For smallholder farmers in Uganda, accessing credit is difficult. There are a lot of reasons for this, one being a disconnect between agriculture and finance. Financial institutions know that smallholders have a big outstanding demand for credit. According the CGAP Smallholder Families Data Hub, more than half of Uganda’s smallholders say credit is important for their agricultural activities. But meeting this demand can be complex and expensive, and there is risk aversion when it comes to lending to farmers. More than two-thirds of Uganda’s smallholders derive most of their income from agriculture. Farming income is seasonal, variable and hard to predict, and financial institutions hesitate to lend to people who have confusing income streams.
But what if incomes, as well as expenses and other cashflows around agriculture, were easier to predict? Even as the alternative lending space fills up with new players applying data-driven models, agricultural lending remains a frontier area, particularly to smallholder growers. This is despite the fact that smallholders are often a large untapped client segment for lenders. If the financial inclusion community is going to build truly scalable models for financing sparse, remote populations, then we need to find smarter ways of using data to understand smallholders as customers: their needs, behaviors, incentives and, crucially, their agricultural activities.
Recently, CGAP started working with Harvesting, a FinTech for agriculture company, and PRIDE Microfinance Limited, Uganda’s largest microfinance institution (MFI), to develop a new credit scoring system for lending to farmers. Given the enormous impact that effective credit scoring could have on rural livelihoods if it achieves scale, we are working to create a data-driven solution using demographic, transactional, production and financial data. The following are some early insights from our collaboration that may be useful to other providers who are exploring alternative credit scoring.
You’ll need data . . . a lot of data
Credit scoring is a scale business. To develop a traditional credit score card with reasonable predictive power, you need a lot of data from thousands — ideally tens of thousands — of farmers. If you have data on 5,000 farmers and the default rate is 5 percent (typical of an MFI’s agrilending portfolio), then that means you have only 250 defaulters to draw information from about what went wrong (and understanding what went wrong is a good way to understand not just what is driving defaults but how to improve the quality of credit products). With PRIDE, we will seek to improve how client data are collected and see what additional data points are needed. We expect that this will rapidly increase the quantity and quality of data being collected on lending decisions and outcomes.
Focus on collecting high-quality new data, not cleaning up old data
The best data point to predict the likelihood of a borrower defaulting on a loan will always be his or her repayment history. But that’s a catch-22. If a prospective borrower can’t access credit, how can he or she build a credit history? For this reason, many smallholders don’t have credit histories. Even when they do, historical data can be challenging for lenders because it is often incomplete or of poor quality. Reconstructing or fixing old data can be very time consuming and of limited use. A smarter approach is to look forward and focus on building a better system for capturing new, high-quality data.
Automate your data collection
Credit scoring is only one part of a long and often complex system of lending. Map and analyze the entire lending process, from customer acquisition to loan repayment, to understand where else data and technology can drive improvements. Technology can be used to improve productivity — for example, by replacing paper loan applications with tablet-based ones. But as we look at process automation, the impact of technology will hinge (in the short term, at least) on human capacity. Building the capacity of loan officers, IT teams and data analysts through training and workshops is critical to getting credit scoring and other processes working efficiently. Reinforcing the importance of collecting the right, clean data upfront can save time and effort further down the line, and loan officer training is a critical step to ensure quality data are collected.
Get everyone on board with the changes, not just executives
Top management buy-in is critical to changing lending operations. And even with executives on board, everyone in the process needs to see the value and benefits of having good data and align with the right incentives and motivations. When it comes to data-based credit scoring, three distinct teams need to work together to drive change: the product team must ensure the product being developed aligns with institutional strategy and meets the needs of the target market, the risk team needs to advise on risk management measures and the digital team must continue pulling relevant data to support continuous building of the model. Without these three groups on board, with agreement from top management, change is unlikely to happen.
Recognize that data can discriminate
There is plenty of evidence that data-driven models can discriminate against groups such as women, youth or disabled populations because of biases in the historical data on which these models are built. As CGAP develops credit scoring models that work for the rural poor, one of the key challenges we anticipate is how to correct for these biases. We will share what we learn along the way. One question we will be looking at is whether providers can go beyond neutralizing negative biases to better recognize undervalued characteristics of underserved populations, turning credit scoring systems into stronger tools for financial inclusion.
Going from traditional lending decisions to credit scoring is an iterative process with many positive feedback loops. It’s worth starting with a simple minimum viable product and building it up as more data feed through the system. With every new loan cycle, new data are generated, models are refined and accuracy is improved. Once the systems are running, new data sources can be integrated into later models to improve the predictive power of the scorecard. Creating a data-driven credit scoring system is hard work, but if done right it has the potential to lower the risk and cost of serving smallholders.