Learning from Financial Inclusion Research: What Should We Expect?
There is a real puzzle in the world of financial inclusion: Where’s the impact? The question is not unique to financial inclusion, but it is a particularly pressing one. After more than 30 years in the spotlight, the lack of clear, compelling and consistent evidence that financial inclusion interventions reduce poverty has raised questions about whether continued investment is wise. There are a number of efforts to look at existing evidence and come to conclusions about whether, where and why financial inclusion matters.
I’ve been thinking a lot about this puzzle lately, in part because I’m engaged in a comprehensive review of the evidence for investing in financial systems for CDC. But I’m not the only one thinking about it. CGAP is working on a new theory of change. IPA has a review of evidence on resilience. Dvara Research has a financial inclusion evidence gap map for India. Caribou Digital has an evidence gap map for digital financial services. And 3ie and the Campbell Collaborative have published a “systematic review of reviews” of evidence on financial inclusion interventions, and there are more similar efforts.
Before considering what conclusions these various reviews reach, it’s worth stepping back to think about what we should expect to find in impact evaluations of financial inclusion programs — and what we should expect to learn from reviews (and reviews of reviews) of these impact evaluations.
What should we expect to find in impact evaluations?
One of the reasons that financial inclusion became such a popular sector is that, intuitively, providing access to quality financial services should make a meaningful difference to poor households. We know that these households are subject to a lot of volatility and risk and that they struggle to accumulate usefully large sums.
While not focused at the household level, two decades of empirical research using micro- and macro-economic data have established that not only does financial system development precede significant economic growth, the relationship is causal. From history and anthropology, we learn that essentially every civilization and culture going back to ancient Mesopotamia develops financial services. Research like financial diaries illustrate that informal financial services precede formal ones and fill gaps once formal services arise.
The evidence from theory, history, anthropology and empirical econometrics is remarkably consistent in demonstrating that financial systems and services are important to households and that they drive economic growth. Shouldn’t we, then, expect to find large positive impact from financial inclusion programs that target households?
Not necessarily. The nature of a financial system is that (a) it is a system and that (b) it is a system with the express purpose of distributing gains across participants. Financial systems drive growth by moving capital from people, places and times where there is a surplus to people, places and times where it is scarce. Put another way, the purpose of a financial system is to create spillover effects — the kind of spillover effects that make it difficult to observe household-level impact. Keep in mind that the financial system is not just the formal financial system. Since we now know that it is rare that households are not already significant users of at least informal financial services before any intervention is put in place, the impact of any intervention is likely to be dispersed through the system. Consider the famous finding that M-Pesa reduced poverty in Kenya. It did this by allowing a very large number of households to increase their engagement in the cash economy (or, alternatively, in the financial system). For each household, the change wasn’t much — a few pennies a day. Those modest benefits do add up in aggregate, but if you are looking for big changes at the household level, you’re not going to see them. Other recent research has shown significant spillover effects from interventions that boost access to the financial system.
Beyond the systemic nature of a financial system, there is also the fact that impact evaluations of financial inclusion interventions do not measure whether financial services have a positive impact on households; they measure whether particular programs have meaningfully improved the financial services available to particular households. Whether a program has done so depends on its design, implementation and many other factors. Impact evaluations that find modest effects do not necessarily tell you anything about the value of the system.
Consider, for instance, that many rigorous impact evaluations of water and sanitation interventions find little to no effect on diarrheal disease. Does that mean that there are no benefits of clean water or sanitation? Of course not. It does mean that the programs aren’t fully dealing with the myriad sources of water contamination. That’s a problem that needs solving, but it’s not a reason to say that clean water and sanitation doesn’t make a difference for poor households.
What should we expect to learn from reviews of impact evaluations?
The case for making policy decisions based on evidence from reviewing not just one impact evaluation, but from a review of many impact evaluations is just as intuitive as the case for investing in financial inclusion. It would seem that a review of other reviews would be even better.
Reality is again more complicated. Researchers conducting meta-analyses, as they are called, face a difficult task picking which studies to include. There are a lot of poor-quality studies out there, but trying to select for quality can easily turn into cherry picking. In an attempt to produce a high-quality review, the standard practice is to choose which studies are included “systematically,” by creating a set of objective, transparent rules for what will and won’t be included (e.g., specific details on the program design, location, participant selection, implementer, cost, details on participants, availability of original data, etc.).
But there’s a problem in the research world: Many studies don’t include the information necessary to apply those rules. So rather than selecting for quality and relevance, a systematic meta-analysis is really selecting based on whether the authors wrote a paper that can be included in a meta-analysis. The world would be better if every paper included the important information that is used in systematic reviews, but that’s not the world we live in.
When you extend the systematic review to a “review of reviews,” these problems are amplified. Often, the vast majority of studies are excluded, and it is unclear that what is included is the most useful evidence for drawing conclusions. You get a situation where very careful and useful meta-analyses designed to uncover critical evidence for program design are excluded. A good example is the exclusion of Rachel Meager’s important work on heterogeneity in microcredit impact evaluations from a recent Campbell Collaborative review of reviews.
Even worse, the systematic review process appears to provide a judgment on whether a category of intervention “works” — in this case, financial inclusion — when what they are really evaluating is not a category but a limited set of programs designed to influence a category. And they do this while obscuring the policy-relevant details of program implementation that help explain their impact (or lack thereof).
Systematic reviews and reviews of reviews can be helpful, especially to uncover cherry picking of research to support conclusions. But in the end, what we can learn from them can often be less than what we can learn from a theory-informed, nonsystematic but thorough reading of the research: Financial inclusion is important for poor households, but a particular financial service is not sufficient for them to escape poverty, and we have a lot of work to do to improve the quality, utility and delivery of financial services and the inclusivity of financial systems.
Timothy Ogden is managing director of the Financial Access Initiative, a research center housed at NYU-Wagner focused on how financial services can better meet the needs and improve the lives of low-income households.
Hi Tim. Thanks for citing our recent Review of Reviews (RoR) as an example of systematic evidence appraisal, even if in a discussion of the drawbacks. No doubt, high-level evidence reviews have major shortcomings, as our conclusions note.
Our main response would be that, despite the shortcomings of Systematic Reviews (SRs) and RoRs, they are still vastly preferable to unsystematic and potentially biased lit reviews precisely because they assess the evidence base in a scientific and neutral way, in the sense of applying clearly-defined (rather than undisclosed, hence potentially arbitrary or biased) filters. They mirror the approach taken by evidence-based medicine, though it bears mentioning that interventions are often more easily comparable in the medical sciences than in social policy or development.
While your blog raises many interesting points for debate (including concerns we have about the methodology of the M-PESA study you positively highlight) one key thing we’d like to clarify is that our RoR was, in fact, strongly theory-informed. We started out from a theory of change and mapped the evidence (and gaps in it) onto it. We wished that all underlying evidence had been theory-informed, too.
Of course, it was frustrating to have to exclude some SR/Meta-Analysis evidence on the basis of our protocol (which was peer-reviewed and agreed in advance, as part of our being systematic). This didn’t apply to the paper you mention (by Rachel Meager), however, which was neither an SR nor a MA, and hence not applicable.
Fortunately, your final paragraph pretty much summarises what our RoR says, barring the very categorical statement that “Financial inclusion is important for poor households”. We believe readers of our RoR will learn important lessons for practice, including that the evidence is strongest for savings service delivery programmes, and that targeted programme components (rather than financial services alone) matter for social and gender outcomes. (The Campbell plain language summary, for those with less time, is here: https://campbellcollaboration.org/media/k2/attachments/0369_IDCG_Duvend…)
In sum, being more systematic doesn’t mean being less relevant; neither is the converse true.
Maren & Phil
Maren & Phil
First let me say something that got cut from the original post because of length (even though the kind CGAP editors let me go several hundred words over the normal max): I admire the huge amount of work you did on the RoR. It is very hard and painstaking work to pull off, and it is a useful contribution to the literature. I also hope it's clear that I understand the reasons for the choices made on inclusion/exclusion criteria in systematic reviews and reviews of reviews.
That being said, we have (I hope) a good faith disagreement over the value of systematic reviews and especially of reviews of reviews in domains with highly variable contexts. I do hold to the categorical statement about the value of financial services to poor households, not based on impact evaluations but from theory, history, anthropology and high-frequency data surveys like financial diaries. In my opinion, there is no better evidence for what matters to households than revealed preferences: these households create financial services to meet their needs if/when there are none available. That is a pattern visible across cultures and throughout history.
Our disagreement is encapsulated in the perspective on the Meager paper. Again, I understand why it is excluded from your review. My argument is that exclusions like that do more harm than good in policy terms.
I also disagree strongly with the statement about systematic reviews being "vastly preferable" and the implication that disclosure makes inclusion/exclusion criteria not "potentially arbitrary and biased". The disclosure, or the method for creating the exclusion/inclusion criteria, does nothing to reduce arbitrariness or bias, but it does give the illusion of doing so. An example so that my point is clear: a choice to not consider cost-benefit when assessing evidence is arbitrary and introduces bias (this is frustration with most of the impact literature in financial inclusion).
Finally, an example of how our disagreement yields different policy conclusions: I believe the evidence for savings interventions is particularly weak. The majority of studies are short-term and have many unexplainable features, but most importantly do not address the business model issues that go along with savings products. That's why, despite the big push around savings interventions roughly 10 years ago, they have fallen out of favor among funders and practitioners.