The latest CCAF Global AI in Financial Services Report reinforces a persistent reality – scaling AI in financial services is being stymied by the dual binding constraints of data quality and availability.
Across respondents surveyed by CCAF, 46% of regulators and 34% of fintechs identify data availability and quality as the leading constraint, while vendors report even sharper challenges among their clients — 72% cite data quality and completeness, and 41% cite data-sharing and privacy restrictions.
These findings are striking not because they are new, but because they are persistent. Despite rapid advances in AI capabilities, the underlying data foundations have not kept pace. CGAP’s forthcoming working paper, “Powering AI with Inclusive Data: A Roadmap for Financial Inclusion,” argues that this is not incidental. We find that AI adoption is fundamentally constrained by the strength, inclusiveness, and usability of underlying data – not as much by the sophistication of algorithms. The forthcoming paper will provide a detailed roadmap on how data availability and quality can be improved to make financial systems more inclusive.
AI adoption is fundamentally constrained by the strength, inclusiveness, and usability of underlying data – not as much by the sophistication of algorithms.
The constraint is data availability as much as quality
While the CCAF survey emphasizes data quality, the constraint is more fundamental. Many financial systems face simultaneous gaps in both the availability and the quality of data needed to support AI.
For large segments of the population, particularly women, informal workers, and micro and small enterprises, data trails remain thin, fragmented, or entirely absent. Even where digital activity exists, it is often not captured or structured in ways that financial institutions can use.
For example, a woman running an informal retail business may transact daily through cash or messaging platforms, but without a formal transaction history or standardized records, these economic activities remain invisible to financial institutions. This creates a data availability constraint, limiting the ability of AI systems to generate reliable and generalizable insights.
At the same time, even when data exists, it is often incomplete, siloed, or not fit for purpose. Because AI models learn from both historical and real-time data, fragmented and biased digital footprints — especially for women, informal workers, and rural users — are carried through and amplified. Weak data foundations, marked by poor quality, limited interoperability, and governance gaps, ultimately limit model accuracy and reinforce bias.
Many financial systems face simultaneous gaps in both the availability and the quality of data needed to support AI.
The result is a dual constraint. AI systems are being developed on datasets that are both limited in availability and lacking in reliability. Advancing toward data-driven financial inclusion, therefore, requires strengthening both dimensions simultaneously, expanding the availability of data trails while improving their quality, structure, and governance. Consequently, AI performance and its inclusiveness depend on solving for both at the same time.
The “connected but invisible” gap is undermining AI outcomes
A central reason these challenges persist is that data gaps are concentrated among underserved populations.
Across many markets, individuals like the woman in the example above are digitally connected but remain effectively invisible within financial datasets. Their economic lives, often informal, irregular, or outside traditional financial systems, are not adequately captured or recognized. This creates a connected but invisible dynamic, where participation in the economy does not translate into visibility within data systems.
As a result, financial institutions continue to rely on narrow, traditional datasets that fail to reflect the realities of large customer segments. When AI systems are trained on these datasets, they do not correct these gaps. Instead, they inherit and scale them.
For instance, AI systems trained on conventional financial data may underestimate women’s creditworthiness or overstate their risk because women are less likely to appear in traditional credit datasets and are often misrepresented by proxies such as formal employment, asset ownership, or stable income.
This dynamic is reflected in broader risks highlighted in CCAF’s survey and in CGAP’s work, including bias, exclusion, and lack of explainability in AI-driven financial services. These risks are not purely algorithmic - they are rooted in who is represented in the data, and who is not.
The question is not just how to deploy more advanced AI models, but how to build data systems that make AI viable, reliable, and inclusive. This would be a progression toward data-driven financial inclusion, where AI is not the starting point, but an accelerator that becomes effective only when data systems are sufficiently mature. This shift toward AI-enabled, data-driven financial inclusion highlights three priorities.
- First, data systems must be treated as core infrastructure, including through investments in digital public infrastructure such as interoperable data-sharing frameworks, particularly open finance.
- Second, inclusion must be intentional, with deliberate efforts to expand and better represent underserved populations in datasets.
- Third, financial services providers and public sector authorities in data-constrained environments must build/use synthetic data sets, use advanced sampling, and combine these with alternative data to solve the “connected but invisible” paradox of individuals who are economically active yet statistically invisible.
AI readiness starts with data foundations
CCAF’s findings point to the need for a fundamental shift in how the industry scales AI. The persistence of data-related constraints makes one point clear - AI’s trajectory in financial services will be determined less by advances in algorithms and more by the availability, quality, and governance of the data systems that underpin them.
AI’s trajectory in financial services will be determined less by advances in algorithms and more by the availability, quality, and governance of the data systems that underpin them.
Until these foundations are strengthened, data will remain the binding constraint to scaling AI. However, it is also the greatest opportunity. Institutions that invest in building richer, more representative, and better-governed data ecosystems will not only unlock AI’s potential. They will define what responsible and inclusive AI looks like in practice.
Add new comment