Is Your Data AI-Grade?

by Paul Zakas, Data Solutions Strategist
August 25, 2025

In no other sector of society does AI have the potential to transform lives as it does in healthcare. But the stakes are too high to trust in technology that is built on a flawed data foundation.

For more than two decades, data management has been one of the most formidable challenges facing biopharma companies.

Even the industry’s most seasoned data engineers often struggle to identify and integrate high-quality sources due to the complexities of healthcare data and a process that is fraught with pitfalls. As a result, datasets are frequently incomplete, biased, or otherwise compromised, leading to skewed analytics, subpar decision-making, and results that range from less-than-optimal to dismal.

Yet many companies that still lack confidence in their datasets (and/or data partners) are on the verge of embracing AI — technology whose performance is dependent on the very data that powers it.

Defying Logic: AI That Is Built on an Incomplete or Biased Data Foundation

AI that is trained and operates on a flawed dataset will deliver the same inaccuracies and impaired insights that have plagued biopharma for decades.

Here is just one example: Continuous patient-level enrollment data is crucial to delivering accurate insurance assignments and payer insights for Commercial and Market Access teams. If a dataset has not been built to enable this granular level of insight, AI cannot compensate for that weakness and will not be able to deliver accurate payer analytics.

*KPI = Komodo Patient Insurance

Trite But True

Never has the “garbage in, garbage out” adage been more fitting than with AI technology, which amplifies and magnifies a dataset’s flaws and weaknesses. Here are three examples of how this manifests:

Hallucinations. When data is erroneous, inconsistent, or unrepresentative, it can mislead algorithms into identifying patterns and making predictions that do not exist. These “hallucinations” can lead to flawed conclusions about patient populations, market trends, drug efficacy, and so on. Ultimately, hallucinations impact critical business decisions and can lead to wasted resources and missed opportunities.
Bias. Data that is not representative of race, ethnicity, and socioeconomic dimensions will cause AI to amplify bias and contribute to healthcare inequities; i.e., if underserved populations are insufficiently represented or misclassified, or their data is incomplete in the training data, AI models will perform poorly when applied to these groups. This can lead to inaccurate diagnoses, ineffective treatments, and limited access to care for those who need it most.
Ambiguity. Perhaps one of the least understood and most underestimated flaws within datasets is a lack of clinical context, which can lead to inaccurate inferences and/or insights that aren’t actionable. For example, a dataset might show a correlation between a drug and a positive outcome, but, without details about patient demographics, comorbidities, and treatments prior to and post diagnosis, AI will not be able to surface true cause and effect or identify which patient subgroups would benefit most. This ambiguity can hinder the development of targeted therapies and personalized medicine.

The risks associated with using flawed data to power AI are numerous and significant:

How to Assess the Data Foundation That Supports AI

The ability to trust AI begins with obtaining confidence in its data foundation. Investigate these four key areas:

AI and the Data Quality Imperative

While AI has the potential to exponentially improve every facet of the drug life cycle, it will fail if it is not trained and operating on the highest- quality healthcare data. As biopharma companies seek to incorporate AI into their processes, they must carefully consider the data foundation and AI technology in tandem.

Machine learning and generative-AI assistants are moving the industry forward, but what’s next? Watch our on-demand webinar introducing Marmot™, the first fully conversational AI for healthcare analytics.

Are you still exploring how to best incorporate the power of AI into your organization? Learn more here.

To see more articles like this, follow Komodo Health on X, LinkedIn, or YouTube, and visit our Resources Hub.