I was recently at a meeting where the sponsoring pharma company reviewed the commercial real-world databases (RWD) they had licensed for research. It was an impressively long list, one that I guessed cost them millions of dollars each year. I’d had direct experience analyzing data from many of the vendors. Not all those experiences were good. The other thing that struck me was overlap: by my estimation, at least three of the databases were highly likely to have claims and/or electronic health records (EHR) for the same patients. This type of redundancy is, of course, known—there are only so many payers and health providers in the United States—but generally very difficult to address because of the unwillingness of commercial vendors to disclose where the data come from.
At one point in the meeting came a truly revealing presentation: for one important study, the company’s analysts went to look for occurrence rates of one of their primary endpoints (a lab value) and found that it wasn’t there. This was not an extraction error—the RWD vendor had promised the pharma company that it had an endpoint that could not be analyzed using the database. It was a sufficiency error—the result was only available for a tiny fraction of patients, and even then, there wasn’t enough data to tell what happened over time.
How on earth could this happen?
Probing about the decision-making behind the data purchases helped me understand a bit of the thinking that may have led to this problem. Here’s what I heard at the meeting, backed up by later discussions with others who are licensing RWD.
First and foremost, RWD clients want big numbers. Specifically, as many patients with the condition or drug of interest. Looking at the websites of the data vendors, I see that they have picked up on this. One vendor promises, to my amusement, that they have more patients in their database then there are people living in the United States!
I certainly agree that, all things being equal, having more people with the condition or on the treatment is better than less. Unfortunately, all things are not equal. To put it bluntly, many database vendors aren’t up to providing what they promise. Because “try before you buy” is not something that most vendors are willing to do, this leaves purchasers in a perilous situation. Spending hundreds of thousands or millions of dollars for an unverified product is not something we do in our everyday business or home lives. Why should this be normal with health records?
Of course, the situation isn’t going to change unless those who purchase these databases demand more from their data providers. With that in mind, below are three questions that I typically recommend clients ask before they purchase RWD.[a]
[a] This list primarily applies to HEOR studies that address market access. The regulatory requirements for RWE studies submitted to FDA (e.g., for label extensions or to make marketing claims) are very different, and demand a much higher level of attention to detail. See https://www.fda.gov/regulatory-information/search-fda-guidance-documents/considerations-use-real-world-data-and-real-world-evidence-support-regulatory-decision-making-drug.
Question 1: What is the research question?
First, it’s critical to be very clear about what your organization wants to learn from RWD data before shopping. It might be helpful to imagine that you are designing a prospective clinical trial instead of a retrospective study. What is the patient population of interest, including inclusion and exclusion criteria? What is being compared? What is the main outcome of interest? What are the primary and secondary hypotheses?[b] Be as specific as possible. Very few (if any) databases will have everything needed to answer these questions, but by laying out everything that would ideally be available allows for setting priorities: must-have’s versus may-haves.
[b] The PICOTS framework is a good guide: https://www.pcori.org/engagement/engagement-resources/Engagement-Tool-Resource-Repository/picots-framework-how-write.
“Very few (if any) databases will have everything needed to answer these questions, but by laying out everything that would ideally be available allows for setting priorities.”
Question 2: Who is your primary audience for the study?
Next, envision who the study is intending to influence. Because the level of scrutiny for real-world evidence (RWE) studies will always be greater than for clinical trials (due to unmeasured confounding, among other issues), it is important to think through what your audience will accept and reject in relation to your question, data, and analysis (that goes for the target journal too, of course).
Question 3: What is your budget?
Not everyone has deep pockets when it comes to purchasing RWD. Database costs range from a few thousand dollars to well over a million dollars. Often, higher level questions for internal decision making (e.g., prevalence of disease) can be answered just as well or better with less expensive data—typically government-sponsored databases such as SEER-Medicare or MEPS—than with commercial databases.
Simplified Questions for Purchasers of RWD* |
---|
1. Define the study and hypotheses: |
a. What is the primary question? |
b. What is the patient population (inclusion and exclusion criteria)? |
c. What endpoints are of primary interest? |
d. What is the primary hypothesis about the impact of the disease or treatment on the primary endpoint? |
e. What sample size is needed to test the hypothesis? |
2. Review databases that may address the study and hypothesis: |
a. Ability to define the patient population with sufficient accuracy and specificity |
b. Availability of primary and secondary endpoints 1. Accuracy relative to the input source 2. Completeness, as a whole and over time for specific patients (longitudinally) |
c. Availability of clinical and demographic factors needed to account for confounding |
d. Recency of database relative to the current standard of care and the research question |
e. Potential number of patients available for hypothesis testing |
f. Number of months or years of observation relative to time needed to observe the endpoints of interest from the index date |
These three questions can be useful to generate a short list of potential data vendors. The next step is to reach out to them with some very specific questions about data quality and completeness. I and others have written guidance on standards for data quality in RWD.1-3 These can be useful references. My experience here is that while some vendors are willing to provide the information, many are not.[c] This probably represents the relative imbalance in negotiating power between data providers and data users, particularly for specific types of RWD. For the (often relatively junior) person in charge of purchasing and managing RWD for pharma companies, getting data vendors to provide answers to a specific list of questions might be intimidating. For these folks, here are three suggestions. First, if you are from a large pharma company, you have power. Data vendors value long term relationships with deep-pocketed firms. Second, if data vendors won’t cooperate with basic information to help you understand what you are buying, ask for summary statistics on the question of interest, or better yet take a “short-term lease before long-term rental” approach. That is, lease the data for just long enough to do the data probing that is necessary, letting the data vendor know that you will consider a longer-term contract after you have done the scoping exercise. Third, the head of HEOR or Market Access should participate in the negotiations. For better or worse, when someone at the VP level is on the line, RWD companies tend to be more accommodating.
[c] Unlike commercial vendors, most government-sponsored databases (e.g., SEER-Medicare) have extensive documentation of quality and completeness.
RWD Priorities: A Shared Resource or Moving the Market Needle?
Some data purchasers may take issue with the approach I outline above, noting that they are not buying data for a single study. This fits with the bigness idea: the more that’s in there, the more one can get out in terms of multiple studies that can support multiple products. My response is, “How much do you spend on a single clinical trial?” Do you expect that trial to answer every question about a product and the condition it is treating? Of course, companies design trials with multiple endpoints and multiple planned sub-analyses, but here is the thing: first and foremost, the study must answer a primary question that moves them along the path to regulatory approval or impacting practice. Shouldn’t a RWE study demand no less—provide findings that clinicians, payers, and patients must consider in coverage, clinical decision-making, and use?
This brings me to my second point about demanding more from RWD vendors: studies rest on the quality of the data that is collected along the way. This issue is exactly the same whether one is designing a clinical trial or selecting a database for an RWE study. Unlike trials, RWD users can’t control what is put into the databases.
“Studies rest on the quality of the data that is collected along the way. This issue is exactly the same whether one is designing a clinical trial or selecting a database for an RWE study.”
Thinking Long-Term (Getting from “Data Vendor Might” to “Doing What’s Right”)
The suggestions above are short-run tactics for a relatively young RWD industry. Do I see improvements in RWD quality over time? Pharma and device manufacturers are still figuring out what they want from these databases, learning slowly what types of studies do and don’t move the market for their products. Moreover, managing a large real-world database is extremely expensive. Because there are economies of scale in data, a certain amount of consolidation is inevitable. The speed of this change, in my opinion, will be influenced by demands of clients for better quality (and yes, quantity).
In the long run, I believe that when purchasers demand more from RWD, this will filter back to the vendors curating the data, and ultimately to the many parties who are responsible for recording the information in the first place. Case in point: documenting performance status for patients with cancer. Multiple studies show that cancer patient’s performance status– a measure of how well a patient is able to perform ordinary tasks and carry out daily activities–at diagnosis correlates with ability to tolerate treatment and with outcomes.4 It is an inclusion criterion for the vast majority of cancer clinical trials. Still, oncology providers only record their patient’s performance status about 60% of the time. Perhaps this is okay in clinical care, but it is a major problem for those of us who need this information for high quality analyses of retrospective data. Since EHR data is now a major source of revenue for healthcare providers, this gives the data vendors theoretical leverage to demand more accountability in recording important clinical information (that as a side effect, could also improve quality of care). Efforts to improve data completeness and accuracy are costly to revenue-constrained providers, so the data vendors will either have to pay to improve data entry or provide direct incentives to providers for meeting benchmarks. This in turn will only happen if prospective clients take their business elsewhere.
In summary, we still have a quality problem in big health data that is partly a product of users not being very demanding about what they are purchasing. Addressing this problem will improve the quality of RWD studies, and ultimately the impact of RWD on market access, clinical practice, and patient outcomes. It starts with us. It’s time to demand more from RWD vendors.
– Scott Ramsey, MD, PhD
Senior Partner and Chief Medical Officer, Curta
Acknowledgement
Elizabeth Brouwer, MPH, PhD, Associate Director, provided important improvements to this article.
CONNECT WITH US
The experts at Curta are ready to discuss your real-world study needs. For more information, please contact Elizabeth at Elizabeth.Brouwer@Curta.com or Scott at Scott.Ramsey@Curta.com.
REFERENCES
- Fleurence RL, Kent S, Adamson B, Tcheng J, Balicer R, Ross JS, et al. Assessing Real-World Data From Electronic Health Records for Health Technology Assessment: The SUITABILITY Checklist: A Good Practices Report of an ISPOR Task Force. Value Health. 2024;27(6):692-701. doi: 10.1016/j.jval.2024.01.019.
- Orsini LS, Berger M, Crown W, Daniel G, Eichler HG, Goettsch W, et al. Improving Transparency to Build Trust in Real-World Secondary Data Studies for Hypothesis Testing-Why, What, and How: Recommendations and a Road Map from the Real-World Evidence Transparency Initiative. Value Health. 2020;23(9):1128-1136. doi: 10.1016/j.jval.2020.04.002.
- Determining Real-World Data’s Fitness for Use and the Role of Reliability. Duke Margolis Institute for Health Policy. https://healthpolicy.duke.edu/publications/determining-real-world-datas-fitness-use-and-role-reliability. Accessed December 2024.
- National Cancer Institute. NCI dictionary of cancer terms. https://www.cancer.gov/publications/dictionaries/cancer-terms/def/performance-status. Accessed December 2024