The Centers for Medicare & Medicaid Services will use artificial intelligence to derive a patient’s race using factors like name, ZIP code, and language preference, when the patient’s race is not fully disclosed on hospital forms, to help spot and improve on health-care inequities.
The CMS thinks the computer-estimated data, from two algorithms developed by independent contractors, would be a closer match to what respondents might have self-reported if they were given more races and ethnicities to select from as their response. The de-identified data will be shared with hospitals, which already use information on race to identify risk factors for certain illnesses or to keep their own tabs on disparities in quality of care.
The CMS currently relies on data from the Social Security Administration to understand health equity gaps among Medicare beneficiaries, but the accuracy is questionable. “For example, prior to 1980, only three categories were available for individuals to self-report race,” the CMS said. They were Black, White, and Other.
The agency has tried updating incorrect information and initiatives like “direct mailings” to beneficiaries to no avail—the data’s accuracy still skews toward correctly identifying Black and White patients and struggling to identify other races.
Privacy advocates warn that the use of artificial intelligence could harm the very groups it’s trying to protect. The software is built to make decisions like humans do, meaning it tends to perpetuate the biases—perhaps unintentional—of its builders. People are also becoming more cautious about how their data is used and potentially misused, and the sensitivity of health-care data warrants added vigilance.
“I commend them for trying to address the equity gap,” said Andrew Crawford, policy counsel for the Center for Democracy and Technology. “But algorithms that are using proxies to make inferences and conclusions about race, they haven’t always proven to be accurate.”
The CMS said the algorithms are highly accurate, but “there remains the small risk of unintentionally introducing measurement bias.” They are “notably less accurate” for patients that identify as “American Indian/Alaskan Native or multiracial,” the CMS said.
The agency is aware that AI-type programs cause angst and its officials try to address that fear. “It’s important to make sure that data is used in ways that aren’t detrimental to those communities that we’re actually trying to help,” Micky Tripathi, the national coordinator for heath information technology at the Department of Health and Human Services, said in a recent interview.
The CMS will use the algorithms for Medicare patients at hospitals. The Biden administration proposed this as part of two annual rules (RIN 0938-AU43 and RIN 0938-AU44) that dictate how Medicare health facilities are paid. One of the rules was finalized Aug. 3, meaning the CMS will move forward with the algorithms’ implementation. The agency anticipates sharing the data with hospitals confidentially in 2022 and publishing the results in 2023, but any public reveal of the data will be the subject to future rulemaking.
The algorithms will be a temporary measure “until more accurate forms of self-identified demographic information are available,” according to a CMS fact sheet.
Health-care data as a whole has one inherent flaw. “By definition, it only represents the people who receive health care,” meaning that marginalized communities with less access to care won’t be represented in the numbers, said Trishan Panch, a lecturer at the Harvard TH Chan School of Public Health and co-founder of Wellframe, a digital health management platform.
The CMS says it will only use the data to make inferences about whether disparities exist at individual hospitals. The agency said it doesn’t “intend to use it to make inferences about any single individual.”
The first algorithm, Medicare Bayesian Indirect Surname Geocoding, takes data that a patient may have self-reported, a patient’s name and address from the U.S. census, and the racial and ethnic demographics of their neighborhood to spit out the likelihood that they belong to a certain racial or ethnic group. Medicare is already using this model to report performance data for Medicare Advantage plans.
The Research Triangle Institute’s algorithm uses the same data, plus a patient’s language preference, to “reclassify some beneficiaries as Hispanic or Asian/Pacific Islander,” the CMS said.
“This is a very technically tractable problem” the CMS is trying to solve, and “there’s a form of machine learning, supervised learning, that would work very well,” Panch said.
Machines are very good at determining which “types of pixels are a cat,” on Google images, Panch said. They can use the same type of programming to tell you which “kinds of pixels are a South Asian person.”
Humans on computers do the first rounds of labeling to train the algorithms. Who those humans are and what biases they have impacts what the algorithms end up generating, critics say.
“Unfortunately, the health-care system doesn’t have a great track record when it comes to providing equitable services to every community, especially marginalized communities here in the U.S.,” Crawford said.
The Covid-19 pandemic was a staggering reminder of how racial, economic, and social factors like where people live and work leave some underrepresented groups more vulnerable to illness. People of color experienced higher rates of hospitalization and intensive care than White patients, according to data from Kaiser Permanente. The gap motivated the Biden administration to make health equity a top priority.
The algorithms are “envisioned as an intermediate step, filling the need for more accurate demographic information for the purposes of exploring inequities in service delivery,” according to the CMS.
“Self-reported race and ethnicity data are the gold standard for classifying an individual according to race or ethnicity,” the rule said. The CMS plans to work toward achieving that standard as it rolls out the algorithms by expanding the collection and sharing of data “using electronic data definitions which permit nationwide, interoperable health information exchange.”
One hurdle that the CMS could be trying to avoid lies in how many people are willing to disclose their race on forms, said Dianne Bourque, a health privacy lawyer at Mintz. “People are certainly more sensitive to privacy than they have been in the past,” she said.
The CMS said it is “mindful that additional resources, including data collection and staff training may be necessary to ensure that conditions are created whereby all patients are comfortable answering all demographic questions, and that individual preferences for non-response are maintained.”
Proceed With Caution
As the CMS prepares to implement the algorithms, Crawford said “caution is the first thing that comes to mind.”
“Don’t just jump headlong into this without really understanding how these algorithms work and what their biases might be,” he said. “The data coming out is not going to be perfect, even in the best case scenario.”
Medical groups were also wary of the CMS’s proposals. The Association of American Medical Colleges urged the CMS “not to use indirectly estimated race and ethnicity data due to concerns with accuracy and actionability of such data,” according the association’s comments to the CMS when the rule was proposed. “Instead, CMS should invest in data collection improvements that standardize and use data already collected by hospitals.”
America’s Essential Hospitals wrote in comments that this information, if publicly reported, could have “unintended consequences” including “the risk that consumers will rely on inaccurate results when making important care decisions.”
Panch said the initiative is a necessary stop-gap because the CMS hasn’t historically collected this data. “I don’t see any way around it.”