AI And The ‘Black Box’ Problem: US FDA Is More Comfortable Than Some May Think
Executive Summary
A recurring question about using artificial intelligence in drug development is whether the US Food and Drug Administration can accept a model that operates as a black box, meaning that developers cannot explain exactly how the model does what it does.
The US Food and Drug Administration is very much prepared to consider artificial intelligence applications that are not explainable, that is, models that operate essentially as a “black box.”
Key Takeaways
-
The FDA will consider AI applications even if the model is not fully explainable.
-
If the AI model is hard to understand, but there is other evidence supports the findings, a discussion with FDA may be possible.
-
Not using AI because we do not understand how it works is not the best path forward, A Parexel official said.
That was one key message from a recent FDA/Clinical Trials Transformation Initiative meeting on AI in drug and biological product development. The event more broadly served as an opportunity for the agency to gather input and share initial thoughts about its upcoming guidance on the use of AI in drug development. (Also see "AI In Drug Development: Regulatory Clarity Needed On Inspections, Human Role" - Pink Sheet, 28 Aug, 2024.)
FDA officials said that in the absence of guidance, there are some industry misconceptions about the agency’s AI stance.
“We have heard FDA does not allow” large language models,” said Tala Fakhouri, Center for Drug Evaluation and Research associate director for data science and artificial intelligence policy. “I'm not sure where that came from.”
Fakhouri also said she has heard “that the models have to be explainable.”
“Again, not all models are explainable,” she said. “It will depend on the context of use and model risk, where we might ask more questions.”
“Generally, at the FDA, we put more emphasis on transparency,” Fakhouri added. “We want to know more about the data that you may have used to train your model. We want to know why you chose a specific modeling technique. We want to understand the methods behind AI, not necessarily explainability.”
Fakhouri and a panel of agency colleagues were asked to elaborate on the point during a wrap up session. Fakhouri read the question: “How do we propose to validate an AI model that is not explainable? In other words, how do you validate a black box model? If you don’t know how it works, how can you claim you validated it?”
“If you are only asking me to trust this model that I don't understand, and this is the only piece of evidence that you have, I think this is going to be a hard pill to swallow,” said Hussein Ezzeldin, senior digital health expert in the Center for Biologics Evaluation and Research’s Office of Biostatistics and Pharmacovigilance. “But if you're bringing something that may be hard to understand, but it has other evidence that would support these findings, then maybe it might be a different discussion.”
“If you don’t understand how the model works, but you see concurrence between the output of the model and the data that you already have, then this is some sort of a validation, even though you don't really understand how the model works,” Ezzeldin added.
Another FDA official, speaking from the audience, said that there is often a misperception that models are either fully explainable or a complete black box.
“It is not 0/1,” she said. “Typically, you don’t work with models that you don’t understand” at all.
“You understand some things about those models, and there are certain other things that you don't know completely,” the official added.
She then used an analogy similar to one offered by FDA Commissioner Robert Califf about the importance of understanding a drug’s mechanism of action. (Also see "The Aspirin Test For AI?" - Pink Sheet, 24 Apr, 2024.)
There are many drugs “where we have some understanding of the mechanism of action, but we don’t know in detail how these drugs work,” the official said. She specifically cited SSRI antidepressants, where it is well established that the drugs block reuptake of serotonin.
“How does that translate into improvement in depression or anxiety? We have no clue,” she said. “That is a principle that not only we are comfortable with as a society and in clinical trials and drug development, we have been using many of these drugs for ages in one of the most demanding tasks and more serious tasks, which is treating people.”
Fakhouri gave a different example, indicating that meeting speakers reported that “Generative AI is being used to generate the first draft” of regulatory documents. “Even the makers of those tools don’t understand how they work.”
Parexel Executive VP Stephen Pyke, who chairs the Association of Clinical Research Organizations’ AI/ML Committee, stressed the importance of becoming “comfortable” with the concept.
“I’m a statistician and I got trained about Occam's Razor: simple is preferable wherever it can do the job,” he said. However, “some jobs can’t be done with simple. We have to get used to this idea. There are certain tasks that will only be amenable to AI that we can't explain.”
“It is just about recognizing that there are going to be those applications, and we have to figure out, what do we do?” Pyke said. One option is “we won’t use those solutions, even though we think as best we can tell they work and work well, because we don't understand how they work. That does not seem to me a good place to go.”