Clinical trials and AI at a crossroads: will the GDPR stifle innovation?
Updated: Aug 1, 2018
Depending on who you talk to, AI is poised to make world-changing leaps and bounds, or is tremendously overhyped. Much like bitcoin and blockchain, hundreds of start-ups are prefixing ‘deep-‘ to their names to tie in, regardless of how much they actually use deep learning—itself a subset of machine learning (ML), which is a subset of AI.
The medical world is no exception. One such exciting avenue is the application of ML (our preferred term, since AI at its extremes gets into sci-fi, futuristic territory. ML is being applied and used today) to clinical trials.
As anyone in related fields knows, clinical trials have little changed, and been incredibly frustrating, for decades. Examples abound, such as the cancer drug Avastin, which, after more than 750 completed clinical trials—at the cost of billions of dollars and untold human time and dashed hopes—we still cannot predict its effect on a given subject. The variables are too great, from genetic makeup to age, physiology, to the characteristics of the thing being tested, that trial results are often not replicable.
Adding to the annoyance is the now-desperately-old-fashioned way of finding candidates. According to a Wired article from earlier this year, the US currently has 19,816 clinical trials open, about 18,000 of them will experience significant difficulties finding subjects, and a third will never happen at all. Currently (in the US) there is a federal registry that keeps the details of every one of these trials, but it is painfully difficult to navigate for experts, and almost impossible for individuals.
A key note is that details of trials get entered into ClinicalTrials.gov as structured, easily searchable information, but patient details are open entry, meaning that any text at all can be entered, or even images. This leads to partial information, overlapping terms, and a multitude of other problems that make finding candidates such a painful task. The resources expended to try to fill these trials, as well as the wasted potential for the many that fail to fire, are obvious.
ML and other analysis methods like decision trees have been applied to these problems. A US company, Antidote, hired experts to train a ML solution to comb through and standardize/structure the data into a sortable format. In 2015 big pharma companies like Novartis, Pfizer, and Eli Lilly decided to coordinate and back the solution, which currently has processed about 50% of ClinicalTrials.gov. In addition to making it far easier for companies to search for appropriate candidates, the tool is also usable by individuals to quickly search for their condition and get a list of potential trials.
Addressing the efficiency and accuracy of the trials themselves is a bigger problem, however, with massive potential. With millions (or billions/trillions if we consider DNA permutations) of different combinations of conditions, assessing all the possible mitigating factors that may make a treatment succeed or fail is far beyond human ability. ML techniques, if allowed, reveal commonalities and trends across large populations of trial subjects, limited only by the quality of the data provided to the algorithm.
GDPR Article 22 is a key sticking point threatening AI development
Key phrase: if allowed. This brings us to the current debate over the GDPR, the sweeping EU privacy regulation that went ‘live’ a short time ago, on May 25th. The regulation has significant provisions that could limit AI development, leading to complaints that it will stifle EU innovation and set them well behind the rest of the world.
Some of the more widely-discussed changes are not necessarily a problem, at least for the clinical trial word. Affirmative and clear consent, for instance, is not as much of a problem in a field which already has a strong consent culture. The ‘right to be forgotten’, meanwhile, is bypassed by research/scientific data falling into a special category where subjects who sign up opt out of their right to erasure.
No, the tricky provision regarding ML and other automated processing techniques is Article 22:
1. The data subject shall have the right not to be subject to a decision based solely on automated processing, including profiling, which produces legal effects concerning him or her or similarly significantly affects him or her.
Article 22 (2)(c) allows for the above to not apply if the subject gives explicit consent, but the Article continues in (3):
3. In the cases referred to in points (a) and (c) of paragraph 2, the data controller shall implement suitable measures to safeguard the data subject’s rights and freedoms and legitimate interests, at least the right to obtain human intervention on the part of the controller, to express his or her point of view and to contest the decision.
Thus, even with consent, subjects must be informed about such processing, and have the ability to get human intervention on the part of the data controller and contest any decisions. An algorithm cannot be a black box spitting out decisions at the end, at any moment in the middle of the process a data subject may request a pause, explanation, or contest a decision.
This provision has potentially massive ramifications for the entire ML industry, and commenters have been quick to point out that requiring human hands to be able to fiddle with such processing ruins the entire point of the exercise: letting machines solve problems beyond human capability. Furthermore, the power and complexity of the algorithms that can be used will need to be limited, so that they can feasibly be explained to a data subject, and the need for review of all steps will increase labour (and thus, cost).
Further complicating the situation is ambiguity: does a token human oversight satisfy the condition? Is there an exact definition of ‘automated’? Billions and billions in investment and development await such clarifications. If the US and other parts of the world implement the massive database usability improvements we have discussed (much less the more advanced ML interpretation of the data itself), there is cause to worry that the EU will be left behind.