At C2 Labs, we enjoy working on problems at the extreme ends of complexity and scale. When it comes to solving big problems, it is hard to find one much bigger than cancer. In the US alone, cancer affects over 1.7 million people per year resulting in over 600k deaths. As cancer progresses through its various stages, the outlook becomes more dire and clinical trials are increasingly a last hope for many of these patients. Unfortunately, finding the right clinical trial that is the "best fit" for the patient can be a daunting task. There are many variables that go into identifying the right trial and it can be difficult for patients to sort through or even find all of the options when they are already stressed and emotional due to the cancer diagnosis.
Oak Ridge National Laboratory (ORNL) recognized this issue and put out an open call for innovative approaches for using Artificial Intelligence (AI) to match patients to clinical trials as part of the Smoky Mountains Data Challenge (SMDC). C2 Labs participated in this challenge where we got a chance to leverage our growing data science capabilities and high-end DevOps experience to come up with a creative solution to this problem.
We started by setting some realistic goals of what we could accomplish given the fact that there was a short timeline to complete the work (weeks) and no budget so it would all be volunteer effort after hours and on weekends from our dedicated staff. Based on these limitations, we set out to meet the following goals with our solution:
Performance and Scalability: solution should be able to grow to handle large datasets at Petabyte (PB) size using horizontal scaling
Open Source: there was no budget for this work so we leveraged all open source and free tools
Interoperability: solution should leverage standards for interchanging data (i.e. open REST APIs)
Simplicity: user experience should be intuitive and easy to navigate for a patient
Relevance: search results should provide more relevant search results than standard string/RegEx based searches of trials
At the end, we wanted to ensure we delivered a Proof of Concept (POC) application and data model that offered practical real-world value while also being extensible by others in the open source community. At the end of the day, we knew that we didn't have the domain knowledge to make a fundamental breakthrough (we are not a healthcare related company and have no medical doctors on staff). However, we do have some world-class DevOps talent and a growing data science practice. We leveraged this talent to deliver a "Bring Your Own Algorithm" (BYOA) architecture that allows future data scientists to plug and play their algorithms into our architecture without having to worry about the underlying details of how they would retrieve or display the data. This approach lets people much smarter than us focus on breakthroughs in AI and the data science without being concerned about any pesky and annoying IT details. To accomplish our BYOA approach, we implemented the architecture below:
The pieces of this architecture work as follows:
We pulled data into a scale-out ElasticSearch database hosted in Azure Kubernetes by using a custom Python script to query the NIH Cancer.gov APIs and to clean/index the data
We developed a Node.js API using Express.js that can spawn "to the N" sub-processes to execute the AI/ML algorithms that interact with the data
The user interface was developed in Angular and is hosted in a Kubernetes container. It allows the patient to try and find trials by entering some basic info then applies the AI/ML algorithms to return the best results
The application also provided a feedback loop (thumbs up or down) for the user to indicate whether the result was useful. This feedback can be used to train future ML algorithms to correlate what is most likely to generate a positive match.
The final result of the application is shown below:
The green scores indicate a high probability match, yellow indicates a potential match, and red is unlikely to be a match. The "View" button takes the user to the Cancer.gov site to find more information on the trial. The end result is an intuitive solution that provides real world value on finding optimal clinical trials that are the best fit for the patient. In addition, the architecture can be extended by others to provide additional algorithms to improve on the overall results. To that end, we have shared the source code and results as open source within our GitHub repository for others to use and extend as they see fit.
To find out more:
Interested in learning more about how Data Science and DevOps can be combined to digitally transform healthcare? Contact Us today for a no cost consultation on how C2 Labs can help accelerate your digital transformation.
Comments