On March 16, 2020, researchers and leaders from the Allen Institute for AI, Chan Zuckerberg Initiative (CZI), Georgetown University’s Center for Security and Emerging Technology (CSET), Microsoft, and the National Library of Medicine (NLM) at the National Institutes of Health released the COVID-19 Open Research Dataset (CORD-19) of scholarly literature about COVID-19, SARS-CoV-2, and the Coronavirus group.
Requested by The White House Office of Science and Technology Policy, the dataset represents the most extensive machine-readable Coronavirus literature collection available for data and text mining to date, with over 29,000 articles, more than 13,000 of which have full text.
The call-to-action is to use AI to analyze the corpus to discover new insights about COVID-19. Rehinged.AI ran 9,000 papers (the commercial use subset that we pulled on 3/24/20) through the Rehinged.AI data processing engine, using our natural language processing algorithms to analyze and score the language used about specific treatments of coronaviruses.
Here the 14 drugs and the number of articles about the coronavirus that mentioned them:
- Tamiflu;Oseltamivir | 302
- Corticosteriod | 236
- Chloroquine | 139
- Quercetin | 80
- Avigan;Favipiravir | 45
- Ritonavir | 37
- Ibuprofen | 32
- Vitamin C | 31
- Remdesivir | 14
- Hydroxycholorquine | 12
- Actemra;Tocilizumab | 5
- Tocilizumab | 5
- Aluvia;Kaletra | 4
- Kevzara | 1
One of the ways that the Rehinged.AI platform interprets content at scale is by measuring “perception metrics” on a corpus (text or groups of text).
A perception metric can be any noun or adjective. The perception metrics are typically selected by the end user to match to the topic and objects of interest. For example, most brand managers have a group of three to seven perception metrics that they want their brand to be known for. A common example for corporate brands is “trust.”
Our AI platform has the ability to interpret and compare any objects on any perception metrics. For this project, we measured the above articles for the words “improvement”, “good”, “effective” and “unique.”
It’s important to note that natural language processing of text does not always produce an “ah-ha” learning or a conclusive result. It’s simply the algorithm applied to the corpus in a consistent manner over the large amount of data. Moreover, it’s impossible for an AI algorithm to reach to 100% accuracy, but our AI tools have been periodically validated via well-known results and all scores are about the same level of accuracy compared to human interpretation.
Rehinged.AI customers use our platform to obtain data-driven results for content (including video) that they don’t have the time or ability to review and interpret. So, the speed of the insight is the true value. Since Kevzara is only mentioned in one of the 9,000 articles, we don’t need AI and NLP to read a single article; humans (even if they are still not 100% accurate) are statistically better than a machine at interpreting a single article.
However, a human would struggle to find understanding in a corpus of 9,000 articles, so having a data processing engine be able to process articles at scale, in real time, provides value.
For this project, we reviewed these drugs in two batches: one which discusses “coronaviruses” (of which there are many) and in another smaller sample that specifically discussed COVID-19 (which was named 2/11/2020).
- Learning #1: Ritonavir scored high for “improvement” in the COVID-19 batch. This means that the language in these papers referenced improvement with patients.
“The 2019-nCoV could be different, and there are initial positive reports that lopinavir and ritonavir, which are HIV protease inhibitors, have some clinical efficacy against 2019-nCoV, similar to prior studies using them against SARS.”
“Lopinavir is one kind of protease inhibitor used to treat HIV infection, with ritonavir as a booster. Lopinavir and/or ritonavir has anti coronavirus activity in vitro. Hong Kong scholars found that, compared with ribavirin alone, patients treated with lopinavir/ritonavir and ribavirin had lower risk of acute respiratory distress syndrome (ARDS) or death caused by SARS-CoV. Lopinavir/ritonavir has also been clinically tested in treatment of COVID-19, and showed wonderfully effective treatment for some patients, but the general clinical effect has not been determined.“
- Learning #2: Remdesivir scored high for “improvement” in the COVID-19 batch. This means that the language in these papers referenced improvement with patients.
“Research should continue to be undertaken to screen other clinically available antivirals in cell culture models of 2019-nCoV, in hopes that a drug candidate would emerge useful against the virus that could be rapidly implemented in the clinic. One promising example could be remdesivir, which interferes with the viral polymerase and has shown efficacy against MERS in mouse models.“
“Nelfinavir was predicted to be a potential inhibitor of SARS-CoV-2 main protease. The first patient in the US had been trial-treated with intravenous remdesivir (a novel nucleotide analogue prodrug in development) due to a severe infection. No adverse reactions were observed during the administration, and the patient’s condition was effectively improved.”
- Learning #3Hydroxychloroquine and Vitamin C scored high for “improvement” in the coronavirus batch. This means that the language in these papers referenced improvement with patients.
“Members of the quinoline family, such as chloroquine and hydroxychloroquine, have shown antiviral activity against several viruses, such as coronaviruses, human immunodeficiency virus, and respiratory syncytial virus. Concerning Flavivirus, quinoline derivatives have proved active against the Hepatitis C virus, West Nile virus, Japanese Encephalitis virus, Zika virus, and dengue virus.”
“In one paper, published in 2017, we investigated the relationship between diet and upper respiratory tract infections (URTI) . Using data from the web-based food frequency questionnaire we found an inverse association between intake of vitamin C, vitamin E, docosahexaenoic (DHA) and arachidonic acid (AA) and risk of URTI among women, while intake of vitamin E and zinc was associated with an increased risk of URTI among men.”
- Learning #4: Chloroquine scored low for “improvement” in the coronavirus batch. This means that the language in these papers didn’t reference improvement with patients.
“As cyclophilin is a critical host factor responsible for the replication of many members of the Coronaviridae family, cyclosporine A was suggested to be a pan-coronavirus inhibitor. In another example, chloroquine was shown to have anti-FIPV and anti-inflammatory activities in vitro and further relieved clinical symptoms in FIP-infected cats. The compound, however, poses safety concerns and it may inflict liver damage.”
- Learning #5: Tamiflu (oseltamivir) and Ibuprofen scored low for “improvement” in the coronavirus batch. This means that the language in these papers didn’t reference improvement with patients.
“Some potential candidates for symptom relief such as steam inhalation have not been shown to be helpful whilst ibuprofen had no significant benefit and may cause harm.”
“The most recent clinical trial (NCT02293863) aimed to investigate the safety and clinical activity of a single intravenous (IV) dose of MHAA4549A in adult participants hospitalized with severe influenza A in combination with oseltamivir was recently concluded and results indicated no advantage on any of the primary clinical outcomes evaluated when compared with the standard of care.”
Here is the interactive view from our platform of the publications that mentioned "coronavirus":
Here is the interactive view from our platform of the publications that mentioned COVID-19:
We are currently performing additional NLP analysis to interpret and extract intelligence from the scientific papers. We are also looking at expending the database of scientific papers beyond the COVID-19 Open Research Dataset as well as performing text analysis from non-scientific articles.