On 16 March the COVID-19 Open Research Dataset (CORD-19) was released. This comprises an open-source, machine-readable collection of scholarly literature covering COVID-19, SARS-CoV-2, and the Coronavirus group. This free resource contains over 29,000 relevant scholarly articles, including over 13,000 with full text.
The release of the dataset is a result of a collaborate effort between the Allen Institute for AI, Chan Zuckerberg Initiative, Georgetown University, Microsoft, and the US National Library of Medicine (NLM). This resource is intended to mobilize researchers to apply recent advances in natural language processing to generate new insights in support of the fight against this infectious disease.
The CORD-19 dataset is available on the Allen Institute’s SemanticScholar.org website and will continue to be updated as new research is published in archival services and peer-reviewed publications.
Kaggle is hosting a challenge using this dataset and at present there are 10 initial tasks for people to work on. These key scientific questions have been drawn from the National Academies of Sciences, Engineering, and Medicine’s research topics and the World Health Organization’s R&D Blueprint for COVID-19.