Supporting Nuclear Data Evaluation through NLP

Nuclear science research and applications rely on a system of centrally maintained resources for nuclear data, evaluations, and literature. All of these resources are fed by a constant torrent of new journal articles published in over 80 mainstream journals: in 2020, there were over 4000 papers published in the top nine nuclear science journals alone. Currently, searching, categorizing, and tabulating these articles is a manual and laborious process. Assuming 30 minutes per paper of processing time (used to read the paper, categorize it, and extract all relevant data), then the amount of effort needed to simply keep up with new literature corresponds to roughly one full-time PhD-scientist. Automation is not just desirable, it’s a requirement.

NucScholar is a project to automate the processing of nuclear science literature to simultaneously boost researcher productivity while lowering the effort required to maintain important databases. We leverage significant developments in natural language processing (NLP) that are available through open-source libraries, such as PDFMiner, Gensim, and the Bidirectional Encoder Representations from Transformers (BERT).