Data and Methods

Science Communication and the emerging research around misinformation are highly interdisciplinary. The research pulls from psychology, communication, risk assessment, computer science, sociology, and many other disciplines. The goal of our analysis was to map this research in these various disciplines and to see how these disciplines were connected. Because of this multi-disciplinary requirement, we could not pull from one key author or journal. We needed to start with a seed set that had been verified by experts and that included these various disciplines. With this seed set, we could then create maps of the emerging field and apply automated methods for finding related papers through citation links and word similarity.

Some questions we want to address are:

  • What fields are producing papers in science communication and misinformation research? How are those fields connected through these papers?
  • What authors are working this space? Who is collaborating? And what authors are central to the research conversations?
  • What keywords and phrases are being used in this field? How do the fields connect through these commons phrases?
  • Can we automatically identify related papers in this research area, in order to expand our view of the research?

To begin our analysis, we used two bibliographies. The first set we call “Collection 1” (C1). This set consisted of papers in an Endnote library assembled by the National Academy. Details on how this Endnote library was constructed is provided below. The second set we call “Collection 2" (C2). This collection of papers comes from the National Academy report on “Communicating Science Effectively: A Research Agenda”. This report was published in 2017 and included many of the top scholars in the field of science communication. C1 focused more on misinformation research. C2 focused more generally on science communication.

In parts of the analysis, we separate C1 and C2. We do this when separating the misinformation literature from the science communication literature. For most of the analysis, though, we combine C1 and C2. We call this combined set C3. For this report, we matched these titles to records in our database and in the Web of Science (WoS) and the Microsoft Academic Graph (MAG). For all matched papers, we extracted the titles and abstracts. For many of these records, we had to add bilbiographic data by hand given the difficulty in finding and matching records.

The final seed set for C3 included 1,075 papers. There were 840 records extracted from the Endnote library (C1). There were 298 records extracted from the National Academy review of science communication (C2). There are 63 papers that existed in both C1 and C2. The details of the data collection are as follows:

NAS bibliography on misinformation (C1)

  • 1,918 references based on search criteria
  • Matched 840 papers with WoS IDs
  • Non-matches included recent papers (2018), papers from other databases, etc.
NAS report on science communication (C2)
  • Extracted 428 references from 23 pages of (unstructured) references
  • Matched 330 papers with DOIs
  • Matched 298 papers with WoS IDs
  • Non-matches include books, reports, etc.
Total matches with WoS IDs for SEED SET: 1,075 papers
  • 298 papers from C1; 840 papers from C2
  • 63 papers in both C1 and C2
  • Citations to/from this core set

Additional details on the inclusion and exclusion criteria for the seed sets can be found in this document. This also includes tables of most cited papers and venues represented in this set.

Click here to download the list of publications in these seed collections.

Next Steps

What data is missing? Do the results make sense? Did you find “connectors”? What other bibliographic features should be added? What papers should not be in the set?

These are all questions we are interested in exploring. Please send us feedback and we will update our data and results. We will continue to add more papers as we refine our automated bibliography methods.


Please contact us if you have any questions or comments about the tools, data or other content.