Millions of research papers at risk of disappearing from the Internet

  A study identified more than two million articles that did not appear in a major digital archive, despite having an active DOI.Credit: Anna Berkut/Alamy
Foto
A study identified more than two million articles that did not appear in a major digital archive, despite having an active DOI.Credit: Anna Berkut/Alamy https://www.nature.com/articles/d41586-024-00616-5
Tipo de evento
Descripción

 

An analysis of DOIs suggests that digital preservation is not keeping up with burgeoning scholarly knowledge.

More than one-quarter of scholarly articles are not being properly archived and preserved, a study of more than seven million digital publications suggests. The findings, published in the Journal of Librarianship and Scholarly Communication on 24 January1, indicate that systems to preserve papers online have failed to keep pace with the growth of research output.

“Our entire epistemology of science and research relies on the chain of footnotes,” explains author Martin Eve, a researcher in literature, technology and publishing at Birkbeck, University of London. “If you can’t verify what someone else has said at some other point, you’re just trusting to blind faith for artefacts that you can no longer read yourself.”

Eve, who is also involved in research and development at digital-infrastructure organization Crossref, checked whether 7,438,037 works labelled with digital object identifiers (DOIs) are held in archives. DOIs — which consist of a string of numbers, letters and symbols — are unique fingerprints used to identify and link to specific publications, such as scholarly articles and official reports. Crossref is the largest DOI registration agency, allocating the identifiers to about 20,000 members, including publishers, museums and other institutions.

The sample of DOIs included in the study was made up of a random selection of up to 1,000 registered to each member organization. Twenty-eight per cent of these works — more than two million articles — did not appear in a major digital archive, despite having an active DOI. Only 58% of the DOIs referenced works that had been stored in at least one archive. The other 14% were excluded from the study because they were published too recently, were not journal articles or did not have an identifiable source.

Preservation challenge

Eve notes that the study has limitations: namely that it tracked only articles with DOIs, and that it did not search every digital repository for articles (he did not check whether items with a DOI were stored in institutional repositories, for example).

Nevertheless, preservation specialists have welcomed the analysis. “It’s been hard to know the real extent of the digital preservation challenge faced by e-journals,” says William Kilbride, managing director of the Digital Preservation Coalition, headquartered in York, UK. The coalition publishes a handbook detailing good preservation practice.

“Many people have the blind assumption that if you have a DOI, it’s there forever,” says Mikael Laakso, who studies scholarly publishing at the Hanken School of Economics in Helsinki. “But that doesn’t mean that the link will always work.” In 2021, Laakso and his colleagues reported2 that more than 170 open-access journals had disappeared from the Internet between 2000 and 2019.

Kate Wittenberg, managing director of the digital archiving service Portico in New York City, warns that small publishers are at higher risk of failing to preserve articles than are large ones. “It costs money to preserve content,” she says, adding that archiving involves infrastructure, technology and expertise that many smaller organizations do not have access to.

Eve’s study suggests some measures that could improve digital preservation, including stronger requirements at DOI registration agencies and better education and awareness of the issue among publishers and researchers.

“Everybody thinks of the immediate gains they might get from having a paper out somewhere, but we really should be thinking about the long-term sustainability of the research ecosystem,” Eve says. “After you’ve been dead for 100 years, are people going to be able to get access to the things you’ve worked on?”

Nature 627, 256 (2024)

doi: https://doi.org/10.1038/d41586-024-00616-5

Updates & Corrections

  • Clarification 05 March 2024: The headline of this story has been edited to reflect the fact that some of these papers have not entirely disappeared from the Internet. Rather, many papers are still accessible but have not been properly archived.

References

  1. Eve, M. P. J. Libr. Sch. Commun. 12, eP16288 (2024).

    Article  Google Scholar 

  2. Laakso, M., Matthias, L. & Jahn, N. J. Assoc. Inf. Sci. Technol. 72, 1099–1112 (2021).

    Article  Google Scholar 

 

Download PDF