Vanishing Act: The Erosion of Online Footnotes and Implications for Scholarship in the Digital Age

By Michael Bugeja and Daniela Dimitrova

Chapter 1: Extinct Citations, Missing Links and Other Bibliographical Wonders from Vanishing Act

A decade ago, most research was done in the library rather than through its Web site, and scholars, editors, graduate directors and librarians were meticulous about the integrity of footnotes. They knew that citation was the backbone of research, from agronomy to zoology in the sciences and from art history to Zen studies in the humanities. The footnote upheld standards because it allowed others to test hypotheses or replicate experiments. Testing and replication are at the heart of the peer review and scientific processes upon which academe is based, from papers by first-year and transfer students to grants by postdoc and professor.

Because so much depended on the foundation of all scholarship, the footnote, academicians admonished students for sloppy or erroneous citation. This was the norm even a decade ago when most research was done in the library rather than through its Web site. Our discipline of communication scholarship was as exacting as any other in the academy, especially when it came to footnotes. Students submitting dissertations and faculty, journal articles, were fastidious about the accuracy of footnotes, knowing that their reputations relied on the fine print at the bottom of the page or at the end of the manuscript. Unacceptable were citations that simply named the source without specifying the document, as in “U.S. Mint, 801 9th Street NW, Washington, DC 20220-0001.” The worst types of mistakes would contain particulars, including an article’s title and date of publication, but might locate it in the wrong volume and issue of a journal. Indeed, if dissertation advisers went to the stacks to verify citations, as they often did, they would be aghast at checking a citation and finding none in any volume or number, or finding it with wrong pages or other particulars, and discovering a journal with those pages ripped out and missing. Those mistakes could doom a letter of recommendation for a job or advanced study.

Now scholars don’t worry so much about footnotes. The emphasis in the Internet age has been more on access to data rather than retrieval thereof, with academics promoting that access via technology initiatives that have all but transformed some university libraries into computer centers with gourmet coffee rather than learning centers with expert archivists. The phenomenon of vanishing citations seems more like a technological glitch—a downed server or corrupt file—than a lapse in methodology. The typical student, professor and researcher now seems to overlook the disappearance of primary sources in an article or a document, rather than questioning where those sources went or trying to recover them, merely because the Internet glitches so frequently, that the convenience of online research would be severely undermined, if we kept our meticulous ways. As we will learn, the tilt toward convenience over substance has put at risk peer review and scientific process upon which research, invention and innovation have been based since the Enlightenment.

Indeed, Davis argues that convenience may play a role in the increasing popularity of online citations. Our research pivots on three time-tested aspects of accountable scholarship:

  • Text must be stabilized. Researchers must have access to original documents rather than manipulated versions thereof.
  • Citations in those documents must be retrievable. Researchers must have access to archives that guarantee both the longevity of technical formats and cited URLs.
  • The digital library must remain a repository of fact as much as a dissemination point of information. The emphasis on the latter has eroded the former so that no universal archive exists to ensure the permanency associated with the scientific tradition.

The digital library is an oasis of convenience rather than an oasis of escape for the curious mind. With the advent of easily accessible data from a library open online at all hours, citation mistakes are common and routinely overlooked. This is not to say that footnotes are out of fashion; quite the contrary, if you analyze the implosion of citation in scholarly articles or conference papers. Citation indices in library databanks or on Google Scholar are gauges of how influential a professor’s work has been over time. In fact, it is easier than ever to cite an online source with the select, copy and paste function of the typical computer. It only makes sense that footnoting sources would become more plentiful in the digital age because the platforms have multiplied. American Psychological Association style, for instance, has formats for articles from online periodicals; online periodicals with assigned digital object identifiers (bar codes); articles from a database; digital abstracts; electronic newspaper articles; electronic books; sections of Web documents or online book chapters; online book reviews, dissertations and theses from a database; online encyclopedias and dictionaries; online bibliographies and annotated bibliographies; digital data sets; interactive maps and other graphic representations of data; qualitative data and online interviews; online lecture notes and presentation slides; non-periodical Web documents; Web pages or reports; computer software and downloaded software; e-mail; online forum or discussion board postings; blogs (weblog) and video blog posts; wikis; and audio and video podcasts. We have become more meticulous about identifying the digital source rather than fact-checking to see if it has vanished.

There is no question that Internet and other digital technologies and applications are fabulous resources for quick and easy source material used in footnotes. But new media platforms and applications are abysmal when it comes to later retrieval, primarily because text and graphics accumulate in a device, server or databank, and those are the domains (literally and figuratively) of computer science whose god is the server and file system, the angelic orders. There is no history or tradition associated with libraries that honors these deities the way that librarians used to honor fact in the archival repository. New librarians are apt to be digital natives skilled in social networks but may forget why they should be fastidious about fact. Our book challenges them as well as information and computer scientists whose god is the file folder rather than the book shelf. This study continues to challenge advocates of online scholarship to stop touting the convenience of easy access and start resolving issues of later retrieval. Despite what literary natives know about the importance of reliable footnotes in literature reviews and scientific protocols—especially in the medical arts—academia has looked askance at suggestions that footnote accuracy must be maintained if the integrity of scholarship is to endure. Our book asks “why.” Why should Internet-based scholarship be less trustworthy than that of the paper-era? We also want to know the four other “W”s and “H”:

  • When do Internet citations disappear?
  • Where do disappearing citations go?
  • Who is responsible for preserving them?
  • What can be done to lengthen the lifespan of those citations?
  • How can that be achieved?

We began our exploration of this phenomenon more than seven years ago when co-author Michael Bugeja, writing a book about Internet, no less, checked his citations before submitting his manuscript to Oxford University Press, and discovered that almost half of his Web citations had vanished. This was 2003, and Bugeja and research partner and co-author Daniela Dimitrova had not yet considered the phenomenon. Literary natives, both were aghast at the implications for scholarship in the digital age. Bugeja had to inform his editor that delivery of Interpersonal Divide: The Search For Community in a Technological Age would be delayed for weeks while he searched for missing URLs and made photocopies of every citation, to prove the accuracy of his citations, if ever challenged. Meanwhile, Dimitrova researched the effect of “link-rot” in library science. One early chronicler of the effect was the Web Surveying Team at the Georgia Institute of Technology, which reported lapsed URLs in a survey . About the time we noticed the impact of lapsed footnotes, a study was published examining Internet footnotes in New England Journal of Medicine, The Journal of the American Medical Association and Science, noting that after two years, 13% of online references were inaccessible .

That is how our journey began, and here for the time being, it will end, with our sounding a warning on how vanishing footnotes over time will result in ever unreliable literature reviews and follow-up studies; be the cause of failed experiments in the hard and medical sciences because data of previous studies no longer can be retrieved; and seriously erode the methodology of history and media history whose research procedures rely more than any other discipline on accurate citation.

Our several studies, synopsized here and expanded in this book, contain new data, analyses and discussion. We warn that the decay of footnotes threatens the very tradition of peer review upon which degrees are conferred, articles published, and knowledge advanced. As such, our research has a historical component. In The Footnote: A Curious History,Princeton historian Anthony Grafton observes that citations, especially in scientific (and, by extension, social science) works, contain a compendium of information, such as the intellectual culture of an academic program, the pedagogical methods of its graduate students, and the editorial preferences of its journal. Intellectual culture, pedagogical methods, and editorial preferences are precisely what the current academic generation stands to lose if the phenomenon of disappearing online citations continues. The only prevailing culture will be the most current evolving one; the pedagogical method will be on what can be accessed in the current moment; and the editorial preferences will be those listed on the current masthead. This book will document comprehensively how that scenario is likely to result if editors continue to overlook deteriorating digital citations.

This book also looks at where we started with what we call the “half-life” effect, or how long it typically takes for one half of online citations in a journal to vanish from the Web. Through a methodology that we adapted and perfected over the past several years, analyzing data on individual articles, we can project with reasonable precision when the half-life will occur for an entire journal. We measure how many footnotes have lapsed in each article published in each edition of a journal over the course of a year. That not only gives us an idea of the scope of the linkrot but also a general view of the phenomenon over time. Once the data are retrieved for one journal, we can compare that with others in the same genre, giving an overall view that might yield anything from best and worst editing practices to comparisons of communications journals with those of other disciplines. Compiling data over a period of years makes the half-life phenomenon a more salient issue. Typically in books such as this research methodologies are omitted for the sake of reader interest, because the diction of scholarly analysis is dense; however, we include the methodology in an appendix so that other researchers interested in the half-life effect can document, replicate and advance the preservation of citation. As such, this book also chronicles where we came in our research and what conclusions we can share with other scholars to address the half-life issue. Specifically, we have tracked down the use of online citations by scholars in our discipline over a four-year period and examined the rate of decay of those citations, enabling us to estimate a half-life for online citations in journalism and communication journals. That figure is important in assessing the half-life phenomenon over time. For instance, if editors know the half-life in advance, they can take steps to mitigate its effects, following advice the authors provide later in this book, and then chart the rate of decay to see if it has slowed or quickened since the last analysis. Collectively, then, half-life estimations shared later in this work can be compared with new rates of decay for the same journals in private studies by editors or by communication scholars. In sum, for all these reasons, we opted to reprint the methodology in Appendix A.

It is high time for scholars not only in our disciplines but in the entire academy to understand the half-life phenomenon and the risks associated with online sources. Vanishing online footnotes undermine the building blocks of research, and their disappearance raises concerns about the reliability and replicability of scholarship. That fact alone—replicability—undermines the foundation of peer review, for without footnotes tracking back to accessible sources, the scientific method becomes impossible. No one person is responsible for that method as it evolved alongside the history of science. Its fundamental form is to ask a question, research what has been written about that or similar questions (literature review), formulate a hypothesis, create a methodology to test the hypothesis, collect findings, analyze and discuss data, and posit some conclusions. Publication is a part of that scientific method, for without dissemination of data, so that others may scrutinize hypotheses, findings and/or conclusions, we lack the requisite component of replicability. A study has to be accessible over time for its assertions to be proved correct, partially correct, or wrong. Consequently, the entire infrastructure of this method on which we have based centuries of progress is built on the foundation of the footnote; otherwise, we research on shifting sand, an apt metaphor for the half-life effect.

In science and medicine, this effect could literally be a life-and-death matter. The inability to replicate the research of others would undermine the foundations of scientific research as it has been known. When citations in medical databanks lapse, physicians must search for missing footnotes using the Wayback Machine, which compiles snapshots of the Web at regular time frames, or hunt for missing sources using other Internet tools, all the while making sure to access an identical version of the vanished text. The process takes time that otherwise could be devoted to experimentation, especially as online citations in leading medical journals tend to lapse at a rate of 13 percent over 27 months . In the humanities and social sciences, where the stakes might not be life and death, the entire research enterprise is nonetheless threatened in the same way when citations and even texts turn to vapor . Worse, the academic tradition is being violated, especially in literature reviews, because one study that should be linked theoretically or methodologically to another, no longer can be, because of vanishing links. As a result, studies cannot build upon each other for richer literature reviews and deeper analyses. It was therefore critical for us to inquire whether this phenomenon was being addressed by the people disseminating the data: journal editors. Their views will be shared later in this book, providing additional insights into the challenges associated with Internet citation. Later we also will provide other precautions that scholars can take until online information is stabilized so as to preserve the integrity of peer review and scientific method.

Specifically, the present book focuses on nine leading journals in the area of journalism and communication. Using longitudinal data and a content analysis methodology, we analyzed the use of online footnotes in refereed journal articles over a four-year period (2000-2003). We could have continued our data analyses into 2008 (and perhaps beyond); but we were less interested in charting recent vanishings and more interested in documenting when citations first began to lapse, showing the acceleration to the point where the half-life could be predicted. We were equally as interested in charting the effect in traditional vs. new media journals of communication, to see if content about digital platforms also resulted in more lapsed citations; then we wanted to compare those data with that of media history journals, as the latter’s primary methodologies all assume footnote reliability. In sum, we found it more useful to publish findings from our own historical perspective as one of the first research teams to study the effect and the only communication-based team to do so in depth. As such this book builds upon and extends our previous work.