Digital Archive on Paper (Part I)

I recall reading a story a while back (I wish I could find it) where someone was doing research and followed a reference to a journal article about data NASA gathered during its lunar missions. This researcher wanted to re-evaluate some of the results (or perhaps compute different results) and needed access to the raw data. As I recall the old raw data was eventually found after a search, but stored in some strange compressed format that no one alive knows.

I have the impression that scientific articles almost never contain raw experimental data. The same problem exists in computer science; although there is less experimental data, there is more important missing information, source code. All this ancillary data could be useful to researchers who want to follow up on the research. This data additionally could enable fraudulent results to be more easily caught. In today’s digital age, providing access to ancillary data ought be practical.

When ancillary data is provided in modern article, the typical solution is to provide electronic access via a URL reference written in the article. Often it is up to the author to maintain this URL resource, and given the number of dead links on the WWW, authors do a poor job of this. The life span of journal articles is much greater than the author can be reasonably expected to support.

Authors are not experts in archival (and neither am I). Someone more responsible is needed to maintain this data. One obvious choice is the journal publisher. However, as far as I have been able to tell, journal publishers don’t generally support storage of data. Also, I am not very confident that they would do a sufficient job of maintaining URLs. Furthermore, even if the do support archiving ancillary data, the life span is limited to the life span of the journal publisher, which is still often shorter than the life span of journal articles.

Journal articles last as long as libraries support access to them, so it makes sense to look to libraries to provide the service of supporting electronic ancillary data. But I am not aware of any (university) library that provides this service. And unless all libraries kept individual copies of the data, its lifespan would be limited to the lifespan of an individual institution.

In my opinion, the best way of having sufficient lifespan of ancillary data is to actually attach the data to the article itself. For electronic documents this is not too difficult. Adobe’s PDF format supports file attachment to documents. With this method, the ancillary data lasts as long as the document. I have started using this feature. For instance, I attach my Haskell source code to my recent document published at the university’s WebDOC service. Unfortunately authors of journal articles often don’t have the option of attaching files to PDF documents. Mathematics and computer science journals usually require LaTeX source code. For example, the arXiv generates PDF from LaTeX sources. I don’t know of a way of attaching files to those PDFs. Furthermore the longevity of electronic journal articles has yet to be proven.

This brings me to what I see as best solution for the current state of affairs. In addition to electronic attachments for electronic version, we need to print the digital data on paper and include it with the printed publication.

Digital Archive on Paper (Part I)

Tags