Note: A longer version of this story originally appeared in the Spring 2009 Duke University Libraries Magazine
Part detective, part digital archaeologist.
That’s how electronic records archivist Seth Shaw sees himself.
Unlike archivists of generations past, he doesn’t restore medieval manuscripts. Instead, he excavates data from 3½-inch floppy disks, digital camera memory cards, and hard drives circa 2000. A Nobel Laureate in economics even offered Shaw his username and password in order to donate his e-mail correspondence to Duke Libraries.
Shaw is a part of Duke's new digital information strategy to explore new methods for managing and archiving the deluge of digital information. The goal is to ensure the university’s vast and varied digital output – from course Web sites and dissertations to wikis and raw scientific data – will be available to future scholars, for uses we can’t currently imagine.
“The academy is based on building on the work someone has done before you,” says Paolo Mangiafico, director of Duke's digital information strategy. “We need to provide incentives for people to share data, help other people get to that data and mash it up, and make sure the stuff persists over time. Someone might not think to do those mash-ups until 20 years from now.”
The efforts are part of a new university initiative, funded in part by a Mellon Foundation grant, aimed at developing not just a “digital attic,” but a technological infrastructure and a set of policies to find new uses for previous work.
Mangiafico is reaching out to faculty, archivists and information technology staff across the university as part of the initiative, a joint project of the Office of the Provost, Duke Libraries and the Office of Information Technology. (To find out more, contact him at firstname.lastname@example.org.)
“What worries me is the stuff I don’t know about – keeping track of the new content that comes up that doesn’t have that paper equivalent,” says University Archivist Tim Pyatt. “Hundreds of us are trying to find these solutions independently. We need to be thinking about this together, so we’re not spending multiple resources to solve the same problem.”
Unlike the archivists of generations past who could set aside boxes of letters or old photographs for later cataloging, Seth Shaw faces the twin ticking time bombs of technology obsolescence and “bit rot.”
“When you stick papers in a box, you don’t have to go back and check every month to make sure they’re still there," Shaw says. "You don’t assume the box is spontaneously going to die on you, like a hard drive might.”
These new-age archivists are forced to make up a lot of it as they proceed.
Paolo Mangiafico is director of Duke's digital information strategy. In an effort to capture the “first rough draft” of Duke’s current history, for example, Shaw wrote his own computer program to copy seven years’ worth of multimedia files and press releases off servers and hard drives in the Office of News and Communication. He left with 211 gigabytes’ worth of university news – and the knowledge that each passing day generates a new flood of digital data he can’t possibly hope to sift through, let alone store.
Meanwhile, he also faces growing skittishness among prospective data donors, who fear what one Duke student called the “promiscuous access” of online data sharing.
“It’s one thing to have a box of papers on the shelf in the library that someone might pull down,” Shaw says. “It’s a whole different story if you donate your papers and someone could type your name into Google and find your files.”
In an era when the first draft of scholarship is written in wikis and blogs and researchers can store an entire career on a Flash drive, Shaw sometimes feels as if he’s trying to outrun an avalanche. “There is no future-proofing," he says. "We’ll always be trying to keep up with technology.”
“It’s hard to decide what’s important in advance, but the tools and infrastructure we build now need to factor in the long term,” Mangiafico says.
Only through that kind of forethought and coordination can the university facilitate the kind of data-driven “mash-ups” that will fuel the next generation of unexpected collaborations, Mangiafico predicts: “With enough eyeballs, you make better discoveries.”