- Katherine Boss
- Meredith Broussard
- Nora Paul
- Ben Welsh
Remember that story you read online in 2005, the one with the cool Flash graphics? How about that amazing interactive data visualization that you saw way back when, the one that made you want to level up your news nerd game? Good luck finding those stories today. Data journalism is disappearing from the web.
Data journalism is more fragile than most people realize. Every time a news organization reorganizes its staff or updates its CMS or stops paying the bill for the data team’s servers, complex data journalism projects are lost. Conventional archiving methods, like the Internet Archive’s crawlers or the automated archiving feeds of companies like Lexis-Nexis, are no longer sufficient to capture projects that involve big data, databases, streaming data or interactive graphics.
In this session, we’ll discuss why data journalism is the new digital ephemera, and we’ll explore the state of the art for archiving. We’ll talk about strategies data journalists can use to preserve their own work and how news organizations can better preserve their valuable digital assets. Finally, we’ll report on how journalists, librarians and scholars are thinking about future-proofing the news.
Nora Paul (NP)
There's no one whose job it is to advocate for saving old news; if you do that's not all they're doing.
News orgs have never been good at preserving the news.
Ben Welsh (BW)
Having CMS be archive ready is best way, but we aren't (usually) CMS people.
5-10 years old stuff is probably dead.
The Five Commandments
I. Thou shalt not make a mess and expect someone else to clean it up for you. II. Thou shalt publish as static files immediately or eventually. III. Thou shalt not depend on rando links. IV. Thou shalt version your CSS and base templates. V. Thou shalt see the big archives as a platform.
Katherine Boss (KB)
Internet archive is only searchable by date.
The problem: Our stuff is dynamic, libraries haven't figured out what to do with it.
Maybe we can monetize our archives.
Flash. :(
Emulation might be a solution.
For static objects, PDFs. But migration is not successful for dynamic projects.
Reprozip, the reproducibility packer.
Meredith Broussard (MB)
Four recommendations for what you can do.
This has to happen at the institutional level. Individuals should save their own work, but that's not enough.
- Take a video. Walkthrough of your project.
- Bake out. Static versions of dynamic pages.
- Plan for the future. Sunset plan at time of launch.
- Work with libraries, institutions and commercial archives.
These are human issues, not computational problems. What are those?
NP: Difference between libraries and archives. Preservation v. access.
BW: People underestimate the risk to their work until they lose something they care about. And people who do know and care don't know how they can make a difference. There needs to be "The Checklist" to make sure stuff is archive-ready.
MB: We need digital archivists and libraians back in newsrooms
What are some resources that can help?
- Reprozip
- Dodging the Memory Hole
- Internet Archive API tools
How do you think about linkrot?
Local files vs. CDNs for updates?
Technical and design challenges of archiving?
Katherine Boss is the Librarian for Journalism, Media, Culture and Communication at New York University. Her research focuses on archiving and preserving born digital news content, and she is the co-leader of the Archiving and Preserving News Applications working group of the Journalism Digital News Archive. She holds a bachelor’s in Journalism, a master’s in Library and Information Science, and a master’s in Media Studies. @katy_boss
Meredith Broussard teaches data journalism at NYU's Arthur L. Carter Journalism Institute. Her current research focuses on artificial intelligence in investigative reporting, with a particular interest in using data analysis for social good. Her new book is "Artificial Unintelligence: How Computers Misunderstand the World." @merbroussard or meredithbroussard.com
Nora Paul is co-author of Future-Proofing the News: Preserving the First Draft of History. She is the former director of the Minnesota Journalism Center at the University of Minnesota where she also taught classes on information strategies. Formerly at the Poynter Institute as a faculty member and at the Miami Herald where she ran the news research library. Now blissfully retired, but happy to share her perspective on the archiving panel.
Ben Welsh is the editor of the Data Desk, a team of reporters and computer programmers in the Los Angeles Times newsroom. He is also an organizer of the California Civic Data Coalition, an network of journalists working to open public data, and the founder of PastPages, an open-source archive dedicated to better preserving digital news.
Description and speakers from official schedule