Big Data, Smart Savings: Duke’s Data Attic Cuts Research Storage Costs by 63%
By Tracey Collins
When Dr. Mark Palmeri, a faculty member in Duke’s Biomedical Engineering department, was faced with the challenge of managing over 1 petabyte of ultrasound research data, he turned to Duke’s new low-cost storage solution, the Data Attic, for his archival needs.
Mark’s research group had accumulated massive amounts of data over the years— active NIH-funded projects, retired faculty, and student projects. Their existing storage setup was expensive and difficult to manage. With grant funding winding down and faculty transitions on the horizon, Mark needed a smart, sustainable way to store and organize 200 TB (20% of his total data) for archiving.
“As the Duke Ultrasound research group has grown, so have our data storage needs. This constant growth in our data storage footprint has required us to get more aggressive in our data archiving protocols as we cannot maintain so much live data online,” commented Mark.
An Innovative Shift in Strategy
Initially, Mark planned to organize the data attic by individual projects or users. However, after collaborating with the Compute and Data Services Alliance (CDSA) team, he realized that organizing by lab made more sense. This approach allowed the team to preserve historical data, simplify management, and prepare for future transitions.
“We are re-thinking how we do group-wide data storage. As we have migrated to file storage hosted within the research computing environment, the ability to partition storage based on funded research project may make more sense and allow our total data storage needs to be more strategically defined by project, and not research group,” Mark said. “The Duke Data Attic is another resource that is pushing us in that direction as it makes scoping data storage needs by project, not PI, easier to budget and allocate resources to long-term.”
Instead of uploading raw data directly, Mark used a tool called Restic, which compresses and deduplicates data while preserving its history. This allowed him to store more data using less space and keep track of changes over time. With support from the CDSA team, he received secure access keys to upload data directly, bypassing slower tools and streamlining the process.
Big Savings, Smart Planning
By switching to the Data Attic, Mark dramatically reduced his current and future storage costs. Previously, storing 200 TB in Data Commons would cost around $13,000 per year. By migrating archive data away from valuable computational storage to the Data Attic and using Restic’s compression (estimated at 30%), he now pays about $1,260 annually for 36 TB, with a long-term projection of $4,760 for 136 TB—a projected long-term cost savings of 63.4%.
Every faculty member is allocated 2 TB of free Ceph S3 storage, Ph.D. Students receive 500 GB of free Ceph storage, and if additional storage is needed for faculty, the cost is $35/TB annually.
“The Duke Data Attic is the most affordable option that we have access to now,” Mark said. “It allows us to easily retrieve data if needed, meaning that the lab members are not hesitant to designate datasets for archiving, which was a previous barrier to usage.”
The team also developed a plan for what to do when grants end or faculty retire. Data is now funded through discretionary codes, and sunset policies help manage retention. This ensures that valuable research data isn’t lost—and that costs stay manageable.
Redefining the Path Forward for Other Researchers
Mark’s experience is now being used as a model for other Duke researchers with large data needs. His collaboration with the CDSA team helped uncover technical challenges, improve tools, lower costs, and shape best practices for data management.
“Using the Duke Data Attic means that we do not have to try to grow our live storage beyond the current one petabyte ceiling we currently have, which helps us avoid spending so much of our grant budgets on data storage needs,” Mark said.
For more information about the Data Attic, contact cdsa_research@duke.edu or join the CDSA mailing list to stay connected.