Is your “cold data” burning a hole in your pocket and the environment?
Q: Christof, the term “cold data” is used to define data that is infrequently accessed, typically what percentage of an organizations data falls into this category?
A: There’s no single answer to how much data is typically “cold” in companies, as it can vary significantly across industries and individual organizations. However, here’s some insight and factors to consider:
General Estimates:
- Studies suggest that around 60%-80% of enterprise data can be classified as cold (infrequently accessed).
- This percentage tends to grow over time, as more data is generated but not actively utilized.
Factors Affecting Cold Data Percentage:
- Industry:
- Industries with heavy regulations (financial, healthcare, legal) often need to retain more data for compliance, increasing their “cold storage” pool.
- Companies facing rapid data growth (media, research) also tend to have higher amounts of inactive data.
- Data Management Practices:
- Organizations with strict data retention policies and active data lifecycle management will have a smaller proportion of truly cold data.
- Companies without established practices tend to accumulate more unclassified and unused data over time.
- Business Needs:
- Certain data types (historical sales figures, past project files) might become irrelevant for daily operations but still hold value for analytics or reference later on.
Q: Clearly cold data makes up a high percentage of total data for many organizations, can you provide some advice on why analyzing cold data is important?
A: Companies should invest the effort to analyze their cold data for several reasons:
- Cost Savings: Identifying cold data can help you implement tiered storage solutions – moving it to cheaper, long-term storage options like tape libraries, reducing active storage costs.
- Security: Reducing the active data footprint can minimize cyber-attack risk surfaces.
- Efficiency: Archiving cold data keeps core systems uncluttered, improving performance for frequently used information.
Tips for Estimation:
- Data Classification Tools: Software can analyze data access patterns and categorize it based on frequency.
- Consult Your Internal Experts: Interview IT personnel, departmental heads, and data analysts to identify what data is actively used and what could be archived.
Remember, the focus should be on identifying truly inactive data, not just arbitrarily aiming for a percentage.
Q: If we just focus on the cost implications, can you estimate the annual cost of storing 1TB of data on a HDD?
A: Unfortunately, there’s no single definitive answer to how much energy it takes to store 1TB on HDDs in a data center for one year. Here’s why and how we can estimate:
Factors influencing energy consumption:
- HDD Type: Energy usage varies between consumer-grade drives, enterprise HDDs, and the specific model.
- Idle vs. Active Power: HDDs consume less power when idle (not actively reading/writing data). Data access frequency impacts overall consumption.
- Data Center Efficiency: Power Usage Effectiveness (PUE) measures a data center’s energy efficiency. A lower PUE means less overhead energy used for cooling, etc.
- Power Source: The data center’s electricity source (renewable vs. fossil fuels) doesn’t impact the HDD itself but does affect the environmental footprint.
Making an Estimate:
Let’s use some assumptions to get a ballpark figure:
- HDD: A typical enterprise-grade HDD might consume 5-10 watts when active, and 1-3 watts when idle.
- Usage Pattern: Let’s assume a moderate amount of activity, averaging around 5 watts per hour.
- Power Usage Effectiveness (PUE): We’ll use a PUE of 1.5 (reasonably efficient data center). This means for every watt the HDD uses, an additional 0.5 watts is used for overhead.
Calculations:
- Daily Consumption: 5 watts/hour * 24 hours/day = 120 Wh (watt-hours) per day.
- With PUE: 120 Wh * 1.5 = 180 Wh per day.
- Yearly Consumption: 180 Wh/day * 365 days/year = 65,700 Wh, or roughly 65.7 kWh (kilowatt-hours) per year.
Important Notes:
- This is a rough estimate. Actual consumption can be higher or lower depending on the factors mentioned earlier.
- Energy cost depends on your electricity provider’s rates.
Useful Resources:
- Backblaze regularly publishes HDD power consumption data: https://www.backblaze.com/b2/hard-drive-test-data.html
- PUE is a ratio that describes how efficiently a data center uses power, learn more about PUE here: https://www.thegreengrid.org/
Q: How would the costs compare for storing that same 1TB of data on a tape library?
A: Here’s how storing 1TB on tape compares to HDDs in a data center, focusing on energy consumption:
Tape Library Advantages
- Extremely Low Idle Power: Tape libraries consume minimal power when not actively accessing data. A tape cartridge at rest uses virtually no energy.
- Reduced Cooling Needs: Due to their low power draw, tape libraries generate less heat, significantly reducing the energy burden of a data center’s cooling systems.
- Offline Nature: Tape offers inherent offline protection, mitigating some cybersecurity risks and associated energy costs needed for constant online security measures.
Rough Estimate for Tape:
It’s difficult to give a precise energy figure for tape, as it depends heavily on how often they’re accessed. However, in a primarily archival scenario, let’s assume:
- Monthly Tape Access: Tapes are read/written once per month for backups or verification.
- Active Power Draw: Library and drive consume ~200 watts for a few hours during this activity.
Calculations:
- Monthly Consumption: (200 watts * 3 hours) = 600 watt-hours, or 0.6 kWh per month
- Yearly Consumption: (0.6 kWh/month * 12 months) = 7.2 kWh per year.
Comparison
- HDD Estimate: ~65.7 kWh/year
- Tape Estimate: ~7.2 kWh/year
Key Takeaways:
- Tape libraries have the potential to massively reduce energy consumption compared to always-on HDD storage, especially for infrequently accessed data.
- This energy benefit translates directly into cost savings and a lower environmental footprint.
Important Notes:
- If data on tape needs frequent access, the power savings diminish.
- Tape libraries have upfront costs, so the total cost of ownership (TCO) comparison with HDDs needs to be done over a longer period, however, through the Cristie READY program customers can gain all the benefits of tape archive without upfront costs with our true OPEX “pay-per-use” model.
Q: Thanks Christof, clearly tape offers vast energy savings plus added security benefits for cold data, however, Cristie offer tape backup and archive from our data center’s located in wind turbines and powered directly from renewable energy at source. Surely that provides an incredible double whammy for companies looking for secure backup and archive while massively reducing their carbon footprint?
A: Absolutely! The benefits of our windfarm data centers and the Cristie READY program are far reaching, let’s cover that in more detail in our next Q&A session.