Dark Data: What is it and Why Should I Care?

Posted by Ben Austin on Dec 8, 2014 5:15:00 PM

Dark-Data

In this current world of business, organizations are amassing and storing more data than ever before. We truly are in the age of Big Data, which often presents as many challenges for growing companies as it does benefits.

Most of the data being gathered by your organization is going to be used to improve something about the way you do business. Whether it's information about how your users are utilizing your product, results gathered from your marketing efforts, or internal statistics about your development processes, your company's constantly growing data is a major asset that, with the correct analysis, can increase your bottom line.

But along with that valuable data, your company is almost certainly also storing an increasing amount of data that has no real tactical value at all. Gartner has deemed this unmanaged information as "Dark Data." Sure, it sounds a bit dramatic, but realistically, the increasing amount of this unstructured information being stored by organizations is a costly and potentially risky endeavor that some believe could become a major speed bump along the Big Data highway.

Let's take a look at what dark data actually is, how it could impact your organization, and what steps you can take to manage it at your own organization.

What is Dark Data?

As with many buzz terms that float around the Web, the exact definition of "Dark Data" can be hard to nail down. According to Gartner, which originally coined the term, dark data is defined as, "the information assets organizations collect, process and store during regular business activities, but generally fail to use for other purposes."

By this definition, much - if not most - of the information your organization stores could be referred to as dark data. This is because, as useful as data can be, the majority of the information we tend to hold on to is simply collateral, in that we feel the need to hold onto it in case you need to prove that something occurred in the past, but is almost entirely obsolete for any other use.

Specific examples of what could make up all of your dark data will be wide-ranging on a company-to-company basis, but any of the following could absolutely fall under this fairly broad term if they are outdated or unstructured:

  • Customer Information
  • Log Files
  • Account Information
  • Previous Employee Data
  • Financial Statements
  • Raw Survey Data
  • Email Correspondences
  • Notes or Presentations
  • Old Versions of Relevant Documents

What's the Problem with Dark Data?

There are many issues associated with dark data that can become more prevalent as time goes by. If you think of dark data as the clutter that is amassed inside the house of a hoarder, the first problem becomes obvious: Space. As that unorganized data continues to grow, it takes up storage that could otherwise be used for your valuable assets. More storage means more overhead costs, which - particularly in the era of Big Data - is already a significant concern in most organizations.

Aside from increased storage costs, having large amounts of unstructured or unorganized data can potentially lead to serious security risks. Along with outdated and seemingly useless documents, dark data will likely also contain sensitive, proprietary information. If you haven't seen the news, data breaches - like the one that just rocked Sony Pictures - are becoming more and more prevalent each week. Just because employees at your organization don't want to take their time to go through piles of old information doesn't mean that hackers aren't willing to mine that data for years-old embarrassments that your company had hiding in the basement. 

On the other end of the spectrum, your organization may also be missing out on some great opportunities by allowing dark data to steadily build up in your database. Along with extremely sensitive information that could be potentially harmful in the case of a breach, there's likely going to be a lot of untapped potential inside that mass of information. As with the hoarder and their overabundance of useless stuff, it's difficult for your company to find the information that could be truly valuable amid a giant mass of unstructured legacy data.

How Can Dark Data be Managed?

While you'll likely never be able completely rid yourself of legacy data, that's not necessarily a bad thing. Your goal shouldn't be to toss out any information you're not currently using. Rather, it should be to have a process in place that allows you to manage and organize your legacy data in order to keep the risks and costs associated with dark data at reasonable limit. 

Audit and Prune your Database

Do regular audits of your entire databases and make sure you have a process for getting rid of the old, unneeded data. Nail that down as early as possible, and stick to it moving forward. This won't necessarily make up for the lack of organization of your previous information, but it will surely slow the build-up of new dark data, which will be helpful in the future.

Part of that process should include the pruning of old data. This isn't necessarily data dumping, but it's a bit more than mining for hidden gems. The goal here is simply to provide more structure to your legacy data overtime so you can easily decipher what is necessary to hang on to and for how long. 

Anytime you can find a new use for old data is a big win - it's like finding $5 in that pair of pants you hadn't worn in a month. That's why, rather than dumping old data, I'd recommend to simply find a manageable format for it. That way, when (or if) you ever actually need that information, you'll have exactly what you need at arms length.

Find a Suitable Way to Backup Your Data

"But if we're not getting rid of data, how are we saving storage costs?"

Good question. The answer here comes less from the side of what data you're storing and more from the how you're storing it. If your backup and disaster recovery plan involves taking traditional, full backups of your database in order to maintain daily or weekly restore points then you are only making the storage problem worse. This means that you're constantly duplicating and storing all of that useless, unorganized information over and over again.

More modern backup solutions allow you take a single snapshot (or initial replica) of your database, and then make incremental or differential backups from that point forward. This means that you're only copying over that dark data one time, and recycling it for each restore point. This may not solve the security issue, but it will certainly allow you to cut back on those painful storage costs.

Store in an Encrypted Form

This should go without saying, but encrypting any and all of your assets - including dark or legacy data - should provide your company with peace of mind and will save a lot of headaches if you are on the wrong side of a breach.

But it's not only important to encrypt your data as its sitting on your own in-house server, it's also crucial that strong encryption is used while its being stored offsite or in the cloud, as well as anytime it'ss traveling across your network.

If you set up and stick to a data audit and management process, backup your servers using modern techniques, and encrypt your information as thoroughly as possible, you should be able to quell the majority of risks and costs that are typically associated with dark data.

What other specific types of dark data have you come across at your organization, and what ways have you found to sufficiently manage its cost and risk? Let us know by posting in the comments section.

See also:

Start a FREE trial ▸

Posts by Topic

see all