Computing Deltas in Server Data

Posted by Ben Austin on Jan 20, 2015 6:24:00 AM


The following is an excerpt from our recently published eBook, The Big Book of Backup. If you'd like to download the full eBook, click here.

All backup strategies must have a method to determine which data to include and exclude. To find the differences in each iteration, or deltas, the backup system uses either file-level or block-level backup technology. Depending on which method a backup system uses, it may have some benefits or limitations compared to other systems.

Check out the information below, which outlines the three main technologies used to calculate deltas in server data, and consider which option would best fit the needs of your organization.

File-Level: File Attributes

For file-based backup technology, the system checks each file’s attributes, such as file name and type, date created or date last modified. The last modified date is often the most important attribute for backup systems, as a new date indicates the file has changed and the most recent version should be included in the backup iteration.

Although straightforward and easy to implement, the disadvantages of this approach for determining deltas are closely related to those of file-based backup in general, because the backup system must examine all attributes of every file, which is very time-consuming.

Even minor changes require backing up the full file. Additionally, the accuracy of attribute status may be compromised because users and other programs can manually set file attributes.

Block-Level: Checksums

A checksum is a calculation performed on a block of data using one of many algorithms to compute a number that uniquely identifies a block of data on the basis of its content. Only the blocks that are different are backed up, and the checksum accurately identifies changed blocks.

A backup solution that relies on checksums has some advantages over file-based backup systems.

However, the system still has to examine each data block, compute the checksum and compare the result with the previous checksum in a process that uses a large amount of time, bandwidth and computing resources.

Block-Level: Continuous Data Protection

Continuous Data Protection (CDP) is an efficient strategy for recording block deltas. By installing a relatively small software application on each computer or server, disk activity can be monitored and tracked as data blocks are updated. This process uses a small amount of memory to store the recorded information and the backup system queries the software rather than directly examining each individual block on the disk.

Because CDP drastically reduces computing overhead, network bandwidth and disk read operations, the backup system can query the block information much more frequently, providing almost real-time monitoring and backup.

Additionally, disks need only be fully scanned once upon initial installation or following a computer crash or reboot.

Which of these methods do you currently use at your organization? Hoping to learn a bit more about how backup solutions actually work? Click the button below to get a download the entire eBook, The Big Book of Backup.

See also:


Download "The Big Book of Backup"

Find me on:

Recent Posts

Posts by Topic

see all