Kirth A. Prawl
Prof. Eli Weissman
Devry College of New York
October 10, 2012
Deduplication technology is a way to streamline backup and storage of data by making sure that redundant records are filtered out prior to backup. In some cases, using deduplication technology can eliminate as much as 95% of the data traditionally backed up. If that figure seems high, consider the case of corporate documents that are distributed to hundreds or even thousands of employees. Each individual may have an identical copy of electronic files such as procedures manuals.
When all workstations are backed up without considering duplicate records, the backup itself will contain much wasted space. By managing the data more efficiently and identifying files that are identical, deduplication technology can produce a backup that is scaled down in size, making it more manageable in terms of time and resources as well as in terms of physical media such as disk or tape space.
Deduplication technology can be implemented at the network level and consist of only local operations, or it can also be integrated into cloud based technologies that take advantage of economies of scale through the use of remote and managed services.
Deduplicate data Eliminating redundant data can significantly shrink storage requirements and improve bandwidth efficiency. Because primary storage has gotten cheaper over time, enterprises typically store many versions of the same information so that new work can reuse old work. Some operations like Backup store extremely dismissed information. Deduplication lowers storage costs since fewer disks are needed, and shortens backup/recovery times since there can be far less data to transfer. In the context of backup and other near line data, we can make a strong supposition that there is a great deal of duplicate data. The same data keeps getting stored over and over again consuming a lot of unnecessary storage space disk or tape, electricity to power and cool the disk or tape drives, and bandwidth for replication, creating a chain of cost and resource inefficiencies within the organization.
The way that Deduplication works, that the Deduplication segments the incoming data stream, uniquely identifies the data segments, and then compares the segments to previously stored data. If an incoming data segment is a duplicate of what has already been stored, the segment is not stored again, but a reference is created to it. If the segment is unique, it is stored on disk.
For example, if a file or volume that is backed up every week creates a significant amount of duplicate data. Deduplication algorithms analyze the data and can store only the compressed, unique change elements of that file. This process can provide an average of 10-30 times or greater reduction in storage capacity requirements, with average backup retention policies on normal enterprise data. This means that many companies can store 10TB to 30TB of backup data on 1 TB of physical disk capacity, which has huge economic benefits.
The Virtual tape library data deduplication technology is implemented around a VTL, the capabilities of the VTL must be considered as part of the evaluation process. It is unlikely that the savings from data deduplication will override the difficulties caused by using a sub-standard VTL. Consider the functionality, performance, stability and support of the VTL as well as its deduplication extension.
The Impact of deduplication on backup performance is important to consider where and when data deduplication takes place in relation to the backup process. Although some solutions attempt deduplication while data is being backed up, this straight method processes the backup stream as it comes into the deduplication application, making performance dependent on the single node’s strength. Such an approach can slow down backups, jeopardize backup windows and degrade VTL