Example: tourism industry

Demystifying DeDuplication - etouches

Preprinted from Dell Power Solutions, Q1 2010. Copyright 2009 Dell Inc. All rights POWER SOLUTIONS | Q1 20101 DeDuplication has been around for several years, but as organizations contend with rapid data growth and increasingly con-strained IT budgets, this technology has risen to the top of many priority lists holding the promise of efficient storage capacity and bandwidth utilization, accelerated backup and recovery, reduced costs, and other benefits. At the same time, however, deduplica-tion is surrounded by confusing and even contradic-tory information from a myriad of understanding the basics of DeDuplication and its place within an overall approach to data manage-ment, IT administrators can evaluate how it can help them address their own IT challenges. In addition to the comprehensive Dell portfolio of platforms, soft-ware, and services designed to optimize storage and backup environments (see the Dell/EMC data pro-tection and DeDuplication transform backup opera-tions sidebar in this article), Dell Services consulting teams can help organizations assess their current infrastructure and goals to determine whether dedu-plication can help meet their needs and, if so, how they can deploy it for maximum DeDuplication technologieS a

capacity for Data Domain DDX arrays) and impressive data reduction ratios. This range of EMC products offer the versatility, speed, and scalability to help ensure comprehensive data

Tags:

  Data, Ratios, Demystifying, Deduplication, Demystifying deduplication

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Transcription of Demystifying DeDuplication - etouches

1 Preprinted from Dell Power Solutions, Q1 2010. Copyright 2009 Dell Inc. All rights POWER SOLUTIONS | Q1 20101 DeDuplication has been around for several years, but as organizations contend with rapid data growth and increasingly con-strained IT budgets, this technology has risen to the top of many priority lists holding the promise of efficient storage capacity and bandwidth utilization, accelerated backup and recovery, reduced costs, and other benefits. At the same time, however, deduplica-tion is surrounded by confusing and even contradic-tory information from a myriad of understanding the basics of DeDuplication and its place within an overall approach to data manage-ment, IT administrators can evaluate how it can help them address their own IT challenges. In addition to the comprehensive Dell portfolio of platforms, soft-ware, and services designed to optimize storage and backup environments (see the Dell/EMC data pro-tection and DeDuplication transform backup opera-tions sidebar in this article), Dell Services consulting teams can help organizations assess their current infrastructure and goals to determine whether dedu-plication can help meet their needs and, if so, how they can deploy it for maximum DeDuplication technologieS and approacheSDeduplication is the process of eliminating duplicate copies of data and replacing them with pointers to a single copy.

2 It typically serves two main purposes: reducing the amount of storage capacity required to store data , and reducing the network bandwidth required for performing backups or replication. The DeDuplication process is applied to an entire file system or storage device, which is what primarily differentiates it from compression. Key elements of DeDuplication include the level of granularity (file-level or block-level DeDuplication ) and where the DeDuplication occurs during the backup process (at the source client or at the storage target).Currently, the dominant application for deduplica-tion is backup storage, because of the repetitive nature of backup data . However, DeDuplication has also moved into other areas such as network attached storage (NAS) and archive storage, a trend that is likely to continue as the technology reaches maturity (see Figure 1).

3 File-level and block-level deduplicationSome DeDuplication processes examine files in their entirety to determine whether they are duplicates, which is referred to as file-level DeDuplication or single-instance storage (SIS); others break the data into blocks and try to find duplicates among the blocks, which is referred to as block-level DeDuplication . Block-level DeDuplication typically provides more granularity and a greater reduction in the amount of utilized DeDuplication holds the promise of efficient storage and bandwidth utilization, accelerated backup and recovery, reduced costs, and more. Understanding how this technology fits into a comprehensive data management strategy and taking advantage of Dell Services consulting expertise can provide the first steps toward an optimized storage and backup Joe ColucciKay BenarochDemystifyingDeDuplicationPreprin ted from Dell Power Solutions, Q1 2010.

4 Copyright 2009 Dell Inc. All rights capacity compared with file-level DeDuplication . Some DeDuplication soft-ware attempts to increase this efficiency even further by varying the size of the blocks to help locate additional common-alities, an approach known as variable-block DeDuplication is typically used for backup storage, but is not typically used with NAS or archiving systems because of the performance impact of the extreme disk fragmentation that block-level dedu-plication causes by its nature. File-level DeDuplication , however, can provide signifi-cant advantages for NAS. User home direc-tories offer an excellent use case: multiple users often store the same documents or spreadsheets in their home directories. The Microsoft Windows Storage Server 2008 platform includes a SIS DeDuplication feature specifically for this purpose.

5 This Figure 1. DeDuplication technology life cycleBackup technologies are evolving in response to the pressures of growth, time, resources, and budgets. The Dell/EMC storage portfolio includes platforms, soft-ware, and services designed to optimize storage infrastructures, simplify backup processes, and reduce the demand for scarce resources, including the following:Dell NX4: The Dell NX4 network attached storage (NAS) array incorporates a single-instance storage (SIS) methodology designed to optimize storage efficiency and reduce backup sizes by eliminating duplicate copies of files from Network data Management Protocol (NDMP) CX4 Series: Strong data protection and high performance are central features of Dell/EMC CX4 Fibre Channel and Internet SCSI (iSCSI) storage arrays, which utilize software such as the EMC SnapView , MirrorView , RecoverPoint , and NetWorker applications to help quickly and flexibly recover from data loss or corruption.

6 EMC NetWorker provides a heterogeneous layer of management and control for backup processes and media from a single interface and is designed to bridge the gap between traditional and next-generation Disk Library, Avamar, and data Domain: data DeDuplication has the power to transform backup processes by helping to significantly reduce the amount of data copied during backups and the demand for storage media, resources, and network bandwidth. Dell/EMC CX4 Series storage plat-forms integrated with the EMC Disk Library 4000 Series are designed to pro-vide 4 Gbps performance and policy-based data DeDuplication . EMC Avamar integrated solutions incorporate global source-based DeDuplication to sup-port next-generation backup requirements including remote offices, NAS, desktop and laptop backups, and virtualized environments. EMC data Domain in-line DeDuplication storage systems support leading backup and archive applications, enterprise applications, and backup hardware platforms to deliver scalable storage with extended retention (including up to 56 PB of capacity for data Domain DDX arrays) and impressive data reduction ratios .

7 This range of EMC products offer the versatility, speed, and scalability to help ensure comprehensive data retention, replication, and Services teams can help organizations assess their data , environment, and requirements to identify and deploy appropriate storage systems as part of a comprehensive Intelligent data Management strategy. For more information, visit data protection anD DeDuplication transform backup operationsIntroduction Point products only Backup data onlyGrowth Point products and initial software integration Nearline and backup dataMaturity Ubiquitous software integration Primary, nearline, and backup dataBackup targetappliancesBackup softwareintegrationFile systemintegrationArchive softwareintegrationArrayintegrationOSint egrationDell/EMC CX4 Series arrays offer strong data protection and high performanceStorageDELL POWER SOLUTIONS | Q1 20103 Preprinted from Dell Power Solutions, Q1 2010.

8 Copyright 2009 Dell Inc. All rights retains a single copy of a file in a SIS common store and replaces duplicate copies with file system links, all performed transparently to end states that it is not uncom-mon to recover 25 40 percent of existing disk space after a Windows Storage Server 2008 SIS file consolidation has Because backup sizes are reduced for SIS-aware backup solutions, this approach can also help accelerate backup and recovery processes. The Dell NX4 storage system provides file-level DeDuplication with functionality similar to the Microsoft SIS feature, which can also help reduce backup sizes when using Network data Management Protocol (NDMP).Source and target deduplicationDeduplication can occur at two points in the backup process: at the source or at the target. Each type has its own advantages and disadvantages.

9 In source DeDuplication , the DeDuplication occurs on the host that is being backed up, which helps significantly reduce the amount of data transferred over the network during a backup. By reducing the amount of network traffic, it can also alleviate network bottleneck conditions as well as issues related to backing up over a wide area network (WAN) link. The trade-off, however, is that this approach uses pro-cessor cycles on the backup client itself, which may be undesirable for production servers. More important is that until recently, many source DeDuplication solutions did not integrate well (if at all) into existing backup target DeDuplication , the deduplica-tion occurs on the backup storage itself. This approach does not reduce the network traffic during the backup, but can help do so when data is replicated off-site for disaster recovery.

10 Target DeDuplication also avoids using system resources on the backup client. Target DeDuplication appli-ances can be a good option because they are generally easier to integrate into exist-ing backup environments than source DeDuplication DeDuplication solutions can use either in-line or post-process deduplica-tion. In-line DeDuplication means that the data is deduplicated before it is actually written to disk. This approach requires only enough free space to write the changed data , but does require system resources on the appliance during the backup to identify the duplicate data . As processor power and cache sizes have increased, however, this effect has become less sig-nificant than it was in the DeDuplication writes the backup data in its entirety, and then later deduplicates it as a batch process. The advantage of this approach is that the DeDuplication process does not affect the write performance during backups.


Related search queries