Example: biology

UNDERSTANDING DATA DEDUPLICATION - SNIA

UNDERSTANDING data DEDUPLICATIONT homas Rivera SEPATONU nderstanding data DEDUPLICATION 2009 Storage Networking Industry Association. All Rights Legal NoticeThe material contained in this tutorial is copyrighted by the SNIA. Member companies and individual members may use this material in presentations and literature under the following conditions:Any slide or slides used must be reproduced in their entirety without modificationThe SNIA must be acknowledged as the source of any material used in the body of any document containing material from these presentation is a project of the SNIA Education the author nor the presenter is an attorney and nothing in this presentation is intended to be, or should be construed as legal advice or an opinion of counsel.

current understanding of the relevant issues involved. The author, the presenter, ... any reliance on or use of this information. NO WARRANTIES, EXPRESS OR IMPLIED. USE AT YOUR OWN RISK. Understanding Data Deduplication ... Factors associated with higher data deduplication ratios; Factors associated with lower data deduplication ratios;

Tags:

  Data, Understanding, Ratios, Deduplication, Understanding data deduplication, Data deduplication ratios

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Advertisement

Transcription of UNDERSTANDING DATA DEDUPLICATION - SNIA

1 UNDERSTANDING data DEDUPLICATIONT homas Rivera SEPATONU nderstanding data DEDUPLICATION 2009 Storage Networking Industry Association. All Rights Legal NoticeThe material contained in this tutorial is copyrighted by the SNIA. Member companies and individual members may use this material in presentations and literature under the following conditions:Any slide or slides used must be reproduced in their entirety without modificationThe SNIA must be acknowledged as the source of any material used in the body of any document containing material from these presentation is a project of the SNIA Education the author nor the presenter is an attorney and nothing in this presentation is intended to be, or should be construed as legal advice or an opinion of counsel.

2 If you need legal advice or a legal opinion please contact your information presented herein represents the author's personal opinion and current UNDERSTANDING of the relevant issues involved. The author, the presenter, and the SNIA do not assume any responsibility or liability for damages arising out of any reliance on or use of this WARRANTIES, EXPRESS OR IMPLIED. USE AT YOUR OWN data DEDUPLICATION 2009 Storage Networking Industry Association. All Rights the SNIA DMFDMF Workgroups: data Protection Initiative(DPI)Information Lifecycle Management Initiative(ILMI)Long-term Archive and Compliance Storage Initiative (LTACSI)Defining best practices for data protection and recovery technologies such as Backup, CDP, data DEDUPLICATION and VTLD eveloping, educating and promoting ILM practices, implementation methods, and benefitsAddressing the challenges of retaining, securing, and preserving digital information for the long-termThis tutorial has been developed, reviewed and approved by members of the data Management Forum (DMF)

3 The DMF is an industry resource to those responsible for the accessibility and integrity of their organization s informationThe DMF focuses on the technologies and trends related to data Protection, ILM and Long-term digital information retention3 UNDERSTANDING data DEDUPLICATION 2009 Storage Networking Industry Association. All Rights DEDUPLICATION is a capacity optimization technology that is being used to dramatically improve storage efficiency. This technical session will:Review various data DEDUPLICATION methodologiesIdentify the factors that influence space savingsProvide scenarios where data DEDUPLICATION is used4 UNDERSTANDING data DEDUPLICATION 2009 Storage Networking Industry Association.

4 All Rights data DEDUPLICATION WorksScenariosQ & A5 UNDERSTANDING data DEDUPLICATION 2009 Storage Networking Industry Association. All Rights Reduction TerminologyData Deduplicationis the replacement of multiple copies of data at variable levels of granularity with references to a shared copy in order to save storage space and/or bandwidthSubfile data Deduplicationis a form of data DEDUPLICATION that operates at a finer granularity than an entire file or data objectSingle Instance Storageis form of data DEDUPLICATION that operates at a granularity of an entire file or data objectCompressionis the encoding of data to reduce its storage requirement - deduplicated data can also be compressed6 UNDERSTANDING data DEDUPLICATION 2009 Storage Networking Industry Association.

5 All Rights Reduction Ratio & PercentCapacityOptimizationBytes InBytes OutBytes InBytes OutRatio =Bytes In Bytes OutBytes In% =1 Ratio% = 1 ()11 - %Ratio =()7 UNDERSTANDING data DEDUPLICATION 2009 Storage Networking Industry Association. All Rights ReductionRatioSpace ReductionPercent2:11/2 = 50%5:14/5 = 80%10:19/10 = 90%20:119/20 = 95%100:199/100 = 99%500:1499/500 = can meaningfully be compared only under the same set of assumptionsRelatively low space reduction ratios provide significant space savingsSpace Reduction Ratio & Percent8 UNDERSTANDING data DEDUPLICATION 2009 Storage Networking Industry Association.

6 All Rights DEDUPLICATION How it WorksEvaluate DataIdentify RedundancyCreate or Update Reference InformationStore and/or Transmit Unique data OnceRead and/or Reproduce DataUnderstanding data DEDUPLICATION 2009 Storage Networking Industry Association. All Rights DEDUPLICATION Simplified= new unique data = repeat data10 UNDERSTANDING data DEDUPLICATION 2009 Storage Networking Industry Association. All Rights DEDUPLICATION Simplified= new unique data = repeat data = pointer to unique data segment11 UNDERSTANDING data DEDUPLICATION 2009 Storage Networking Industry Association.

7 All Rights DEDUPLICATION SimplifiedDump #2 Dump #1 Dump #4 Dump #3= new unique data = repeat data = pointer to unique data segment12 UNDERSTANDING data DEDUPLICATION 2009 Storage Networking Industry Association. All Rights Deduplicated DataApplication/CIFS/NFS/VTL InterfaceRead FulfilledRead RequestDeduplicated DataDeduplicationEngineMetadata ReferencesReconstitution/Verification13 UNDERSTANDING data DEDUPLICATION 2009 Storage Networking Industry Association. All Rights ApproachComponentHardware ( , chip or card) integrated into a larger systemGatewayA dedicated data DEDUPLICATION engine that must be combined with a storage systemApplianceA dedicated DEDUPLICATION engine integrated with a storage systemStorage SystemA general purpose storage system with data DEDUPLICATION capabilitiesGrid StorageA storage system that can scale independently without constraints to physical attributes SoftwareIncludes application agents, virtual appliances, or storage software14 UNDERSTANDING data DEDUPLICATION 2009 Storage Networking Industry Association.

8 All Rights ApproachMultiple deployment examples are illustratedSpecific deployments selected based on customer situationGatewayAgentsStorage SystemVirtual ApplianceApplianceIntegrated ReplicationCIFS, NFS, FC, iSCSI, VTLWANGrid StorageApplication-specific protocol15 UNDERSTANDING data DEDUPLICATION 2009 Storage Networking Industry Association. All Rights or TargetSource DeduplicationIdentifies duplicate data at the clientTransfers unique segments to a central repositorySeparate client and server componentsTarget DeduplicationIdentifies duplicate data where the data is being storedStores unique segmentsStandalone systemConsiderationsNeither approach enables a greater or lesser space savingsScope of data DEDUPLICATION may vary by implementation16 UNDERSTANDING data DEDUPLICATION 2009 Storage Networking Industry Association.

9 All Rights or Post-Process Inline DEDUPLICATION data DEDUPLICATION performed before writing the deduplicated dataPost-Process DeduplicationData DEDUPLICATION performed after the data to be deduplicated has been initially storedConsiderationsA product may implement both methodsA product may provide methods to control when particular data is deduplicatedMay impact replication, usable capacity, scalability, data DEDUPLICATION 2009 Storage Networking Industry Association. All Rights or Variable Size SegmentFixed Length Segment DeduplicationEvaluation of data includes a fixed reference window used to look at segments of data during DEDUPLICATION processProvides fixed granularity, 4KB, or 8KB, or 128 KBVariable length Segment DeduplicationEvaluation of data uses a variable length window to find duplicate data in stream or volume of data processedProvides variable granularity, Average 4KB or 32 KBMethod Chosen May Affect DEDUPLICATION ResultsEffects observed will vary by methodSegmentation may not apply to all deduplication18 UNDERSTANDING data DEDUPLICATION 2009 Storage Networking Industry Association.

10 All Rights DEDUPLICATION BenefitsData DEDUPLICATION can help organizations:Satisfy ROI/TCO requirementsManage data growthIncrease efficiency of storage and backupReduce overall cost of storageReduce network bandwidthReduce operational costs including:Infrastructure costs required space, power and coolingMovement toward a greener data centerReduce administrative costs19 UNDERSTANDING data DEDUPLICATION 2009 Storage Networking Industry Association. All Rights DEDUPLICATION ScopeMultiple Repositories Per ControllerSingle Repository Per ControllerSingle Repository Shared by Multiple Controllers System capacity varies independently from the scopeDedupeRepositoryDedupeRepositoryDed upeRepositoryDedupeRepositoryApplication ServersApplication ServersApplication Servers20 UNDERSTANDING data DEDUPLICATION 2009 Storage Networking Industry Association.


Related search queries