IBM TS7650G Protectier Deduplication Gateway
|
|
Bookmark IBM TS7650G Protectier Deduplication Gateway |
About IBM TS7650G Protectier Deduplication GatewayHere you can find all about IBM TS7650G Protectier Deduplication Gateway like manual and other informations. For example: review.
IBM TS7650G Protectier Deduplication Gateway manual (user guide) is ready to download for free.
On the bottom of page users can write a review. If you own a IBM TS7650G Protectier Deduplication Gateway please write about it to help other people. [ Report abuse or wrong photo | Share your IBM TS7650G Protectier Deduplication Gateway photo ]
Manual
Preview of first few manual pages (at low quality). Check before download. Click to enlarge.
Download
(English)IBM TS7650G Protectier Deduplication Gateway - Overview, size: 131 KB |
IBM TS7650G Protectier Deduplication Gateway
User reviews and opinions
| LawrenceJ |
7:17am on Tuesday, August 31st, 2010 ![]() |
| Good choice to have for a laptop, upgraded an old Hitachi Deskstar for this drive, and great difference in speed. Garbage item Only used about one month and it was broken. I had to back up data, reinstall OS and exchange the item with WD. | |
| roemer8 |
10:07am on Wednesday, August 4th, 2010 ![]() |
| I cloned a 250 GB drive to this one using Seagate Discwizard. Worked perfectly. No problems Quiet, fast, reasonably priced. This thing is a piece of work. I had this for only a little over a year. | |
| DarrylB |
3:31am on Tuesday, August 3rd, 2010 ![]() |
| Bought this drive to replace smaller drive in new Toshiba laptop. It is quick, quiet and no problems. I was so impressed. It seems to work pretty well. When I test it under Linux using the smartctl program. So far it works fine, however I noticed that it is not as quiet as the other disk I had before | |
| mdkmemphis |
4:51pm on Thursday, July 1st, 2010 ![]() |
| This is my third harddrive, the first one was my old 250gb from my Dell before I built my custom, the second is an 80gb my friend gave me. if your into media editing and heavy gaming id suggest another drive Works ; Doesnt make noise ; Low temperatures ; Good cheap storage drive none | |
| rscataran |
3:09pm on Thursday, May 13th, 2010 ![]() |
| Somewhat Satisfied After two years, this drive finally went South on me. I wish hard drives were not so short lived. I guess two years is not so bad. excellent item for the most part, ease of installation was my issue. inexperience with unformatted. | |
| mattcfs |
11:50am on Sunday, April 25th, 2010 ![]() |
| This is a nice drive for the cash I spent. I find this unit is compact for my laptop backup. Dell has these WD products at a lower price than WD even on sale. | |
| ethereal21 |
6:48am on Thursday, March 25th, 2010 ![]() |
| No Comment. It seems to be a good product to this point. Runs quiet and cool. No Comment. This series of disks from Seagate are reliable, quiet and suitable for personal and business use. Good balance for the price. Buffer size. | |
Comments posted on www.ps2netdrivers.net are solely the views and opinions of the people posting them and do not necessarily reflect the views or opinions of us.
Documents
Front cover
IBM System Storage TS7650 and TS7650G with ProtecTIER
Understand the concepts of data deduplication Learn about native replication and grid manager Reduce your storage hardware requirements
Alex Osuna Reimar Pflieger Lothar Weinert Xu X Yan Erwin Zwemmer
ibm.com/redbooks
International Technical Support Organization IBM System Storage TS7650 and TS7650G with ProtecTIER August 2010
SG24-7652-02
Note: Before using this information and the product it supports, read the information in Notices on page ix.
Third Edition (August 2010) This edition applies to Version 2.3 of ProtecTIER.
Copyright International Business Machines Corporation 2010. All rights reserved. Note to U.S. Government Users Restricted Rights -- Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.
Contents
Notices. ix Trademarks.x Preface. xi The team who wrote this book. xi Now you can become a published author, too!. xiii Comments welcome. xiii Stay connected to IBM Redbooks. xiv Summary of changes. xv August 2010, Third Edition. xv March 2010, Second Edition. xv Part 1. Introduction and architecture. 1 Chapter 1. Concepts of data deduplication. 3 1.1 Data deduplication. 4 1.2 Types of data deduplication. 5 1.2.1 Hash-based data deduplication. 5 1.2.2 Content aware. 6 1.2.3 HyperFactor, deduplication, and bandwidth savings. 7 1.3 Data deduplication processing. 9 1.3.1 Inline method. 9 1.3.2 Post-processing method. 9 1.4 Components of a data deduplication system. 9 1.4.1 Server. 9 1.4.2 Data deduplication software. 10 1.4.3 Disk array. 10 1.5 Benefits of data deduplication. 10 1.5.1 Reduction of storage requirements. 10 1.5.2 Reduction of environmental costs. 10 Chapter 2. IBM System Storage TS7600 with ProtecTIER architecture. 2.1 TS7650G ProtecTIER Deduplication Gateway. 2.1.1 TS7650G Gateway terms. 2.1.2 TS7650G ProtecTIER Deduplication Gateway (3958-DD3). 2.1.3 Disk array. 2.1.4 Deployment. 2.2 TS7650 ProtecTIER Deduplication Appliance. 2.2.1 TS7650 Deduplication Appliance. 2.2.2 TS7650 ProtecTIER Deduplication Appliance features. 2.2.3 Available models. 2.2.4 Deployment. 2.2.5 Two-node clustered configuration. 2.3 Terms and definitions. 2.4 ProtecTIER Virtual Tape (VT). 2.4.1 Data deduplication. 2.4.2 ProtecTIER native replication. 2.5 ProtecTIER Manager. 40
Copyright IBM Corp. 2010. All rights reserved.
2.6 IBM TS3000 System Console. 43 2.7 Operating system. 43 2.8 Rack. 43 Chapter 3. ProtecTIER native replication overview and operation. 3.1 How it works. 3.1.1 Replication features. 3.1.2 Typical deployment. 3.1.3 ProtecTIER native replication Management Interface. 3.2 Normal operation concepts. 3.2.1 Replication. 3.2.2 Replication data transfer. 3.2.3 Visibility switch control. 3.2.4 Single domain and multiple domain backup application environments. 53 54
Part 2. Planning for data deduplication and replication. 57 Chapter 4. Hardware planning for the 3958-AP1 and 3958-DD3. 4.1 General overview of the TS7650 and TS7650G. 4.2 Hardware and software components for the 3958-DD3. 4.3 3958-DD3 and 3958-AP1 feature codes. 4.4 IBM System Storage TS7600 with ProtecTIER software. 4.4.1 5639-XXB ProtecTIER Enterprise Edition V2.3 Base Software. 4.4.2 IBM System Storage ProtecTIER Appliance Edition (AE) V2.3 software. 4.4.3 ProtecTIER Manager V2.3 console software. 4.5 Feature codes for Red Hat Linux. 4.6 Recommended 3958-DD3 configuration options. 4.6.1 Single node configuration. 4.6.2 Two-node cluster configuration. 4.7 Usage considerations. 4.7.1 Virtual tape libraries. 4.7.2 Fibre Channel ports and host assignment considerations. 4.7.3 Firewall environments: Ports assignments for ProtecTIER Replication Manager. 4.8 Installation planning. 4.8.1 Installation worksheets. 4.8.2 Supported backup server operating environments. 4.8.3 Planning ProtecTIER Installation. 4.8.4 Installation tasks. 4.8.5 Host attachment considerations. 4.8.6 SAN configuration. 89
The team who wrote this book
This book was produced by a team of specialists from around the world working at the International Technical Support Organization, San Jose Center. Alex Osuna is a Project Leader at the International Technical Support Organization, Tucson Center. He writes extensively and on all areas of storage. Before joining the ITSO four years ago, Alex worked for the Tivoli Western Region as a Principal SE in storage. Alex has over 30 years of experience in the IT industry, 28 of them with IBM, mainly focused on storage. He holds certifications from IBM, Microsoft, Red Hat, and the Open Group. Reimar Pflieger is an IT Specialist from Germany working at the IBM Global Technology Services Organization. He provides post-sales support as Product Field Engineer for RMSS products in Mainz. He joined IBM in 1998 and worked for many years as a Process Support and Manufacturing Engineer in Disk and Wafer Production. In his current role as an RMSS Product Field Engineer, he supports Open Systems Tape, tape libraries from entry level to the high-end level, and tape encryption solutions. His operating system experience includes Linux, Windows, and AIX platforms. This is the first IBM Redbooks publication that Reimar has co-authored. Lothar Weinert is a Certified High End Tape Solution and IBM i Specialist working at the Tape Support Centre in Mainz, Germany, where he has worked since 2001. In his current position, Erwin provides EMEA support to colleagues for the complete spectrum of IBM TotalStorage and System Storage tape and optical products. Prior to his position with IBM Germany, he worked as an AS/400 Technical Support Representative. He joined IBM Holland in 1995 as an IBM Customer Engineer for AS/400 systems and multivendor products.
Xu X Yan is a Senior Technical Sales Enablement Specialist and Technical Expert at IBM China. He has 17 years of experience in IT and nine years of experience in the storage field. His focus is on tape-related products and solutions. Erwin Zwemmer is a Certified High-End Tape Solution and IBM i Specialist working at the Tape Support Centre in Mainz, Germany, where he as worked since 2001. In his current position, Erwin provides EMEA support to colleagues for the complete spectrum of IBM TotalStorage and System Storage tape and optical products. Prior to his position with IBM Germany, he worked as an AS/400 Technical Support Representative. He joined IBM Holland in 1995 as an IBM Customer Engineer for AS/400 systems and multivendor products.
1.2 Types of data deduplication
Many vendors offer products that perform deduplication. Various methods are used for deduplicating data. Three methods frequently used are: Hash based Content aware HyperFactor
1.2.1 Hash-based data deduplication
Hash-based data deduplication methods use a hashing algorithm to identify chunks of data. Commonly used algorithms are Secure Hash Algorithm 1 (SHA-1) and Message-Digest Algorithm 5 (MD5). When data is processed by a hashing algorithm, a hash is created that represents the data. A hash is a bit string (128 bits for MD5 and 160 bits for SHA-1) that represents the data processed. If you process the same data through the hashing algorithm multiple times, the same hash is created each time. Examples of hash codes are: MD5: 16-byte long hash # echo The Quick Brown Fox Jumps Over the Lazy Dog | md5sum 9d56076597de1aeb532727f7f681bcb0 # echo The Quick Brown Fox Dumps Over the Lazy Dog | md5sum 5800fccb352352308b02d442170b039d SHA-1: 20-byte long hash # echo The Quick Brown Fox Jumps Over the Lazy Dog | sha1sum F68f38ee07e310fd263c9c491273d81963fbff35 # echo The Quick Brown Fox Dumps Over the Lazy Dog | sha1sum d4e6aa9ab83076e8b8a21930cc1fb8b5e5ba2335 Hash-based deduplication breaks data into chunks, either fixed or variable length, depending on the product, and processes the chunk with the hashing algorithm to create a hash. If the hash already exists, the data is deemed to be a duplicate and is not stored. If the hash does not exist, then the data is stored and the hash index is updated with the new hash.
Chapter 1. Concepts of data deduplication
In Figure 1-2, data chunks A, B, C, D, and E are processed by the hash algorithm and create hashes Ah, Bh, Ch, Dh,, and Eh. For the purposes of this example, we assume that this is all new data. Later, chunks A, B, C, D, and F are processed. F generates a new hash, Fh. Because A, B, C, and D generated the same hash, the data is presumed to be the same data, so it is not stored again. Since F generates a new hash, the new hash and new data are stored.
1. Slice data into chunks (fixed or variable) A B C D E
1.5.2 Reduction of environmental costs
If data deduplication reduces your disk storage requirements, the environmental costs for running and cooling the disk storage are also reduced.
Chapter 2.
IBM System Storage TS7600 with ProtecTIER architecture
In this chapter, we introduce the IBM System Storage TS7600 with ProtecTIER Deduplication Gateway and discuss the following topics: TS7650G ProtecTIER Deduplication Gateway technology and product requirements TS7650 ProtecTIER Deduplication Appliance technology and product requirements IBM System Storage ProtecTIER Enterprise Edition V2.3 (ProtecTIER) software with HyperFactor Performance and capacity
2.1 TS7650G ProtecTIER Deduplication Gateway
On April 18, 2008, IBM acquired Diligent Technologies, a privately held company headquartered in Framingham, Massachusetts, to tackle pervasive storage issues, such as redundant data, complex infrastructure, rising energy costs, and never-enough capacity. The acquired data deduplication technology includes compliance applications, backup/recovery, and archiving in a single solution, which keeps the data secure at the same time. However, the acquisition of Diligent represents more than just gaining the benefits of a stand-alone product. IBM sees the potential to incorporate Diligent's technology across System Storage virtual tape libraries and information infrastructure. Over time, IBM plans to integrate the acquired technology with the IBM suite of storage solutions. The TS7650G ProtecTIER Deduplication Gateway, comprising 3958-DD3 hardware combined with IBM System Storage ProtecTIER Enterprise Edition V2.3 (ProtecTIER) software, offers the acquired data deduplication technology. Note: ProtecTIER software is ordered with the TS7650G order, but it is shipped separately.
2.1.1 TS7650G Gateway terms
These are terms for the IBM virtualization solution from the TS7650 family that does not include a disk storage repository, allowing the customer to choose from a variety of storage options. The TS7650G consists of the 3958 DD3 and the 3958 DD1, the two types of server used in the Gateway: 3958 DD3 This is a newer, higher performance server available since March 2009. This server is based on the IBM System x3850 M2 Type 7233. When used as a server in the TS7650G, its machine type and model are 3958 DD3. Use this machine type and model for service purposes. 3958 DD1 This is the original server introduced in August 2008. This server is based on the IBM System x3850 M2 Type 7141. When used as a server in the TS7650G, its machine type and model are 3958 DD1. Use this machine type and model for service purposes. This model is no longer available.
2.4 ProtecTIER Virtual Tape (VT)
The ProtecTIER Virtual Tape service (Figure 2-11) emulates traditional tape libraries. By emulating tape libraries, ProtecTIER VT enables you to transition to disk backup without having to replace your entire backup environment. Your existing backup application can access virtual robots to move virtual cartridges between virtual slots and drives. The backup application perceives that the data is being stored on cartridges while ProtecTIER actually stores data on a deduplicated disk repository on the storage fabric.
Figure 2-11 ProtecTIER Virtual Tape Service
ProtecTIER provides two main features: Data deduplication ProtecTIER native replication Each of these brings its own set of options to customize to your environment. The ProtecTIER software is configured by and ordered from IBM or its business partners. The order includes the ProtecTIER and the ProtecTIER Manager GUI application.
2.4.1 Data deduplication
Data deduplication solutions from IBM employ an advanced form of data compression that identifies and eliminates redundant data across the data landscape, making it possible to significantly reduce the amount of data that must be protected. This in turn dramatically increases the effective capacity of existing disk storage so that far less physical disk is required to protect the data, beyond the direct resource savings associated with needing less disk space, which can be in the hundreds of thousands or millions of dollars (Figure 2-12). The benefits of data deduplication include: Greater productivity that comes from being able to perform more frequent backups with the same amount of physical disk Increased efficiency because of the greater likelihood that data will be able to be restored from disk rather than a slower medium Reduced energy consumption that results from reducing the amount of disk in operation
Figure 2-12 Disk savings with ProtecTIER
Data deduplication uses algorithms to compare data and eliminate duplicated or redundant data. Standard compression only works at the file level, while Storwize and database real-time compression both work at the subfile level. Data deduplication can be applied at the subfile level to reduce data size across a much broader landscape. HyperFactor technology uses a pattern algorithm that can reduce the amount of space required for storage in the backup environment by up to a factor of 25, based on evidence from existing implementations. The capacity expansion that results from data deduplication is often expressed as a ratio, essentially the ratio of nominal data to the physical storage used. A 10:1 ratio, for example, means that 10 times more nominal data is being managed than the physical space required to store it. Capacity savings of 18:1 and greater have been reported from data deduplicationup to 25:1 in the case of IBM solutions.
Operating system features
The operating system installed on the 3958-DD3 server is a distribution of Red Hat Advanced Server V5.2 64-bit. IBM System Storage ProtecTIER Enterprise Edition V2.3 software and Red Hat Enterprise Linux 5.2 Advanced Platform (RHELAP) x86_64 are loaded on the 3958-DD3 server with your initial order of the TS7650G. Refer to the following website to check whether there has been recent host software-specific maintenance for the ProtecTIER V2.3 software: http://www-03.ibm.com/systems/storage/tape/3958-DD3/index.html Select Support & download Download Fixes updates and drivers.
Software features
The ProtecTIER EE V2.3 server software in a single node configuration allows you to create: Up to 16 virtual libraries Up to 256 virtual tape drives per single node Up to 500,000 virtual cartridges Scalability up to 1 PB per single node or two-node clustered configuration
DS4700 disk array configuration
In this section we describe the features of the IBM System Storage DS4700 in a single node configuration. This disk array can host up to 16 disk drives on FC or SATA technology and with different sizes. The disk drive characteristics in terms of technology, size, and speed must be chosen after a capacity and performance planning assessment. We describe the assessment phases in Chapter 5, Planning for deduplication and replication on page 95. When attaching expansions, drive loops are configured as redundant pairs utilizing one port from each controller. This helps ensure data access in the event of a path/loop or controller failure. This controller can be expanded up to six expansions and in this full configuration the disk array is cabled using all two-drive channel pairs, assuming that there are six total expansion enclosures evenly spread out across the drive channel pairs (three each). Figure 4-5 shows the rear-view of the disk array that shows the D1 and D2 ports for FC connections to expansion drawers and the H1 and H2 ports for FC connections to the 3958-DD3 server, and finally the two P ports indicate the power outlets.
RS-232 Ethernet
Controller B
H1 H2 Ethernet RS-232
Controller A
Figure 4-5 Rear view of the I/O connections of the disk array DS4700
EXP810 expansion configuration
Figure 4-13 FC connectivity between Gateway servers and disk arrays in the two-node cluster configuration
Port 1 of each Qlogic FC Adapter in slots 6 and 7 will be connected to the two controllers (controller A and controller B) of the first disk array mounted in the base frame, while port 2 of each Qlogic FC Adapter in slots 6 and 7 will be connected to the two controllers (controller A and controller B) of the second disk array mounted in the expansion frame.
The operating system installed on the Gateway server 3958-DD3 is a distribution of Red Hat Advanced Server V5.2 64-bit.
The ProtecTIER Enterprise Edition V2.3 server software in a two-node clustered configuration allows you to create: Up to 16 virtual libraries Up to 512 virtual tape drives with 256 virtual tape drives per each node Up to 500,000 virtual cartridges Scalability of up to 1 PB for a single repository
DS4700 disk array two-node clustered configuration
The configuration of each disk array is the same as the single node configuration one, but now all the host ports H1 and H2 are occupied because of the attachment of the two Gateway servers, as shown in Figure 4-15 on page 81.
DS4700 expansion two-node clustered configuration
The configuration of each disk array is the same as the single node configuration. Refer to DS4700 disk array configuration on page 70 for more details. In the two-node clustered configuration, we have two disk arrays with six expansion units attached to each one.
In this section we show the recommended rack layout of the server frame for the two-node clustered configuration. For easier installation and maintenance, we recommend that the components included in the purchase of the 3958-DD3 servers and the TS3000 System Console occupy one frame (the server frame), whereas specific supported disk arrays occupy a second frame (the cache frame). A cache frame may be used to host specific supported disk arrays and expansions. For example, two DS4700s may be mounted in a customer-supplied cache frame, while an IBM System Storage DS8300 is shipped with its own dedicated enclosure and expansion enclosures if required, so it does not need a customer-supplied cache frame. Because of its performance and scalability (in terms of cache, capacity, and number of I/O Host Adapters), only a single DS8300 disk array may be deployed in the two-node clustered configuration instead of two disk arrays required for an implementation with DS4700. Note: Both 3958-DD3 servers must be in the same rack when two nodes are clustered together. This is true for any storage subsystem that you intend to deploy.
Media server qualification
For each media server that will connect to the IBM System Storage TS7600 with ProtecTIER, provide the data shown in Table 5-1.
Table 5-1 Characteristics of the deployed media server Item Media server operating system and version Backup software and version FC HBA model HBA firmware version Media server 1 Media server 2 Media server 3
Item HBA driver version Connected to ProtecTIER through loop or switch
Media server 1
Media server 2
Media server 3
Front-end fabric connectivity
For each switch that will be connected to the front-end ports (media server facing) of IBM System Storage TS7600 with ProtecTIER, provide the information shown in Table 5-2.
Table 5-2 Characteristics of the fabrics Switch characteristic Switch model Switch release Switch 1 Switch 2
Storage
For each storage array connected to the TS7650Gs, provide the information shown in Table 5-3 after an accurate assessment of the disk sizing (see Disk configurations for TS7650G on page 111).
Table 5-3 Characteristics of the disk arrays Item Disk array make and model. Disk capacity on implementation. Number of hard disk drives (HDDs). Size of HDDs. HDDs revolutions per minute (RPM). Number of controllers. Controller cache size. The connection between the TS7650G and disk array is a loop or switch topology. Disk array 1 Disk array 2
Data protection survey
Accurate capacity planning considers the behavior of each data type. A data type can be a file system, an operating system, databases, and so on. The size of one full backup is usually equal to the size of the online disk capacity. The Data Protection Survey is a worksheet that IBM technical personnel use during capacity sizing and that provides information about the number of versions, frequency, and retention for each backup. We assume that the retention of a weekly backup is associated with the retention of its incremental backup.
Important information for this survey includes: All the workloads that you back up in your environment. How much capacity is used for your current full backups to physical tape. How much capacity is used for the current daily backups to physical tape, including differentials, incrementals, and cumulative backups. The rate at which the data received from the backup application changes from backup to backup. This measurement has most relevance when like backup policies are compared. (Data change rates might range from 1% to >25%, but are difficult to observe directly.) How often the full backups are performed. How many cycles of full backups are kept. The relationship of how many daily backups are performed in between full backups, including differential, incremental, and cumulative backups. How many cycles of full and incremental, differential, or cumulative backups are kept. Whether a monthly full backup is kept for longer periods than the regular weekly full backup. The information in Table 5-4 is relevant for IBM Tivoli Storage Manager users with an Incremental Forever policy for some of their data. IBM Tivoli Storage Manager (TSM) uses a different paradigm that is known as Progressive Incremental. In this case, an initial full backup is taken, then all future backups (even those on the weekends) are considered incremental, so there is no full backup on a weekly basis. TSM uses a much more sophisticated and intelligent way to do backups, because only new or changed files are backed up. TSM is empowered to do this type of backup because of its relational database that tracks each individual file and knows exactly what your computers state was on each day. When a restore is required, just the version of the file needed is restored. In addition to the information listed above, provide the information in Table 5-4 if you are using Tivoli Storage Manager as your backup application.
Chapter 6.
IBM System Storage TS7600 with ProtecTIER initial setup
In this chapter, we provide information about how to install ProtecTIER Manager and how to set up the IBM System Storage TS7600 with ProtecTIER system so that it is ready to use with backup applications. We cover: Enabling SNMP support Getting started Adding nodes in ProtecTIER Manager Creating repositories Setting up the virtual libraries and cartridges Setting up replication and replication policies At this point the IBM SSR and the IBM ProtecTIER Specialist already: Installed the TS7650 or TS7650G systems, TS3000, KVM kit, Ethernet switch, and network power switch Checked the disk arrays configuration (provided by the customer) Connected the network cables (provided by the customer) to the network switches (provided by the customer) in the customer configuration Connected the Fibre Channel cables Applied cable labels Set IP addresses according to customer-provided network assignments Created the backup set for the TS7650 or TS7650G systems Verified that the IBM System Storage TS7600 with ProtecTIER hardware is functioning properly Set up the TS3000, which includes connecting the customer-supplied analog phone line and network cable Installed any clustering hardware and software, if applicable Verified fencing functionality, if applicable
For more details, refer to the IBM System Storage TS7600 with ProtecTIER Introduction and Planning Guide, GC53-1152. After these tasks are complete, the following steps must be performed to fully set up the IBM System Storage TS7600 with ProtecTIER: Enable ProtecTIER SNMP support. Install ProtecTIER Manager Software. Add one or more nodes. Create a repository (TS7650G only). Create a virtual library (robot, drives, slots, and cartridges). Set up ProtecTIER Replication Manager.
6.1 Enabling ProtecTIER SNMP support
ProtecTIER responds to SNMP, implementing MIB and generating the appropriate traps. The server responds to SNMP discovery and queries and to the standard MIB requests.
6.1.1 Defining the IP address
When the server is installed at the user site, the IP address of the SNMP management station (for example, the TS3000 Service Console) must be made available to their ProtecTIER servers to enable SNMP support. To do this: 1. Edit the configuration file. The snmpd file is found in the /etc/snmp directory. 2. From the command line, use the vi editor: vi /etc/snmp/snmpd.conf <Enter> The following output is displayed: ########################################################################### # # snmpd.conf # # - created by the snmpconf configuration program # ########################################################################### # SECTION: System Information Setup # # This section defines some of the information reported in # the "system" mib group in the mibII tree. # syslocation: The [typically physical] location of the system. # Note that setting this value here means that when trying to # perform an snmp SET operation to the sysLocation.0 variable will make # the agent return the "notWritable" error code. IE, including # this token in the snmpd.conf file will disable write access to # the variable. # arguments: location_string #syslocation Unknown (edit /etc/snmp/snmpd.conf) # syscontact: The contact information for the administrator # Note that setting this value here means that when trying to
8.3.5 How to determine what is available for restoration at the disaster recovery site
This section suggests ways for users to determine what catalog and data sets are complete or not complete, matched, and readily available to restore at the secondary/DR site.
Before running a restore for disaster recovery, the user must verify that the list of associated cartridges is completely replicated to the remote site. Otherwise, an earlier full backup image must be used for recovery (usually the previous night's). The easiest way to determine the time of the last full backup is if the user has a specific time each day when the replication backlog is zero (that is, there is no pending data to replicate). If this is not the case, then the user can assess the cartridges by recovering the backup application catalog and scanning it to find the last full backup where its associated cartridges completed replication. The best practice for ensuring that a copy of the catalog is available at the remote site is to use the native replication function of ProtecTIER. Each day, the catalog should be backed up on a virtual cartridge following the daily backup workload so that it will be replicated to the remote site at the end of each replication cycle.
If the catalog is backed up to a virtual cartridge, use the cartridge view of the library in ProtecTIER Manager to query each of the cartridges used for catalog backup to find the most recent sync dates marked on the cartridges. Assuming that there are multiple backup copies, you must find the latest backup that finished replication. To recover the backup application catalog from a backup on a virtual cartridge, you must work with the replicated cartridges to get an updated copy of the catalog to the remote site by performing the following actions: Each cartridge has a last sync time that displays the last time that the cartridge's data was fully replicated to the remote site. (The sync time is updated during the replication, and not just when the replication for this cartridge is finished.) The cartridge marked with the most recent last sync time date should be used to recover the backup application catalog.
DR test operation in a multiple (two) domain backup environment
This use case describes the option to perform a DR test to simulate a scenario of disaster at the primary site and the recovery operation from the replicated cartridges at the remote/DR site. This scenario assumes that: Backups may still run at the primary/local site (as this is just a DR test). The remote site has different/separate backup servers from the local site. Some or all of the cartridges may be part of a visibility-switch-enabled policy.
Backups can be running that the local primary site while recovery of the remote site is occurring from replicated cartridges. Once the remote site is remote, site replication will resume between the two sites. See Table 10-2.
Table 10-2 NetBackup DR test simulation in two domain backup environment NBU Server/media 1 ProtecTIER 1 (local site). Lib A. NBU Server/media 2 ProtecTIER 2 (remote/DR site). Lib A'. Lib B physical library. Local site. Prerequisites Create repository. Create libraries. Install the ProtecTIER Replication Manager SW module (on the local or remote ProtecTIER node): Create Grid A. Add a repository to Grid A. Pair local and remote repositories. Create a replication policy to select the cartridges for replication and the specific remote library for the visibility switch. DR test use case Run regular backup activity to Lib A (to cartridges included in the replication policy). Enter DR mode using the designated ProtecTIER Manager wizard. Recover the backup application from the replicated catalog/database (either from the catalog backup cartridge or from other means). Rescan the backup application to learn the library dimensions. Backups can continue running to Lib A while the system is in DR mode. However, no data will be replicated to the remote until exiting DR mode. As a result, the new backup data to be replicated is accumulated as a replication backlog/queue. Run the command line to import cartridges that are located in the import/export slots. This imports all available cartridges in these slots into the designated library. Remember that all cartridges in the library are in read-only mode. Move required cartridges from the repository shelf to the required libraries (through ProtecTIER Manager). Create repository. Create libraries. Remote site.
NBU Server/media 2 Eject all cloned physical cartridges from Lib B (including the catalog backup) and save them in a safe location for recovery purposes. The cloned (virtual) cartridges cannot be left in the library, as the next visibility switch iteration will run over the backup application catalog/database. Therefore, the cartridges used for cloning will be considered as scratch. Once duplication/clone operation is completed, run vault policy to eject all Lib A' required cartridges using vault_policy_Remote. All ejected cartridges will move from Lib A' to the repository 2 (remote) shelf.
Cartridges ejected from Lib A' will move from the repository 1 (local) shelf to Lib A import/export slots. To move the cartridges into the library, the user must issue a command to import them: 'vltinject <vault_policy_local>' This script must run in a loop until all respective cartridges are imported into the library. Once cartridges are imported, they can be used for new backups. Recovery at the remote site from duplicate/cloned (physical) cartridges. Every cloned box consists of two types of cartridges: The backup application catalog consistent with this complete set of cartridges The cloned cartridges For every box of cartridges that requires recovery, perform the following steps: 1. Import all cartridges from the box into library B. 2. Recover the backup application from the catalog/database located on one of the cartridges. 3. Rescan the library dimensions. 4. Restore cartridges from Lib B to the backup server. 5. Once Restore operation completes, eject all required cartridges from Lib B to the box. 6. Continue with the next box. Full/selective recovery at the remote site from replicated cartridges. Recover the backup application from the latest complete catalog/database located on one of the virtual cartridges in Lib A'.
NBU Server/media 2 Restore cartridges from Lib A' to the NBU backup server: If selective restore is required, scan the catalogue/DB for the cartridge containing the exact file. If full recovery is required, restore all required cartridges.
Figure 11-22 Port Attributes section, Nodes window in ProtecTIER Manager
The Link speed column shows the transmission speed of the port. Possible values are: AUTO: a transmission speed that is auto-negotiated between the two ports depending on the combined highest possible link speed. 1 GB: a fixed transmission speed of 1 Gigabit per second. 2 GB: a fixed transmission speed of 2 Gigabits per second. 4 GB: a fixed transmission speed of 4 Gigabits per second. DOWN: There is no Fibre Channel connection. The Topology column displays the Fibre Channel topology of the port. Possible values are: LOOP: Fibre Channel Arbitrated Loop connection. P2P: Peer-to-peer connection. DOWN: There is no Fibre Channel connection. The User setup column is the user-assigned link speed and topology. Possible values are a combination of the Link Speed and Topology column values above, separated by a comma. There is also the Scan button (marked with a pair of glasses icon) displayed in the far right column for each port. Clicking this icon opens the Scan Port dialog box. Scanning the port displays a numbered list of the WWNs of the remote ports detected by the port. This is useful during the initial setup or when diagnosing problems.
Note: Scanning the port causes a disruption to any active traffic using the link. A dialog box is displayed that asks you to confirm that you want to continue with the port scan (Figure 11-23.)
Figure 11-23 Scan port confirmation window in ProtecTIER Manager
Once the scan port operation has completed, the window shown in Figure 11-24 appears.
Figure 11-24 Scan Port result window in ProtecTIER Manager
The Version Information section
The Version Information pane (Figure 11-25) displays information about the version of the ProtecTIER, the model (in this case TS7605G), the Linux RPM version, and the DTC Emulex RPM installed and running on the node. Only the node that was defined first in a two-node cluster will display the TS7650G model information. The second node in the two-node cluster displays NA.
Figure 11-25 Version Information section, Nodes window in ProtecTIER Manager
The Fibre Channel Ports Throughput section
The Fibre Channel (FC) Ports Throughput pane displays the rate of data movement and I/O operations for both read and write operations for the node (Figure 11-26). The data movement rate is also displayed graphically for each front-end Fibre Channel port on the node. The bars will be colored blue (for read) or orange (for write). There is enough space for four bars to be displayed at once in the bar graph, one bar for each FC port. You can change the scale of the graph by editing the value in the Scale Graph To field to see the throughput rates in finer or fewer details.
LAB VALIDATION REPORT
IBM TS7650G ProtecTIER
Enterprise-class Data De-duplication
By Brian Garrett
With Claude Bouffard
November, 2008
Copyright 2008, The Enterprise Strategy Group, Inc. All Rights Reserved.
ESG LAB VALIDATION
IBM TS7650G: Enterprise-class Data Deduplication
Table of Contents
Table of Contents..... i Introduction..... 1 Background..... 1 Introducing the IBM TS7650G.... 2 ESG Lab Validation..... 4 Getting Started..... 4 Performance..... 6 Capacity Savings.... 10 Fault Tolerance..... 12 ESG Lab Validation Highlights.... 14 Issues to Consider.... 14 ESG Labs View..... 15 Appendix..... 16
ESG Lab Reports
The goal of ESG Lab reports is to educate IT professionals about emerging technologies and products in the storage, data management and information security industries. ESG Lab reports are not meant to replace the evaluation process that should be conducted before making purchasing decisions, but rather to provide insight into these emerging technologies. Our objective is to go over some of the more valuable feature/functions of products, show how they can be used to solve real customer problems and identify any areas needing improvement. ESG Labs expert third-party perspective is based on our own hands-on testing as well as on interviews with customers who use these products in production environments. This ESG Lab report was sponsored by IBM.
All trademark names are property of their respective companies. Information contained in this publication has been obtained by sources The Enterprise Strategy Group (ESG) considers to be reliable but is not warranted by ESG. This publication may contain opinions of ESG, which are subject to change from time to time. This publication is copyrighted by The Enterprise Strategy Group, Inc. Any reproduction or redistribution of this publication, in whole or in part, whether in hard-copy format, electronically, or otherwise to persons not authorized to receive it, without the express consent of the Enterprise Strategy Group, Inc., is in violation of U.S. Copyright law and will be subject to an action for civil damages and, if applicable, criminal prosecution. Should you have any questions, please contact ESG Client Relations at (508) 482.0188.
-iCopyright 2008, The Enterprise Strategy Group, Inc. All Rights Reserved.
Introduction
The IBM System Storage TS7650G ProtecTIER De-duplication Gateway based on innovative Diligent ProtecTIER technology acquired by IBM in 2008, was designed to deliver the scalability, performance, and ease of use required to meet the data protection needs of midrange and enterprise-class data centers. This ESG Lab Report examines the enhanced ease of use, performance, and fault tolerance of the latest version of ProtecTIER inline de-duplication software running within a clustered pair of IBM TS7650G appliances.
Background
A recent ESG survey of IT decision makers within enterprise-class organizations with more than 1,000 employees indicates that the need to reduce backup times and the cost of storage systems top the list of data protection challenges.1 At the root of these challenges is an inability to keep pace with the growing capacity of information that needs to be protected. As a matter of fact, enterprise-class organizations are more likely to report capacity growth as a challenge (61% as shown in Figure 1 vs. 52% for all respondents). These are also more likely to be faced with stringent service level agreements, compliance initiatives, and legal discovery requestsall of which are driving the need to reduce recovery times and improve the reliability of backup and recovery processes.
FIGURE 1. ENTERPRISE-CLASS DATA PROTECTION CHALLENGES
Source: ESG Research Report, Data Protection Trends, January 2008
-1Copyright 2008, The Enterprise Strategy Group, Inc. All Rights Reserved.
IBM TS7650G: Enterprise-class Data De-duplication
Enterprise-class data protection challenges are driving IT managers to adopt a number of technologies, including backup to disk, virtual tape library (VTL), and data de-duplication. As a matter of fact, 73% of enterprises surveyed by ESG indicate that they are using a disk-to-disk-to-tape backup process, 28% have deployed a VTL solution, and 22% have deployed data de-duplication with another 38% planning to do so. Clearly, these technologies are changing the way that organizations deal with growing backup and recovery challenges.
Introducing the IBM TS7650G
The IBM TS7650G is a gateway-based VTL solution used for disk-to-disk backup as shown in Figure 2. Backup software running on a media server connects with one or two TS7650G gateways over a Fiber channel (FC) storage area network (SAN). ProtecTIER software running on the TS7650G servers emulates one or more tape libraries. From the media servers perspective, disk capacity managed by the TS7650G behaves like a tape library, so there is no need to change existing backup software, policies, or procedures. Acting as a gateway, the IBM TS7650G manage access to virtualized pool of disk capacity within one or more FC attached disk arrays.
FIGURE 2. AN ENTERPRISE-CLASS DATA PROTECTION ARCHITECTURE
ProtecTIER software running on TS7650G gateway provides: Virtual tape library emulation for capacity within one or more SAN attached disk arrays Easy deployment and integration with existing backup software, infrastructure, and processes Enterprise-class capacity and performance scalability Enterprise-class data integrity Inline data de-duplication, which can be used to reduce retained capacity by a factor of 25:1 or more
Inline data de-duplication is powerful technology capable of drastically reducing the capacity required to store backup data on disk. The concept of data de-duplication is simplewhen multiple copies of the same data are sent to a system, the system finds the redundancy and stores only one copy of the data as it maintains an index to keep track of all data within the system. The motivation for data de-duplication is also simplestoring fewer copies of the same data can significantly reduce the capacity required to keep backup images on disk for quick and reliable restores.
-2Copyright 2008, The Enterprise Strategy Group, Inc. All Rights Reserved.
The concept of inline de-duplication is also simple. It eliminates redundant data as it is being backed up. Compared to post-process de-duplication, which removes redundant data after backup has landed on disk, inline de-duplication eliminates the capacity required to store backup data in its native format. Inline de-duplication also eliminates the overall performance impact that post-process de-duplication can have on processes that typically run after backup job have completed (e.g. a clone to physical tape for offsite vaulting or the attempted recovery of corrupt data). To better understand data de-duplication technology, consider the example of a PowerPoint presentation attached to an e-mail. If the e-mail is sent to multiple recipients and then forwarded to yet another set of recipients, data de-duplication technology can be used to store the presentation only once. This is an example of data de-duplication technology working at the file level. Next, consider what happens when one of the e-mail recipients modifies a slide in the presentation and forwards it to a group of colleagues. Block level data deduplication algorithms, like those utilized by ProtecTIER, can be used to store only the new, unique data associated with the changed slide. The inline de-duplication algorithm at the core of the ProtecTIER platform is called HyperFactor. Using an efficient in-memory index to keep track of where duplicate data is stored, HyperFactor was designed to provide high-speed de-duplication services for organizations with enterprise-class performance, scalability, and availability requirements.2 The latest version of ProtecTIER software includes support for a two node cluster of inline de-duplication gateways. As shown in Figure 3, a pair of TS7650G gateways running ProtecTIER software is used to create a single repository of de-duplicated backup data. The two gateways share a global file system stored on disk to multiply the performance and capacity savings that can be achieved with a single node. With state information stored safely on disk, this approach can also be used to ensure that backup and recovery operations are available after a hardware failure.
FIGURE 3. PROTECTIER HYPERFACTOR DATA DE-DUPLICATION
The balance of this report presents the results of ESG Lab testing, which was designed to confirm IBMs claims of up to 900 MB/sec or more of aggregate backup performance, retained capacity savings of 90% or more, and enterprise-class fault tolerance of ProtecTIER version 2.1 software running on a clustered pair of IBM TS7650G gateways.
The TS650G is ideally suited for organizations that need to backup more than 10 TB of data nightly.
-3Copyright 2008, The Enterprise Strategy Group, Inc. All Rights Reserved.
ESG Lab Validation
ESG Lab performed hands-on testing of an enterprise-class TS7650G inline de-duplication solution at an IBM facility located in Tel Aviv, Israel. A pair of TS7650G gateways were deployed as a fault tolerant active-active cluster to test the speed, reliability, and recoverability of backup and recovery operations while avoiding the inconveniences of changing existing backup software, infrastructure, policies, and processes.
Getting Started
Three quad core servers running Red Hat Linux and Veritas NetBackup software were used as media servers for backup and recovery operations.3 The media servers were connected to the TS7650G appliances through a FC SAN. A total of eight 4 Gbps FC connections provided up to 32 Gbps of theoretical bandwidth for backup and restore operations. The TS7650G gateways were connected to an IBM DS8300 disk array using eight 4 Gbps FC connections. A FC SAN was implemented between the media servers and the TS7650G appliances using a QLogic SANbox 5200 FC switch. The storage array was directly connected to the TS7650G gateways.4
FIGURE 4. ESG LAB TEST BED
A two-bay IBM DS8300 disk array housed GB 15K RPM FC drives. The DS8300 was configured with 50 TB of usable backup capacity accessed through 36 RAID-5 LUNs.3 The ProtecTIER management console was used to configure and monitor the solution through an Ethernet connected web browser. ProtecTIER Manager was used to browse the test bed, beginning with the virtual tape library configuration. As shown in Figure 5, the system was configured as an ATL P3000 tape library with 64 tape drives and 964 tape cartridges.
See the Appendix for more configuration details. While IBM supports SAN connections as well, it should be noted that the direct connect methodology used during ESG Lab testing reduces the number of FC switch connections which lowers the cost of the total solution.
-4Copyright 2008, The Enterprise Strategy Group, Inc. All Rights Reserved.
FIGURE 5. VIEWING VIRTUAL TAPE LIBRARY CONFIGURATION WITH PROTECTIER MANAGER
ESG Lab noticed that the GUI has been enhanced since it was tested by ESG for the first time in 2006. Most noticeable was the addition of tabs to manage from either the system- or node-level due to the introduction of clustered support in ProtecTIER version 2.1. The user interface has also improved in terms of usability. The clear and valuable depiction of used versus nominal disk space, which shows the capacity utilized and savings provided by ProtecTIER, are more prominently displayed. The excellent performance and trending graphs, absent in a number of early data de-duplication solutions tested by ESG Lab, have been enhanced as well. Its obvious that IBMs excellent GUI design principles are working their way into the ProtecTIER management console. ESG was also pleased to see that ProtecTIER documentation has been expanded and enhanced in accordance with IBM standards. Pre-configured Veritas NetBackup resource definitions and backup policies were reviewed. From a backup administrators perspective, managing the 64 tape drives presented by the VTL software running on the TS7650G gateways felt exactly the same as a physical ATL P3000 tape library.
Why This Matters
Unrelenting capacity growth and shrinking backup windows are driving a growing a number of enterprises to deploy backup to disk processes. As a matter of fact, a recent ESG survey indicates that 70% of enterprise-class organizations have already deployed a backup to disk solution due to the fast backup and recovery performance of disk compared to tape. FC-attached VTL technology simplifies the deployment and management of a disk-based backup solution-especially for large organizations that have standardized on FC SAN technology. ESG Lab recently spoke with a customer, who summed it up well when he said, Deploying a pair of IBM TS7650G gateways was simple. My backup administrators didnt have to learn anything new.
-5Copyright 2008, The Enterprise Strategy Group, Inc. All Rights Reserved.
Performance
Midrange and enterprise-class data centers backing up ten terabytes of data or more nightly are faced with a number of conflicting challenges. The system needs to be fast to avoid a missed backup window. The system needs to store and retain tens, or hundreds, of terabytes of backup data for quick and reliable restores. And the cost of the system, including the cost of the relatively expensive disk drives, has to fit the budget. While inline de-duplication can drastically reduce the amount of disk capacity (and hence the cost), it is a complex operation that must be done in real-time, it has to be fast, and it needs to support a large pool of capacity so that it can find and eliminate duplicate data throughout the enterprise. The HyperFactor technology at the heart of the IBM TS7650G solution was designed to meet the performance and capacity needs of enterprise-class data center environments. ProtecTIER version 2.1 software extends the performance and capacity scalability of HyperFactor technology using a clustered pair of TS7650G gateways, which implement de-duplication over a shared pool of capacity. Supporting up to a petabyte (1PB) of physical capacity and twenty five petabytes (25 PB) of retained backup capacity, IBM claims that a pair of TS7650G gateways can deliver up to 900 MB/sec or more of sustained aggregate backup performance. ESG Lab Testing The performance of a single and dual-node TS7650G solution was tested using the configuration depicted in Figure 4 and documented in the Appendix. Backup and restore performance was monitored using the ProtecTIER manager GUI as shown in Figure 6. The aggregate performance of 64 backup jobs running in parallel was used to confirm that ProtecTIER was reporting the same performance as NetBackup.5
FIGURE 6. MONITORING AGGREGATE BACKUP PERFORMANCE WITH PROTECTIER
The performance screenshot shown in Figure 6 was taken as 60 backups ran a 3rd consecutive full backup with a data change rate of 3% between each backup. Results for the full series of backup and restore jobs will be presented in detail later in this section. This particular screen shot is presented to document performance during
The sum of the throughput reported by each Veritas NetBackup job was compared to the performance shown in the ProtecTIER GUI.
-6Copyright 2008, The Enterprise Strategy Group, Inc. All Rights Reserved.
the test with the fastest aggregate performance witnessed by ESG Lab. It also illustrates how performance is spread relatively evenly across each of the TS7650G gateways (NY and LA) and each of the front-end FC interfaces within each gateway (FE ports). ESG Lab performance testing began with an analysis of how aggregate backup scales as the number of backup jobs running in parallel are increased. The performance scalability of a clustered pair of TS7650G gateways for the first set of full backups is depicted in Figure 7.
FIGURE 7. ADDING STREAMS TO INCREASE AGGREGATE BACKUP PERFORMANCE
What the Numbers Mean The first full backup job, running alone (single stream), sustained 125 MB/sec of performance as inline deduplication processed backup data in real-time and reduced the amount of data stored on disk. ESG Labs experience testing inline de-duplication solutions indicates that 125 MB/sec for a single backup stream is an excellent level of performance. A sustained performance level of 125 MB/sec can be used to protect up to 3.6 TB of data in an eight hour shift. Performance scales in a near linear fashion as the number of backup-job streams is increased. A maximum performance of 1,040 MB/sec (1 GB/sec) for 24 full backup streams running in parallel exceeds IBMs conservative claims of up to 900 MB/sec of aggregate backup performance for a pair of clustered TS7650G gateways.
-7Copyright 2008, The Enterprise Strategy Group, Inc. All Rights Reserved.
A series of daily full backup jobs with a daily change rate of 3% between each backup job was run. The aggregate performance for 64 backup jobs (streams) running in parallel was recorded. A Veritas NetBackup verify job was run after the third full backup to measure maximum theoretical restore performance.6 Single and clustered dual-node TS7650G solutions were used to measure performance scalability when moving from a single node to a cluster of two TS7650G gateways. The results are shown in Figure 8.
FIGURE 8. ONE VERSUS TWO NODE AGGREGATE PERFORMANCE
TABLE 1. PROTECTIER AGGREGATE PERFORMANCE ANALYSIS
Operation 1 full backup 2nd full backup 3rdt full backup full restore
One Node (MB/sec)
Two Nodes (MB/sec) 1,043
870 1,420
1,260 1,390 1,965
Normally, a verify job is used to verify the integrity of a previously backed up data. In this case, the verify operation was used to measure the maximum restore performance of TS7650G appliances without incurring the potential performance overhead of the file system and disk array where the data is being restored.
-8Copyright 2008, The Enterprise Strategy Group, Inc. All Rights Reserved.
What the Numbers Mean The aggregate backup performance of a two node cluster is significantly greater than a single node-even as the two nodes work together to de-duplicate a common pool of backup data. Compared to a single node, two nodes working together increases backup performance for the first full backup by 67% percent Aggregate performance increases for the second and third backup jobs. This is due in part to the fact that less data is being written to disk (only the 3% of data that has changed since the last backup). The speed and efficiency of HyperFactor inline de-duplication is also a factor. A maximum aggregate performance of 1,371 MB/sec (1.37 GB/sec) was recorded for the 3rd set of full backups. This is well in excess of IBMs conservative claims of 900 MB/sec. An aggregate backup rate of 1,371 MB/sec (4.9 TB/hour) can be used to protect 39.5 TB of data in an eight hour shift. A peak aggregate restore performance of nearly 2 GB/sec is impressive, considering the fact that HyperFactor is running the de-duplication algorithm in reverse. This clearly demonstrates the power and efficiency of the in-memory indexing algorithm at the heart of HyperFactor technology.
ESG spoke recently with an IBM customer that has deployed a single TS7650G gateway connected to an IBM XIV Storage System full of SATA disk drives. He currently has 280 TB of IBM TSM backup data stored on IBM XIV disk. An upgrade to a two node TS7650G solution is scheduled for the next maintenance window. The single node system has improved performance by 30%, compared to the enterprise-class tape library used previously. While hes impressed with the backup performance and expects that it will improve when he upgrades to a clustered pair of TS7650G gateways, hes most impressed with his new-found ability to more quickly respond to ad hoc restore requests.
ESG research indicates that the number one data protection challenge reported by IT professional within enterprise organizations is the need to reduce backup times.7 Quicker recoveries are also needed to meet service level agreements and increase end-user productivity. ESG Lab has confirmed via hands-on testing that IBM can easily exceed its aggregate backup throughput claims of 900 MB/sec for a two node TS7650G solution. A peak aggregate backup performance of 1.37 GB/sec and 1.97 GB/sec for restores was observed. Based on experience testing a number of backup to disk solutions, ESG Lab is extremely impressed with the performance of a clustered IBM TS7650G solution especially given the fact that it performs inline de-duplication to reduce the amount of disk capacity required to retain backup data on disk.
Source: ESG Research Report, Data Protection Trends, 2008
-9Copyright 2008, The Enterprise Strategy Group, Inc. All Rights Reserved.
Capacity Savings
Data de-duplication is a resource-intensive process of examining data to identify and eliminate redundancy. It can have a significant impact on the capacity of data stored, which, in turn, can deliver significant economic benefits. IBM TS7650G solutions use an inline de-duplication method, which eliminates duplicate data as it is being backed up. The solutions use an efficient algorithm that uses an in-memory index to keep track of duplicate data. This efficient approach, referred to as HyperFactor, is designed to meet the extreme scalability and performance requirements of enterprise-class data center environments. The level of disk capacity savings that can be achieved with data de-duplication varies according to the backup policy in use, the number of backup images retained on disk, (a.k.a. retention policy), the rate at which data is changing the amount of duplicate data found within an organization. The ratio of capacity backed up versus capacity stored on disk after de-duplication is generally known in the industry as the data de-duplication factor. An ESG survey of early adopters of de-duplication solutions indicates that factors between 10:1 and 20:1 are commonly achieved.8 A de-duplication ratio of 10:1 reduces disk capacity requirements by 90%. ESG Lab Testing ESG Lab recorded the capacity backed up and stored on disk during the performance tests presented earlier in this report. An 875 GB collection of file data designed to mimic the contents of typical office productivity files (e.g., documents, spreadsheets, presentations) was used during this round of ESG Lab testing. The data deduplication results presented below are similar to those recorded during a round of ESG Lab testing performed in 2006 using Oracle, Exchange, office productivity, and audio files harvested from production backup data sets. Daily changes were emulated using an IBM utility that randomly changed 3% of the data after each set of backup jobs had completed. The amount of disk capacity backed up by Veritas NetBackup and consumed on disk after de-duplication was recorded using the ProtecTIER GUI after each backup job. A sample screen shot is shown in Figure 9. In this example, 3,500.7 GB of backed up data consumes only 634.4 GB of disk capacity.
FIGURE 9. PROTECTIER CAPACITY UTILIZATION CONSOLE
Values recorded for the first four backups were used to project the savings for the 8th thru 32nd backup jobs. The results are shown in Figure 10.
28% report up to 10x reduction, 40% between 10x and 20x, 16% greater than 21x and 17% dont know. Source: ESG Research Report, Data Protection Trends, 2008
- 10 Copyright 2008, The Enterprise Strategy Group, Inc. All Rights Reserved.
IBM TS7650G: Enterprise-class Data De-duplication FIGURE 10. DE-DUPLICATION CAPACITY SAVINGS
TABLE 2. PROTECTIER DATA DE-DUPLICATION CAPACITY SAVINGS
Backup Iteration 1 3
Backed Up (MB)
Consumed (MB)
De-duplication Ratio 1.6 3.0 4.3 5.5 8.5 19.5 20.6
rd th th
28th 32nd
875 1,750 2,626 3,501 7,001 24,500 28,000
1,256 1,361
ESG research indicates that cost is the leading reason why non-adopters have not yet embraced a diskbased backup strategy.9 Data de-duplication addresses the cost issue by reducing the disk capacity required to maintain multiple generations of backup data on disk for quick and reliable restores. ESG Lab has confirmed that HyperFactor inline de-duplication can be used to reduce retained backup to disk capacity requirements by a factor of up to 20 to 1.
- 11 Copyright 2008, The Enterprise Strategy Group, Inc. All Rights Reserved.
Fault Tolerance
Two ProtecTIER gateways can be configured as a cluster. The ProtecTIER clustered approach not only maximizes performance and de-duplication savings as shown earlier in this report, it also minimizes downtime in the unlikely event of a ProtecTIER TS7650G hardware failure. ProtecTIER uses an active-active cluster approach that ESG refers to as a true cluster. As shown in Figure 11, a global file system, which is maintained on disk and shared by each node in the cluster, is used to create a single pool of highly available backup to disk capacity. With system configuration, state, and status information stored safely on disk, two nodes can work together while sharing a single repository. Any node can service backup or restore requests for any virtual cartridge. Clustering and the global file system ensure that backup and restore services are highly available.
FIGURE 11. HIGH AVAILABILITY/DR TEST CASE
If one of the nodes in the cluster fails, the system continues to be available for new backup and restore requests. Backup jobs running on virtual tape drives being serviced by a failing node eventually time out and fail in ProtecTIER version 2.1. Jobs running on the surviving node continue without error. Failed jobs are restarted and serviced by the surviving node. Virtual cartridges that appear to be stuck in a failed tape drive from a backup software perspective can be reassigned and found with a routine backup software inventory job if needed. A higher level of non-stop availability using host-based failover drivers is planned for a future release of ProtecTIER software. ESG Lab Testing The configuration and data sets used during the performance and de-duplication tests presented earlier were used during this phase of testing. To simulate an outage, errors were injected during the fourth generation of backup jobs running on three media servers. Ten Veritas NetBackup jobs running in parallel were used to assess the performance and availability of a ProtecTIER cluster after a node restart, a node shutdown, and a node power failure. Auto Restart A manual restart of a ProtecTIER node was performed as backup jobs were balanced across both nodes in the cluster. A sustained aggregate backup rate of 1.3 GB/sec was noted ten minutes after the backup jobs had started. As one of the nodes was restarting, aggregate performance dropped sharply for a minute and then increased to 600 MB/sec. As expected, the aggregate throughout during the failure was roughly half that of a healthy two node system. Most of the jobs continued to completion after the restart. One of the five NetBackup jobs that had timed out was restarted and completed without error. ProtecTIER Manager and the Veritas Netbackup GUI were used to confirm that the job had been restarted and was being serviced by the node that had been restarted.
- 12 Copyright 2008, The Enterprise Strategy Group, Inc. All Rights Reserved.
Manual Shutdown A manual shutdown of a ProtecTIER node was performed to assess the ability of the system to ride through a planned service event (e.g., a code or hardware upgrade). This test was also used to confirm that jobs running on a node that had been shut down are automatically migrated to a surviving node when the backup job is restarted. The screenshot shown in Figure 12 shows the state of the system after the shutdown. Note that the offline node is depicted in red. The surviving node, depicted in green, is online and acting as a proxy for the management GUI. It should also be noted that the surviving appliance is delivering excellent aggregate performance of 867 MB/sec.
FIGURE 12. PROTECTIER SYSTEM STATUS AFTER A NODE SHUTDOWN
As expected, the jobs running on the surviving node completed without error. Jobs that were being serviced by the shut down node hung and eventually timed out. Veritas NetBackup reported that it cannot write image to media due to an input/output error. While the node was shutdown, one of the failed jobs was restarted and completed without error. ProtecTIER Manager was used to confirm that the virtual tape library and cartridge used by the restarted job had been automatically migrated to the surviving node. Power Failure A power failure was tested to verify that backup jobs being serviced by a failed node can be reassigned, discovered, and restarted. The node power failure was induced as ten backup jobs were running on one of the NetBackup media servers. Before the power failure, it was noted that jobs were balanced across both nodes in the controller. Jobs running on the surviving node completed without error. ProtecTier Manager was used to unload and move one of the virtual tape cartridges being used by the failing node at the time of the power failure. The virtual tape drives cartridge was reassigned to the surviving node. A NetBackup inventory job discovered the cartridge and a NetBackup verify of the virtual tape cartridge completed without error.
Midrange and enterprise-class organizations with mission critical applications that can tolerate little or no downtime cannot afford the risk of their data protection infrastructure being unavailable. ESG Lab has confirmed that ProtecTIER clustering technology can be used to ensure that backup and recovery services remain available in the unlikely event of an IBM TS7650G hardware failure.
- 13 Copyright 2008, The Enterprise Strategy Group, Inc. All Rights Reserved.
ESG Lab Validation Highlights
From a backup administrators perspective, managing backup jobs running on a virtual tape library presented by ProtecTIER felt exactly the same as a physical tape library. ESG Lab noted that the ProtecTIER GUI and documentation are benefitting from the use of IBM standards and guidelines. Excellent single stream backup performance of 125 MB/sec was recorded. A dual-node cluster reached a peak aggregate backup performance up to 1,390 MB/sec. A peak aggregate restore rate of 1,965 MB/sec was measured. Inline data de-duplication capacity savings of 20 to 1 were confirmed using a daily full backup policy, a three percent daily change rate, and a retention period of 30 days. Availability of backup and recovery services after an appliance reboot, shutdown and power failure was confirmed.
Issues to Consider
ESG Lab believes that extremely large enterprise-class data center environments would benefit from the support of more than two nodes per cluster to further improve performance and data consolidation. IBM has advised ESG that multiple node cluster support is planned for a future release of the TS7600 line of offerings While optional two-node clustered support increase the fault tolerance of a TS7650G solution, multi-path failover drivers and integration with industry leading software would provide even greater levels of nonstop reliability. IBM has advised ESG Lab this is planned for a future release. The results presented in this document were obtained in a controlled test environment which was designed to stress the performance limits of TS7650G solutions. Performance in production environments will vary due to a number of factors including the performance of file systems, primary storage systems and networks.
- 14 Copyright 2008, The Enterprise Strategy Group, Inc. All Rights Reserved.
ESG Labs View
IT managers have been struggling for decades to keep up with the relentless growth in the volume of data that needs to be protected. The challenges are particularly acute for IT managers responsible for the protection of vital information assets within medium to enterprise-class data centers. Traditional backup methods are straining under the load. Backup windows are shrinking. The number of recovery requests is increasing. Budgets are being stressed. The frequency of legal and regulatory discovery requests is rising. Risk levels and blood pressures are rising as a growing number of IT managers wonder if data can be recovered quickly. A growing number of organizations are turning to disk-based backup methods to improve the speed and reliability of backup and recovery operations. As a matter of fact, ESG research indicates that 73 percent of enterprise-class organizations have already deployed some form of disk-based backup solution.10 While disk can be used to accelerate performance, the costs are high compared to legacy tape solutions. Data deduplication addresses the cost issue by reducing disk capacity requirements. As a result, a growing number of organizations (22 percent according to a recent ESG survey) have deployed a disk-based data de-duplication system for improved performance when storing and accessing active backup data. The strategic blend of disk and tape (for infrequently accessed data and/or long term archive) is being used by a growing number of organization to meet performance objectives, minimize costs, reduce energy consumption and protect critical data at all levels. ESG Lab was impressed with the disk-based ProtecTIER inline de-duplication architecture when it was first tested in 2006. Initial deployment using existing backup software and processes was easy. Enterprise-class levels of performance were measured (266 MB/sec for a single server running ProtecTIER inline de-duplication software). Capacity was reduced by a factor of 28 to 1 for a months worth of retained full backups. Since then, ProtecTIER solutions have been adopted by more than 200 world-wide customers, many with multiple deployments. More than 30 PB of managed capacity has been deployed in some of the largest enterprise-class organizations in the world including three of the top ten global telecommunications firms. More recently, ProtecTIER technology was acquired by IBM and version 2.1 software with optional two-node cluster support running on a purpose-built IBM Gateway server was released. IBM TS7650 gateways and two-node cluster support dramatically increased the capacity and performance scalability of a ProtecTIER solution. ESG Lab hands-on testing in 2008 confirmed backup performance of up to 1,371 MB/sec which is well in excess of IBMs conservative claims of 900 MB/sec. Aggregate restore performance of 1,965 MB/sec (7 TB/hour) was measured. ESG Labs experience with a number of backup to disk solutions indicate that these are extremely impressive levels of performanceespecially given the fact that de-duplication is being done in real-time over a logical pool of retained backup capacity of 20 PB, or more, in size. This is a crucial consideration for enterprise-class organizations that need to back up more than 10 TB or more of data nightly. One logical pool is easy to purchase, deploy and manage. One logical pool increases the cost and energy savings that can be achieved with de-duplication. And one logical pool with optional two-node clustered support increases the availability of backup and recovery services as validated by ESG Lab. With high-speed inline data de-duplication that reduces the cost of retained disk capacity, optional two-node cluster support, up to one petabyte (1PB) of physical capacity, and VTL technology that is easy to deploy within an existing backup infrastructure, ESG Lab has confirmed that IBM System Storage TS7650G ProtecTIER Deduplication Gateways are designed to meet the disk-based data protection needs of the enterprise-class data center.
- 15 Copyright 2008, The Enterprise Strategy Group, Inc. All Rights Reserved.
Appendix
TABLE 3. TEST CONFIGURATION
ProtecTIER Software ProtecTIER Gateway Servers
ProtecTIER Storage
Fibre Channel Switch Backup Software Media Server #1
Media Server #2
Media Server #3
Version 2.1 Two IBM TS7650G gateways based on an IBM x3850 server each with a quad core 2.9 GHz Xeon processor, 32 GB of RAM and a pair of 4 Gbps FC HBAs. IBM DS8300 with GB 15K RPM FC drives configured with 36 RAID-5 LUNs for 50 TB of backup data (20 7+1, 8 6+1) and four RAID-10 LUNs for meta data (4+4). QLogic SANbox 5200 Symantec Veritas NetBackup, Version 6, SP 5 Four 2.8 GHz Opteron CPU cores, 16 GB RAM, Red Hat Linux version 4 update 5, one four port Emulex 4 Gbps PCI-Express HBA, driver 8.0.16.27 Four 2.3 GHz Xeon CPU cores, 22 GB RAM, Red Hat Linux version 4 update 5, one four port QLogic 4 Gbps PCI-Express HBA, driver 8.01.4-08 Four 2 GHz Xeon CPU cores, 16 GB RAM, Red Hat Linux version 4 update 5, one four port Emulex 4 Gbps PCI-Express HBA, driver 8.0.16.27
20 Asylum Street Milford, MA 01757 Tel: 508-482-0188 Fax: 508-482-0218 www.enterprisestrategygroup.com
- 16 Copyright 2008, The Enterprise Strategy Group, Inc. All Rights Reserved.
Tags
KD-S707R Autopilot HTS5310S DGS-1224T Travelmate-2700 AQ09NSA DN898 M3904 Navigation RE-29FB51RQ UNO IB CLX-6200ND XR-C7300 MRF-350 6416D Plus SL-3310 CA-600 Acermate 920 SE-A800S Valencia MP36 FC504W1 Elura 85 SGH-J700G SGH-B520B DT-50PY10 TH-37PX70E Mg102C SG-4000 - 2003 MXD-D3 CQ-C3503N P1022 XD250U-ST TA-N1 Meow-CHI TK7022 LH-W9656IA M198WDP-BZJ P241W CDP-XA555ES BBA 2864 Yamaha 8HP SB-600 1401HD CDX-GT200E 32LG4000-ZA AEU XR-CA600X YP-R1J 9321S Wemc10263 CE116KT 2600 Zoom CMT-J100 WS-FV10D 26PFL5522D Monitors 81-30351 523295 Cool Skin VP-DC171W SGH-F275 HBH-DS205 TL98QE Contest WP 1130 P4B533 L16850A 32LE2R AEU Scpt480 MCM239 Motorola V980 MS7127C Bass800 PM4001 BD350 FS-2000D SGH-T629 Funciones Scanner STR-DE845 Series FX-991W C2 105 N620C 2033SW Plus Monitor MX-800 Review WF-T1051TP NAS-D5HD ICD-B200 IC-R1500 LCD4000 4 1 GTX 328 P3200 TD-C70140E Motor 15DE GR232SBF-h- GA-M61pme-s2P LAV88830-W
manuel d'instructions, Guide de l'utilisateur | Manual de instrucciones, Instrucciones de uso | Bedienungsanleitung, Bedienungsanleitung | Manual de Instruções, guia do usuário | инструкция | návod na použitie, Užívateľská príručka, návod k použití | bruksanvisningen | instrukcja, podręcznik użytkownika | kullanım kılavuzu, Kullanım | kézikönyv, használati útmutató | manuale di istruzioni, istruzioni d'uso | handleiding, gebruikershandleiding
Sitemap
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101












