Although I have used many backup products for VMware ESX infrastructure, I never used VMware Data Recovery appliance. So last week I was researching on the same. Here are my notes…may help you to understand the product architecture, issues, limitations etc. Well while I am researching on this product new version (Data Recovery 2.0) has been released by VMware. Here is the white paper for the Data Recovery 2.0. There are some interesting features has been included. When I will get a chance to work on DR 2.0, will post my understanding in this series! Not to duplicate the work – I am going to give you some URLs or copy paste some contains 😉
VMware’s Backup and Recovery product (From VMware Uptime Blog)
One of the many capabilities introduced in VMware vSphere 4 is VMware Data Recovery (VDR), a virtual machine backup and recovery product. Market research and customer feedback showed that many people wanted an integrated option for protecting virtual machines in a VMware environment. Further analysis showed that this was more eminent for VMware customers that had (or plan to have) fewer than 100 virtual machines in their environment and where IT responsibilities (including VMware) were shared among 2-3 IT administrators (as opposed to having a dedicated VMware administrator on-staff).
VMware has been helping customers address their backup challenges in two ways: making significant investments in the vStorage APIs for Data Protection that third-party backup tools use to integrate their backup/recovery products with vSphere, and in providing an integrated option optimized for vSphere customers with smaller environments. VDR is built using the vStorage APIs for Data Protection and incorporates a user interface, policy engine and data duplication – see the diagram below on how it all fits together. I’ll cover these blocks in a series of blogs but I wanted to start out by discussing Data Deduplication (de-dupe).
Given that we had a made a decision to only use disks as the destination for the VDR backups, we had to look for a solution that offered disk storage savings – and this is where de-dupe comes in. In a nutshell, de-dupe avoids the same data to be stored twice – and de-dupe is HOT – just check out the mergers and acquisitions news!
What VMware decided to implement for VDR de-dupe is (take a deep breath) – block based in-line destination deduplication. Deconstructing it means the following:
1. We discover data commonality at the disk block level as oppose to the file level.
2. It is done as we stream the backup data to the destination disk as opposed to a post-backup process.
3. The actual de-dupe process occurs as we store the data on the destination disk as opposed to when we are scanning the source VM’s virtual disks prior to the backup.
When it comes to deduplication, there are different techniques and hash algorithms used to accomplish the result. I am not going to get into a theoretical discussion of the pros and cons of the various types of de-dupe technologies available and which approach provides the best disk savings. I personally think that it totally depends on the customers’ IT environment constraints and their overall business goals plus a lot of the storage savings is going to be data driven anyway (the more data commonality there is, the better the de-dupe rate). We chose this de-dupe architecture because it fit best with what we were trying to achieve with VDR and what the vSphere platform provided to us. What were these reasons? Stay tuned to this space……
VMware Data Recovery Taking Advantage of vSphere 4 (From VMware Uptime Blog)
I wanted to explain in more detail why we chose the type of de-dupe that we did. As I had mentioned in my previous post, we chose to implement block based in-line destination deduplication for VMware Data Recovery (VDR). There are a few reasons for this, two of which are due to enhancements in the VMware vSphere 4 platforms itself.
1) Change block tracking: Any new VM provisioned on vSphere will use virtual hardware version 7 (you can also upgrade your existing VM version 4 to version 7). With VM version 7, the VMkernel tracks the changed blocks of the VM’s virtual disks. (By the way, this the same change block tracking functionality that enhances Storage VMotion in vSphere 4). So, instead of having to scan the VM’s virtual disks to determine which blocks have changed every time a backup occurs, VDR just makes an API call to the VMkernel and gets this information “for free”.
Thus, VDR is able to dramatically cut down the amount of time and CPU cycles to calculate the changed blocks on a virtual disk. In addition, change block tracking also helps on the restore side of the equation. For example, if you wanted to restore yesterday’s VM image, VDR will make the reverse change block API call and will just transfer the changed blocks from yesterday’s backup to revert the VM to its previous state. So, given that there is a lot of intelligence in the platform about virtual disk blocks, block based de-dupe seemed like a natural direction for VDR to take.
2) Hot add disk: VDR can “hot add” virtual disk snapshots directly to the VDR virtual appliance. This is accomplished by leveraging capabilities of the vSphere storage stack. This means that VDR can bypass the LAN and stream the data from the snapshots directly to de-dupe destination disk. In addition to reducing load on the LAN and effectively eliminating the need to block out other LAN traffic during the backup window, the streaming of data to the destination de-dupe disk on the Data Recovery appliance will be considerably faster.
Note that there are three caveats to enabling hot add disk with VDR:
a. The source virtual disks need to be on shared storage
b. The ESX host where the VDR appliance is running needs to have visibility to this shared storage
c. You will need a vSphere edition that includes Hot Add as a feature
The knock against destination (or target) based de-dupe is the fact that it consumes precious network bandwidth with the unnecessary transfer of data that will be discarded as part of the de-dupe process. However, given that VDR only transfers changed blocks and can transfer these blocks off-LAN, the concern did not apply and thus we felt comfortable with a destination based de-dupe architecture.
So does this mean that unless you have both change block tracking and hot add disk features enabled in vSphere 4, VDR and its de-dupe capability is useless to you? Absolutely not! All data that is protected by VDR will be de-duped, so you will enjoy the storage savings independent of what VM version is being backed up or what vSphere edition you are have installed. What change block tracking and hot add disk adds is additional efficiency and performance gains that will allow even more data to be protected in an ever shrinking backup window.
If you like to know more about Changed Block Tracking; here are some good reads…
http://www.yellow-bricks.com/2009/12/21/changed-block-tracking/ (read comment section for Eric’s finding about block size & how it works)
There are bunch of features like Data Integrity check, Retention policy etc are not exposed to outside for reconfiguration. However if you want to change the behavior of the appliance you may like to do some tuning inside the appliance. You can create a “datarecovery.ini” use the same. Here is a VMware KB article talks about the same; “VMware Data Recovery datarecovery.ini options”
Data recovery appliance runs automated Integrity checks for the backup destination store. If due to any reason restore point of a virtual machine get corrupted, it locks the store & it will not execute backup jobs. Till 1.2.1 version there was no automated notification around the same. Every time you have to check manually report or restore section for this kind of error.
If you are facing any issues related to this; below URLs may help you to fix the same.
Remove Lock files on backup destination stores:
Run Manual Integrity check:
How to perform manual integrity check when you have a damaged restore point?
Some final Notes:
Requirement to Implement/Use VMware Data Recovery
- VDR is available only for Essentials Plus, Advanced, Enterprise and Enterprise Plus
- vDR Appliance should have access to all shared datastores
- Virtual Machines must be version 7
Feature Limitations of VMware Data Recovery
- Limited to 100 virtual machine
- Per virtual machine once a day backup
- VDR will not backup virtual machines with Fault Tolerance enabled
- VDR does not support VC Linked mode
- Virtual Machines with hardware version lower than 7 will take longer to be backed up.
- The deduplication feature cannot be disabled so all backups done by VDR are deduplicated
- Not compatible with VI3 hosts. VDR requires the presence of a VMware vCenter Server 4
- VDR will not backup your VM if it is stored in a RDM not in virtual compatibility mode
- Doesn’t support IPv6
- Only Two Deduplication stores per appliance
- 1TB backup destination store limitation
- Maximum 8 concurrent jobs (backup or restore)
- No Backup Server level GUI or Command line tool to do administration
- No Powershell support
- No Email Reporting
- No Replication to another backup store
- No support for tape/external media
Apart from these feature limitations, I like this product very much. It perfectly does what it promises. Check out my installation & configuration guide with lab results over here.