TechOnTip Weblog

Run book for Technocrats

Fundamentals of Data Backup

Posted by Brajesh Panda on July 15, 2012

  1. Types of Backup
    1. Full
  • Complete backup of give data (file/folder/application data/database etc.)
  • Takes Long time to complete the backup as per data size
  • Advantage:     Only need a single backup set to restore the data
  1. Incremental
  • Backup of changed data from Last Backup
  • Advantage:    As amount of data is less, time required to complete incremental backup is less
  • Disadvantage: At the time of restore, you need Full Backup Media + All incremental backup Media; you need a lot of tape
  1. Differential
  • Backup of changed data from Last Full Backup
  • Advantage:    At the time of restore you need only two Media Set i.e. Last full backup + last differentials backup
  • Disadvantage: Differential takes more time because it backup all changes from last full backup
  1. Methods of Backup
    1. Online
  • Target data is live; users are using the data.
    • Like file server – people are already using & your backup software is taking backup
    • Like Oracle server – database is online/live, then your backup software is taking backup
  • During Online Backup as server or services are live, there is a chance of data change; means data is not consistent, data is dirty. So backup agents have special application awareness feature (like VSS etc.) to take consistent backup of data. So at the time of restore consistent good data can be restored.
  • Due to application awareness features backup tools/agents are expensive
  • In this method you don’t need any downtime for application. Because agent can take backup when production server is online & users are working.
  1. Offline
  • Target data is offline mode, no user is using data
    • Like Oracle Server – Database services are down, no user connection & you are just backing the database & log files from different directory.
  • No need of application awareness features like Online backup; because data is already in offline state & clean because no data change is happening during backup job
  • Due to this these softwares cheap.
  • In this method you need downtime for application. This is not possible in every environment.


  1. Types of Backup Media
    1. Portable
  • Those media which can be easily moved (ported) from one place to other .
  • Floppy, CD, DVD, Tape, USB Hard Disk
  1. Non-Portable
  • Those media which can’t be moved from one place to another
  • Backup Server Hard Disks, Connected storage


  1. Decision between Tape or Disk
    1. Tape
  • Tapes are based on magnetic tapes, so they are sequential on read & write
  • Tape Drives RPM (Rotation per Minute ) is less as comparison to hard disks.
  • Each tape has a definitive life period. Means after certain round of write, tapes get damaged.
  • Backup jobs which use tapes are called as Disk to Tape (D2T) backup jobs
  • Need another device to write into tapes; This brings in additional expense
    • Stand Alone Tape Drive
    • Tape Auto Loader
    • Tape Library
  • Tapes are oldest method to take backup & retain for many years
  • Tape size/capacity is limited as per their generation
    • DLT
    • LTO1 – 100GB
    • LTO2 – 200GB
    • LTO3 – 400GB
    • LTO4 – 800GB
    • LTO5 – 1500GB
  • (DLT – Digital Linear Tape)
  • (LTO – Linear Tape Open) 1 starts for Generation 1
  • Above capacity values are capacity of Tape in RAW format, if we compress data at the time of backup it can increase by 50% depending on backup software & what we are backing
  • While writing backup can be compressed to backup more data into the tape.
  • For Data Security Data can be encrypted
  1. Disk
  • Disks are based on electronic motor & data palters and provide random read & write. Random nature of disks provide more data throughput
  • Disks RPM is more than Tape helping in more data throughput
  • Data in disks can be stored for indefinite period
  • Backup jobs used Disk as Media target called as Disk to Disk (D2D) backup jobs


  1. Key Words (Jargons)
    1. Backup Server
  • This is a server where specialized backup server software runs & communicates to backup agents. Manages backup devices & helps to take backup.
  • There are lot of companies who provide backup server software
    • Symantec NetBackup
    • Symantec BackupExec
    • Commvault
    • CA Arcserve
    • EMC Networker
    • EMC Avamar
    • EMC DataDomain
    • Windows Backup Tool
  1. Backup Agents
  • Each Backup server comes with different types of agents.
  • These agents helps the backup server to communicate to source server & they ship selected data to backup server to them backed
  • Agents comes with application awareness feature, so consistent data can be backed up
  1. Backup Job
  • Backup job is a special task in the backup server, where you define what, when & where
  1. Backup Policy
  • If there are 1000 servers, as per backup job you have to create 1000 backup jobs & manage them on day to day basis ; which may not be possible.
  • So we create backup policies with When & Where these backups are going to run
  • And we connect this backup policy to target agents (what); to get the backup jobs created automatically.
  • Any time we want to change in When & Where, we just change in one policy; respective backup jobs get changed automatically
  • You can also configure duplicate backup jobs in the same policy without any complex or repetitive backup job configuration.
  1. Selection List
  • In every backup job we have to select source server & respective files in that server. This means what you are going to backup.
  • Selection list helps to work with backup policy.
  • You create a selection list & connect it to a specific backup policy
  • It is easy to select, unselect or remove any old selection from that list.
  1. Backup Throughput
  • It refers to data transfer rate from Agent to Backup Server Media device
  • Mostly measured in MB\m (Mega Bytes per minute)
  • It depends on multiple factors like source server utilization at the time of backup, source server network card speed & configuration, intermediate network switches & stack, backup server network card configuration, available network bandwidth for the backup server, how many backup jobs are running at a time etc.
  1. Media Set
  • One or Group of media where we have taken backup like tapes, disks etc.
  1. Media Retention
  • On a media set we can define retention means till when we would like to retain those backup.
  • This value depends on life period of the given data
  • Like Weekly, Monthly, Yearly so on
  • So after that period that media can be overwritten or re-cycled
  1. Media Server
  • This is the server which keeps or manages all media sets as per their retention period & at the time of backup jobs it presents those media to the backup server to use for backup
  • Media servers are mostly connected to tape drives, autoloaders, tape library and external disk storage systems


  1. Advance Technologies
    1. Duplicate Backup Set
  • If you want to copy same backup job to multiple media you need to run duplicate backup set jobs.
  • Like if you want to copy same backup set from backup Disk to tape or tape to tape
  1. De-Duplication of Backup
  • While backing up data most of the time we backup same data again & again wasting a lot of space
  • De-Duplication feature removes those duplicated files & just keep one copy of that
  • De-duplication jobs mostly implemented in D2D environment, where jobs are using a disk as media target.
  • De-duplications are two type
    • Source Side De-duplication
      • Backup Agents use this feature to detect duplication at the production server end using production server resources like CPU, Memory, Disk etc. And after duplication detection agent only sends a single copy of data to backup server to get backed up. It reduces network bandwidth but de-duplication process consumes server processing power at source, may be impacting production services.
    • Target Side De-duplication
      • Backup Server use this feature to detect duplication while backing into the target media set or after backup gets completed. So all data including duplicate files get transferred to backup server. Then backup server run de-duplication process on them., and only retains one copy of that data. It consumes network bandwidth utilization. This is mostly implemented solution these days.
  • Detection of duplicate files depends on backup software & vendor.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: