Sunday, October 31, 2010

A $3500 Avamar Case Study!

Searched for Avamar on Amazon.com and first link is to a case study which costs $3500!!!

http://www.amazon.com/Large-Transportation-Company-Rides-Avamar/dp/B002HE1EK2/ref=sr_1_1?ie=UTF8&s=books&qid=1288580313&sr=1-1


Thursday, September 30, 2010

How Avamar has changed my backups

Avamar is nothing short of a revolutionary backup product. While deduplication concepts and algorithms have existed in theory or in bits and pieces in other backup and replication products, I believe it is the first to combine client side and target side deduplication in one hardware and software product.

There are several discussions related to the pros and cons of Avamar found on message boards and email lists, comparing Avamar with more established backup products such as Netbackup, Tivoli and Networker (another EMC product). I want to add to the pros and cons discussion of Avamar and talk about how Avamar has changed my backups. Coming from Commvault and Backup Exec, Avamar was a big change, towards a positive direction.

First the pros:

1) I now measure my daily backups in MBs and GBs. 70% of my servers only backup 200 - 400 MB per day. The other 20% backup 400 MB - 1 GB per day and the remaining 10% backup anywhere from 1 GB to 25 GB per day. If I see a server backing up more than a few GB of data I get worried that I didn't setup exclusions correctly.

2) Backups are really fast with majority of them finishing within 30 minutes. I know this will not apply to all servers and different kinds of data, but it is working out well for my servers. Compare this to Commvault where it took 5 - 10 minutes just to scan the files to backup.

3) No more complicated setups of incremental, differential and full backups. Every backup is full. Avamar combines a daily incremental backup with the previous backups to create a virtual full backup. Unlike a traditional backup product, if you delete previous backups, the subsequent backups are not affected. So there is no golden or master backup to worry about. But just as a disclaimer, SQL backups do have an option for incremental backup, so this doesn't apply to them.

4) Restores are easy and if you are not restoring several hundred GBs or TBs of data, then they are quick. Since no tape is involved, I feel confident about restoring the data. When I was using tape I was always scared that the tape might stop during the restore and my data could become unrecoverable. Probably just an irrational fear.

5) Replicating backups to another Avamer grid is straight forward and restoring from the replicated data is really easy. There are no auxiliary or staged clones or snapshot backups to worry about.

6) There are no media servers or backups nodes. In Commvault and Backup Exec there is a concept of a control backup server, and then several media servers which connect the clients to the tape or backup-to-disk applinces. In Avamar there is one central grid and all clients connect to it for backup.

7) Significantly lowered network utlization. After the first backup, very little data is sent to Avamar, which equals less network useage. Before, I couldn't even count how many slow application response time issues were blamed on "...the backup are running which slows down the network".

8) No more huge spikes in SAN throughput. With Commvault, a graph of the read throughput during the backup window looked like a plateau, a giant plateau. With Avamar, it looks more like a steep mountain which tapers off quickly.


Now the cons:

1) Horrible admin console. Each action opens a new window. For example, if I want to look at the backup activity, that opens a new window. If I want to look at previous backups, that is another window. If I am working on an issue and looking around, I can easily have 6 - 7 windows open. But that's not all, each Avamar grid is managed seperately. So if I am managing both Avamar grids from the GUI, I can have 14 - 16 windows open at any time.

2) Blackout Window. Avamar needs some time to go through old backups and delete them. The deletion of old backups is called garbage collect and can take up to 1 - 3 hours. No backups can run during this time and all running backups are killed. This duration is referred to as the black out window. What this does is set up a hard stop for all backups and forces a backup window. Garbage collect is scheduled to start at 6:00 PM (default, can be changed), and if a client is still backing up data at that time, the backup is killed and the data is not available for restore. So if you had a client that experienced a lot of changed data and is taking a long time to backup, you could potentially run into a situation where the backup is not available for restore.

3) Limited SQL restore options. No incremental restores or log only restores are available.

4) High CPU intensive backups. While a backup is running, the client will experience high CPU useage,sometimes in excess of 70%.

5) Confusing storage limits. Avamar can only be 95% full before becoming read only, at which point old backups have to deleted to create space. With Commvault I always had the option of destaging the backups to tape and freeing up space on my primary backup target. Also, purchasing additional Avamar storage (nodes) is EXPENSIVE!

6) No way for the admin to determine how much space will be freed up by deleting backups. I have asked EMC and they don't know either. This probably ties into the fundamentals of how Avamar works, but it is frustrating when you are close to being full and don't know which backups to delete to free up the most space. The DPN report helps some what, but it is still confusing.

7) Difficult to isolate from the network to test restores. In a traditional backup application, the admin can setup a seperate network, do a disaster recovery restore of the backup server and test restores of critical servers like Active Directory and Exchange. With Avamar, the backups have to be first replicated to a second grid, and then the restores can be tested.

Wednesday, August 4, 2010

Isilon Scale Out NAS

Isilon Scale Out NAS

Last week I sat through a presentation about Isilon's Scale Out NAS Platform. The technology and innovation behind Isilon is interesting and the features and benefits of scale out NAS are quite impressive. Having worked with the EMC Celerra, and Microsoft Windows based NAS devices, I was accustomed to the usual NAS setup: 2 or more NAS heads in an N+1 high availability cluster, with the NAS heads connecting to back-end storage and converting block based data to file based data. Scale out NAS, as implemented by Isilon gets rid of the individual NAS heads and combines the entire unit into one big cluster.

Features of scale out NAS I am most impressed with are:

1) No individual NAS heads. The scale out NAS device is made of individual hardware nodes. These nodes are clustered together and work seamlessly with each other. There is no longer one dedicated standby node, each node is capable of taking over if another node fails.

2) All resources are pooled together. There are no RAID groups, or back-end LUNS. A file system spans over the entire storage system. This leads to high IOPS, redundancy and reliability. However, I don't know how Isilon balances the filesystems on the back-end. As you keep on creating file systems, wouldn't the last filesystem perform poorly compared to the first filesystem?

3) There is no RAID in the traditional sense. Unlike other NAS devices where you define what RAID protection you want for your particular file system, in the Isilon scale out NAS device you define how much redundancy you want for your data. You can configure N+2 data protection and that will protect your data against 2 disk failures or two node failures. All data is striped across all nodes in the cluster, so no one disk or node contains all your data. The best feature? You can define protection per file, per directory, or cluster. Compared to other NAS devices where you define protection at the file system level, in the Isilon systems you have complete flexibility in choosing your protection scheme.

4) Hardware nodes can be added while the NAS cluster is in production and the NAS balances the data across the new nodes. Maybe this answers my concern about file system performance in number 2.

5) No more long rebuild times after disk failures. Since there is no RAID, there are no parity drives. When a disk fails, the free space on all the remaining drives is used to recreate the data. Although what happens when the system is nearly full, does rebuild performance deteriorate as free space gets low?

One thing I will have to experience myself is the promise of linear performance. In traditional NAS, performance increases rapidly as disks are added, but after a certain point performance becomes nearly flat. Each new disk shows diminishing returns in performance. Isilon promises linear performance gains, that is as more disks are added, performance increases by that much. I am not doubting this, but this is something I will have to test myself and verify.

In conclusion, I was very impressed with the Isilon scale out NAS. Now if I only had the money :)

Saturday, June 26, 2010

Avamar - Viewing Large Files

Avamar and viewing large file folders:

Avamar GUI has a problem working with folders with more than 50000 files. When trying to do a backup or restore, Avamar GUI can only show up to 50000 files. To verify this limitation, I created a perl script to create 60,000 files. The perl script is as follows:

for ($a =0; $a <>$a";
close FILE;
}

I ran this script in a test folder, and tried to backup this folder. When I clicked on this folder to select files, the AVamar GUI became unresponsive. On the client side I noticed the avagent.exe process jump up to 25% processing usage and memory usage also increased. After 30 seconds I got a pop up in the Avamar GUI saying that "The client was unable to complete this browse request within the allotted limit of 10 seconds". It gave me two options: Increase the time limit, or view partial results. I clicked on "view partial results" and it only showed me the first 20,000 files.



I clicked on the top folder which selected all the files inside the folder, and then I started the backup. After the backup completed the log reported that it backed up 60,001 files (it also backed up the perl script).

Backup #140 timestamp 2010-06-16 14:32:09, 60,001 files, 3 folders, 10.53 MB (60,001 files, 274 bytes, 0.00% new)

When trying to restore, I ran into the same issue. If I browse for the file by date, and go to that folder I get a popup message saying "Backup list truncated. The console truncated the backup listing because the maximum number of entries was exceeded. The preference max_backup_nodes in mcclient.xml specifies the limit. If the data of interest is not listed, refresh the view and then reselect your backup. This will allow you to select other folders. Selecting the same folder or any other folder that exceeds the limit will cause the truncation to occur again."

I clicked OK, and it only showed me the first 50,000 files. I didn't count, but I am assuming it did. I followed the EMC solution esg110422 which wanted me to edit the MAX_BACKUP_NODES to a higher value from the one listed. The XML file mcclient.xml is located under c:\prograam files\var\mc\gui_data\prefs on the computer where the Avamar administrator GUI (console) is installed. I increased the value to 500,000 and rebooted by desktop according to the instructions. That did not make any difference, and I still saw only 50,000 file entries. I tried refreshing and reselecting, but that did not help, I still could not see files beyond the first 50,000 entries. this causes a huge issue because you have to restore the whole folder to get that file back, something I did not want to do.

I contacted support, and was told that 50,000 is the limit and cannot be changed. I can however use command line to restore the files. I tried mccli from the Avamar utility node and I was able to restore the file by providing the full path and file name.

So, this is how things are right now. If you have a folder with more than 50,000 files, listing the files in the GUI is not possible which makes is impossible to restore any file which is not in that list. You can however use the command line to do a restore. Both mccli.exe and avtar.exe can be used to do the restore.

Avamar System State Backup

NOTE: This applies to Avamar 5.0 with no service pack installed. Windows 2008 system state backups have changed in Avamar 5.0 Service Pack 2.

Avamar System State Backups:

There appears to be some confusion related to Windows systems state backups with Avamar. Most new comers to Avamar are used to the backup software handling the system state backup natively, that is without the need for scripting or storing the backups locally on the client. Avamar handles the system state backups differently as compared to other backup applications. For Windows 2000 and Windows 2003 clients, Avamar utilizes the NTBackup utility to create a system state backup which is stored locally on the client. Starting with Windows 2008, Avamar is capable of making a backup of the system state using the VSS plugin and storing it on the Avamar node(s) itself. However, this method is no longer supported by EMC, and EMC recommends to script the backup using the Windows Backup utility and direct it to a shared folder.

Following are the systems state backup requirements, observations and best practices for each Windows OS.

Windows 2000 and Windows 2003:

Windows 2000 is not officially supported by Avamar 5.0+, so the documentation may not cover it. However, it's system state backup procedure is similar to Windows 2003. System state backups on Windows 2000 are around 300 - 500 MB in size, while in Windows 2003, they can range from 800 MB to 3 GB. If there is enough space in the local C: drive, then use the option in the dataset to create a system state backup. When a client backup is initiated, the Avamar agent calls the NTBackup program and creates a system state backup locally. The files is called systemstate.bkf. On Windows 2003, this procedure uses VSS which is known to cause issues with SQL in certain situations, however in my experience this process has been mostly problem free.

Inside the Windows File System dataset options there is an additional setting called "Backup the System Profile" and this is set to disabled by default. I believe this is what is causing the most confusion. This option only works with Windows 2003 servers and requires that the Avamar Backup System State agent (refered to as AvamarBackupSystemState-windows-x86-5.0.100-409.msi) be installed on the client. Installing this, and enabling the system profile option creates a recovery profile for the client. This profile contains all the configuration information that would be necessary to perform a bare metal restore of the server. The profile is called a HomeBase profile and is often referred to as HomeBase Lite since it only works with Windows 2003 and only offers limited capabilities, such as it will not install the OS or the service pack as is common with other bare metal restore applications. This option is not necessary to restore the client to similar hardware.

If a complete restore of the system is necessary, then install the operating system and the service pack, restore the system state backup file to another client and follow the procedure outlined in the documentation to restore the client. During my testing I only had the Windows system state and the server data, so I had to follow the directions from the version 4.1 Administrator Guide. After following the instructions, and a couple of reboots later, the system came back up.

If the client does not enough space to create a system state backup locally on the C drive, then the backup can be redirected to another local drive, or be scripted to send it to a remote share. To send the system state to a different local drive, include this parameter in the dataset definition that is applied to the client: systemstatefile=d:\avamarfolder. The redirect location can be any directory on the drive. However, if you are doing this, then it is a good idea to document where the system state file resides so it can be located when it is needed.

If the backup needs to be scripted, then the Avamar dataset can be instructed to run a script before the backup begins. This option is found under advanced dataset options and is called pre-script. The script needs to be placed inside c:\program files\avs\etc\scripts and can only be a .bat, .vbs or .js file. Be sure to uncheck the option underneath which says "Abort backup if script fails". If the script fails to run, there is no sense in not backing up the data. When manually creating a system state backup, the system state and either the c:\windows\windows32 (for Windows 2003) or the c:\winnt\system32 (for Windows 2000) need to specified. The script I use is as follows:

Windows 2000 Script:

@echo off
ntbackup backup "c:\WINDOWS\system32" /m normal /f "\\servername\share\%COMPUTERNAME%.bkf"
ntbackup backup systemstate /M normal /f "\\servername\share\%COMPUTERNAME%.bkf" /a

Windows 2003 Script:

@echo off
ntbackup backup "c:\WINDOWS\system32" /m normal /f "\\servername\share\%COMPUTERNAME%.bkf"
ntbackup backup systemstate /M normal /f "\\servername\share\%COMPUTERNAME%.bkf" /a

These scripts require that a share already exist, and it should have share permissions set to allow Everyone to write. This is because Avamar agent runs as System user and this is the only option I have found to allow the System user to write to remote share. I haven't tried running the Avamar client agent as a specific user, so I don't know if that will allow specific share level permissions. I know this is a security hole, but I don't have a work around for this.

One important thing to do when redirecting system state backups to a remote share is to create a avtar.cmd file in c:\program files\avs\var and put the following parameter in it: --backupsystem=false. This is important because if by mistake a different dataset is selected for a client and it does not specify that the system state should be directed to a remote share, the system state backup will be created locally. This parameter blocks the system state backup to made using the Avamar agent. I do this because I only enable redirecting system state backups to a remote share for clients which have very small amount of free space available locally, and if by accident the drives fill up, the client could crash.

When the client backup runs, it calls the system state backup script which creates the backup to a remote share. This remote share can then be backed up by installing the Avamar agent on that client and backing up that client last.

One thing to note about the Windows dataset is that it includes a SystemState/ option under Source Data. This makes some people think that the system state will be backed up for Windows 2000 and 2003 servers. This is not true, and this option is there to backup the system state for Windows 2008 servers. Although, even then this option is not used.

Windows 2008:

Windows 2008 system state backups cannot be made with Avamar natively. EMC has a document on Powerlink called "Windows Server 2008 Offline System Recovery Using Windows Server Backup with Avamar" which describes how to configure system state backups for Windows 2008. Do not use the VSS Plugin to backup the system state, even though it appears to be the obvious choice, or the client logs might indicate so if you enabled the "Backup the System State" option. If you enable it, you will see that a successful system state backup was made, but this backup cannot be properly restored. If you try to restore from the VSS plugin system state backup, it will appear to restore data, but it will never complete, it will just get stuck at 99%. EMC says that this is a Microsoft issue due to recent changes they have made to Windows 2008. Thus, the process outlined in the document mentioned earler has to be followed in order to make a successful restorable system state backup.

To enable the system state backup, the Windows Backup utility has to be installed on Windows 2008. This utility is not installed by default, so go to Manage, Features and enable it. Once it is installed, system state backups can be made. I use a script to start the backup and redirect it to a remote share. The script is:

@echo off
wbadmin start backup -backupTarget:\\remoteserver\share$ -allCritical -quiet

Then, go to the dataset being used with this server and specify the script under pre-script option.

Windows 2008 system state backups take much longer than Windows 2003 backups and at least 10GB in size because they backup more data. Whatever server the system state backups are going to should be backed up last, otherwise system state backups will be at least a day old.

The script provided in the documentation has a line to delete the system state backup after it has been backed up, but I don't do that.

Avamar Retention

Retention.

Ther are two types of retention in Avamar, basic and advanced. Basic retention policy can be specified in three ways:

Retention Period: Allows you to define how long a backup should be maintained. Length can be defined in days, weeks, months or years. The retention period is calculated from the start time. So if a job started on 3/31/2010 11:00 PM but ends on 4/1/2010 5:00 AM, the retention period will use 3/31/2010 to calculate when to expire the job.

End Date: Expire jobs on this particular date. This is not a moving backup window, and all jobs that have this retention policy will expire on the defined date. This is good for one time backup jobs where a system may need to be backed up, but after a certain date its backups are no longer necessary.

No end data: Backups never expire.

The second type of retention in Avamar, advanced, allows you to define how long to keep backups based on how they are tagged. Backup jobs can be tagged as daily, weekly, monthly and yearly. Every backup job is a daily job and is marked with a "D". If a backup was made on a Sunday, it is tagged with a "W" to signify it is a weekly. The very first backup job of the month is marked as an "M" which stands for monthly. The very first backup job of the year is marked with a "Y" for yearly. Tags can be combined for backup jobs to create layers of retention. The first backup job of any system is tagged as "DWMY". Jobs made on a Sunday are tagged "DW", while the first backup of the month is marked "DM" if it is not on a Sunday, which is then tagged with a "DWM".

Retention periods for each tag can be defined in days, weeks, months and years. The job expires when it is older than the time period defined in the retention policy. For example, if advanced retention policy is set to D: 20 days, Weekly: 40 days, Monthly: 100 days and Yearly: 365 days, and a job is tagged as DWMY, the D tag drops off after 20 days, W tag after 40 days, and Monthly tag after 100 days. If you look at the job after 100 days, it will have only one tag, Y. After 365 days, the job will expire.

According to the best practices guide for Avamar 5.0, weekly backups are equal to three daily backups, and a monthly backup is equal to six daily backups. This helps conserve space by reducing the amount of data that is kept on the system. But, this also reduces the amount of days you can go back to recover data from.

One important thing to note about advanced retention is that it does not apply to on demand and client initiated backups.
On demand jobs have an option to specify basic retention just before initiating the job. Client initiated backups use the End Use On Demand Retention. Both jobs get tagged with an "N" which stands for not tagged.

These tags can be changed by going to a job under Backup Management and selecting what tags you want to apply to a job. A job marked weekly, can be changed to daily, monthly, yearly, or a combination of all four tags. When jobs run as scheduled jobs, they are automatically tagged. If only basic retention is enabled, the jobs are still tagged, but only the expiration date is used to expire the job.

The best practice for retention is to use advanced retention since it saves data. Another best practice is to set minimum retention to 14 days for all jobs. This is because retention can only be specified in time periods. There is no setting in Avamar to not delete the very last backup job, or only delete a job until a new backup becomes available. If there is a problem with backing up a client, and retention is set to 7 days, it is likely that the failure can go unnoticed and all backups will be deleted. Setting minimum retention to 14 days buys some time for the admin to check if a job failed and if so why.

Avamar Restoring ACLs only (File Permissions)

Restoring ACL only for a Windows host is a bit tricky with Avamar. There is no explicit option in the GUI that only restores the ACL. If you are going through the GUI, then you can only restore the files and the Acess Control List together. I searched through the Administrator guide and was unable to find anything related to restoring ACL only.

I looked through the avtar.exe command line and found a parameter that can be used to specify that only ACL be restored. The parameter is --restore-acls-only=true which is specified in the avtar.cmd file. The avtar.cmd files is located in c:\program files\avs\var\ if the default installation location was selected during install. However, when I tried to do a restore of several files and folders I saw these errors in the job log:

WARN: <0000> ntsecurity error:Unable to reset security on pre-existing directory "%s" during restore "C:\Documents and Settings\srvsandtm\Desktop", LastError=87 (code 87: The parameter is incorrect)

I looked through the avtar.cmd command line again and found another related parameter called --existing-dir-aclrestore=true. After much experimentation I found out that this parameter restores the files inside the folders, and the security of the folder itself. If the files inside had their security modified, but they exist at the time of the restore, then only the ACL of the folder is restored.

I still got the same error stated above, but it did not have any effect on restoring the ACL.

So in summary if you want to restore folder ACL and file ACL (security) then use --restore-acls-only=true. Only those folders and files that exist will have their ACL restored. If a file or folder does not exist, then it is not restored. If you want only the folder ACL restored but don't want the file ACL touched, then use --existing-dir-aclrestore=true. During a regular restore, that is is with no parameters, if a folder exists then its ACL is not restored.