Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Backing up Volumes with Millions of files - Suggestions?

Status
Not open for further replies.

markey164

Technical User
Apr 22, 2008
26
GB
Hi All,

Looking for your thoughts on this scenario.

At our site, we have a volume containing ~20,000 user home folders totalling ~30 million files. The volume is on a Netapp Filer, which backs up via NDMP to a Tape Library with 4 LTO4 tape drives.

All other volumes on the NAS backup fine (and fast) with a throughput of 100-200GB/hr, but as soon as the backups reach the above volume, performance is dire.

We had a single subclient to begin with for the above volume, and when the job started, it would sit there for at least 12 hours, simply scanning the volume before even beginning to back it up. The throughput would often only be around 5GB/hr and rarely go above 50-60GB/hr obviously caused by the sheer number of small files it is handling.

The volume has 26 subfolders in it representing the surnames of the users. We've now setup a sub client per folder to give us a better idea of the breakdown of each folder in our reports. However it still takes 4-5 hours to scan many of the folders, and the throughput is still only a few GB/hr on average. Also for some reason the overall backup window for this volume is now longer since we split it into separate subclients per folder.

We are looking at implementing synthetic fulls, however whilst this will solve the *full* backups, it seems likely the incrementals are still going to go into a 2 day window because of the sheer number of files they still have to scan.

Does anyone else have problems backing up volumes with millions of files, and what steps have they taken to improve the situation.

I'm interested in any thoughts/comments/suggestions on this subject.
 
Sorry for the delayed response guys, been on holidays :eek:)

@ Cabraun - Yes we have considered synthetic fulls. However, the scan phase itself seems to be 50% of the problem. Even if we move to synthetic fulls, it will still have to scan 30 million files regardless of whether the job is an incremental or full, so there is still a lengthy delay present. Whilst synthetics will help somewhat, i'm not sure this is the optimum solution.

Interesting your incremental only takes 20 minutes for 16million images. I wonder if the difference here is something to do with it being NDMP in our case, rather than a regular server backup?

@Psy053

* No i havn't run any performance testing. What would you advise in this scenario with a Netapp Filer?

* No AV software involved. This scenario is backing up a Netapp Filer. It runs a proprietary Ontap OS, and doesn't have any AV software on it.

@ standiferl - Hmm searching 'direct to disk' on the Commvault books online only returns 2 results, neither of which are relevant. I've heard of the feature, but can't seem to find any documentation on it. Is it perhaps known by another name? I know Commvault do change the names of their features from time to time.

In answer to your query on the number of disks in the volume in question, its a 7.25 TB volume, but i'm not sure how many disks its using, other than saying its obviously several. I don't directly look after and maintain the netapp, and i'm not sure how i can determine how many disks it is using.

@ CraigMcGill - I've had a read through the Data Classification Enabler, but it only talks about Windows idata agents. Can you confirm for sure if you can use DCE with NDMP, as the documentation doesn't make any mention of NDMP either way.

@Calippo - Regarding Synthetic's, not sure if this will entirely solve our problem as per reply to Cabraun above. Regarding DCE, the documentation only talks about windows idata agents, so i'm not sure if you can use this with NDMP?

The image level iDA looks interesting. I've not used this before, but i'll look into it.

Thankyou for all the responses so far guys. Hopefully this discussion will help others in a similar situation ;o). We'll keep experimenting, and discussing with our consultants and report back any progress or solutions we find.
 
Sorry Markey

I didnt see any referance to a NAS client, my mistake.
With generic NDMP you can do Incr and Full.
If this NAS is a Celerra than rather than the PAX method using dump you could do VBB instead which is block-level and is supported with CommVault but thsi wont apply for you as your NAS is a NetApp.

I dont know of any other NDMP block-level backup supported via CommVault but perhaps NetApp have something simular. CommVault have excellant support on both EMC and NetApp.

regards

---------------------------------------
EMCTA & EMCIE - Backup & Recovery
Legato & Commvault Certified Specialist
MCSE
 
It looks like you have three options.

First option, persue with NDMP but you should consider creating qtrees inside /vol/data so that the files to be backed up are logically partitioned. This will be much faster, as the filer won't need to walk the filesystem to discover files to be sent to tape, it can simply use the inode table attached to a given qtree. Qtrees have the same appearance as regular directories for users, so they won't notice any difference or need to make any share configuration changes. However, you can't create qtrees "in place" for directories like /vol/data/groups. You will need to create the qtrees and then move the files into them.
Using qtrees you should see a marked reduction in the time for NDMP Phase I to complete.

Second option is that you could bring up a Windows or Unix iDataAgent and map those CIFS/NFS shares from the FS iDataAgent. With this method you can do a synthetic full backup policy (which you can with NDMP), then the only issue is the scan time which would be offset by the savings you gain in doing incremental backups.

I recently setup a Windows FS iDA to backup a Celerra. The NAS has 3TB of data and ten CIFS shares, 1GiG Ethernet. We configured a subclient for each share and defined synthetic full backups. The reason we did this rather than NDMP was because we wanted to single instance the backups to disk to the SIS enabled storage policy and also have the option of CI support. This works great by the way and a lot of other NAS users do this instead of NDMP.

Third option is that you may do SnapMirror to Tape, this is an nice NetApp solution to the problem but its not integrated so it would seem with the CommVault NDMP client.

Thats the end of my MindMeld.

---------------------------------------
EMCTA & EMCIE - Backup & Recovery
Legato & Commvault Certified Specialist
MCSE
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top