Hi All,
Looking for your thoughts on this scenario.
At our site, we have a volume containing ~20,000 user home folders totalling ~30 million files. The volume is on a Netapp Filer, which backs up via NDMP to a Tape Library with 4 LTO4 tape drives.
All other volumes on the NAS backup fine (and fast) with a throughput of 100-200GB/hr, but as soon as the backups reach the above volume, performance is dire.
We had a single subclient to begin with for the above volume, and when the job started, it would sit there for at least 12 hours, simply scanning the volume before even beginning to back it up. The throughput would often only be around 5GB/hr and rarely go above 50-60GB/hr obviously caused by the sheer number of small files it is handling.
The volume has 26 subfolders in it representing the surnames of the users. We've now setup a sub client per folder to give us a better idea of the breakdown of each folder in our reports. However it still takes 4-5 hours to scan many of the folders, and the throughput is still only a few GB/hr on average. Also for some reason the overall backup window for this volume is now longer since we split it into separate subclients per folder.
We are looking at implementing synthetic fulls, however whilst this will solve the *full* backups, it seems likely the incrementals are still going to go into a 2 day window because of the sheer number of files they still have to scan.
Does anyone else have problems backing up volumes with millions of files, and what steps have they taken to improve the situation.
I'm interested in any thoughts/comments/suggestions on this subject.
Looking for your thoughts on this scenario.
At our site, we have a volume containing ~20,000 user home folders totalling ~30 million files. The volume is on a Netapp Filer, which backs up via NDMP to a Tape Library with 4 LTO4 tape drives.
All other volumes on the NAS backup fine (and fast) with a throughput of 100-200GB/hr, but as soon as the backups reach the above volume, performance is dire.
We had a single subclient to begin with for the above volume, and when the job started, it would sit there for at least 12 hours, simply scanning the volume before even beginning to back it up. The throughput would often only be around 5GB/hr and rarely go above 50-60GB/hr obviously caused by the sheer number of small files it is handling.
The volume has 26 subfolders in it representing the surnames of the users. We've now setup a sub client per folder to give us a better idea of the breakdown of each folder in our reports. However it still takes 4-5 hours to scan many of the folders, and the throughput is still only a few GB/hr on average. Also for some reason the overall backup window for this volume is now longer since we split it into separate subclients per folder.
We are looking at implementing synthetic fulls, however whilst this will solve the *full* backups, it seems likely the incrementals are still going to go into a 2 day window because of the sheer number of files they still have to scan.
Does anyone else have problems backing up volumes with millions of files, and what steps have they taken to improve the situation.
I'm interested in any thoughts/comments/suggestions on this subject.