Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

rsync performance

Status
Not open for further replies.

jpadie

Technical User
Nov 24, 2003
10,094
FR
I am running ubuntu 11.4 live CD on a generic 2.5GHz Core 2 Duo with 4GB ram.

I am having some interesting issues with rsync on which I'd love some advice.

basically I am trying to clone a 2TB hard drive (that is completely full but for 10GB). The source drive is formatted with HFS+. the files are typically films and shows. between 500MB and 2GB in size. both drives are SMART verified.

I originally tried this via rsync on a windows machine (where the target drive will eventually live). the source and target drives are connected to the mobo via SATA 300 connections. they are identical drives spinning at 7200.

the target drive is formatted with NTFS (some files are too big for FAT32 and I need junction points so NTFS became necessary anyway).

I gave up with windows as only 500GB had been transferred after 24 hours.

I then moved to the linux solution I am currently using. first test was using dd to do a full disk clone using dd. this was fast - I completed the clone in about 6 hours and averaged 86MBs. This would have resulted in an HFS+ disk which would have been ok to use in windows but suboptimal. Both source and target were, of course, unmounted at the time.

However the target was not mountable in the end (missing superblock); and it also reported itself to be full afterwards. So something was going wrong with dd.

I then tried to use rsync from the mounted partitions. In the past, rsync'ing these drives on a mac (HFS+ -> HFS+) was fast. about seven hours. and subsequent rsyncs were a few minutes.

This time (HFS+ -> NTFS) the sync took about 24 hours. I then launched the process again more than two hours ago and it has not yet even finished the file comparison.

This is of concern as I intend to use these two drives in a wan sync environment where I expect about 1GB of change per week. Across ADSL this will be slow but if the file comparison takes more than an hour over directly connected drives, I don't stand a hope across the WAN.

So ... my questions are:

1. is this kind of behaviour expected?
2. is there an issue with ntfs whereby an rsync would take such a long time (I know that 8.3 filenames are created but surely that should not more than triple the transfer time)?
3. is there a better way of keeping these two drives in sync across a WAN (in fact it is iTunes libraries that reside on these drives, but iTunes API only works on entire files, no delta copy nor restart etc)

some info is pasted below, just in case it is helpful

Code:
top - 12:57:12 up 1 day,  3:50, 10 users,  load average: 0.17, 0.21, 0.22
Tasks: 153 total,   2 running, 149 sleeping,   1 stopped,   1 zombie
Cpu(s): 28.1%us, 33.1%sy,  0.0%ni, 35.9%id,  1.8%wa,  0.0%hi,  1.2%si,  0.0%st
Mem:   3085340k total,  3072040k used,    13300k free,   760568k buffers
Swap:        0k total,        0k used,        0k free,  1962848k cached
Code:
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND            
 5882 root      20   0  5244 1548  796 R   53  0.1 787:02.93 mount.ntfs         
 6904 root      20   0 48644 2660  416 S   33  0.1  32:34.33 rsync              
 6902 root      20   0 48696 6468  784 S   30  0.2  29:10.06 rsync
 
What was your rsync command-line? Does --size-only speed it up at all? Does -vv give you any clues about what is taking the time?

Annihilannic
[small]tgmlify - code syntax highlighting for your tek-tips posts[/small]
 
Thanks for posting

I cannot use size only as metadata may change in the file without affecting overall size.

I am very loathe to interrupt the current process to add verbosity to the rsync as that might have the effect of putting me back to square one on the transfer time.

The options were simply recursion this time around.

However I am not looking really for help with the rsync command but for information as to whether this behaviour is expected and/or whether there are known issues with syncing across filesystem boundaries (in particular to ntfs with all its hangups).

On a more general level if people have input to question 3 above it would be great to hear.
 
Yes, it is to be expected... the initial sync takes more time than any subsequent syncs...

also note, that if the NTFS partitioned drive uses anything other than a 4k cluster (block size), it will cause more writes and computations to write the files...



Ben
"If it works don't fix it! If it doesn't use a sledgehammer..."
How to ask a question, when posting them to a professional forum.
Only ask questions with yes/no answers if you want "yes" or "no"
 
Ben
this was not the first sync that is/was taking so long. my concern was that simply exchanging the file diff was taking a long time.

since then I have experimented more in a variety of OS's and have determined that the issue appears to be with crossing file system boundaries into ntfs. fat32 is quick, as is exFAT. ntfs is just slow. probably because of the ACL and 8.3 file requirements (which I could turn off I guess).

also, the drives had compression enabled. turning this off has marginally helped.

bottom line is that i have abandoned ntfs as it is just not compatible with my requirements
 
just a suggestion: maybe, you should switch the NTFS driver to NTFS-3G...

PS: Tuxera LTD. (creator of the NTFS-3G driver) cooperated with Microsoft in 2009, that allowed them to have access to the API, including to access to exFAT FS...



Ben
"If it works don't fix it! If it doesn't use a sledgehammer..."
How to ask a question, when posting them to a professional forum.
Only ask questions with yes/no answers if you want "yes" or "no"
 
Thanks. I was using the ntfs 3G driver in Linux. And native obviously in windoze. Poor speed issues in both which leads me to believe that crossing file formats is intrinsically slow. The problem manifests only when crossing fs boundaries.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top