Dec 11, 2004 #1 rkdash Technical User Jul 15, 2004 14 US Hi, I have a large ascii (3-column ) file ( > 2 Gb). So I want to resample it to select, for example, every 5th line. How to do this ? Thanks
Hi, I have a large ascii (3-column ) file ( > 2 Gb). So I want to resample it to select, for example, every 5th line. How to do this ? Thanks
Dec 11, 2004 #2 futurelet Programmer Mar 3, 2004 539 US [tt] awk '!(NR%5)' file [/tt] Upvote 0 Downvote
Dec 11, 2004 #3 olded Programmer Oct 27, 1998 1,065 US I know this is an awk forum, but I'll bet the sed version is faster on a 2-gig file: sed -n 'n;n;n;n;p' file Upvote 0 Downvote
I know this is an awk forum, but I'll bet the sed version is faster on a 2-gig file: sed -n 'n;n;n;n;p' file
Dec 11, 2004 Thread starter #4 rkdash Technical User Jul 15, 2004 14 US Thanks futurelet and olded. I tried with: sed -n '1~5p' infile > outfile and it worked. Upvote 0 Downvote
Dec 11, 2004 #5 futurelet Programmer Mar 3, 2004 539 US > I know this is an awk forum, but I'll bet the sed version > is faster on a 2-gig file: Why? Upvote 0 Downvote
Dec 12, 2004 #6 olded Programmer Oct 27, 1998 1,065 US >Why? Because sed has a smaller foot print than awk. Of course, that's just a guess on my part; you'd have to test it. If this problem was a one time only deal, I wouldn't worry about it. Upvote 0 Downvote
>Why? Because sed has a smaller foot print than awk. Of course, that's just a guess on my part; you'd have to test it. If this problem was a one time only deal, I wouldn't worry about it.
Dec 12, 2004 #7 futurelet Programmer Mar 3, 2004 539 US I'm not sure what you mean by footprint. On my system, sed.exe is 1,060,864 bytes; one version of mawk is 67,072 bytes (almost a 16:1 ratio). Awk spends some time parsing every line into fields, but that time is probably negligible compare to the time needed to read and write the line. Upvote 0 Downvote
I'm not sure what you mean by footprint. On my system, sed.exe is 1,060,864 bytes; one version of mawk is 67,072 bytes (almost a 16:1 ratio). Awk spends some time parsing every line into fields, but that time is probably negligible compare to the time needed to read and write the line.
Dec 12, 2004 #8 olded Programmer Oct 27, 1998 1,065 US Hmmm... That's interesting. On my solaris 7 system, here are the sizes: 76408 Sep 1 1998 /bin/awk 104656 Feb 15 2002 /bin/nawk 24796 Feb 28 2002 /bin/sed As I said, it would be an interesting technical exercise to see which is faster. I wish I had the time .... How about that? Two guys are arguing about who's "tool" is smaller. -) -) Ed Upvote 0 Downvote
Hmmm... That's interesting. On my solaris 7 system, here are the sizes: 76408 Sep 1 1998 /bin/awk 104656 Feb 15 2002 /bin/nawk 24796 Feb 28 2002 /bin/sed As I said, it would be an interesting technical exercise to see which is faster. I wish I had the time .... How about that? Two guys are arguing about who's "tool" is smaller. -) -) Ed
Dec 12, 2004 #9 futurelet Programmer Mar 3, 2004 539 US Thanks for the info. I was wondering whether sed was really so complex that it had to be that huge. The version I use under DOS is GNU sed version 4.0.7 (DJGPP port 2003-09-09 (r1)) It must be grossly bloated due to linked-in libraries. Upvote 0 Downvote
Thanks for the info. I was wondering whether sed was really so complex that it had to be that huge. The version I use under DOS is GNU sed version 4.0.7 (DJGPP port 2003-09-09 (r1)) It must be grossly bloated due to linked-in libraries.