Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Why is sort -k not working all the time?

Status
Not open for further replies.

kristo5747

Programmer
Mar 16, 2011
41
US
I have a script that puts a list of files in two separate arrays:

First, I get a file list from a ZIP file and fill `FIRST_Array()` with it. Second, I get a file list from a control file within a ZIP file and fill `SECOND_Array()` with it

Code:
                while read length date time filename 
                do
                        FIRST_Array+=( "$filename" )
                        echo "$filename" >> FIRST.report.out
                done < <(/usr/bin/unzip -qql AAA.ZIP |sort -k11 -t~)
Third, I compare both array like so:

Code:
    diff -q <(printf "%s\n" "${FIRST_Array[@]}") <(printf "%s\n" "${SECOND_Array[@]}") |wc -l
I can tell that `Diff` fails because I output each array to files: `FIRST.report.out` and `SECOND.report.out` are simply not sorted properly.

1) FIRST.report.out (what's inside the ZIP file)


Code:
JGS-Memphis~AT1~Pre-Test~X-BanhT~JGMDTV387~6~P~1100~HR24-500~033072053326~20120808~240914.XML
JGS-Memphis~PRE~DTV_PREP~X-GuinE~JGMDTV069~6~P~1100~H24-700~033081107519~20120808~240914.XML
JGS-Memphis~PRE~DTV_PREP~X-MooreBe~JGM98745~40~P~1100~H21-200~029264526103~20120808~240914.XML
JGS-Memphis~FUN~Pre-Test~X-RossA~jgmdtv168~2~P~1100~H21-200~029415655926~20120808~240914.XML
2) SECOND.report.out (what's inside the ZIP's control file)

Code:
JGS-Memphis~AT1~Pre-Test~X-BanhT~JGMDTV387~6~P~1100~HR24-500~033072053326~20120808~240914.XML
JGS-Memphis~FUN~Pre-Test~X-RossA~jgmdtv168~2~P~1100~H21-200~029415655926~20120808~240914.XML
JGS-Memphis~PRE~DTV_PREP~X-GuinE~JGMDTV069~6~P~1100~H24-700~033081107519~20120808~240914.XML
JGS-Memphis~PRE~DTV_PREP~X-MooreBe~JGM98745~40~P~1100~H21-200~029264526103~20120808~240914.XML
Using sort -k11 -t~ made sense since ~ is the delimiter for the file's date field (11th position). But it is not working consistently.

The sort is worse when my script processes bigger ZIP files. Why is sort -k not working all the time? How can I sort both arrays?
 
Why not simply this ?
done < <(/usr/bin/unzip -qql AAA.ZIP | sort)

FYI, your sort -k11 -t~ is working as expected, ie sort on the 11 first characters ...

Hope This Helps, PH.
FAQ219-2884
FAQ181-2886
 
I believe sort -k11 should actually reference the 11th field, but it means "11th field up to the end of line". sort -k 11,11 would limit the sort key to the 11th field only. I think your main problem is that the 11th field onwards contains non-unique data (as shown even by your small set of example data) so the sort order is effectively undefined.

But PHV's first suggestion still stands; if the contents are expected to be identical it shouldn't matter what field you sort on as long as the sort key is unique enough to guarantee an identical sort order, so you may as well sort on the whole line.

Annihilannic
[small]tgmlify - code syntax highlighting for your tek-tips posts[/small]
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top