Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations SkipVought on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Mixed-up sorting

Status
Not open for further replies.

Daedelus

Technical User
Aug 14, 2002
70
0
0
US
When I sort files on my workstation, I get a sorting that puts the following two entries in the indicated order:
M22759/16-2-9
M22759/16-22-9
There is a remote Linux cluster that is being installed on which they sort as
M22759/16-22-9
M22759/16-2-9

I figured it was collating-order difference, and sure enough when I checked, the default locale on my workstation is en_US, while on the Linux cluster it is en_US.UTF-8.

Unfortunately, when I change the locales on either machine to the other, they both still sort in the same fashion.

The only other difference I have found so far is that the cluster default login shell is /usr/bin/ksh, where as on my workstation it is /bin/ksh. But changing shells doesn't make a difference either.

Does anyone have another idea as to why the sorting order would be different?
 
What happens when you try this ?
Code:
LC_COLLATE=C_C.C sort /path/to/yourfile

Hope This Help
PH.
 
Also, how are you sorting the files? Simply using ls, or are you piping it through sort?

Annihilannic.
 
Annihilannic: Forgive me for not being clear in my original post. PHV has it right. I am trying to sort the contents of files, not the filenames.

PHV: That works: both the Linux cluster and my workstation sort in the same fashion (-2-9 before -22-9) for the C_C.C locale. Thanks!

That solves my immediate problem, but can anyone explain to me why the two would sort differently for the same en_US locale?
 
Presumably a bug! Either a difference in the locale definitions or a bug in the way sort is handling the locale information I would guess.

Annihilannic.
 
I did some checking. The Linux cluster does not have C_C.C locale, so I assume when I tried it, it defaulted to the C locale. But that worked fine.

The AIX operating system on my workstation has only a few locales (C, POSIX, and 4 variations of en_US), all of which have the same collation order for ordinary characters:

!"#$%&'()*+,-./
0123456789
:;<=>?@
ABCDEFGHIJKLMNOPQRSTUVWXYZ
[\]^_`
abcdefghijklmnopqrstuvwxyz
{|}~

The Linux cluster, which has the C, POSIX, and many en_... locales, has uses 3 different collation orders for ordinary characters (I didn't check for differences in the non-graphic or special characters):

C and POSIX use the same order as above.
en_CA and en_CA.utf8 use the collation oder:

_-,;:!?/.`^~'&quot;()[]{}@$*\&#%+<=>|
0123456789
AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz

All the remaining en_... locales use the order:

`^~<=>|_-,;:!?/.'&quot;()[]{}@$*\&#%+
0123456789
aAbBcCdDeEfFgGhHiIjJkKlLmMnNoOpPqQrRsStTuUvVwWxXyYzZ

I've tried searching for a site that will tell me what the standard definition of en_US ought to be, so I can figure out which of AIX or Linux is set up wrong, but I've not been successful. Based on the little I have found, it looks like the AIX instl is the flawed one, but I'm not sure.

Since the primary reason I need the file sorted is so I can use &quot;look&quot; for fast lookups, and &quot;look&quot; does not pay attention to locale, but expects the ASCII standard collation order, I will always be sorting using the C or POSIX locales.

Thanks for your interest!
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top