Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Creating & Referencing Multidimensional Arrays 1

Status
Not open for further replies.

qwerty987

Technical User
Jun 3, 2004
6
GB
Have the following data:
minute, second, fraction_sec, type, ID, data1, data2, data3

Want to count the number of entries (rows) in each second when, say, type==1 && data1==0 which occur in the file for each ID. Note the IDs are letters and numbers (hexadecimal) so the only way I could think of is to create a list of IDs spotted already so that when I print out the count array I just need to go through the list of IDs that were spotted and held in ID_list.

Tried to do the counting using a 3d array:
Code:
{if ($4 == 1 && $6 == 0) {
	for(k=1; k <= ID_no; k++) {
		if(ID_list[k] == $5) {
			ID_exists = 1
			break
			}
		}
	if(ID_exists == 0) {
		ID_list[ID_no] = $5
		ID_no++
		}
	ID_exists = 0
	if(count[$5,$1,$2] == "")
		count[$5,$1,$2] = 1
	else
		count[$5,$1,$2]++
	}
}
However when I try to print this count array out by cycling through the data with nested for loops I get nothing for the minutes 0 to 9 (for all seconds) and nothing for the seconds 0 to 9 for all other minutes. Data does exist for these times!
0: 0,0,0,0,0,0,0,0,...
1: 0,0,0,0,0,0,0,0,...
etc...
9: 0,0,0,0,0,0,0,0,...
10: 0,0,0,0,0,0,0,0,0,0,7,5,32,2,5,...
11: 0,0,0,0,0,0,0,0,0,0,8,15,4,6,8,...

This is what I have been using to print out the data:
Code:
for(x = 1; x < ID_no; x++) {
	printf("ID: %s\n", ID_list[x])
		for (y = 0; y <= 59; y++) {
			printf("%d: ",y)
			for (z = 0; z <= 59; z++) {
				printf( "%d,", count[ID_list[x],y,z])
				}
			printf("\n")
		}	
}
Is AWK doing something with the single digit minutes and seconds?
What has really stumped me is I can create test scripts for 1 ID which work for the single digit minutes and seconds!?!

Many Thanks for any help.
 
Sorry should have posted some example data. Note that this is just a tiny part of the data. Each file represents 1 hour so there is no hour field for time.

minute,second,fraction_sec,type,ID,error_check,data1,data2,data3,data4
0,50,17171598,2,,,17000,49,,3320
0,50,17177944,2,,,39000,58,,7024
0,50,17303326,1,4010000000,0,21375,130,0,
0,50,17349201,2,,,-1200,31,,40
0,50,17356881,2,,,39000,60,,7024
0,50,17386657,1,400553,0,,184,11,
0,50,17421413,2,,,,30,,404
0,50,17434773,2,,,,39,,200
0,50,17489448,2,,,48400,88,,2234
0,50,17601249,2,,,,104,,7443
0,50,17683473,2,,,,36,,4400
0,50,17699233,2,,,,28,,2000
0,50,17702005,1,e4185b,1,-99900,39,0,
0,50,17717797,2,,,12700,126,,2340
0,50,17763880,1,20aba2,1,-99900,58,0,
0,50,17803589,2,,,,126,,4021
0,50,17810757,2,,,,44,,4
0,50,17819675,1,40097c,1,,91,11,
0,50,17852273,1,4006c8,0,11750,130,0,
0,50,17905707,1,1b39fd,1,11750,133,0,
0,50,17923864,1,400553,0,5700,197,0,
0,50,17939429,1,70b815,1,13325,89,0,
 
What output are you expecting from your sample data? I got

ID: 400553
0: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,
1: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
.....
which seems correct, since both lines for id 400553 have 0 in field 1 and 50 in field 2. This gives a 2 in position 50 of line 0 and everything else 0.

The line

0,50,17303326,1,4010000000,0,21375,130,0,

was ignored because it is stored in row zero of the array and you start printing at row 1. Increment ID_no before storing ID_list if you want this line printed out.

CaKiwi
 
The output is correct but I have a problem when I process 100Mb files of this data. I can create a test file which works perfectly for the single digit minutes and seconds (see below). But when I come to process the larger files I get nothing but strings of zeros for the minutes 0-9 and seconds 0-9 for all other minutes. Have I made some error in how these entries in the array are being referenced (either in writing the data or reading it back).

More test data:

minute,second,fraction_sec,type,ID,error_check,data1,data2,data3,data4
0,0,4342883,1,40095d,0,,,158,11
0,0,25507669,1,40095d,0,,13500,163,0
0,1,7382634,1,40095d,0,,13475,131,0
0,1,24847380,1,40095d,0,,13425,164,0
0,1,26479475,1,40095d,0,,12925,153,0
0,1,6351170,1,40095d,0,,12900,159,0
0,2,23973929,1,40095d,0,,,144,11
0,2,6603607,1,40095d,0,,12775,153,0
0,2,12607687,1,40095d,0,,12750,113,0
0,3,12675616,1,40095d,0,,12750,115,0
0,3,24755072,1,40095d,0,,12725,156,0
0,3,8059772,1,40095d,0,,12525,120,0
0,10,6652163,1,40095d,0,,12475,117,0
0,10,17283577,1,40095d,0,,12475,166,0
0,10,25602835,1,40095d,0,,12450,167,0
0,11,18056461,1,40095d,0,,,160,11
0,11,18891953,1,40095d,0,,12325,166,0
0,12,19275572,1,40095d,0,,,130,11
0,20,12733831,1,40095d,0,,12250,170,0
0,21,23393896,1,40095d,0,,,122,11
0,22,4129126,1,40095d,0,,12175,171,0
0,22,10818334,1,40095d,0,,12075,140,0
0,23,6720528,1,40095d,0,,12000,103,0
0,45,11619514,1,40095d,0,,11975,155,0
0,45,21553786,1,40095d,0,,11975,161,0
0,57,12764370,1,40095d,0,,11900,156,0
0,57,13976332,1,40095d,0,,11900,160,0
0,59,7613763,1,40095d,0,,11850,117,0
0,59,14058647,1,40095d,0,,11850,147,0
1,0,9784745,1,40095d,0,,,115,11
1,0,11364027,1,40095d,0,,,151,11
1,0,13087915,1,40095d,0,,11750,118,0
1,1,21691795,1,40095d,0,,11675,142,0
1,1,30742045,1,40095d,0,,11550,138,0
1,2,20863575,1,40095d,0,,,111,11
1,2,21675953,1,40095d,0,,11450,132,0
1,8,9797070,1,40095d,0,,11350,117,0
1,8,576598,1,40095d,0,,11250,131,0
1,9,22922284,1,40095d,0,,11200,138,0
1,9,27922895,1,40095d,0,,11175,135,0
1,10,12416520,1,40095d,0,,11150,103,0
1,10,19495784,1,40095d,0,,11150,112,0
1,11,19537531,1,40095d,0,,11150,127,0
1,12,1376349,1,40095d,0,,11100,109,0
1,13,199762,1,40095d,0,,,123,11
1,14,219293,1,40095d,0,,10575,128,0
1,15,5016538,1,40095d,0,,10550,131,0
1,16,17056730,1,40095d,0,,10550,114,0
1,17,20004379,1,40095d,0,,10400,84,0
1,18,16616507,1,40095d,0,,9550,115,0
 
Perhaps storing a such large amount of data in arrays is causing a problem for awk. You could try splitting the file into 60 files, 1 for each minute and run them separately.

CaKiwi
 
Adding 99 to the minute and second as they are writen to the array appears to fix the problem.
I think awk was interpreting the single digits being read in as strings. When I was later trying to reference the array I was cycling through using numbers.
It returns a 0 since there was nothing in the entry in the array referenced by a number.

The new working code:
Code:
	if(count[$5,$1 + 99,$2 + 99] == "")
		count[$5,$1 + 99,$2 + 99] = 1
	else
		msgcount[$5,$1 + 99,$2 + 99]++

I then just need to reference the array with 99 added and I just take this off again before displaying.

I imagine I could fix this in another way by not adding 99 when writing (so indices are strings) and forcing the reading as a string (rather than a number).

If any one has a "prettier" way of doing this...
 
My guess is that the "single-digit" fields are padded with a leading space character. This would mean that awk treats them as strings. To fix, either change the field separator to [tt]FS=" *, *"[/tt] or use the int function, eg. [tt]$2=int($2)[/tt]
 
Tried the $2=int($2) suggestion and it appears to solve the problem also.

Many Thanks for the "cleaner" solution!.

 
The "standard" way to force awk to coerce to numeric is to add 0, like this:
if(count[$5,$1+0,$2+0] == "")

Hope This Helps, PH.
Want to get great answers to your Tek-Tips questions? Have a look at FAQ219-2884 or FAQ222-2244
 
Similar to what I found by adding 99 and then taking it away again.

Thanks for all the help
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top