Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

CSV PARSER

Status
Not open for further replies.

Diubidone

IS-IT--Management
Nov 3, 2009
13
IT
Hi,

I need to parse a big CSV file that has some issues

It contains list of users of a network as it follows:

domain\firstname.secondname,number_of_connections_with_that_user,ip_used_to_connect

Example:

college\john.doe,5,192.168.1.1
college\mario.rossi,1,192.168.1.3
college\john.doe,5,192.168.1.2

and so go on...

So as you can see I have John Doe that connected 5 times so I would need an output like this

college\john.doe,5,192.168.1.2,192.168.1.1,....

Can anyone help me?

PS we're talking about a csv file that has 5000 rows.

I did a sort -t , file.csv and I have all the data contigous now on my stdout. I need to do the junction based on first field which is in my example domain\username, domain beeing always the same...

Anyone can help? Don't know much about awk or sed, I tried to use em but have little time to solve the issue...
 
Hi

Note that I considered that would be correct to sum the connection counts :
Code:
sort -t , file.csv | \
awk -F, 'l!=$1{if(c)print l","c i;l=$1;c=0;i=""}{c+=$2;i=i","$3}END{print l","c i}'
Tested with [tt]gawk[/tt] and [tt]mawk[/tt].

Feherke.
 
Hi,

Wow thx for ur reply...two things:

1. The connections are already summoned in the original csv file as I said in first thread:

Here's a piece of the csv after I did the sort -t s

Domain\X901K631,6,10.144.190.58
Domain\X901K631,6,10.144.190.59
Domain\X901K631,6,10.144.190.60
Domain\X901K631,6,10.144.190.62
Domain\X901K631,6,10.144.190.69
Domain\X901K631,6,10.144.190.73

as you can see I have all 6 ips that connected to user what I'n need is something like:

Domain\X901K631,6,10.144.190.58,10.144.190.59,10.144.190.60,...

way cool would be

Domain\X901K631,6,10.144.190.58
10.144.190.59
10.144.190.60

I know awk controls formating too...but that's way outta my league for now...although I'm learning...

 
mmmh I this command:

sort -t , 25_26_10_2009.csv | awk -F, 'l==$1{print"\t",$3;next}{l=$1}1'

I have no output whatsoever

is \t the variable you're using to select rows? Shoud I replace it?

Thanks for your help...
 
Hi

Diubidone said:
I have no output whatsoever
Then try to call [tt]print[/tt] explicitly :
Code:
awk -F, 'l==$1{print"\t",$3;next}{l=$1[highlight];print[/highlight]}'
Diubidone said:
is \t the variable you're using to select rows?
man awk | String Constants said:
Within strings, certain escape sequences are recognized, as in C. These are:
[gray](...)[/gray]
\t horizontal tab.


Feherke.
 
Your script works fine on the example I gave to you but not on my original csv file.

So there's a problem with the original csv file... It's really long, we're talking about 5000 rows...could it be the reason?
 
Hi

No. Size not matters here.

By the way, while you are not sorting by a field, is pointless to specify -t for [tt]sort[/tt].

I see no other problem/strangeness.

Are you using the same [tt]awk[/tt] interpreter for both the sample and the original file ?

Feherke.
 
yes the system is the same...

I'll try to work on my csv file...

Thank you for your help...
 
Ok the problem was the csv file...

I copied the content and pasted in another file.csv and the thing worked like a charm...

I know I'm busting your nuts but could you explain me the code?
 
Hi

Code:
awk -F, '         [gray]# use "," as FS [sup][small](*)[/small][/sup][/gray]
l==$1 {           [gray]# if variable l [sup][small](**)[/small][/sup] equals field 1 do[/gray]
  print "\t",$3   [gray]# output tab character, OFS [sup][small](***)[/small][/sup] and field 3[/gray]
  next            [gray]# continue with next record, skipping the rest of code[/gray]
}
{                 [gray]# always do[/gray]
  l=$1            [gray]# set variable l to field 1[/gray]
  print           [gray]# print current record[/gray]
}
'
(*) [tt]FS[/tt] = field separator
(**) l used to come from "last" in my codes
(***) [tt]OFS[/tt] = output field separator

The above is logically the same as the following, but should be abit faster :
Code:
awk -F, '         [gray]# use "," as FS[/gray]
l==$1 {           [gray]# if variable l equals field 1 do[/gray]
  print "\t",$3   [gray]# output tab character, OFS and field 3[/gray]
}
l!=$1 {           [gray]# if variable l not equals field 1 do[/gray]
  l=$1            [gray]# set variable l to field 1[/gray]
  print           [gray]# print current record[/gray]
}
'

Feherke.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top