CSV PARSER

Diubidone · Nov 3, 2009

Hi,

I need to parse a big CSV file that has some issues

It contains list of users of a network as it follows:

domain\firstname.secondname,number_of_connections_with_that_user,ip_used_to_connect

Example:

college\john.doe,5,192.168.1.1
college\mario.rossi,1,192.168.1.3
college\john.doe,5,192.168.1.2

and so go on...

So as you can see I have John Doe that connected 5 times so I would need an output like this

college\john.doe,5,192.168.1.2,192.168.1.1,....

Can anyone help me?

PS we're talking about a csv file that has 5000 rows.

I did a sort -t , file.csv and I have all the data contigous now on my stdout. I need to do the junction based on first field which is in my example domain\username, domain beeing always the same...

Anyone can help? Don't know much about awk or sed, I tried to use em but have little time to solve the issue...

feherke · Nov 3, 2009

Hi

Note that I considered that would be correct to sum the connection counts :

Code:

sort -t , file.csv | \
awk -F, 'l!=$1{if(c)print l","c i;l=$1;c=0;i=""}{c+=$2;i=i","$3}END{print l","c i}'

Tested with [tt]gawk[/tt] and [tt]mawk[/tt].

Feherke.

http://free.rootshell.be/~feherke/

Diubidone · Nov 3, 2009

Hi,

Wow thx for ur reply...two things:

1. The connections are already summoned in the original csv file as I said in first thread:

Here's a piece of the csv after I did the sort -t s

Domain\X901K631,6,10.144.190.58
Domain\X901K631,6,10.144.190.59
Domain\X901K631,6,10.144.190.60
Domain\X901K631,6,10.144.190.62
Domain\X901K631,6,10.144.190.69
Domain\X901K631,6,10.144.190.73

as you can see I have all 6 ips that connected to user what I'n need is something like:

Domain\X901K631,6,10.144.190.58,10.144.190.59,10.144.190.60,...

way cool would be

Domain\X901K631,6,10.144.190.58
10.144.190.59
10.144.190.60

I know awk controls formating too...but that's way outta my league for now...although I'm learning...

feherke · Nov 3, 2009

Hi

Like this ?

Code:

awk -F, 'l==$1{print"\t",$3;next}{l=$1}1'

Feherke.

http://free.rootshell.be/~feherke/

Diubidone · Nov 3, 2009

mmmh I this command:

sort -t , 25_26_10_2009.csv | awk -F, 'l==$1{print"\t",$3;next}{l=$1}1'

I have no output whatsoever

is \t the variable you're using to select rows? Shoud I replace it?

Thanks for your help...

feherke · Nov 3, 2009

Hi

Diubidone said:
I have no output whatsoever

Then try to call [tt]print[/tt] explicitly :

Code:

awk -F, 'l==$1{print"\t",$3;next}{l=$1[highlight];print[/highlight]}'

Diubidone said:
is \t the variable you're using to select rows?

man awk | String Constants said:
Within strings, certain escape sequences are recognized, as in C. These are:
[gray](...)[/gray]
\t horizontal tab.

Feherke.

http://free.rootshell.be/~feherke/

Diubidone · Nov 4, 2009

Your script works fine on the example I gave to you but not on my original csv file.

So there's a problem with the original csv file... It's really long, we're talking about 5000 rows...could it be the reason?

feherke · Nov 4, 2009

Hi

No. Size not matters here.

By the way, while you are not sorting by a field, is pointless to specify -t for [tt]sort[/tt].

I see no other problem/strangeness.

Are you using the same [tt]awk[/tt] interpreter for both the sample and the original file ?

Feherke.

http://free.rootshell.be/~feherke/

Diubidone · Nov 4, 2009

yes the system is the same...

I'll try to work on my csv file...

Thank you for your help...

Diubidone · Nov 4, 2009

Ok the problem was the csv file...

I copied the content and pasted in another file.csv and the thing worked like a charm...

I know I'm busting your nuts but could you explain me the code?

feherke · Nov 4, 2009

Hi

Code:

awk -F, '         [gray]# use "," as FS [sup][small](*)[/small][/sup][/gray]
l==$1 {           [gray]# if variable l [sup][small](**)[/small][/sup] equals field 1 do[/gray]
  print "\t",$3   [gray]# output tab character, OFS [sup][small](***)[/small][/sup] and field 3[/gray]
  next            [gray]# continue with next record, skipping the rest of code[/gray]
}
{                 [gray]# always do[/gray]
  l=$1            [gray]# set variable l to field 1[/gray]
  print           [gray]# print current record[/gray]
}
'

(*) [tt]FS[/tt] = field separator
(**) l used to come from "last" in my codes
(***) [tt]OFS[/tt] = output field separator

The above is logically the same as the following, but should be abit faster :

Code:

awk -F, '         [gray]# use "," as FS[/gray]
l==$1 {           [gray]# if variable l equals field 1 do[/gray]
  print "\t",$3   [gray]# output tab character, OFS and field 3[/gray]
}
l!=$1 {           [gray]# if variable l not equals field 1 do[/gray]
  l=$1            [gray]# set variable l to field 1[/gray]
  print           [gray]# print current record[/gray]
}
'

Feherke.

http://free.rootshell.be/~feherke/

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

CSV PARSER

Diubidone

IS-IT--Management

feherke

Programmer

Diubidone

IS-IT--Management

feherke

Programmer

Diubidone

IS-IT--Management

feherke

Programmer

Diubidone

IS-IT--Management

feherke

Programmer

Diubidone

IS-IT--Management

Diubidone

IS-IT--Management

feherke

Programmer

Similar threads

Part and Inventory Search

Sponsor