Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations biv343 on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Parsing text

Status
Not open for further replies.

nelpalad

Technical User
Dec 7, 2004
7
US
Hi,

I have a problem. Im a new to perl, i want to make some simple perl script that parse text and save it to excel format or a text tab delimited. Here is the same text file.

101854 000 / 0 0 0 PR 0 ntserver-[1234] 192.168.0.9-[1234] de:6-19 kp:0-0 ed:0 oi:0 Du:0-0 TE:0-0 0 0 [] 0 0 INA:1191
.
.
.and so on

Now, i want to use perl so that i can open this in excel with each column. ex. 101954 has its column and 000 has its own column etc. etc.
 
Hi,

You want to break that down into fields for us please? I can't tell where fields start and end at the moment.

Mike

To err is human,
but to really foul things up -
you require a man Mike.

Want to get great answers to your Tek-Tips questions? Have a look at faq219-2884

 
If you write the file in tab-delimited format and call it something.csv , you can open it using a spreadsheet program (OpenOffice.org, MS Excel, whatever). Assuming the fields are separated by spaces, the code is very simple:

[tt]
open I,'<infile.txt' or die "Can't open infile";
open O,'>outfile.csv' or die "Can't open outfile";
for(<I>) [/tt]# for each line in the input file[tt]
{
s/ +/\t/g; [/tt]# convert each sequence of one or more spaces to a tab[tt]
print O $_; [/tt]# write to output file[tt]
}
close I;
close O;
[/tt]

-----
$world=~s/war/peace/g;
 
You might also want to take a look at the module Spreadsheet::WriteExcel before making a decision to the direction you'd like to go.

- Rieekan
 
Assuming the fields are separated by spaces:
Code:
awk 'BEGIN{OFS="\t"}{$1=$1;print}' infile >outfile
 
Thanks to all of yours answers. But how can i remove the - and place the two items between the - to separate column?

example.

1-3 ; i want to remove the - sign and put 1 and 3 to a separate column?
 
Code:
awk 'BEGIN{FS="[ \t]+|-";OFS="\t"}{$1=$1;print}'

With comments:
[tt]
# Convert text file to tab-delimited format for
# importing into Excel.
# Put this code in file "tab-delimit.awk" and run with
# awk -f tab-delimit.awk inputfile >outputfile

BEGIN {
# Where do we want to split into columns?
# We set the field-separator to match whitespace
# (one or more spaces or tabs: "[ \t]+") or a "-".
FS="[ \t]+|-"
# We set the output-field-separator to a tab.
OFS="\t"
}

# Main loop. The code below will be executed
# for every line read from input file.

{ # When we assign to a field, the line ($0) is rebuilt
# with OFS between fields. This is how we put the tabs
# where they belong.
$1=$1
# Print the line.
print
}
[/tt]
 
You never told us where your fields begin and end, as requested, but assuming they're delimited by whitespace:
Code:
while (<>) {
    chomp;
    s/-/ /g; # replace dashes
    print join("\t", split), "\n"; # output tab-delimited
}
 
Hi,

Im sorry for my incomplete details. Here's what i want to do:

101854 000 / 0 0 0 PR 0 ntserver-[1234] 192.168.0.9-[1234] de:6-19 kp:0-0 ed:0 oi:0 Du:0-0 TE:0-0 0 0 [] 0 0 INA:1191

the start in the line will be 101854 then, i want to separate the entry with - in between. Ex. ntserver-[1234] i want this to be the ntserver will be on different column in excel and also the entry '-' and [1234]. and also the separate the entry with : sign in between. ex. in de:6-19 is it possible doing this? de will be separed column and : and 6, - and 19?
 
nelpalad, I'm sure it's possible to do what you want. However, I'm still confused about what exactly that is. Could you show me how you want the data broken up? Something like this, for example:
Code:
 1: 101854 
 2: 000 / 0 0 0 PR 0 
 3: ntserver
 4: [1234] 
 5: 192.168.0.9
 6: [1234] 
 7: de
 8: 6
 9: 19 
10: kp
11: 0
12: 0 
13: ed
14: 0 
15: oi
16: 0 
17: Du
18: 0
19: 0 
20: TE
21: 0
22: 0 0 0 
23: [] 0 0 
24: INA
25: 1191
I realize you want this in columns not rows, of course, but if you can show me like this it will be easier for me to understand. Thanks.


 
Split at "-" or ":", preserving those characters.
Code:
awk '{gsub(/:|-/,"\t&\t")}1' infile >outfile
Split at "-" or ":", preserving those characters, or at one or more blanks, replacing with tabs.
Code:
awk '{gsub(/:|-/,"\t&\t");gsub(/ +/,"\t")}1' in >out
 
:) It's that awk chap here again...

I *think* the OP want's it like this: (Don't worry if English is not your first language Nepalad, just keep on asking)

101854
000 / 0 0 0 PR 0 ntserver
[1234] 192.168.0.9
[1234] de
6-19 kp
0-0 ed
0 oi
0 Du
0-0 TE
0-0 0 0 [] 0 0 INA
1191

Is that how it needs to be split?

Mike

To err is human,
but to really foul things up -
you require a man Mike.

Want to get great answers to your Tek-Tips questions? Have a look at faq219-2884

 
Code:
open FH, ">outfile.txt";
$line=
"101854 000 / 0 0 0 PR 0 ntserver-[1234] 192.168.0.9-[1234] de:6-19 kp:0-0 ed:0 oi:0 Du:0-0 TE:0-0 0 0 [] 0 0 INA:1191";
@data=split/[-:\s+]/, $line;
foreach (@data) {
 print FH "$_,";
}
print FH "\n";

produces
Code:
101854,000,/,0,0,0,PR,0,ntserver,[1234],192.168.0.9,[1234],de,
6,19,kp,0,0,ed,0,oi,0,Du,0,0,TE,0,0,0,0,[],0,0,INA,1191,

Is this what you're after
--Paul

Nancy Griffith - songstress extraordinaire,
and composer of the snipers anthem "From a distance ...
 
Hi Paul,

Yes, this is what i want to do. Separate each item in each column.

Thanks everyone for your help.

Could you or anyone suggest a book for reference to help me start building perl + cgi + postgresql + apache?

Cheers!
 
Code:
awk 'BEGIN{FS="[: \t-]";OFS="\t"}{$1=$1"";print}'
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top