Seems simple but I'm stumped

jaycastaldo · Jan 23, 2004

Well I thought I had something good on my hands until I received I had spaces in what was supposed to be my first field in the file. Here is an example:

John 1 Main St 555-1212
Joe 2 Main St 555-1313
Jack 3 Main St 555-1414
Jack Frost 1 Main St 555-1515
John Doe 2 Main St 555-1616
Joe Smoe 3 Main St 555-1717

I want to grab the first field whether it contains a only a first name or first and last, problem with that is the names might be variable as there might be more than one space in the name like Jack Frost Jr. Also I want to be able to print the 2nd field as a whole field even if it has a billion different whitespaces. Any help would be appreciated because I've being doing a lot of looking and nothing that can really help me. Thanks a lot, I know it's kind of hard to teach while trying to provide a quick solution, but please comment as well as possible. Thanks again

aigles · Jan 24, 2004

The problem is to determine where to start the second field (there is no specific separator).

1) If the field 2 start allways at the first numeric character (like in your datas example), you can do :

[tt]
{
pos=match($0,"[0-9]&quot

;
field1 = substr($0, 1, pos-2);
sub("[[:space:]]*$", "", field1); # Trim leading spaces
field2 = substr($0, pos);
printf "[%s] [%s]\n",field1,field2;
}
[/tt]

2) If you fields are fixed length (like in your datas example where field 1 is 10 chars), you can do :

[tt]
{
field1 = substr($0, 1, 10);
field2 = substr($0,12);
printf "[%s] [%s]\n",field1,field2;
}
[/tt]

With gawk you can read fixd-width datas.
In the BEGIN clause, defines the variable FIELDWIDTHS.

[tt]
BEGIN {
FIELDWIDTHS = "11 9999";
}
{
printf "[%s] [%s]\n",$1,$2;
}
[/tt]

Jean Pierre.

PHV · Jan 24, 2004

Try something like this:

Code:

awk 'match($0,/[1-9][0-9]*[^0-9]*/){
 f1=substr($0,1,RSTART-1);sub(/[ \t]*$/,&quot;&quot;,f1)
 f2=substr($0,RSTART,RLENGTH-1);sub(/[ \t]*$/,&quot;&quot;,f2)
 f3=substr($0,RSTART+RLENGTH+1)
 printf &quot;%s,%s,%s\n&quot;,f1,f2,f3
}' /path/to/infile

The assumptions are:
Field1 never has digit
Field2 allways start with a number and have no more digit
Field3 allways start with a digit
Just a thought: Is field2 allways starting at the same column position ?

Hope This Help
PH.

jaycastaldo · Jan 24, 2004

well I was trying to hide my data as best as possible and didn't give enough parameters. Unfortunately, none of the fields are fixed. Second, all the fields can contain a combination of numbers, characters and spaces. Sorry

Ex.

John1111111 1111111 1 Main St. 555-1212
Joe Doe 11122222222 2 Main St. 555-1212

I want to get all of the "John1111111 1111111" into a field. Everything other field is basically the same. It's a tough deal and I'm wondering if awk is the right way to go, but I don't know any other Unix command that can do get info from a file properly. I have tried both suggestions, hopefully I can jimmy rig, thanks please keep it coming with the help because I am far from solving this. Thanks again

jaycastaldo · Jan 24, 2004

to simplify this, the first field contains characters, spaces, and numbers, the second field will only contain numbers with no spaces.
Ex

"1111JJJ 222222222 12132Jack" 5551212
"1122JJJ 333333333 12132Joe" 5551313

Thanks, I hope this makes it easier

PHV · Jan 24, 2004

jaycastaldo, you have to post more differently layered data.
Is field2 allways starting with a number between, say, 1 and 999?
Will field1 never have a blank delimited number between 1 and 999?
Off topic: Salut JP

jaycastaldo · Jan 24, 2004

field2 will ALWAYS have a number starting between 0-9
field can have a combination of characters, spaces, and numbers as displayed in my previous example. Just to give you a little info, Field1 are Firewall Rules that contain Names of programs in uppercase and lowercase, numbers, and spaces in between them. Thanks and sorry for being so difficult, trying to protect my configs as much as possible

PHV · Jan 24, 2004

jaycastaldo, your last data example is completly different from the 1st !
If you want helpfull answer, please post consistent input example and expected result.
BTW, feel free to change digits by another digits and letters by another letters to maintain confientiality.

jaycastaldo · Jan 24, 2004

I understand that it wasn't consistent but I am trying to protect my data, if I listed my hard data, it would takes hours for me to cover up anything that needs to be protected, oh yeah by the way I work for the US GOvernment so you can imagine. I'm must trying to get the theory down I know it is hard to go by and I apologize but I need to protect the data I am getting paid to protect as much as possible

jaycastaldo · Jan 24, 2004

well I think I found a delimiter, anything with more than one space, the Names will contain one space but never two spaces in a row. Can anybody work with that? I have tried it `awk -F " " '{print $1}' works fine, when I try to print more than one field, no can do, only prints the first first. Anybody?

CaKiwi · Jan 25, 2004

I'm not exactly sure what you want to what you want to do with the data but this splits it at the first multiple space delimiter and prints it all out

{
n1=match($0,/ */)
printf substr($0,1,n1)
n2=split(substr($0,n1),a)
for (j=1;j<=n2;j++) printf a[j] " "
print ""
}

CaKiwi

"I love mankind, it's people I can't stand" - Linus Van Pelt

aigles · Jan 26, 2004

Hi jaycastaldo ,

With these assumptions :
o The field1 (Names) ends at the first multi-space delimiteur if exists.
o The field1 contains no more than one space.

you can do :
[tt]
{
sub(" *$","&quot

;
pos = match($0,/ +/);
if (pos > 0) {
Names = substr($0, 1, pos-1);
Addr = substr($0, pos+RLENGTH);
} else {
Names = $1 " " $2;
$1 = ""; $2 = "";
Addr = $0;
sub("^ *", "", Addr);
}
printf "[%s] [%s]\n", Names, Addr;
}
[/tt]

With you data examples, the result is:
[tt]
[John] [1 Main St 555-1212]
[Joe] [2 Main St 555-1313]
[Jack] [3 Main St 555-1414]
[Jack Frost] [1 Main St 555-1515]
[John Doe] [2 Main St 555-1616]
[Joe Smoe] [3 Main St 555-1717]
[John1111111 1111111] [1 Main St. 555-1212]
[Joe Doe] [11122222222 2 Main St. 555-1212]
[/tt]

As you can see, there is a problem in the last line.
Your definition of the fields is incomplete.

Jean Pierre.

Ygor · Jan 27, 2004

Perhaps break fields depending on contents? e.g....

awk 'BEGIN {numeric = "^[0-9]+$"}
{
for (x=1; x<NF; x++) {
if ($x ~ numeric && $(x+1) !~ numeric) {
$x = "] [" $x
break
}
}
print "[" $0 "]"
}' file1

[John ] [1 Main St 555-1212]
[Joe ] [2 Main St 555-1313]
[Jack ] [3 Main St 555-1414]
[Jack Frost ] [1 Main St 555-1515]
[John Doe ] [2 Main St 555-1616]
[Joe Smoe ] [3 Main St 555-1717]
[John1111111 1111111 ] [1 Main St. 555-1212]
[Joe Doe 11122222222 ] [2 Main St. 555-1212]

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Seems simple but I'm stumped

jaycastaldo

IS-IT--Management

aigles

Technical User

PHV

MIS

jaycastaldo

IS-IT--Management

jaycastaldo

IS-IT--Management

PHV

MIS

jaycastaldo

IS-IT--Management

PHV

MIS

jaycastaldo

IS-IT--Management

jaycastaldo

IS-IT--Management

CaKiwi

Programmer

aigles

Technical User

Ygor

Programmer

Similar threads

Part and Inventory Search

Sponsor