Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

matching digits and adding tabs to each line of a file 3

Status
Not open for further replies.

marc7

Programmer
Oct 4, 2001
35
0
0
US
i am using perl to go through a database and add some tabs and characters. i am having trouble with one particular segment though. it looks like this:

1 012345 Literature Book Pkged with Audio Companion $52.50 $52.50

what i need to do is add a tab between 1 and 012345. sometimes the number 1 will be two digits and sometimes it will be two digits. can anyone help me with the pattern mathcing statement? any guidance is appreciated. thanks.
 
Code:
$var =~ s/\A(\d+) +(.*)\Z/$1\t$2/;
Class exercise: you try to figure out what it does and why. :)
Tracy Dryden
tracy@bydisn.com

Meddle not in the affairs of dragons,
For you are crunchy, and good with mustard.
 

1 012345 Literature Book Pkged with Audio

$var =~ s/\A(\d+) +(.*)\Z/$1\t$2/;

Please explain what this is doing?
It is starting with beginning of string (which is the number 1?) and then what is the
[tt] +(.*)[/tt] doing? I know the [tt]\Z[/tt] is ending the string. But what part of the string is it ending??
 
RegExs are my favourite waste of time. \A and \Z as you rightly supposed match the beginning and end of string, much like ^ and $, however the latter refer to lines only if /m is used and include the newline character, whereas \Z is always before a newline.

" +" matches 1 or many spaces
"(.*)" match 0 or many characters

The \Z is used to prevent the "(.*)" matching the newline character.

HTH,
Barbie. Leader of Birmingham Perl Mongers
 
Code:
$var =~ s/\A(\d+) +(.*)\Z/$1\t$2/;
-- matches ---------------------+-------- RegEx -----------
beginning of string | \A
1 or more digits, $1 catch | (\d+)
1 or more spaces, I prefer \s+ | [space]+
0 or more anythings, $2 catch | (.*)
end of string | \Z
--------------------------------+--------------------------
The \Z is actually useless in this case, since the .* is gauranteed to swallow up the rest of the string. I would have used:


Code:
$var =~ s/\s+/\t/;
which avoids the overhead of capturing and will simply replace the first match of 1 or more 'white space's with a tab, which in this example's case should be safe.

--Jim
 
Unfortunately re-reading the questions Jims answer would do the substitution on every line, whereas you need to substitute only the spaces after a word, where the word is the first word on the line and contains 1 or 2 digits, so you could say:

$var =~ s/^(\d{1,2})\s+/$1\t/;

Barbie
Leader of Birmingham Perl Mongers
 
I'm not quite sure I follow your logic Barbie.

It seems to me that marc7 is dealing with a flat-file database where the first character on every line is a number. If you examine his post above, I think it's reasonable to assume that we have one 'record' as an example and that the records are all one line in his file. Here, in the forum the line has wrapped due to it's length, but his script need not consider the limitations of this forum.

I'm not sure I follow you when you say that my pattern would do the substitution on every line. It's up to marc7 as to whether he applies the pattern to each line or not. Of course, in this context we are assuming that is the case.

Assuming my pattern would be applied to each line in the file, it is true that the first match of one or more spaces would be replaced with a single tab. Since the first occurence of one or more spaces is always following the first 'field' or set of digits, it is unnecessary to use positional assertions or to match the digits presence.

My RegEx would in fact produce the desired result described by marc7. However, should the match not be to the FIRST space/set of spaces, a more complicated pattern would need to be used. Fortunately over-complication of the pattern can wait until that requirement surfaces.

--Jim
 
My apologies I had wrongly assumed that the record was actually 2 lines. I guess that comes of thinking too hard at the end of a long day ;)

Barbie. Leader of Birmingham Perl Mongers
 
No one ever get's penalized for thinking. Least of all for questioning. No apology necessary. But thanks.

--Jim
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top