Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Perl RegEx Query 3

Status
Not open for further replies.

jriggs420

Programmer
Sep 30, 2005
116
US
I'm trying to learn perl on my own as part of a side projest through work. I have come up with a perl script that is nearly perfect and will save me TONS of time. Here's what I want to happen:

___DATA.FILE______
- Here is a line
- This is a line too
- You get the idea

___DESIRED.OUTPUT__
[one tab] 'HERE IS A LINE'/
'THIS IS A LINE TOO'/
'YOU GET THE IDEA'<---here's the problem, the current code places a '/', but I don't need one on the last line.

The code I have is:
Code:
open(TXT,"ARGV[0]") || die "Cannot open the data file";
$instr = do{local $/; <TXT> };
close(TXT);
chomp($instr);
$instr =~ s/\-\s+([^\n]+)/\t\U'$1\'\//g;
chop($instr);
open(TXT,">ARGV[1]") || die "Cannot open the formatted file";
print TXT $instr;
close(TXT);
This script works beautifully except for the '/' on the last line. I also have some pattern switching going on after the regex line i.e. $instr~ s/cat/dog/g; I'm not sure how I can get that last '/' of of that line. Any suggestions would be great. Many thanks in advance-
Joe
 
open (INF, ARGV[0]) or die $!;
my @data = ();
while(<INF>){
$chomp($_);
#your $_ =~s/this/that/g statements
push (@data, $_);
}
close INF;

my $output = "\t'" . (join("'/\n\t'", @data)) . "'\n";
 
OMG: chomp should not have a $ infront of it. Must be Friday afternoon!
 
it seems, based on the few sample lines, this would be easier without slurping the file into a scalar. Not really sure of what you are doing but based on the small sample of lines you posted:

Code:
open(TXTIN,"ARGV[0]") || die "Cannot open the data file";
open(TXTOUT,">ARGV[1]") || die "Cannot open the formatted file";
while(<TXTIN>) {
   chomp;
   my $eol = "/\n";
   $eol = "\n" if eof;#<--end of file so we don't want the /
   if (index($_,'-') == 0) {
      substr $_,0,2,"\t";
      $_ = uc($_).$eol;
      print TXTOUT $_;
   }
   else {
      print TXTOUT "$_\n";
   }
}
close(TXTIN);
close(TXTOUT);

if all the lines start with "- " then the if condition is not necessary, which means the else condition isn't necessary either.

Note no regexps used. If the purpose is to learn regexp's then you can substitute in your code where necessary.

Congrats and props for learning perl on your own and welcome to Tek-tips.
 
I missed the quotes:

change these lines:

Code:
   if (index($_,'-') == 0) {
      substr $_,0,2,"\t";
      $_ = uc($_).$eol;

to:

Code:
   if (index($_,'-') == 0) {
      $_ = substr $_,2;
      $_ = "\t'" . uc($_) . "'$eol";
 
Fellas, all of your suggestions are much appreciated, though I'm not following what some lines are doing in your snippets---specifically
Code:
  if (index($_,'-') == 0) {
      $_ = substr $_,2;
I can guess what is going on, but, if someone wouldn't mind explaining, that'd be great.
Also, what is the best way to determine if $_ contains a '-' (or any other character for that matter), or not. From what I've tried
Code:
if ($_=~m/cat/)#would ~/\-/ match '-'??
{
...do something
}
doesn't work, is this simply a typo, or an error in my perl logic? Hope this isn't too many stoopid questions for one post-

Joe
 
you can look those functions up:


has a nice listing of all the core perl functions with some examples and explnations. I will also try and explain:

if (index($_,'-') == 0)

is the same as:

if (/^-/)

Thye both literally mean: if (the string begins with a dash)

index() should do the checking faster than a regular expression though. 0 (zero) means the substring "-" is in the very first position of the string. If the dash was in the second position the condition would fail since that is position 1 (one). If index() returns -1 that means the substring was not anywhere in the string.

This:

$_ = substr $_,2;

removes to first two characters from the variable $_ (the dash and the space after it in this case).

This:

substr $_,0,2,"\t";

replaced the first two characters with the tab, but I changed that after I noticed you wanted to wrap the text in single-quotes. Here are the arguments to substr():

substr EXPRESSION,OFFSET,LENGTH,REPLACEMENT

this regexp:

Code:
if ($_=~ m/-/)
{
...do something
}

will return true if there is a dash anywhere in $_, its the same as:

Code:
if (index($_,'-') > -1) {
...
}
 
Hey Kevin just wanted to say thanks. What you have there should be enough to keep me busy for at least a couple of weeks. *
 
does this help?

Code:
[b]#!/usr/bin/perl[/b]

open (TXT, "< $ARGV[0]") || die "Cannot open the data file";
undef $/; $_ = <TXT>; $/ = "\n";
close TXT;

s/^-\s*(.*)$/\t\U'$1'/mg; s|\n|/\n|mg;

open (TXT, "> $ARGV[1]");
print TXT;
close TXT;


Kind Regards
Duncan
 
Duncan-
That looks pretty close to the regex I have come up with, but I'm not sure if it's any more efficient than the one I have, since I'm not sure what is going on in yours. Would you mind breaking down
Code:
s/^-\s*(.*)
That's something I haven't come across yet, Looks like it's removing all leading '-', but is it requiring the input to have them in order to work properly? Here's what I have so far:
Code:
if ($instr=~ /^-/)
{$instr =~ s/\-\s+([^\n]+)/\t\U'$1\'\//g;
}
else
{
$instr =~ s/([^\n]+)/\t\U'$1\'\//g;
}
$instr =~ s/\/$/ /;
I know there's a better way to this, but since I am new to perl and it works I am pretty happy. If anyone has any recommendations questions, by all means, please post- TIA.
Joe
 
I always recommend using the /x flag on regexes when you're learning - it makes it so much easier to keep track of what you've got. /x allows you to include comments and whitespace (you can still match whitespace with \s, \t, etc). As an example, your line
Code:
$instr =~ s/\-\s+([^\n]+)/\t\U'$1\'\//g;
could be written as
Code:
$instr =~ s/    # REPLACE
    \-          #   a literal minus, followed by
    \s +        #   one or more spaces, followed by...
    (           #   (start capturing to $1)
      [^\n] +   #     ...one or more non-CRs
    )           #   (stop capturing)
  /             # WITH
    \t          #   a tab
    \U          #   (uppercase till \E)
    '           #   a literal quote
    $1          #   captured substring 1 (upcased)
    \'          #   a literal quote
    \/          #   a literal slash
  /xg;

I've been a little pathological with this example, but hope you may find this style easier to write and to understand. You can see from this that your second literal quote does not need to be backslash-escaped. As a matter of fact, neither does your leading literal minus (as it is the first character) but, as it would be a meta-character anywhere else, it's good, healthy defensive programming to escape it anyway.

HTH,

fish

[&quot;]As soon as we started programming, we found to our surprise that it wasn't as easy to get programs right as we had thought. Debugging had to be discovered. I can remember the exact instant when I realized that a large part of my life from then on was going to be spent in finding mistakes in my own programs.[&quot;]
--Maur
 
Thx, fish, regexes definitley aren't my strong suit, and that on top of trying to crack a terse langauge like perl has made it all the harder. here's a star for what's its worth. Also, if anyone is still reading this thread--Suppose I wanted to go through the data input line by line. What would be the best way to go about it? I'm thinking a 'foreach' loop, but the example in my generic "Teach yourself Perl in 24 hours" book doesn't want to work, any suggestions?
 
Suppose I wanted to go through the data input line by line. What would be the best way to go about it?

go back to my first post:

Code:
open(FILEHANDLE,'yourfile.txt') or die "$!";
while(<FILEHANDLE>) {
   do something with each line
}
close(FILEHANDLE);

this is generally a very good way to process a file line by line. The "best" way depends on the size of the file and what you are doing, so there is no one-size-fits-all answer for that question.
 
Thanks for the pointer Kevin, I didn't realize that 'while' was actually going through line for line. Generally, the input file will be less than 40 lines, so is a foreach appropriate here or no? As a side note, Kevin your code works fairly well except that the last line was always
Code:
 ''
, and that was it, I couldn't get the eol to work despite much tweaking, other than that, it worked great.
 
foreach is most appropriate where you have some form of array or list to process. If you slurp the entire file into an array then yes, foreach is most appropriate.
If however you wish to process a file "line by line" (which in my humble opinion is often far safer) then you will probably read from a filehandle or stdin and be best off using a while loop as has been sugested here before.


Trojan.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top