Hi All,
I hope someone might be able to help me by taking a look at my regex and telling me where i am going wrong
I have a load of text files in the format
Title:
blah blah blah over one or many lines
Article Body:
blah blah blah over many lines
I am trying to capture the content of the title and the article body without the "Title:" or "Article Body:"
Additionally, there is sometimes other thing in between where the title and body text appear in the file such as
Wordcount:
this is a number
Keywords:
blah blah blah
I want to ignore these other items in the file.
As a rule the content blocks to capture have their field name followed by a semi colon and then i want to capture everything until the next field name followed by a semi colon, but without any of the field names.
Here is what i am doing so far.
I have read the file into a single var ($result) since the files are not very big but there is lots of them,
If i could, it would be better in one regex but I am not sure how to get that working.
I would really appreciate any suggestions as my regex knowledge is quite rusty (and probably wasn't that great to start with
Thanks,
Jez
I hope someone might be able to help me by taking a look at my regex and telling me where i am going wrong
I have a load of text files in the format
Title:
blah blah blah over one or many lines
Article Body:
blah blah blah over many lines
I am trying to capture the content of the title and the article body without the "Title:" or "Article Body:"
Additionally, there is sometimes other thing in between where the title and body text appear in the file such as
Wordcount:
this is a number
Keywords:
blah blah blah
I want to ignore these other items in the file.
As a rule the content blocks to capture have their field name followed by a semi colon and then i want to capture everything until the next field name followed by a semi colon, but without any of the field names.
Here is what i am doing so far.
I have read the file into a single var ($result) since the files are not very big but there is lots of them,
Code:
my $pattern = qr/^Title:\n+^(.+\n)/mx;
if($result=~/$pattern/){
$title = $1;
}
my $pattern2 = qr/^Article Body:\n+^(.+\n)/mx;
if($result=~/$pattern2/){
$body = $1;
}
If i could, it would be better in one regex but I am not sure how to get that working.
I would really appreciate any suggestions as my regex knowledge is quite rusty (and probably wasn't that great to start with
Thanks,
Jez