Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

REGEXP PROBLEM

Status
Not open for further replies.

ccherchi

Programmer
Jun 18, 2002
14
FR
Hi,

i am trying this line to retrieve the coutry from ths following source code :

preg_match_all(&quot;/<select name=\&quot;coutry\&quot;.*?(<option>.*?<\/option>)+.*?<\/select>/s&quot;, $page, $result);

[SOURCE CODE BEGINING ]

<select name=&quot;coutry&quot;>
<option>country1</option>
<option>country2</option>
<option>country3</option>
</select>

<select name=&quot;city&quot;>
<option>city1</option>
<option>city2</option>
<option>city3</option>
</select>

[SOURCE CODE END]

But this don't work. By modifying this regular expression, i hardly retrieve all &quot;<option>.*?</option>&quot; matches, but i really need to only get country entries (i would like to get an array with these values : &quot;country1&quot;, &quot;country2&quot; & &quot;country3&quot;).

Is there someone to help me doing this ?

Thanks a lot.

Cedric.
 
I think you're trying to make yourself crazy by trying to do it all in a single regular expression.

Using the HTML code you provided (misspellings included), this script will open a file called county.txt and get the countries and cities out into two separarate arrays.

If each form element is not on it's own line, however, this script will break.

<?php
$fh = fopen (&quot;country.txt&quot;, &quot;r&quot;) or die (&quot;Could not open&quot;);

$countries = array();
$cities = array();

while ($line = fgets ($fh))
{
if (preg_match (&quot;/name=\&quot;coutry\&quot;/&quot;, $line))
{
get_names ($countries);
}

if (preg_match (&quot;/name=\&quot;city\&quot;/&quot;, $line))
{
get_names ($cities);
}
}
print_r ($countries);
print &quot;<br>&quot;;
print_r ($cities);

function get_names (&$storage)
{
global $fh;

$continue = -1;
$line = fgets ($fh);
while (! preg_match (&quot;/<\/select>/&quot;, $line))
{
preg_match_all (&quot;/>([^<>]+)</&quot;, $line, $match);
foreach ($match[1] as $temp)
{
array_push ($storage, $temp);
}

$line = fgets ($fh);
}
}

?>
______________________________________________________________________
Perfection in engineering does not happen when there is nothing more to add.
Rather it happens when there is nothing more to take away.
 
> If each form element is not on it's own line, however, this script will break.

How would you do this without regard to line breaks? For example, is it possible to first strip all the cr/lf codes from a text file and then follow that with the appropriate regx to xtract the needed sections?
 
Sure. Trim each line as you read it from the input. Concatentate all the lines by adding each in turn to the end of the value in a variable.

Then run each of the preg_match_all() functions you need to match all of the values you're looking for.


This will work for small files. Sooner or later, though, you're going to run into PHP's runtime configured memory limits.


I often have to process 50-60MB files looking for particular substrings using perl, so I tend to think in terms of line-by-line processing. ______________________________________________________________________
My God! It's full of stars!
______________________________________________________________________
 
A quick follow up...

In general, how are non-printing ascii codes like cr/lf, tab, etc. handled in regular expressions? For the cr/lf example could it be as simple as searching for a /n?
 
I think you must put the ungreedy param.

where you have /s in the end of the preg_match_all

put /sU (note, U is UPPERCASE).

Anikin
Hugo Alexandre Dias
Web-Programmer
anikin_jedi@hotmail.com
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top