Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

preg_match_all not matching everything 1

Status
Not open for further replies.

pushyr

Programmer
Jul 2, 2007
159
GB
I'm trying to do a preg_match_all, but it's not matching everything. about 30% gets missed,

here's a sample of the full text that i'm using preg_match_all on...

----------------------------------------------------------

GET / HTTP/1.1
Host: User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:14.0) Gecko/20100101 Firefox/14.0.1
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip, deflate
Connection: keep-alive
Cookie: .ASPXANONYMOUS=ghi5dIC1zQEkAAAANjU0MDkyZGItM2YwMC00ZDg5LTlhMmEtNWUxMjM3ZTc5MzVl_z-MlT--rJT7DPRjsj2jovs02Dw1; ASP.NET_SessionId=pxy2jgalg2wjvv2akolfvd55; validexistingcustomer=1; Default_cookie=keYmJXIPFYU1LdkuJQFvQsDGa2yFwJicWX50T0yxAlLRGhZJJ249/4zx5vvvAjiNyXinLlmP6Vv3HCU=; __utma=130466954.1493506846.1345479130.1345479130.1345479130.1; __utmb=130466954.29.10.1345479130; __utmc=130466954; __utmz=130466954.1345479130.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); ASPDNSFGUID=99E31ADE25BD2ED4DF34AE9327195E5E59B9F28C49C3D4EFA07958062AEBF78E46EF77AB842319F799721CA41CF3018CD3251D01162AA79D0E476462CF05ED5EFD78DA199DE8F7E7749114B6D1B321AC4712F068BA85CCAE0EAAD3D888BC87F1425BD988B593B9CC7E9A8AF022876A4437B97A1177F0E74CE6C14368FFD99E83272C168CAF11D17A3F1C69D4B95DC1E7132D1FE54D5B129917CC103DA6F8A71528B760FACECE0AF125D69A5CB092FE1397DCC288

HTTP/1.1 301 Moved Permanently
Cache-Control: private
Pragma: no-cache
Content-Type: text/html; charset=utf-8
Expires: Mon, 20 Aug 2012 16:38:51 GMT
Location: Server: Microsoft-IIS/7.5
X-AspNet-Version: 2.0.50727
X-Powered-By: ASP.NET
Date: Mon, 20 Aug 2012 16:38:50 GMT
Content-Length: 84792
----------------------------------------------------------

GET / HTTP/1.1
Host: User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:14.0) Gecko/20100101 Firefox/14.0.1
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip, deflate
Connection: keep-alive
Cookie: .ASPXANONYMOUS=ghi5dIC1zQEkAAAANjU0MDkyZGItM2YwMC00ZDg5LTlhMmEtNWUxMjM3ZTc5MzVl_z-MlT--rJT7DPRjsj2jovs02Dw1; ASP.NET_SessionId=pxy2jgalg2wjvv2akolfvd55; validexistingcustomer=1; Default_cookie=keYmJXIPFYU1LdkuJQFvQsDGa2yFwJicWX50T0yxAlLRGhZJJ249/4zx5vvvAjiNyXinLlmP6Vv3HCU=; __utma=130466954.1493506846.1345479130.1345479130.1345479130.1; __utmb=130466954.29.10.1345479130; __utmc=130466954; __utmz=130466954.1345479130.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); ASPDNSFGUID=99E31ADE25BD2ED4DF34AE9327195E5E59B9F28C49C3D4EFA07958062AEBF78E46EF77AB842319F799721CA41CF3018CD3251D01162AA79D0E476462CF05ED5EFD78DA199DE8F7E7749114B6D1B321AC4712F068BA85CCAE0EAAD3D888BC87F1425BD988B593B9CC7E9A8AF022876A4437B97A1177F0E74CE6C14368FFD99E83272C168CAF11D17A3F1C69D4B95DC1E7132D1FE54D5B129917CC103DA6F8A71528B760FACECE0AF125D69A5CB092FE1397DCC288

HTTP/1.1 200 OK
Cache-Control: private
Pragma: no-cache
Content-Type: text/html; charset=utf-8
Expires: Mon, 20 Aug 2012 16:38:52 GMT
Server: Microsoft-IIS/7.5
X-AspNet-Version: 2.0.50727
X-Powered-By: ASP.NET
Date: Mon, 20 Aug 2012 16:38:52 GMT
Content-Length: 84665
----------------------------------------------------------

GET /getseal?host_name= HTTP/1.1
Host: seal.verisign.com
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:14.0) Gecko/20100101 Firefox/14.0.1
Accept: */*
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip, deflate
Connection: keep-alive
Referer:
HTTP/1.1 200 OK
Server: Apache-Coyote/1.1
Cache-Control: max-age=0, must-revalidate
Content-Type: text/javascript
Transfer-Encoding: chunked
Date: Mon, 20 Aug 2012 16:36:40 GMT
----------------------------------------------------------

I'm trying to match everything that's in-between these lines

----------------------------------------------------------

here's my code...

Code:
preg_match_all("#----------------------------------------------------------(.*?)----------------------------------------------------------#is", $content, $content0);
$content00 = $content0[0];

and here's my results...

Array
(
[0] => ----------------------------------------------------------

GET / HTTP/1.1
Host: User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:14.0) Gecko/20100101 Firefox/14.0.1
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip, deflate
Connection: keep-alive
Cookie: .ASPXANONYMOUS=ghi5dIC1zQEkAAAANjU0MDkyZGItM2YwMC00ZDg5LTlhMmEtNWUxMjM3ZTc5MzVl_z-MlT--rJT7DPRjsj2jovs02Dw1; ASP.NET_SessionId=pxy2jgalg2wjvv2akolfvd55; validexistingcustomer=1; Default_cookie=keYmJXIPFYU1LdkuJQFvQsDGa2yFwJicWX50T0yxAlLRGhZJJ249/4zx5vvvAjiNyXinLlmP6Vv3HCU=; __utma=130466954.1493506846.1345479130.1345479130.1345479130.1; __utmb=130466954.29.10.1345479130; __utmc=130466954; __utmz=130466954.1345479130.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); ASPDNSFGUID=99E31ADE25BD2ED4DF34AE9327195E5E59B9F28C49C3D4EFA07958062AEBF78E46EF77AB842319F799721CA41CF3018CD3251D01162AA79D0E476462CF05ED5EFD78DA199DE8F7E7749114B6D1B321AC4712F068BA85CCAE0EAAD3D888BC87F1425BD988B593B9CC7E9A8AF022876A4437B97A1177F0E74CE6C14368FFD99E83272C168CAF11D17A3F1C69D4B95DC1E7132D1FE54D5B129917CC103DA6F8A71528B760FACECE0AF125D69A5CB092FE1397DCC288

HTTP/1.1 301 Moved Permanently
Cache-Control: private
Pragma: no-cache
Content-Type: text/html; charset=utf-8
Expires: Mon, 20 Aug 2012 16:38:51 GMT
Location: Server: Microsoft-IIS/7.5
X-AspNet-Version: 2.0.50727
X-Powered-By: ASP.NET
Date: Mon, 20 Aug 2012 16:38:50 GMT
Content-Length: 84792
----------------------------------------------------------
[1] => ----------------------------------------------------------

GET /getseal?host_name= HTTP/1.1
Host: seal.verisign.com
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:14.0) Gecko/20100101 Firefox/14.0.1
Accept: */*
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip, deflate
Connection: keep-alive
Referer:
HTTP/1.1 200 OK
Server: Apache-Coyote/1.1
Cache-Control: max-age=0, must-revalidate
Content-Type: text/javascript
Transfer-Encoding: chunked
Date: Mon, 20 Aug 2012 16:36:40 GMT
----------------------------------------------------------
)

as you can see there should be 3 matches, but i only matched 2. Where this block of text didn't get matched, and i'm scratching my head why...

----------------------------------------------------------

GET / HTTP/1.1
Host: User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:14.0) Gecko/20100101 Firefox/14.0.1
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip, deflate
Connection: keep-alive
Cookie: .ASPXANONYMOUS=ghi5dIC1zQEkAAAANjU0MDkyZGItM2YwMC00ZDg5LTlhMmEtNWUxMjM3ZTc5MzVl_z-MlT--rJT7DPRjsj2jovs02Dw1; ASP.NET_SessionId=pxy2jgalg2wjvv2akolfvd55; validexistingcustomer=1; Default_cookie=keYmJXIPFYU1LdkuJQFvQsDGa2yFwJicWX50T0yxAlLRGhZJJ249/4zx5vvvAjiNyXinLlmP6Vv3HCU=; __utma=130466954.1493506846.1345479130.1345479130.1345479130.1; __utmb=130466954.29.10.1345479130; __utmc=130466954; __utmz=130466954.1345479130.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); ASPDNSFGUID=99E31ADE25BD2ED4DF34AE9327195E5E59B9F28C49C3D4EFA07958062AEBF78E46EF77AB842319F799721CA41CF3018CD3251D01162AA79D0E476462CF05ED5EFD78DA199DE8F7E7749114B6D1B321AC4712F068BA85CCAE0EAAD3D888BC87F1425BD988B593B9CC7E9A8AF022876A4437B97A1177F0E74CE6C14368FFD99E83272C168CAF11D17A3F1C69D4B95DC1E7132D1FE54D5B129917CC103DA6F8A71528B760FACECE0AF125D69A5CB092FE1397DCC288

HTTP/1.1 200 OK
Cache-Control: private
Pragma: no-cache
Content-Type: text/html; charset=utf-8
Expires: Mon, 20 Aug 2012 16:38:52 GMT
Server: Microsoft-IIS/7.5
X-AspNet-Version: 2.0.50727
X-Powered-By: ASP.NET
Date: Mon, 20 Aug 2012 16:38:52 GMT
Content-Length: 84665
----------------------------------------------------------
 
Code:
$pattern = '/\-{58}(.*?)\-{58}|$/s';
(assuming that I have counted the number of dashes correctly.) You need to match the end of subject OR the dashes to capture the final segment.
 
Hi jpadie,

I used your exact pattern, but didn't work as i'm still getting the same results..

Code:
preg_match_all('/\-{58}(.*?)\-{58}|$/s', $content, $content0);
 
sorry. didn't read the question carefully enough. you need to make the second assertion zero width with a lookahead (?=) otherwise the regex consumes the dashes and the start of the assertion does not get matched in the second instance.

Code:
$pattern = '/\-{58}(.*?)(?=\-{58}|$)/s';
 
you have totally helped me out on this one... can't thank you enough!!!!
 
no worries. You can achieve the same goal without a regular expression of course.

Code:
$separator = '';
for($i=0;$i<58;$i++) $separator .= '-';
$matches = explode($separator, trim($text,'-'));
echo '<pre>';
print_r($matches);
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top