Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Scanning HTML for file names and their paths 2

Status
Not open for further replies.

Sleidia

Technical User
May 4, 2001
1,284
FR


Hi :)

I'm now using PHPmailer and I ave an issue with the way it attaches
images into a HTML mail.

As you can see below, you have to use $mail->AddEmbeddedImage for every image contained in the HTML mail :

Code:
$mail = new PHPMailer();
$mail->From = $_SESSION["admin_email"];
$mail->FromName = strtoupper($_SERVER['HTTP_HOST']);
$mail->Subject = $mail_subject;

    foreach ($my_array["pics"]["filenames"] as $key => $filename) {
    
    $mail->AddEmbeddedImage($my_array["pics"]["paths"][$key] . $filename, $filename, $filename, "base64", "image/gif");
    
    }
     
$mail->Body = $mail_html;
$mail->AltBody = $mail_text;
$mail->AddAddress($GLOBALS["member_email"], $GLOBALS["member_email"]);

The question is : because $mail_html is user input, how could I scan the content of $mail_html in order to populate all the used images into
$my_array["pics"]["paths"] for the paths and $my_array["pics"]["filenames"] for the file names?


Thanks!
 
can you post an example of you $mail_html variable so that we can see the patterns you use?
 
Hi jpadie :)

Well, $mail_html can be any HTML content.
There aren't any patterns at all because images can be used from several different directories.

So, from what I understand, the delimiters should be img src=" or something like that, right? Only gif/jpg/flash files need to be populated into the array.
 
but you are only talking about <img> tags? i guess you must be providing absolute paths for each image (or a base directive in the <head> content)? can you confirm?
 
Actually, I only need to collect all the paths and filenames of all the <img> tags. Let's get rid of swf files. Jpg and gif will be enough. No background images either.

As for fixing the paths if needed, I'll do it myself later.

So, my bet is that all I need is a REGEX expression that finds all the <img src="?"> in a string, right? REGEX are my weakest point ;(

Actually, I've just found this ... which I'm going to try right now ;)
 
Looks like the programmer of this forum isn't too good with regular expressions either ahaha ;)
 
here's a regex for you. the matches of the src attribute will be in $matches[1] the img tags will be in $matches[0]

but doesn't msgHTML() do this stuff automatically? i have not looked at the source code of phpmailer for a while so might (well ) be wrong.

Code:
$pattern = '/<\\s*img .*?src\\s*=\\s*((\'|").*?\.(png|jpg|jpeg|gif)(\\2)).*?>/imsx';
$text = '<img src="[URL unfurl="true"]http://www.domain.com/path/to/file.png"[/URL] align="center" />';
preg_match_all($pattern, $text, $matches);
print_r($matches);
 
Hi,

Here is a slight change in jpadie's regex from my side (untested).

Code:
$pattern = '/\<\s*img .*?src\s*=\s*.*?\.(png|jpg|jpeg|gif).*?\>/iz';



--------------------------------------------------------------------------
I never set a goal because u never know whats going to happen tommorow.
 
spookie - a few questions:

1/ why are you escaping the angle brackets?
2/ I'm not aware of the z modifier in preg. what does it do?
3/ your pattern does not (appear to) capture the src attribute in a backreference, which is something sleidia was after so that it could be used to pass into the phpmailer method.
4/ i have found that i needed to double slash the special characters like \s (so it becomes \\s. otherwise php thinkgs i am escaping the next character. have you found a way around this?

apart from the above, the key difference that i see between our respective patterns is that mine requires the src attribute to be valid html (by being enquoted).

 
jpadie,

As per my kowledge, since \s is a character class(matches for space character) itself and so does not need to esacpe.

\z match at the end of the string. so typically the regex would match a string like

Code:
$text = '<img src="[URL unfurl="true"]http://www.domain.com/path/to/file.png"[/URL]
and not
Code:
$text = '<img src="[URL unfurl="true"]http://www.domain.com/path/to/file.png"[/URL] align="center" />\n';

I am not coding in PHP for sometime now and tried to just suggest some way ahead to sliedia, so apologies if i am wrong somewhere.


--------------------------------------------------------------------------
I never set a goal because u never know whats going to happen tommorow.
 
Typo:

z match at the end of the string. so typically the regex would match a string like

Code:
$text = '<img src="[URL unfurl="true"]http://www.domain.com/path/to/file.png"[/URL] align="center" />';
and not

Code:
$text = '<img src="[URL unfurl="true"]http://www.domain.com/path/to/file.png"[/URL] align="center" />[b]\n[/b]';


:)

--------------------------------------------------------------------------
I never set a goal because u never know whats going to happen tommorow.
 

THAAAAAAAAAAAAAANKS a lot Jpadie ;)

Your code was the only one that did what I needed.
I ran it on some very complex HTML and here is the result I got :

Code:
Array
(
    [0] => Array
        (
            [0] => <img src="../fr/_dir_layouts_/layout_1/images/body_logo.gif">
            [1] => <img height="185" width="667" alt="" src="../_dir_files_/Image/fr/header_welcome.jpg" />
            [2] => <img src="../fr/_dir_layouts_/layout_1/images/body_slogan.gif">

            [3] => <img src="../fr/_dir_layouts_/layout_1/images/nav_top_left_deco.jpg">
            [4] => <img src="../fr/_dir_pics_/_dir_banners_/banner_3000.jpg" border="0">
            [5] => <img height="228" alt="" width="200" src="../_dir_files_/Image/fr/home_1.jpg" />
            [6] => <img height="228" alt="" width="200" src="../_dir_files_/Image/fr/home_2.jpg" />
            [7] => <img height="228" alt="" width="200" src="../_dir_files_/Image/fr/home_3.jpg" />
            [8] => <img height="228" alt="" width="200" src="../_dir_files_/Image/fr/home_4.jpg" />

            [9] => <img src="../fr/_dir_pics_/_dir_banners_/banner_right.jpg" border="0">
        )

    [1] => Array
        (
            [0] => "../fr/_dir_layouts_/layout_1/images/body_logo.gif"
            [1] => "../_dir_files_/Image/fr/header_welcome.jpg"
            [2] => "../fr/_dir_layouts_/layout_1/images/body_slogan.gif"
            [3] => "../fr/_dir_layouts_/layout_1/images/nav_top_left_deco.jpg"
            [4] => "../fr/_dir_pics_/_dir_banners_/banner_3000.jpg"
            [5] => "../_dir_files_/Image/fr/home_1.jpg"
            [6] => "../_dir_files_/Image/fr/home_2.jpg"
            [7] => "../_dir_files_/Image/fr/home_3.jpg"
            [8] => "../_dir_files_/Image/fr/home_4.jpg"
            [9] => "../fr/_dir_pics_/_dir_banners_/banner_right.jpg"
        )

    [2] => Array
        (
            [0] => "
            [1] => "
            [2] => "
            [3] => "
            [4] => "
            [5] => "
            [6] => "
            [7] => "
            [8] => "
            [9] => "
        )

    [3] => Array
        (
            [0] => gif
            [1] => jpg
            [2] => gif
            [3] => jpg
            [4] => jpg
            [5] => jpg
            [6] => jpg
            [7] => jpg
            [8] => jpg
            [9] => jpg
        )

    [4] => Array
        (
            [0] => "
            [1] => "
            [2] => "
            [3] => "
            [4] => "
            [5] => "
            [6] => "
            [7] => "
            [8] => "
            [9] => "
        )

)

This is simply crazily beautiful ahaha ;)

Lastly, could you tell me what is the data that is supposed to go in key 2 and key 4?

Did I say thanks? ;)

 
key 2 and 4 are just the quotes that are captured as part of the backreference. it's a byproduct of the regex needed to match the quote types. you can get rid of key 4 by deleting the brackets around the \\2

you can get rid of key 3 too, by making it a non-capturing group. to do these bits and bobs (and remove the double slashes pointed out by spookie) change the pattern to

Code:
$pattern = '/<\s*img .*?src\s*=\s*((\'|").*?\.(?:png|jpg|jpeg|gif)\2).*?>/imsx';

@spookie
don't get me wrong, i was asking questions as i didn't know the answers myself. on testing, i see that you are correct that built-in char classes don't need escaping.

i cannot, however, find any documentation on the z modifier. but I do see that there is a \z assertion which appears to be similar to the $ assertion and does what you say. I think that this will need to be before the modifiers though (i.e. before the forward slash near the end of the pattern). it does need to be preceded by the backslash.
However, I don't think that would work for Sleidia as I had assumed that there would be many img tags in the string.


 
jpadie,

It's great that the solution worked perfectly!! Also thanks for some tips for me to learn. A star from me..


--------------------------------------------------------------------------
I never set a goal because u never know whats going to happen tommorow.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top