Just one little issue I'm trying to figure out with regular expressions. I've used them plenty of times before, but I've always had issues with expressions that work on urls.
In this case, I'm using Python to grab information from a php page. The page contents would include something like this:
<a href="display.php?f=2359">Random text</a>
<a href="display.php?f=3256">More random text</a>
...and so forth.
So, I have this:
That works without a problem for matching all cases of display.php?, but if I try and change it even just a little, it falls apart:
I am of course hoping in this case that I'll be matching against display.php?f with this new expression, which most definitely IS in the html file, but it's not matching anything.
I just don't really understand why that expression doesn't work. I've also tried changing it to this:
...which also doesn't work.
So I'm guessing that there just has to be something important to the syntax of regular expressions that I've been completely missing here. Anyone mind filling me in?
The final expression that I'm looking for is a character set that matches display.php?f= followed by any number of digits followed by "> followed by a second character set of any number of alphanumeric characters. Which, to me, seems like it would be...
(I don't need to escape the quotation mark in this case, do I?)
In this case, I'm using Python to grab information from a php page. The page contents would include something like this:
<a href="display.php?f=2359">Random text</a>
<a href="display.php?f=3256">More random text</a>
...and so forth.
So, I have this:
Code:
import urllib2
import re
p = re.compile('(display.php\?)')
for line in urllib2.urlopen(path):
for matched in p.findall(line):
print matched
That works without a problem for matching all cases of display.php?, but if I try and change it even just a little, it falls apart:
Code:
p = re.compile('(display.php\?f)')
I just don't really understand why that expression doesn't work. I've also tried changing it to this:
Code:
p = re.compile('(display.php\?)(f)')
So I'm guessing that there just has to be something important to the syntax of regular expressions that I've been completely missing here. Anyone mind filling me in?
The final expression that I'm looking for is a character set that matches display.php?f= followed by any number of digits followed by "> followed by a second character set of any number of alphanumeric characters. Which, to me, seems like it would be...
Code:
p = re.compile('(display.php\?f=(\d)*">)(\w*)')