Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chris Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Python newbie question

Status
Not open for further replies.

bapatla

Programmer
Jun 22, 2010
2
US
Good Day,

I need to parse the following string to extract the Name and Description field values:

<Book_Data Name="LordOfTheRings" Edition="1" Published="1954" Description=" One Ring to rule the other Rings of Power" Publisher="Allen & Unwin "/>

Any nice way to accomplish this? Maybe via a regular expression? or using some xml parsing?

Many thanks,
Bapatla
 
for the name:
>>> m=re.search('.*Name=\"(\w*)\"',s)

for the description:
>>> m=re.search('.*Description=\"([^\"]*)\"',s)

_________________
Bob Rashkin
 
Code:
[COLOR=#a020f0]import[/color] re
bookstr='[COLOR=#ff00ff]<Book_Data Name="LordOfTheRings" Edition="1" Published="1954" Description=" One Ring to rule the other Rings of Power"  Publisher="Allen & Unwin "/>[/color]'

regex_bookname = re.compile(r"[COLOR=#ff00ff]Book_Data Name=\"(?P<name>\w*)\".*[/color]"
                            r"[COLOR=#ff00ff]Description=\"(?P<description>[^\"]*)\"[/color]")
result=regex_bookname.search(bookstr)
[COLOR=#804040][b]if[/b][/color] result == None:
  [COLOR=#804040][b]print[/b][/color] "[COLOR=#ff00ff]Book Name not found![/color]"
[COLOR=#804040][b]else[/b][/color]:
  name=result.group('[COLOR=#ff00ff]name[/color]').strip()
  [COLOR=#804040][b]print[/b][/color] "[COLOR=#ff00ff]Book name = '%s'[/color]" % name
  description=result.group('[COLOR=#ff00ff]description[/color]').strip()
  [COLOR=#804040][b]print[/b][/color] "[COLOR=#ff00ff]Book description = '%s'[/color]" % description
Output:
Code:
$ python parse.py
Book name = 'LordOfTheRings'
Book description = 'One Ring to rule the other Rings of Power'
 
Thanks Bob, mikrom!

Both these solutions work!!

I have another regex question...

The data looks like this:

<Book_Data App="zBook" Name="Type" Value="title" />
<Book_Data App="zBook" Name="Title" Value="Romeo and Juliet" />
<Book_Data App="zBook" Name="Author" Value="William Shakespeare" />
<Book_Data App="zBook" Name="Published" Value="1597" />
<Book_Data App="zBook" Name="Printer" Value="John Danter" />

How do I get the value for the Title and Author fields?

Thanks in advance!
Bapatla
 
Parse the data line by line and search in every line for the given values. Something like this
Code:
[COLOR=#a020f0]import[/color] re

[COLOR=#0000ff]# data lines[/color]
lines = ['[COLOR=#ff00ff]<Book_Data App="zBook" Name="Type" Value="title" />[/color]',
  '[COLOR=#ff00ff]<Book_Data App="zBook" Name="Title" Value="Romeo and Juliet" />[/color]',
  '[COLOR=#ff00ff]<Book_Data App="zBook" Name="Author" Value="William Shakespeare" />[/color]',
  '[COLOR=#ff00ff]<Book_Data App="zBook" Name="Published" Value="1597" />[/color]',
  '[COLOR=#ff00ff]<Book_Data App="zBook" Name="Printer" Value="John Danter" />[/color]']

[COLOR=#0000ff]# define regexes[/color]
regex_author = re.compile(r"[COLOR=#ff00ff]Name=\"Author\" Value=\"(?P<author>[^\"]*)\"[/color]")
regex_tittle = re.compile(r"[COLOR=#ff00ff]Name=\"Title\" Value=\"(?P<tittle>[^\"]*)\"[/color]")

book_hash = {}
[COLOR=#0000ff]# parse lines[/color]
[COLOR=#804040][b]for[/b][/color] line [COLOR=#804040][b]in[/b][/color] lines:
  result = regex_tittle.search(line)
  [COLOR=#804040][b]if[/b][/color] result != None:
    book_hash['[COLOR=#ff00ff]title[/color]']=result.group('[COLOR=#ff00ff]tittle[/color]').strip()
  result = regex_author.search(line)
  [COLOR=#804040][b]if[/b][/color] result != None:
    book_hash['[COLOR=#ff00ff]author[/color]']=result.group('[COLOR=#ff00ff]author[/color]').strip()

[COLOR=#0000ff]# print the results found[/color]
[COLOR=#804040][b]for[/b][/color] key [COLOR=#804040][b]in[/b][/color] book_hash.keys():
  [COLOR=#804040][b]print[/b][/color] "[COLOR=#ff00ff]%s: '%s'[/color]" % (key, book_hash[key])
Output:
Code:
author: 'William Shakespeare'
title: 'Romeo and Juliet'
 
The data you are providing looks like XML
You may want to check out the various XML modules (I havn't used them yet).

as an asside whoever created that XML file needs Shooting!
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top