Code:
Hi guys
I got a question about how to take out/insert a block of
[maroon]data[/maroon] [red]([/red]Nodes[red])[/red] from XML docs [red]"[/red][purple]without[/purple][red]"[/red] [url=http://perldoc.perl.org/functions/use.html][black][b]use[/b][/black][/url] [green]any[/green] XML parsers.
These are huge xml files, some of them more than [fuchsia]700[/fuchsia] Mb.
Example file might looks like
[red][[/red]code[red]][/red]
<Root>
<Node [fuchsia]1[/fuchsia]>
x
x
x
<[red]/[/red][purple]Node 1>[/purple]
[purple]<Node 2>[/purple]
[purple]y[/purple]
[purple]y[/purple]
[purple]y[/purple]
[purple]<[/purple][red]/[/red]Node [fuchsia]2[/fuchsia]>
<Node [fuchsia]1[/fuchsia]>
a
b
c
<[red]/[/red][purple]Node 1>[/purple]
[purple]<[/purple][red]/[/red]Root>
[red][[/red][red]/[/red][purple]code][/purple]
[purple]Here a logic that works well, but it's not elegant at all[/purple]
[purple][code][/purple]
[purple]#![/purple][red]/[/red]usr/bin/perl
[black][b]use[/b][/black] [green]strict[/green][red];[/red]
[url=http://perldoc.perl.org/functions/my.html][black][b]my[/b][/black][/url] [blue]$source_xml_file[/blue]= [red]qq([/red][purple]source.xml[/purple][red])[/red][red];[/red]
[black][b]my[/b][/black] [blue]$output_xml_file[/blue] = [red]qq([/red][purple]output.xml[/purple][red])[/red][red];[/red]
[black][b]my[/b][/black] [red]([/red][blue]$flag_start[/blue],[blue]$intext[/blue][red])[/red][red];[/red]
[url=http://perldoc.perl.org/functions/open.html][black][b]open[/b][/black][/url][red]([/red]DATA,[blue]$source_xml_file[/blue][red])[/red] or [url=http://perldoc.perl.org/functions/die.html][black][b]die[/b][/black][/url] [red]"[/red][purple]Failed opening [blue]$source_xml_file[/blue][purple][b]\n[/b][/purple][/purple][red]"[/red][red];[/red]
[black][b]open[/b][/black] XMLFILE, [red]"[/red][purple]>[blue]$output_xml_file[/blue][/purple][red]"[/red] or [black][b]die[/b][/black] [red]"[/red][purple]Can't open [blue]$output_xml_file[/blue]: [blue]$![/blue][purple][b]\n[/b][/purple][/purple][red]"[/red][red];[/red]
[blue]$flag_start[/blue]=[fuchsia]1[/fuchsia][red];[/red]
[olive][b]while[/b][/olive] [red]([/red][blue]$intext[/blue] = <DATA>[red])[/red] [red]{[/red]
[url=http://perldoc.perl.org/functions/chomp.html][black][b]chomp[/b][/black][/url] [blue]$intext[/blue][red];[/red]
[olive][b]if[/b][/olive] [red]([/red][blue]$intext[/blue] =~[red]/[/red][purple][purple][b]\<[/b][/purple]gn[purple][b]\:[/b][/purple]GsmRelation[/purple][red]/[/red][red])[/red] [red]{[/red] [gray][i]# Start Node[/i][/gray]
[blue]$flag_start[/blue]=[fuchsia]0[/fuchsia][red];[/red]
[red]}[/red]
[olive][b]if[/b][/olive] [red]([/red][blue]$intext[/blue] =~[red]/[/red][purple][purple][b]\<[/b][/purple][purple][b]\/[/b][/purple]gn[purple][b]\:[/b][/purple]GsmRelation[purple][b]\>[/b][/purple][/purple][red]/[/red][red])[/red] [red]{[/red] [gray][i]# End Node[/i][/gray]
[blue]$flag_start[/blue]=[fuchsia]1[/fuchsia][red];[/red]
[red]}[/red]
[url=http://perldoc.perl.org/functions/print.html][black][b]print[/b][/black][/url] XMLFILE [red]"[/red][purple][blue]$intext[/blue][purple][b]\n[/b][/purple][/purple][red]"[/red] [olive][b]if[/b][/olive] [red]([/red][blue]$flag_start[/blue]==[fuchsia]1[/fuchsia] && [blue]$intext[/blue] !~[red]/[/red][purple][purple][b]\<[/b][/purple][purple][b]\/[/b][/purple]gn[purple][b]\:[/b][/purple]GsmRelation[purple][b]\>[/b][/purple][/purple][red]/[/red][red])[/red][red];[/red]
[red]}[/red]
[url=http://perldoc.perl.org/functions/close.html][black][b]close[/b][/black][/url] XMLFILE[red];[/red]
[black][b]close[/b][/black] DATA[red];[/red]
[red][[/red][red]/[/red][purple]code][/purple]
[purple]Data Example[/purple]
[purple][code][/purple]
[purple]__DATA__[/purple]
[purple]<es:pOffset1Fach>0<[/purple][red]/[/red][maroon]es[/maroon][maroon]:[/maroon]pOffset1Fach>
<[maroon]es[/maroon][maroon]:[/maroon]pOffset3Fach>[fuchsia]0[/fuchsia]<[red]/[/red][purple]es:pOffset3Fach>[/purple]
[purple]<es:administrativeState>1<[/purple][red]/[/red][maroon]es[/maroon][maroon]:[/maroon]administrativeState>
<[red]/[/red][purple]es:vsDataFach>[/purple]
[purple]<[/purple][red]/[/red][maroon]xn[/maroon][maroon]:[/maroon]attributes>
<[red]/[/red][purple]xn:VsDataContainer>[/purple]
[purple]<gn:GsmRelation id="BQ04106A" modifier = "create"><gn:attributes>[/purple]
[purple]<gn:adjacentCell>SubNetwork=ONRM_ROOT_MO_R,vsDataExternalGsmCell=BQ04106A<[/purple][red]/[/red][maroon]gn[/maroon][maroon]:[/maroon]adjacentCell>
<[red]/[/red][purple]gn:attributes>[/purple]
[purple]<xn:VsDataContainer id="BQ04106A" modifier = "create">[/purple]
[purple]<xn:attributes>[/purple]
[purple]<xn:vsDataType>vsDataGsmRelation<[/purple][red]/[/red][maroon]xn[/maroon][maroon]:[/maroon]vsDataType>
<[maroon]xn[/maroon][maroon]:[/maroon]vsDataFormatVersion>EricssonSpecificAttributes[fuchsia].6.2[/fuchsia]<[red]/[/red][purple]xn:vsDataFormatVersion>[/purple]
[purple]<es:vsDataGsmRelation>[/purple]
[purple]<es:qOffset1sn>8<[/purple][red]/[/red][maroon]es[/maroon][maroon]:[/maroon]qOffset1sn>
<[maroon]es[/maroon][maroon]:[/maroon]mobilityRelationType>[fuchsia]0[/fuchsia]<[red]/[/red][purple]es:mobilityRelationType>[/purple]
[purple]<es:selectionPriority>10<[/purple][red]/[/red][maroon]es[/maroon][maroon]:[/maroon]selectionPriority>
<[red]/[/red][purple]es:vsDataGsmRelation>[/purple]
[purple]<[/purple][red]/[/red][maroon]xn[/maroon][maroon]:[/maroon]attributes>
<[red]/[/red][purple]xn:VsDataContainer>[/purple]
[purple]<[/purple][red]/[/red][maroon]gn[/maroon][maroon]:[/maroon]GsmRelation>
<[maroon]gn[/maroon][maroon]:[/maroon]GsmRelation id=[red]"[/red][purple]BQ04106C[/purple][red]"[/red] modifier = [red]"[/red][purple]create[/purple][red]"[/red]><[maroon]gn[/maroon][maroon]:[/maroon]attributes>
<[maroon]gn[/maroon][maroon]:[/maroon]adjacentCell>SubNetwork=ONRM_ROOT_MO_R,vsDataExternalGsmCell=BQ04106C<[red]/[/red][purple]gn:adjacentCell>[/purple]
[purple]<[/purple][red]/[/red][maroon]gn[/maroon][maroon]:[/maroon]attributes>
<[maroon]xn[/maroon][maroon]:[/maroon]VsDataContainer id=[red]"[/red][purple]BQ04106C[/purple][red]"[/red] modifier = [red]"[/red][purple]create[/purple][red]"[/red]>
<[maroon]xn[/maroon][maroon]:[/maroon]attributes>
<[maroon]xn[/maroon][maroon]:[/maroon]vsDataType>vsDataGsmRelation<[red]/[/red][purple]xn:vsDataType>[/purple]
[purple]<xn:vsDataFormatVersion>EricssonSpecificAttributes.6.2<[/purple][red]/[/red][maroon]xn[/maroon][maroon]:[/maroon]vsDataFormatVersion>
<[maroon]es[/maroon][maroon]:[/maroon]vsDataGsmRelation>
<[maroon]es[/maroon][maroon]:[/maroon]qOffset1sn>[fuchsia]8[/fuchsia]<[red]/[/red][purple]es:qOffset1sn>[/purple]
[purple]<es:mobilityRelationType>0<[/purple][red]/[/red][maroon]es[/maroon][maroon]:[/maroon]mobilityRelationType>
<[maroon]es[/maroon][maroon]:[/maroon]selectionPriority>[fuchsia]7[/fuchsia]<[red]/[/red][purple]es:selectionPriority>[/purple]
[purple][[/purple][red]/[/red][red]code[/red][red]][/red]
The question is Could I [black][b]use[/b][/black] [green]a[/green] generic regular expression
to find [olive][b]when[/b][/olive] a node start and [olive][b]when[/b][/olive] it ends and then take
it out from the file. I want to have as well a routine
to add a node in the file specifing the end tag from
previous node .
MY GOAL IS JUST USE A REGEX. Not XML parsers.
Pseudocode
[red][[/red]code[red]][/red]
[url=http://perldoc.perl.org/functions/sub.html][black][b]sub[/b][/black][/url] [maroon]take_node_out[/maroon][red]{[/red]
[black][b]my[/b][/black] [red]([/red][blue]$file[/blue],[blue]$start_tag[/blue],[blue]$end_tag[/blue][red])[/red]= [blue]@_[/blue][red];[/red]
[blue]$/[/blue]=[url=http://perldoc.perl.org/functions/undef.html][black][b]undef[/b][/black][/url][red];[/red]
[black][b]open[/b][/black] FH, [red]"[/red][purple]<[blue]$file[/blue][/purple][red]"[/red][red];[/red]
[blue]$xmldata[/blue]=<FH>[red];[/red]
[black][b]close[/b][/black] FH[red];[/red]
[blue]$xmldata[/blue]=~ [red]s/[/red][purple][purple][b]\$[/b][/purple]start_tag[purple][b]\n[/b][/purple](.*)[purple][b]\n[/b][/purple][purple][b]\$[/b][/purple]end_tag[/purple][red]/[/red][purple][/purple][red]/[/red][red]g[/red][red];[/red] [gray][i]# REGEX TO CREATE[/i][/gray]
[black][b]open[/b][/black] FH, [red]"[/red][purple]>[blue]$file[/blue][/purple][red]"[/red][red];[/red]
[black][b]print[/b][/black] FH [red]"[/red][purple][blue]$xmldata[/blue][/purple][red]"[/red][red];[/red]
[black][b]close[/b][/black] FH[red];[/red]
[red]}[/red]
[black][b]sub[/b][/black] [maroon]insert_node[/maroon] [red]{[/red]
[black][b]my[/b][/black] [red]([/red][blue]$file[/blue],[blue]$end_tag_fromothernode[/blue],[blue]$nodetoadd[/blue][red])[/red]= [blue]@_[/blue][red];[/red]
[blue]$/[/blue]=[black][b]undef[/b][/black][red];[/red]
[black][b]open[/b][/black] FH, [red]"[/red][purple]<[blue]$file[/blue][/purple][red]"[/red][red];[/red]
[blue]$xmldata[/blue]=<FH>[red];[/red]
[black][b]close[/b][/black] FH[red];[/red]
[blue]$xmldata[/blue]=~ [red]s/[/red][purple][purple][b]\$[/b][/purple]end_tag_fromothernode[purple][b]\$[/b][/purple]end_tag_fromothernode[purple][b]\.[/b][/purple][/purple][red]/[/red][purple][blue]$nodetoadd[/blue][/purple][red]/[/red][red]g[/red][red];[/red] [gray][i]# REGEX TO CREATE[/i][/gray]
[black][b]open[/b][/black] FH, [red]"[/red][purple]>[blue]$file[/blue][/purple][red]"[/red][red];[/red]
[black][b]print[/b][/black] FH [red]"[/red][purple][blue]$xmldata[/blue][/purple][red]"[/red][red];[/red]
[black][b]close[/b][/black] FH[red];[/red]
[red]}[/red]
[red][[/red][red]/[/red][purple]code][/purple]
Pragmas (perl 5.10.0) used :
[ul]
[li]strict - Perl pragma to restrict unsafe constructs[/li]
[/ul]
Other Modules used :
[ul]
[li]a[/li]
[li]any[/li]
[/ul]
[/tt]
dmazzini
GSM/UMTS System and Telecomm Consultant