Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Split long string into hash pairs 1

Status
Not open for further replies.

tonykent

IS-IT--Management
Jun 13, 2002
251
GB
After a long time away from perl I have found myself in a new job where I need to use it again (to manipulate lots of xml files). Hello again!

My problem: After some data extraction with XML::LibXML I am getting long strings of text returned, like this:

4541 PT1M27S 4542 PT1M51S 4543 PT2M16S 4544 PT1M47S 4545 PT0M39S 4546 PT0M50S 4547 PT0M43S 4548 PT1M2S 4549 PT0M41S 4550 PT2M44S 4551 PT2M4S 4552 PT1M5S
4553 PT1M30S 4554 PT1M21S 4555 PT2M10S 4556 PT0M23S 4557 PT1M24S 4558 PT0M35S 4559 PT1M30S 4617 PT0M58S 4561 PT2M27S PT1M0S 4562 PT1M24S 4563 PT2M9S 456
4 PT3M8S 4565 PT0M32S 4566 PT0M50S 4567 PT1M30S 4568 PT0M57S 4569 PT1M7S 4570 PT0M55S 4571 PT1M1S 6592 PT0M18S 4790 PT0M20S 4791 PT0M18S 9254 PT1M45S 479
4 PT0M14S 4795 PT0M19S 4796 PT0M18S 6185 PT0M28S 6186 PT1M20S 6187 PT1M48S 6188 PT0M52S

(that's one line).

This actually represents pairs of values like this:
4541 PT1M27S
4542 PT1M51S
4543 PT2M16S
4544 PT1M47S
4545 PT0M39S
4546 PT0M50S
4547 PT0M43S
4548 PT1M2S
4549 PT0M41S
4550 PT2M44S
4551 PT2M4S etc

and I need to split the line(s) up into such pairs and put them into a hash. I can't figure out how to do this. Can anyone point me in the right direction?

my %hash = split /[\s+]/, $lineofdata;

does not work. It gives me this:

$VAR1 = {
'PT0M28S' => '',
'PT0M50S' => '',
'4545' => 'PT0M39S',
'PT0M32S' => '',
'PT1M7S' => '',
'4564' => 'PT3M8S',
'4557' => 'PT1M24S',
'PT0M19S' => '',
'PT2M9S' => '',
'4566' => 'PT0M50S',
'PT0M58S' => '',
'4543' => 'PT2M16S',
'PT0M20S' => '',
'PT2M44S' => '',
'4541' => 'PT1M27S',
'4561' => 'PT2M27S',
'PT1M47S' => '',
'4549' => 'PT0M41S',
'PT1M45S' => '',
'4568' => 'PT0M57S',
'4794' => 'PT0M14S',
'PT1M21S' => '',
'4551' => 'PT2M4S',
'6188' => 'PT0M52S',
'PT1M2S' => '',
'4796' => 'PT0M18S',
'4553' => 'PT1M30S',
'PT1M48S' => '',
'4570' => 'PT0M55S',
'' => '6187',
'4559' => 'PT1M30S',
'PT1M51S' => '',
'PT1M1S' => '',
'6186' => 'PT1M20S',
'4555' => 'PT2M10S',
'4562' => 'PT1M24S',
'PT1M5S' => '',
'PT0M23S' => '',
'PT0M35S' => '',
'4547' => 'PT0M43S',
'6592' => 'PT0M18S',
'PT1M30S' => '',
'4791' => 'PT0M18S'
};

Any pointers will be appreciated, thanks, Tony

 
Hi

Are you sure the data is correct ? The only error I get is
Perl said:
Odd number of elements in hash assignment at tonykent.pl line 10, <DATA> line 1.
Which is right, as there are 2 consecutive values without key :
Tony said:
[pre]4541 PT1M27S 4542 PT1M51S 4543 PT2M16S 4544 PT1M47S 4545 PT0M39S 4546 PT0M50S 4547 PT0M43S 4548 PT1M2S 4549 PT0M41S 4550 PT2M44S 4551 PT2M4S 4552 PT1M5S
4553 PT1M30S 4554 PT1M21S 4555 PT2M10S 4556 PT0M23S 4557 PT1M24S 4558 PT0M35S 4559 PT1M30S 4617 PT0M58S 4561 [highlight]PT2M27S[/highlight][highlight red] [/highlight][highlight]PT1M0S[/highlight] 4562 PT1M24S 4563 PT2M9S 456
4 PT3M8S 4565 PT0M32S 4566 PT0M50S 4567 PT1M30S 4568 PT0M57S 4569 PT1M7S 4570 PT0M55S 4571 PT1M1S 6592 PT0M18S 4790 PT0M20S 4791 PT0M18S 9254 PT1M45S 479
4 PT0M14S 4795 PT0M19S 4796 PT0M18S 6185 PT0M28S 6186 PT1M20S 6187 PT1M48S 6188 PT0M52S[/pre]

After inserting a dummy key there, your code works fine :
Perl:
[b]use[/b] strict[teal];[/teal]
[b]use[/b] warnings[teal];[/teal]
[b]use[/b] Dumpvalue[teal];[/teal]

[b]my[/b] [navy]$lineofdata[/navy] [teal]=[/teal] [i][green]<DATA>[/green][/i][teal];[/teal]

[b]my[/b] [navy]%hash[/navy] [teal]=[/teal] [b]split[/b] [i][green]/[\s+]/[/green][/i][teal],[/teal] [navy]$lineofdata[/navy][teal];[/teal]

Dumpvalue[teal]->[/teal]new[i][/i][teal]->[/teal][COLOR=orange]dumpValue[/color][teal](\[/teal][navy]%hash[/navy][teal]);[/teal]

__DATA__
4541 PT1M27S 4542 PT1M51S 4543 PT2M16S 4544 PT1M47S 4545 PT0M39S 4546 PT0M50S 4547 PT0M43S 4548 PT1M2S 4549 PT0M41S 4550 PT2M44S 4551 PT2M4S 4552 PT1M5S 4553 PT1M30S 4554 PT1M21S 4555 PT2M10S 4556 PT0M23S 4557 PT1M24S 4558 PT0M35S 4559 PT1M30S 4617 PT0M58S 4561 [highlight]PT2M27S[/highlight] [highlight lime]2019[/highlight] [highlight]PT1M0S[/highlight] 4562 PT1M24S 4563 PT2M9S 4564 PT3M8S 4565 PT0M32S 4566 PT0M50S 4567 PT1M30S 4568 PT0M57S 4569 PT1M7S 4570 PT0M55S 4571 PT1M1S 6592 PT0M18S 4790 PT0M20S 4791 PT0M18S 9254 PT1M45S 4794 PT0M14S 4795 PT0M19S 4796 PT0M18S 6185 PT0M28S 6186 PT1M20S 6187 PT1M48S 6188 PT0M52S
Code:
2019 => 'PT1M0S'
4541 => 'PT1M27S'
4542 => 'PT1M51S'
4543 => 'PT2M16S'
4544 => 'PT1M47S'
4545 => 'PT0M39S'
4546 => 'PT0M50S'
4547 => 'PT0M43S'
4548 => 'PT1M2S'
4549 => 'PT0M41S'
4550 => 'PT2M44S'
4551 => 'PT2M4S'
4552 => 'PT1M5S'
4553 => 'PT1M30S'
4554 => 'PT1M21S'
4555 => 'PT2M10S'
4556 => 'PT0M23S'
4557 => 'PT1M24S'
4558 => 'PT0M35S'
4559 => 'PT1M30S'
4561 => 'PT2M27S'
4562 => 'PT1M24S'
4563 => 'PT2M9S'
4564 => 'PT3M8S'
4565 => 'PT0M32S'
4566 => 'PT0M50S'
4567 => 'PT1M30S'
4568 => 'PT0M57S'
4569 => 'PT1M7S'
4570 => 'PT0M55S'
4571 => 'PT1M1S'
4617 => 'PT0M58S'
4790 => 'PT0M20S'
4791 => 'PT0M18S'
4794 => 'PT0M14S'
4795 => 'PT0M19S'
4796 => 'PT0M18S'
6185 => 'PT0M28S'
6186 => 'PT1M20S'
6187 => 'PT1M48S'
6188 => 'PT0M52S'
6592 => 'PT0M18S'
9254 => 'PT1M45S'

Feherke.
feherke.github.io
 
Hi Ferherke, good to see that you are still around! Thanks for pointing that out. I've got thousands of these lines, to the point that I've just about gone word blind and I had not spotted the missing keys (despite the warning). I will take a look at the original xml and see what is going on. Thanks again for taking the trouble to help out.
 
It transpires that a very few of the keys have 2 associated time values. Among the thousands of lines I had not noticed this.
 
Hi

Do I understand you corrently and that entry is actually correct and should be listed like this ?
[pre]4561 => 'PT2M27S PT1M0S'[/pre]

If so, then using this ugliness instead of [tt]split[/tt] solves it :
Perl:
[b]my[/b] [navy]%hash[/navy] [teal]=[/teal] [navy]$lineofdata[/navy] [teal]=~[/teal] [b]m[/b][fuchsia]/(\d+)\s+(\w*[a-zA-Z]+\w*(?:\s+\w*[a-zA-Z]+\w*)*)\b/[/fuchsia][b]g[/b][teal];[/teal]

I should already mention earlier, but if you get that massive concatenated text, then it looks like the XML parsing should be fixed instead.


Feherke.
feherke.github.io
 
Thanks again feherke, that works superbly. I will look to see if I can get the data out in a better format.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top