Hi all
After a bit of help again please
Every day I have to go through a 3-400 Mb XML formatted file correcting errors. Some of these can be done using standard find/replace however
I have a set of issues where I guess Regex is better placed to help as I need to identify and replace /'s which of course is also on every close tag.......
problem is I'm very new to regex and need some pointers/advice please
Being XMl it contains the start/end tags which also contain the / character I need to remove however some of the fields will only accept A-Z0-9 as valid input
i.e. <Ref>1234/567890</Ref> the other block is that the numbers are random....
I need to first identify the tags which contain this error which Ive been able to do using a find for "<Ref>[0-9]{4}/" and I then manually correct the
data - fine if you find 3 or 4..but finding 100 + its a bit of a b****r.....
knowing that's its always inside the <Ref> tags what I need too be able to do is to Remove the in-between tag /'s leaving just the numbers
i.e <Ref>1234567890</Ref>
Can anyone help/provide any pointers of how or indeed if this can be done..?
Many thanks (Apologies if this is in the wrong forum but this Perl forum provided a good deal of Regex help....)
PaulSc
After a bit of help again please
Every day I have to go through a 3-400 Mb XML formatted file correcting errors. Some of these can be done using standard find/replace however
I have a set of issues where I guess Regex is better placed to help as I need to identify and replace /'s which of course is also on every close tag.......
problem is I'm very new to regex and need some pointers/advice please
Being XMl it contains the start/end tags which also contain the / character I need to remove however some of the fields will only accept A-Z0-9 as valid input
i.e. <Ref>1234/567890</Ref> the other block is that the numbers are random....
I need to first identify the tags which contain this error which Ive been able to do using a find for "<Ref>[0-9]{4}/" and I then manually correct the
data - fine if you find 3 or 4..but finding 100 + its a bit of a b****r.....
knowing that's its always inside the <Ref> tags what I need too be able to do is to Remove the in-between tag /'s leaving just the numbers
i.e <Ref>1234567890</Ref>
Can anyone help/provide any pointers of how or indeed if this can be done..?
Many thanks (Apologies if this is in the wrong forum but this Perl forum provided a good deal of Regex help....)
PaulSc