Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Regex of more than 1 character not recognised

Status
Not open for further replies.

ZaccT

Programmer
Sep 7, 2022
1
US
Hello,
I am trying to filter a text file of browsing history generated using chromehistoryview. A sample of the format:

==================================================
URL : Title : Ultra Light Tents, Tarps, Bivys, Packs & Gear | Mountain Laurel Designs | Super Ultra Light Equipment for Outdoor & Wilderness
Visited On : 9/6/2022 1:29:39 AM
Visit Count : 1
Typed Count : 0
Referrer : Visit Duration :
Visit ID : 62
Profile : Default
URL Length : 34
Transition Type : Link
Transition Qualifiers: Chain Start,Chain End
History File : C:\Users\subla\AppData\Local\Google\Chrome\User Data\Default\History
==================================================

For some reason, no regexes longer than 1 character are being matched in this file, e.g., if I enter /U/ it matches no problem, but if I enter /URL/ it matches nothing at all. This seems to be an issue specific to this file, since when I cut and paste any few entries into another file, regexes behave as expected. Is there something about the text format here that would be causing this? It's very baffling. Thanks.

UPDATE: Ok, so this appears to have something to do with the text encoding. UTF 16 encodings seem to be causing the problem, but UTF 8 encodings are behaving as expected. Does anyone know why this is? Cheers.
 
On what OS are you running awk (Linux, Windows, Mac) ?
 
..btw, you could post the relevant part of the awk script, you have done so far..
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top