Regex of more than 1 character not recognised

ZaccT · Sep 7, 2022

Hello,
I am trying to filter a text file of browsing history generated using chromehistoryview. A sample of the format:

==================================================
URL :

https://mountainlaureldesigns.com/

Title : Ultra Light Tents, Tarps, Bivys, Packs & Gear | Mountain Laurel Designs | Super Ultra Light Equipment for Outdoor & Wilderness
Visited On : 9/6/2022 1:29:39 AM
Visit Count : 1
Typed Count : 0
Referrer :

https://www.google.com/search?q=mou...69i57j69i65.4419j0j7&sourceid=chrome&ie=UTF-8

Visit Duration :
Visit ID : 62
Profile : Default
URL Length : 34
Transition Type : Link
Transition Qualifiers: Chain Start,Chain End
History File : C:\Users\subla\AppData\Local\Google\Chrome\User Data\Default\History
==================================================

For some reason, no regexes longer than 1 character are being matched in this file, e.g., if I enter /U/ it matches no problem, but if I enter /URL/ it matches nothing at all. This seems to be an issue specific to this file, since when I cut and paste any few entries into another file, regexes behave as expected. Is there something about the text format here that would be causing this? It's very baffling. Thanks.

UPDATE: Ok, so this appears to have something to do with the text encoding. UTF 16 encodings seem to be causing the problem, but UTF 8 encodings are behaving as expected. Does anyone know why this is? Cheers.

mikrom · Sep 7, 2022

On what OS are you running awk (Linux, Windows, Mac) ?

mikrom · Sep 7, 2022

..btw, you could post the relevant part of the awk script, you have done so far..

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Regex of more than 1 character not recognised

ZaccT

Programmer

mikrom

Programmer

mikrom

Programmer

Similar threads

Part and Inventory Search

Sponsor