FlashMerlot
Programmer
Need advice on the "best-approach" to accomplish this ...
I'm using TEXT var types in a MS SQL 8.0 database. My TEXT fields can contain (literally) up to 2 gb each. I need to find every instance of "Hello World" appearing anyplace in the 2gb text field.
Finding which particular records, happen to contain a TEXT field which happens contains "Hello World" SOMEPLACE ... is actually rather easy. Full-Text search works great to find the record.
HOWEVER ... now let's assume a record is found ... How to I locate EACH INDIVIDUAL INSTANCES of the words "Hello World" burried SOMEPLACE in a 2 gigabyte text bucket? Why? Because I want to return 500charsBEFORE+"Hello World"+AFTERchars500. e.g. not just the words "Hello World" but the CONTEXT-IN-WHICH-THE-WORDS-APPEAR for each instance.
Obviously, using a solution based in the Client is not feasible, since such solution would require over-the-wire-transmissions of theoretically dozens-and-dozens of gigabytes from the server back to the client for additional sub-processing. A solution that works at-the-server is the only reasonable approach? Right?
So ... using TSQL, how do I "walk" a 2 gb text field? Extracting 500+HelloWorld+500 ....
Remember, PATINDEX does not offer a "Start-Looking-Here-Offset" location.
Any advice greatly appreciated!
I'm using TEXT var types in a MS SQL 8.0 database. My TEXT fields can contain (literally) up to 2 gb each. I need to find every instance of "Hello World" appearing anyplace in the 2gb text field.
Finding which particular records, happen to contain a TEXT field which happens contains "Hello World" SOMEPLACE ... is actually rather easy. Full-Text search works great to find the record.
HOWEVER ... now let's assume a record is found ... How to I locate EACH INDIVIDUAL INSTANCES of the words "Hello World" burried SOMEPLACE in a 2 gigabyte text bucket? Why? Because I want to return 500charsBEFORE+"Hello World"+AFTERchars500. e.g. not just the words "Hello World" but the CONTEXT-IN-WHICH-THE-WORDS-APPEAR for each instance.
Obviously, using a solution based in the Client is not feasible, since such solution would require over-the-wire-transmissions of theoretically dozens-and-dozens of gigabytes from the server back to the client for additional sub-processing. A solution that works at-the-server is the only reasonable approach? Right?
So ... using TSQL, how do I "walk" a 2 gb text field? Extracting 500+HelloWorld+500 ....
Remember, PATINDEX does not offer a "Start-Looking-Here-Offset" location.
Any advice greatly appreciated!