Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Full-text indexing on PDF documents

Status
Not open for further replies.

VitoMark

Programmer
Sep 25, 2001
20
BE
Any idea if I can use Full-text indexing from SQL-server 2000 on imported PDF-documents?
TRhe documents are imported with adodb.stream.

I suppose it's impossible...
Are there any workarounds?
 
You should be able to. I know that you can you full text indexing on Word and Excel docs. I would assume that you can also index PDFs. You'll probably need to install Acrobat Reader on the SQL Server for it to work. It all depends on if SQL will pick up the Adobe software and use it to read the binary data.

Here is an article on going it with Word Docs.
Your best bet will probably be to try it out. Be sure to let us know how it goes.

Denny
MCSA (2003) / MCDBA (SQL 2000)

--Anything is possible. All it takes is a little research. (Me)

[noevil]
(Not quite so old any more.)
 
Hi,
Now I'm able to index Office documents and PDF documents.
This is what I did:
- Applied SP3 for SQL Server 2000
- ADOBE Acrobat reader installed on SQL Server
- ADOBE filter installed (cfr. you have to put PDFFILT.DLL in f.i. C:\WINNT

I did some extra performance tests by converting de PDF-documents to a TXT-file (using FiltDump.EXE, available on MSDK (cfr.
There is (as expected) a big difference in time needed to build the full catalog between de TXT-version (uploaded in a TEXT-datatype column) and the PDF-version (uploaded in a IMAGE-datatype column); in my tests, based on a sample of 50000 PDF-docs building the full catalog for de IMAGE-table was 30 times slower.
Differences in Selects (CONTAINS, CONTAINSTABLE) were minimal.
 
The slow down in building the index is understandable. You are opening each of the PDFs in adobe and reading through them instead of just reading text in a field.

However it is good to know that it's working. Congrats.

Denny
MCSA (2003) / MCDBA (SQL 2000)

--Anything is possible. All it takes is a little research. (Me)

[noevil]
(Not quite so old any more.)
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top