Full-text indexing on PDF documents

VitoMark · Aug 19, 2005

Any idea if I can use Full-text indexing from SQL-server 2000 on imported PDF-documents?
TRhe documents are imported with adodb.stream.

I suppose it's impossible...
Are there any workarounds?

mrdenny · Aug 19, 2005

You should be able to. I know that you can you full text indexing on Word and Excel docs. I would assume that you can also index PDFs. You'll probably need to install Acrobat Reader on the SQL Server for it to work. It all depends on if SQL will pick up the Adobe software and use it to read the binary data.

Here is an article on going it with Word Docs.

http://www.databasejournal.com/features/mssql/article.php/3486331

Your best bet will probably be to try it out. Be sure to let us know how it goes.

Denny
MCSA (2003) / MCDBA (SQL 2000)

--Anything is possible. All it takes is a little research. (Me)

[noevil]

http://www.mrdenny.com

(Not quite so old any more.)

VitoMark · Aug 23, 2005

Hi,
Now I'm able to index Office documents and PDF documents.
This is what I did:
- Applied SP3 for SQL Server 2000
- ADOBE Acrobat reader installed on SQL Server
- ADOBE filter installed (cfr.

http://www.adobe.com/support/downloads/thankyou.jsp?ftpID=2611&fileID=2457);

you have to put PDFFILT.DLL in f.i. C:\WINNT

I did some extra performance tests by converting de PDF-documents to a TXT-file (using FiltDump.EXE, available on MSDK (cfr.

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/indexsrv/html/ixufilt_55yd.asp)

There is (as expected) a big difference in time needed to build the full catalog between de TXT-version (uploaded in a TEXT-datatype column) and the PDF-version (uploaded in a IMAGE-datatype column); in my tests, based on a sample of 50000 PDF-docs building the full catalog for de IMAGE-table was 30 times slower.
Differences in Selects (CONTAINS, CONTAINSTABLE) were minimal.

mrdenny · Aug 24, 2005

The slow down in building the index is understandable. You are opening each of the PDFs in adobe and reading through them instead of just reading text in a field.

However it is good to know that it's working. Congrats.

Denny
MCSA (2003) / MCDBA (SQL 2000)

--Anything is possible. All it takes is a little research. (Me)

[noevil]

http://www.mrdenny.com

(Not quite so old any more.)

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Full-text indexing on PDF documents

VitoMark

Programmer

mrdenny

Programmer

VitoMark

Programmer

mrdenny

Programmer

Similar threads

Part and Inventory Search

Sponsor