Wednesday, August 22, 2007

Full Text Indexing in SQL SERVER 2005 using for PDF files

Objective
Uploading PDF files to SQL Server 2005 databse and Index them using full text index.

Analysis:
Since PDF files can be very big .... the best option is to save the PDF contents in a database table column of TYPE VARBINARY(MAX)

So create a table docs that having a
docs
DocID(int)
DocContents(varbinary(max))
DocType(varchar(10))

Now the third column DocTYpe is needed to tell full text index engine what kind of file we are going to search through.

[Basically in SQL SERVER 2005, full text search engine stores the in-built seaching indexs for known types. What are the in-built file types that can be full text indexed can be fount using a view named "sys.fulltext_document_types"

So Execute "select * from sys.fulltext_document_types" in your database to see what are the in-built available type for FTS. Normally you would not find an entry ".pdf" in the result set if the above mentioned query.
]

You need to install ADOBE PDF IFILTER 6.0 that is compatible to ADOBE PDF 7.0 version.
Then you need to load the features of the ADOBE IFILTER6.0
FTS your table.
Execute the FTS query.

So
STEP 1:
1. Execute the query "select * from sys.fulltext_document_types" to check whether ".pdf" exists in "document_type" column in the resultset.
2. If ".pdf" does not exists in "document_type" column in the resultset then
3. Download the ADOBE IFitler 6.0 and installt it. It helps SQL SERVER 2007 for indexing PDF files.
4. Now we need to load the installed ADOBE IFILTER 6.0 in our SQL SERVER 2005 so that we can FTS PDF files. For that execute the following SPs sequencially.

exec sp_fulltext_service 'verify_signature', 0

exec sp_fulltext_service 'load_os_resources',1

5. Now please execute select * from sys.fulltext_document_types and make sure that an entry ".pdf " is there in "document_type" column in the resultset