Hello Everyone,
What's the best way to setup a employee knowledge base portal? I know in the past there was a DBSearch(?) library that would index all the PDF(s)/TXT files and allow you to search them without much fuss. I see goFish, is that what is typically used today to index files and allow for quick search results from within Fox Pro?
Warren
I don't have any good answers for you other than you'll need some sort of third party indexing solution to get decent search performance across text and binary content. There are a number of options out there - .NET has Lucene which can be automated from FoxPro potentially via wwDotnetBridge. Talked to somebody who'd done some work with Lucene for a project with good success. All these solutions require a bit of up front work and active updates to keep the indexes up to date so it's not just a 'drop it in' solution usually.
GoFish() is a code searching tool so that doesn't apply.
Let us know what you find...
+++ Rick ---
I found this yesterday: http://www.foxweb.com/fwFullText/
So far I have a sample of 30 PDF's text that I am searching and it seems to be pretty much what I needed. So 100% FoxPro code, pretty lightning fast to create the keyword search index.
Interesting. Briefly looking at this I can't imagine this works well with PDF files other than just matching the file, but not finding content inside of it to a specific location. This looks text based not context based. IOW you can find that something contains the searched string, but not where in the document (at least not accurately).
+++ Rick ---
Sorry, I didn't really provide enough information. Here is what I am doing:
FILETOSTR on each PDF (about 6,000 of them) and looking for text that tells me the PDF is text searchable or not. I then CreateObject WScript.Shell and create an instance of a PDF viewer (sometimes Acrobat Reader other times FoxIt) and open the text searchable PDF. I send it Ctrl-A Ctrl-C to grab the text. I add/update my table to have the directory, filename, if it's text searchable or not, timestamp of last file change, and a memo field that holds the text grabbed. I then can run the index builder code and search for keywords in the text I copied from the PDFs and show users the relevant PDFs.
I have another job that scans my table for non-searchable PDF's and process it through ABBYY FineReader Server to convert them from image based PDFs to be text searchable PDFs.
So for me I am not trying to raw read PDFs (other than looking to see if I can tell if there is text in there). Some PDFs I have are encoded and I am not sure how to decode them, hence I fire off a PDF viewer and do the Ctrl-A and Ctrl-C to grab the actual text after the PDF viewer has decoded them.
I perhaps have overly thought this process out. There maybe easier ways to achieve what I want, but I like keeping things as clean as possible. So it's 90% Visual FoxPro, 8% Web Connect, 2% PDF tools.
Well... I gave up on auto detecting if the PDF is searchable or not... I just fire off the PDF viewer and do Select All / Copy and if _CLIPTEXT is 0 or < 100(?) I know I have to send it off to Abby to OCR it.