I need to convert a PDF form on a regular bases into something I can consume the data from. I have used iSED Quick PDF in the past but now when I visit their url, I get redirected to Foxit. I have subscribed to Foxit Software for reading and editing PDF's for many years but never considered using their SDK.
Any thoughts about a good PDF to XLS (or better to JSON) converter? I'm working on a limited budget and Foxit at $3000/year seems like a lot for a single developer license especially after the application is finished.
Hi Harvey,
I have used ABBYY FineReader for a few years and the version I have (not the latest) does not have an SDK but does have a command line that can be invoked programmatically.
I use it to compare PDF's and to convert PDF's to Excel. Not full automation (UI is displayed) but it does what I need.
May be worth a look.
Carl
Harvey,
Would Python be an option? This works.
!pip install PyPDF2
import PyPDF2
# Open the PDF file
pdf_file_path = 'sample.pdf'
pdf_file = open(pdf_file_path, 'rb')
# Create a PDF reader object
pdf_reader = PyPDF2.PdfReader(pdf_file)
# Initialize an empty string to store the text
pdf_text = ''
# Loop through each page and extract text
for page_num in range(len(pdf_reader.pages)):
page = pdf_reader.pages[page_num]
pdf_text += page.extract_text()
# Close the PDF file
pdf_file.close()
# Open a text file for writing
text_file_path = 'output.txt'
with open(text_file_path, 'w', encoding='utf-8') as text_file:
# Write the extracted text to the text file
text_file.write(pdf_text)
print(f"PDF content has been successfully exported to '{text_file_path}'.")