Challenges Converting PDF Files
If you are seeing poor parsing results from a PDF, that problem is almost certainly caused by the PDF being a file that looks great, but internally is corrupted.
There is a simple way to find out if a PDF is corrupt. Open the file using the free Adobe Acrobat Reader software, choose File -> Save as Text and save the file. Then open it using a text editor such as Notepad or UltraEdit. You will almost certainly see the horror jumping out at you.
Here is an example that looks great, but internally is corrupt:
You can see below that the text is a mess with one word per line and some characters replaced by numbers:
Summary . 6+ years experience in fast-paced agile environments managing mul9ple projects . Strong communica9on skills with ability to ar9culate, [OMITTING THE REMAINDER]
This problem is NOT fixable. It is not caused by Sovren software. It is caused by a PDF that looks great but internally is corrupt. To read more on why/how PDFs can be corrupt, read the below articles: