What is a resume/CV parser?
A resume/CV parser is used within human resource software and on recruitment websites, job boards and portals to simplify and accelerate the application process. It does so by extracting and classifying thousands of attributes about the candidate and providing a foundation for the semantic searching of candidate data. The parser identifies hundreds of different kinds of information within a resume or CV and clearly tags each data point (for example: first name, last name, street address, city, educational degrees, employers, skills, etc.). The results may be outputted in HR-XML or JSON format. Sovren has been building this technology for over two decades.
Aren’t all parsers the same?
No, Sovren is the fastest, most accurate, and most configurable parser available anywhere.
How long does it take to parse a resume?
Median transaction time on our SaaS Service is about 500 milliseconds, to which you would add your network latency. However, by default our SaaS API allows you to batch process up to 10 concurrent requests which gives you a throughput of approximately 20 resume parses per second. THERE IS NO THROTTLING OR LIMIT ON AD-HOC (NON-BATCH, i.e., UNCONTROLLABLE) CONCURRENT REQUESTS.
What resume formats can the Sovren resume parsing SaaS process?
Essentially any non-image resume and CV format, including all of the popular job board formats and social and professional networks.
Does the Sovren resume parsing SaaS store my resumes?
No, the Sovren resume parsing SaaS does not store any resumes. All parsing is done in-memory so there is never any data written to a file system or database.
Can I add my own skills to the parser?
Yes, you can use the built-in skills list, or customize a skills list with IDs that correspond directly to your system (see our Customizing Skills documentation for more information).
Can I normalize data points?
Yes, you can normalize company names, position titles, school names and degree types (see our Customizing Normalization documentation for more information).
What can the Sovren parser output?
The parser outputs either JSON or XML following the HROpenStandards.org Resume schema. In addition, the original resume can be converted into other formats such ase TEXT, HTML, RTF, PDF (see API documentation for more information).
Do I have to tell the parser what language a resume or CV is in?
No, the parser has auto language detection so you will never have to tell the parser what language a CV is written in.
Does the Sovren parser add any intelligence to the parsed resume?
Yes, the parser constructs a summary of who the candidate is today, a management summary, average time at each employer, etc. (more info)
How do I know how many credits I have available in my SaaS account?
You can check your remaining SaaS credit balance by looking at the CreditsRemaining field after each API transaction.
Getting Started FAQs
I just received my SaaS account credentials. How do I get started?
How do I accurately test resume parsing?
To accurately test resume parsing, please follow these rules (read Best Practices to Test Resume Parsing Software for a more in-depth look at each of these rules):
- IMPORTANT: Do not use resumes with fake data such as Company1; Anytown, USA; 713-555-5555. The parser very smartly ignores all such fake data.
- Never accept vendor-supplied resumes.
- Test between 30-50 resumes per language or locale.
- Don't test resumes from just one source or industry or job type.
- Evaluate results by hand, comparing to the actual resume.
- Don't just test for accuracy, test for completeness.
- Test scalability and robustness.
- Test with multiple configurations.
I am getting unexpected results. What should I check for?
See how do I accurately test resume parsing above and make sure you are following each of the rules for accurately testing resume parsing.
Look at the ConvertedText Resumes that may seem very clear when looking at the original file may yield unexpected results. In many cases, this is due to the conversion from the original file format to plain text format. The most common issues are:
- PDF documents
- MS Word Templates
- Images (e.g., scanned resumes)
If you are parsing a PDF document and see jumbled text, it may be corrupted. To verify this, open the file with Adobe Reader, then choose File => Save As Other => (.txt). Look at that extracted text and if it appears jumbled, then the PDF is corrupted and there is nothing that can be done with this file (see Problems With PDF Format).
Check the Parser Configuration: The parser has many settings that may be turned on or off, all of which affect the parsing results. For example, some sections such as Training are not parsed by default but you can choose to parse for this by setting the ParseTraining flag to true.
If you are parsing a foreign language resume (e.g. Chinese) and seeing values in your saved HR-XML such as "????", this means the file was parsed correctly but you have handled the reading/saving of the response incorrectly. The problem is in how you (or your SOAP library) are reading or processing the HTTP response content. The response is UTF-8 encoded, but you are reading it (or transforming it or saving it) with an ASCII or ANSI encoding somewhere along the way.
Read through Sovren's Tips for Electronic Resumes for more helpful tips.
I am still getting unexpected results. What should I do?
Email firstname.lastname@example.org and be sure to include:
- The original resume.
- The XML or JSON output from the Parser.
- A detailed explanation that specifically describes the problem.