Resume & Job Parser
What is a resume/CV parser?
A resume/CV parser is used within human resource software and on recruitment websites, job boards and portals to simplify and accelerate the application process. It does so by extracting and classifying thousands of attributes about the candidate and providing a foundation for the semantic searching of candidate data. The parser identifies hundreds of different kinds of information within a resume or CV and clearly tags each data point (for example: first name, last name, street address, city, educational degrees, employers, skills, etc.). The results are output in JSON format. Sovren has been building this technology for over two decades.
Aren’t all parsers the same?
No, Sovren is the fastest, most accurate, and most configurable parser available anywhere.
How long does it take to parse a resume?
Median transaction time on our SaaS Service is about 500 milliseconds, to which you would add your network latency. However, by default our SaaS API allows you to batch process concurrent requests which gives you a higher throughput. Learn more about batch parsing here.
What resume formats can the Sovren resume parsing SaaS process?
Essentially any non-image resume and CV format, including all of the popular job board formats and social and professional networks.
Does the Sovren resume parsing SaaS store my resumes?
No, the Sovren resume parsing SaaS does not store any resumes. All parsing is done in-memory so there is never any data written to a file system or database.
What can the Sovren parser output?
The parser outputs JSON. In addition, the original resume can be converted into other formats such ase TEXT, HTML, RTF, PDF (see API documentation for more information).
Do I have to tell the parser what language a resume or CV is in?
No, the parser has auto language detection so you will never have to tell the parser what language a CV is written in.
Does the Sovren parser add any intelligence to the parsed resume?
Yes, the parser constructs a summary of who the candidate is today, a management summary, average time at each employer, etc. (more info)
How is PII protected?
Parsed documents that have been sent to the API to be indexed are scrubbed of ALL personally identifiable information (PII) and then stored in the specified index. This security measure also assists in removing bias from the recruiting process. If you want to simply search for a candidate by name, use your own database.
How to index data not found on the resume?
By default, only data found on the resume is indexed in AI Matching. However, there are many use-cases where one would want to filter on additional data that's only available in their internal database. That information can be indexed using Custom Value Ids.
Account Management FAQs
How do I know how many credits I have available in my SaaS account?
You can check your remaining SaaS credit balance by looking at the CreditsRemaining field returned with each API transaction. Also, you can track usage for your account in the Sovren Portal.
How do I enable or disable AI Matching?
The Accelerator Program for AI Matching provides an incredible one-time introductory discount on credits to allow you to test the Sovren AI Matching Engine. As described in the signup, we provided credits to parse and index 5,000 documents. Turning off AI Matching during the Accelerator program is not allowed, since testing AI Matching was the purpose of the program. When you decide to purchase additional credits, you will be given the option to disable AI Matching. Until then, we encourage you to take full advantage of everything AI Matching has to offer! Read our documentation to learn more about AI Matching.
If you aren't on the Accelerator Program, you can reach out to firstname.lastname@example.org to enable AI Matching for your account. With AI Matching enabled on your account, the cost of each parse is 2 credits. Searching, matching, and index api calls don't cost per transaction, but are subject to the AUP. More information on transaction costs can be found here.
Resume & Job Parser
How do I process batch transactions?
Routine parsing of batches of less than 100,000 documents MUST always be done serially (one-at-a-time). To read more about batches and concurrency, see here.
Document Last Modified Date (formerly Revision Date)
Document Last Modified Dates are extremely important for accuracy, and required for every resume parsing transaction. DO NOT MESS THIS UP OR ALL PROCESSING WILL BE WRONG AND WASTED $. For more information, go here.
What is the default max number of results returned?
By default, AI Matching returns the top 50 matches in the API response. You can set this to up to 100 documents in the API request by setting the take parameter.
Why am I prompted to login?
When creating a Matching UI session, you should include each user's username as part of the UIOptions. If done correctly, they will never be asked for credentials! This approach is strongly advised!!! You should not share or reuse sessions and they should be loaded immediately after they are generated. If you choose to use a process where you don't pass the username as part of the UIOptions, users will be asked to login each time they use the Matching UI. They can use the "Forgot Password" option on the portal login screen (portal.sovren.com) to set up a password. You can also set a password for them from the User Management screen if your process calls for it.
How many documents does Matching UI show?
The Matching UI results are limited to 50 documents. No human looks at more than 50 documents for a single search/match, so there is no reason to include excess data on the screen. If they don't quickly find what they are looking in the result set, they should tweak the query to better align with what they are looking for.
Getting Started FAQs
I just received my SaaS account credentials. How do I get started?
How to process a backlog?
When you have a backlog of documents that you need to parse and optionally index, you should utilize Sovren's Batch Utility. This tool points to a directory of documents on your file system and guides you though the settings to parse and optionally index those documents. If you choose to build this yourself, refer to processing batch transactions for details on concurrency.
How do I accurately test resume parsing?
To accurately test resume parsing, please follow these rules : Best Practices to Test Resume Parsing Software.
I am getting unexpected results. What should I check for?
See Best Practices to Test Resume Parsing Software and make sure you are following each of the rules for accurately testing resume parsing. Also review documents you shouldn't parse to get a better understanding of situations that we can predict to not give satisfactory results.
Look at the ConvertedText. Resumes that may seem very clear when looking at the original file may yield unexpected results. In many cases, this is due to the conversion from the original file format to plain text format. The most common issues are:
- PDF documents
- MS Word Templates
- Images (e.g., scanned resumes)
If you are parsing a PDF document, it may be corrupted. To check this, open the file with Adobe Reader, then choose File => Save As Text. Look at that extracted text and if it appears jumbled, then the PDF is corrupted and there is nothing that can be done with this file (see Problems With PDF Format).
If you are parsing a foreign language resume (e.g. Chinese) and seeing values in your saved JSON such as "????", this means the file was parsed correctly but you have handled the reading/saving of the response incorrectly. The problem is in how you (or your REST/HTTP library) are reading or processing the HTTP response content. The response is UTF-8 encoded, but you are reading it (or transforming it or saving it) with an ASCII or ANSI encoding somewhere along the way.
Read through Sovren's Tips for Electronic Resumes for more helpful tips.
I am still getting unexpected results. What should I do?
Email email@example.com and be sure to include:
- The original resume.
- The JSON output from the Parser.
- A detailed explanation that specifically describes the problem.