Documentation Release Notes Downloads FAQs

Version 9.1.2

Improvements

Resume Parser

7% faster.

Parses all LinkedIn past and present versions extremely accurately.

Better Swedish date parsing.

More accurate employment parsing.

More accurate education parsing.

Improved resume sectioning.

Document Converter

Better PDF conversions.

Version 9.1.1

Improvements

All SaaS Services

Better error messages for invalid requests.

Resume Parser

Fixed management level output for resumes with no current employment.

AI Matching

Improve Bimetric Scoring in cases where no second-best taxonomy is found.

Better comparison algorithm for job titles that contain prepositions.

Improved languages matching algorithm.

Version 9.1

Improvements

Resume Parser

Greatly improved parsing of gradepoint averages in Education.

Greatly reduced the number of spurious trailing work history jobs or educational schools.

Thousands of improvements to internal data lists.

Vastly improved LinkedIn parsing. We are now able to capture the hidden LinkedIn urls, and ignore the broken partial LinkedIn urls.

Degrees which are just certifications and not intended to be high school-or-higher degrees are now not output in Education, but rather, are output in Certifications.

Better parsing of school names. Fewer school names with City names hanging on the end (sometimes they need to be left that way; other times they need to be stripped – we do both better now).

Better parsing of Russian, Italian, and Norwegian schools and degrees.

Far more accurate nesting of PositionHistory nodes within EmployerOrg nodes: specifically, far fewer wrongful nesting events, and a few more correct nesting events.

We restored and improved the parsing accuracy for BOTH past and present LinkedIn resumes in all known formats.

Improved Company Name and Position Title accuracy by several percentage points. Improved the ability to distinguish between ambiguous elements.

Document Converter

Better removal of page numbers.

Vastly improved LinkedIn conversions. Conversion to single column format now happens in correct order. Page markers are properly removed. Broken lines are re-connected.

Have real formatted HTML output available from PDFs now.

Improved HTML-to-text conversions. HTML should not contain tabs except within <pre> tags, but some HTML wrongfully does. In the past, these tabs were converted to a single space; now, we convert them to multiple spaces. This ends up allowing the Parser to “see” many more section headers that in the past were invisible because they collided with nearby words.

LinkedIn URLs

We now report LinkedIn urls with the Use field as linkedIn. This provides the ability for a more programmatic way to extract this url.

<ContactMethod>
    <Use>linkedIn</Use>
    <Location>onPerson</Location>
    <WhenAvailable>anytime</WhenAvailable>
    <InternetWebAddress>https://www.linkedin.com/in/demo</InternetWebAddress>
</ContactMethod>

Bug Fixes

Fixed a bug in the ReservedData section output that would cause an error in scrubbing PII.

We were eliminating some valid URLs. We fixed that so that we now report more URLs.

Version 9.0.2

New Products

Sovren Apply

Provides a candidate portal for ingesting candidate resumes without the need to reinvent the wheel. Uses the latest Sovren Resume Parser and can integrate directly with the AI Matching Engine. For more information visit https://sovren.com/products/apply.

Sovren Sourcing

Quickly source candidates from 3rd party job boards and the web. The results leverage Sovren's Parsing and Scoring to add intelligence on top of the searching provided by those 3rd parties. This provides a quick way to evaluate candidates from multiple sources including your existing candidates. https://sovren.com/products/sourcing.

New Features

AI Matching

Added an endpoint to check if a document exists in an index. This can be used when trying to determine if a document has already been indexed and is much lighter weight than retrieving the entire document. REST API documentation.

Improvements

Better PDF conversions to fix some things intentionally broken by LinkedIn.

Version 9

Upgrade Path

If you are upgrading from version 8.0 or later, switching to version 9 is as simple as changing the url of the service from v8 to v9. No other changes needed, typically.

If you are upgrading from version 7.5 or earlier, this version isn't compatible with version 7.5. To upgrade to version 9.0, we recommend the following approach:

  1. If you're using a parser configuration string, regenerate your string in the new human-readable Name=Value pair format. Details on this new configuration string, and a conversion tool are documented here.
  2. Parse the Sample.doc file (as well as some of your own documents) in the current version you use, and with 9.0 using our Demo Application and save those results to disk.
  3. Use a document comparison tool to evaluate the differences, specifically the new fields. There is a lot of new metadata provided that could be of high value to integrate in your application. These new fields are detailed below in the New Features section. For a document comparison tool, we really like Beyond Compare.
  4. Remap your API calls to the new 9.0 methods as described in the API Documentation (REST | SOAP), make the desired changes to your implementation to leverage the new metadata, change the URL to point to version 9.0, and enjoy.

New Features

Added an endpoint to scrub the Personally Identifiable Information from a Resume/CV. More information can be found in the REST API documentation.

Improvements

Resume Parser

Improved the skills taxonomies for all languages. We added a new taxonomy/Subtaxonomy for all languages: "No dominant taxonomy → Not enough data". When we cannot determine the taxonomy with confidence because so few (or no) skills were found, we output "No dominant taxonomy → Not enough data".

Improved accuracy on Work History and Education.

Improved sectioning of resumes.

Overall accuracy is up about 3 absolute percentage points, with 99% of the previous speed. Sovren parsing speed is typically at least 5x faster than our nearest competitor’s speed, and we produce about 1/3 to 1/10 of the mistakes as our nearest competitor.

AI Matching Engine

Improved the handling of management level queries in Matching when there was no management level data in the source document.

Breaking Changes

Skills

We deprecated the SkillsStyle property because we now have a single canonical way and place to output skills.

Skills are now output only in the resume's UserArea, or job's SkillsTaxonomyOutput. The output is extremely easy to read and understand from both a human and programmatic standpoint. The output taxonomies are sorted in descending order of importance, and skills are alphabetical within the subtaxonomies, and child skills are nested within parent skills.

Also, importantly, we now use the English skills list for non-English skills parsing in addition to the detected language's built-in skills list. This will generally result in more skills being found, with very few false cognates.

DO NOT use/rely on the skill Ids that are output. We reserve the right to modify skill names and to preserve the skill Id when we do so. In some cases, we append a language code to skill Ids so that we can output them alongside another translation of that skill with the same Id. If you are relying on skill Ids, stop!

NOTE FOR CUSTOM SKILLS LISTS: When developing your custom skills lists, you must avoid using ANY Sovren taxonomy or skill Ids. The only way to be certain of that is to prepend or append an alphabetical character to your Ids if they are only integers.

Other

We deprecated the ParserSettings.OutputFormat.ReportAllCompanyNamesAndPositionTitlesRegardless and ParserSettings.OutputFormat.ContactMethod properties.

We made these properties read-only:

  • ParserSettings.OutputFormat.XmlFormat
  • ParserSettings.OutputFormat.MinimumCompanyNameProbability
  • ParserSettings.OutputFormat.MinimumPositionTitleProbability

We moved the Bimetric Score endpoint from /bimetricanalyzer to /scorer/bimetric.

Bug Fixes

Fixed an uncommon issue in our JSON output where some arrays were output as objects when they ony had a single item.

Fixed an issue where the OutputFormat.NormalizeRegions parser setting was being ignored.

Version 8.3

Breaking Changes

AI Matching Engine

Typlically minor releases don't contain breaking changes, but this is a unique case where our AI Matching service hasn't yet crossed into production for our clients and we wanted to take the changes to add in some important improvements. Moving forward, breaking changes will be reserved for major releases and will be deployed alongside the existing service at a new URL.

We made the following changes to our AI Matching Api:

  • We now strip all Personally identifiable information (PII) from the parsed document prior to storage in the index. We don't store this information anywhere in our platform, so the only way to identify a document is by the unique id that is specified to us at time of storage.

Resume Parser

Added ResumeQuality element to the Resume element’s UserArea. These changes are reflected in the SovrenResumeExtensions.xsd. If you use XML validation, this will cause your validation to fail unless you use the new XSD. NEVER EVER use XML schema validation in anything but a QA environment! Never use an intake process that will fail if new nodes appear in the output.

Restored SkillId to the skills output. We still heavily discourage integrators from depending on Sovren SkillId. We also eliminated some of the skills pruning implemented in earlier versions of 8.x, as these reductions to the output were reducing the effectiveness of searching and matching against the output.

New Features

AI Matching Engine

We added the following features

  • Added support for dashes and underscores in index names. This is helpful when needed to delimit information in the index name.
  • Added support for underscores in document names
  • Improved the Management Level query
  • Added configuration options to the POST /parser/resume endpoint to be able to Geocode from a specified address, or to insert specified latitude/longitude. These options were already a part of the POST /geocode endpoint, but can now all be done from a single api call.
  • Added configuration options to the /parser/resume endpoint to be able to specify custom ids for indexing. These options were already a part of the POST /index/{indexId}documents/{documentId} endpoint, but can now all be done from a single api call.

Resume Parser

We now calculate and output a Resume Quality summary along with related information about known or suspected problems with the resume. This data is now output into the ResumeQuality element to the Resume element’s UserArea. This score can help flag those resumes that are low quality, so that you can potentially ask the candidate to fix the identified problems. NOTE: never fix the XML. ALWAYS fix the unparsed resume and then re-parse it.

The ResumeQuality outputs details about known and suspected problems with the resume, in decreasing order of severity/importance. If the resume has no detected issues, it will output as follows:

<sov:ResumeQuality>
    <sov:Assessments>
        <sov:Assessment>
            <sov:Level>No Issues Found</sov:Level>
	</sov:Assessment>
    </sov:Assessments>
</sov:ResumeQuality>
Here is sample output that demonstrates the four problem levels:
<sov:ResumeQuality>
    <sov:Assessments>
        <sov:Assessment>
            <sov:Level>Fatal Problems Found</sov:Level>
            <sov:Findings>
                <sov:Information>We had to calculate where the work history section was. The section header was either missing, spanned multiple lines, or unknown. The work history section should have a header 'Work History' before listing the content.</sov:Information>
                <sov:Information>This resume is approximately 9 pages long, and appears to be a curriculum vitae. Such documents are prone to errors due to the use of nonstandard headers and the vast amount of data describing patents, speaking engagements, research, advisory roles, publications, etc. Accordingly, only the first WORK HISTORY section was parsed, as that usually results in far greater accuracy.</sov:Information>
                <sov:Information>Each of the following sections in the resume contain more content than the work history and education sections combined: 'Publikationen', 'Fremdsprache'. This usually indicates a major problem with section headers or formatting within the resume.</sov:Information>
	    </sov:Findings>
        </sov:Assessment>
        <sov:Assessment>
            <sov:Level>Major Issues Found</sov:Level>
            <sov:Findings>
                <sov:Information>The following section in the resume contains more content than recommended: 'Publikationen' This section should be less than 10 lines long as it can cause errors in parsing the resume.</sov:Information>
                <sov:Information>The following section types appear multiple times in the resume: LANGUAGES (2 occurrences), SKILLS (2 occurrences), HOBBIES (2 occurrences). Each section should only appear once in a resume.</sov:Information>
            </sov:Findings>
        </sov:Assessment>
        <sov:Assessment>
            <sov:Level>Data Missing</sov:Level>
            <sov:Findings>
                <sov:Information>The following work history position does not have a job title:  POS-1. Every position in a resume should have a job title.</sov:Information>
                <sov:Information>The following educational degrees do not have a degree name: DEG-2, DEG-5, DEG-4, DEG-6. Every degree in a resume should have a name or type associated with it, such as 'BS' or 'MS'.</sov:Information>
            </sov:Findings>
        </sov:Assessment>
        <sov:Assessment>
            <sov:Level>Suggested Improvements</sov:Level>
            <sov:Findings>
                <sov:Information>Skills section found in resume. Skills should not be in a separate section, but instead included in the descriptions of work history or education.</sov:Information>
                <sov:Information>The following work history position has a street level address included: POS-2. Including a street level address for anything other than the primary contact address can lead to unexpected results in the parser output and should be removed.</sov:Information>
            </sov:Findings>
        </sov:Assessment>
    </sov:Assessments>
</sov:ResumeQuality>

Bug Fixes

Resume Parser

Fixed a bug related to skills parsing that was omitted some skills.

Accuracy Improvements

Resume Parser

  • Much better Chinese person name parsing
  • Improved Resume Sectioning
  • Improved German language parsing
  • Improved Company Name parsing
  • Improved parsing of educational majors
  • Improved ability to detect and compensate for bad conversions where extra whitespace was inserted between words in sentences

Skills

Resume Parser

We restored the SkillId in the output of skills. We modified some skills and added new ones.

Version 8.2

Breaking Changes

AI Matching Engine

Typlically minor releases don't contain breaking changes, but this is a unique case where our AI Matching service hasn't yet crossed into production for our clients and we wanted to take the changes to add in some important improvements.

We made the following changes to our AI Matching Api:

  • Cleaned up matching/searching endpoints into Match by Document, Match by DocumentId, Match by Criteria, and Search
  • Moved the RevisionDateRange into FilterCriteria since it is restrictive and acts like a filter
  • Removed MaxRecords and added pagination to searching. We limit each page to 100 records, and allow you to query to the 10th page (1000 records).
  • Renamed MaxRecords to Take to follow the same naming convention as Searching
  • Renamed IndexIds to IndexIdsToSearchInto for Matching and Searching endpoints

Resume Parser

None.

New Features

AI Matching Engine

We added the following features

  • Added the Reverse Compatability Score to the AI Matching reponses
  • Added matched terms to the category scores array in AI Matching reponses
  • Synced the queries between Bimetric Matching and AI Matching
  • Added Category Weights as an input for AI Matching
  • Added an endpoint to match using a document that's already indexed

Resume Parser

Added Danish language parsing.

Bug Fixes

Resume Parser

Fixed a bug related to skills parsing that was omitted some skills.

Accuracy Improvements

Resume Parser

  • Improved resume sectioning
  • Improved Company Name parsing
  • Reduced false positives on some email addresses being reported for the candidate (but which did not actually belong to the candidate)
  • Improved accuracy on Norwegian-language resumes

Version 8.1

New Features

AI Matching Engine

Our new cloud-based matching platform provides a scalable solution to finding a needle in the haystack without the need for countless hours of reviewing resumes/jobs. Take a look at the documentation and the API to find out more.

Underlying Changes

We added several run-time bug fixes.

Version 8.0

Upgrade Path

Version 9 contains major improvements over version 8 (and the upgrade process is the same), we strongly recommend upgrading straight to version 9.

As discussed in the breaking changes section, this version isn't compatible with the prior SaaS versions. To upgrade to version 8.0, we recommend the following approach:

  1. If you're using a parser configuration string, regenerate your string in the new human-readable Name=Value pair format. Details on this new configuration string, and a conversion tool are documented here.
  2. Parse the Sample.doc file (as well as some of your own documents) in the current version you use, and with 8.0 using our Demo Application and save those results to disk.
  3. Use a document comparison tool to evaluate the differences, specifically the new fields. There is a lot of new metadata provided that could be of high value to integrate in your application. These new fields are detailed below in the New Features section. For a document comparison tool, we really like Beyond Compare.
  4. Remap your API calls to the new 8.0 methods as described in the API Documentation (REST | SOAP), make the desired changes to your implementation to leverage the new metadata, change the URL to point to version 8.0, and enjoy.

New Features

Metadata

We added some incredibly powerful metadata.

AverageMonthsPerEmployer - We now calculate the candidate's turnover rate; that is, how often they switch employers. Notice that we calculate this turnover per employer, not per job. In other words, if you have three jobs for one employer, stretching over 36 months, your average months per employer is 36, not 12.

FulltimeDirectHirePredictiveIndex - This is a mouthful, but it's an incredibly powerful statistic. Although it is on a scale of 0 to 100, that does not imply that low scores are bad and high scores are good. Think of it as a scale where low scores mean that you are best suited for part time jobs, projects, consulting, etc., and high scores mean you are best suited - and most likely to want - fulltime direct hire jobs. This is an incredibly powerful tool that can be of substantial value-added benefit to every recruiter.

AttentionNeeded - We now have a single area where we output items that need your attention when thinking about how/where to place a candidate. This is where we now place the following notices:

  • Warnings about the Job Objective. In the past, when we calculated that the candidate's job objective seemed like they were wanting to make a career change (different industry, job titles or management level), we noted that within the ExperienceSummary/Description node. Now, we have moved that notice to the AttentionNeeded node.
  • We added a new notice relating to management. If we detect that the candidate was previously in management, but their latest position was not in management, we call this to your attention.

Skills Alternate View

We replaced the old "Competencies" section with a more logical structure that is easier for users to consume. We also output some new data in this section, such as whether a skill is only reported as a parent or was actually found in the resume. View SkillsTaxonomyOutput Documentation for more details.

Normalization

We also added normalization of job titles as a standard feature and this data is always output. Refer to the PositionHistoryUserArea Documentation for more details.

Geocoding

We now supply geocoding through the new web API calling out to a third-party provider (Bing or Google). This geocoding is far more accurate than our old geocoding. The new geocoding is true address-level geocoding, whereas the old geocoding was postcode-level at best, and city-level in other cases, yet still used massive amounts of memory.

Twitter

We now output Twitter handles in the ContactInfo area as well as the ReservedData section of the Resume UserArea:

<ContactMethod>
    <Use>twitterHandle</Use>
    <Location>home</Location>
    <WhenAvailable>anytime</WhenAvailable>
    <InternetWebAddress>@twitQueen</InternetWebAddress>
</ContactMethod>

Breaking Changes

This build is NOT drop-in compatible with any previous build. We made many breaking changes that were necessary in order to simplify the API and eliminate obsolete properties and methods.

In an effort to clean up the existing API and prepare for XML and JSON output we removed the following API parameters (Note: the updated endpoint documentation can be found in the API Documentation (REST | SOAP):

ParseResume Endpoint

  • ParseResumeRequest
    • FileText
    • OutputXmlDoc
    • OutputWordXml
    • ParserVersion
    • OutputJson
  • ParseResumeResponse
    • XmlDoc
    • WordXml
    • WordXmlCode
    • ParserVersion
    • Xml
    • XmlCode
    • Json

ParseJobOrder Endpoint

  • ParseJobOrderRequest
    • FileText
    • OutputXmlDoc
    • OutputWordXml
    • ParserVersion
    • OutputJson
  • ParseJobOrderResponse
    • XmlDoc
    • WordXml
    • WordXmlCode
    • ParserVersion
    • Xml
    • XmlCode
    • Json

NormalizeResume Endpoint

  • NormalizeResumeRequest
    • Xml
  • NormalizeResumeResponse
    • Xml

We also redesigned our response codes and reduced the number of possible responses. Details can be found in the API Documentation (REST | SOAP).

For this release, we developed a much more readable and less error prone Name=Value pair configuration string. The legacy "01010..." Parser configuration string has been deprecated, so we highly recommend that you browse to the Config String Builder tool and use it to generate your new configuration string. You can copy and paste in your existing config string and the page will prepopulate your current settings on the form and generate the config string in the new format.

Underlying Changes

Accuracy Improvements

By popular demand, we changed how we split data between job titles and company names on LinkedIn profiles. That means that job titles will often be longer, and company names will often be shorter.

Contact info, employment, and education are even more accurate.

Speed

The Parser's performance as measured on our resumeparsing.com SaaS service using worldwide submissions in every language is approximately 500 milliseconds. If you are not seeing sub second average parse times, something is probably very wrong. Remember, ParseTime is always output in the Resume UserArea. Contact Sovren Support for help!

Skills

We deprecated the output of skills in the Resume.StructuredXMLResume.Qualifications.Competencies node. The HR-XML schema was simply not informative enough and not user-friendy. We also deprecated the output of BestFitTaxonomies. As mentioned above, both skills and best-fit taxonomies are now output in a hierarchical tree that is intuitive, informative, and easy-to-use. This output can be found in SkillsTaxonomyOutput.

We added dozens of new IT skills. We removed some skills that seemed to be of little value. We stopped some low-value skills from displaying in the output.

Prior to this release, several languages did not have built-in skills. That is no longer the case. We now have full skills trees in every language supported by the Parser.

Potentially breaking change: All custom skills taxonomies must now follow this pattern:

ParentTaxonomy => SubTaxonomy => Skill [ => Optional ChildSkill ]

In other words, you cannot have a skill tied to a taxonomy unless that taxonomy has a parent taxonomy.

Output

Choose JSON or XML

By popular demand: JSON!!! Through the new REST API you can now receive JSON that exactly mimics the HROpenStandards.org Resume 2.5 schema output.

Default settings

Please note that default settings have changed for date output formats, for skills output, for what sections are parsed (we now parse and output more sections by default).

Employment History

By popular demand, we replaced the numeric log-scaled CompanyNameProbability and PositionTitleProbability in the PositionHistoryUserArea with two new nodes called PositionTitleProbabilityInterpretation and CompanyNameProbabilityInterpretation. These new fields represent human readable interpretations of the CompanyNameProbability and PositionTitleProbability. For more information include sample output refer to the PostitionHistoryUserArea Documentation.

By popular demand, all Employment History output is now reordered into reverse chronological order (e.g., most recent data first, followed by progressively older data), regardless of the resume's ordering of the jobs. Please note that the UserArea for each PositionHistory node still shows you the resume-order for each job.

Education History

By popular demand, all Education History output is now reordered into reverse chronological order, regardless of the resume's ordering of the jobs. Please note that the UserArea for each PositionHistory node still shows you the resume-order for each school record.