Documentation Release Notes Downloads FAQs

Sovren Job Order Parser

Job Orders (aka job descriptions, job postings, etc.) are much more difficult to parse than resumes because resumes contain information about one thing (the candidate). On the other hand, job orders contain:

  • Information about the job
  • Information about the ideal candidate
  • Information about the company
  • Information about corporate culture
  • Information about benefits
  • Information about background checking and testing

The Sovren Job Order Parser is designed to take all of these factors into account. Simply pass in your job order document through our API and the parser will extract multiple data points such as employer name, job location, job description, required skills, required degree, start date, etc. (for a full list of fields, please refer to the Job Order Schema). In addition, the job order parser can also output the original document in a variety of formats including HTML, PDF and RTF.

Sample Output

The following sample XML shows the output from the job order parser along with descriptions for each element:

IMPORTANT: Your integration code should be robustly written to handle the case of missing elements. Most of these XML elements are optional; they are output only if:

1) The data exists in the parsed document, and
2) The parser is able to recognize the data, and
3) Configuration options have enabled parsing of those elements.
<SovrenData xml:lang="en">
       <!-- The xml:lang attribute is the ISO 639-1 two-letter-alpha code for the detected language
       of the job order text (see SourceText element). -->
  
	<!-- The unique identifier for this job order.
       This value is only extracted from a few specialized vendor formats. -->
	<JobOrderId>ABC-123</JobOrderId>

	<!-- DocumentLanguage indicates the language of the Job Order text.
	     The value is an ISO 639-1 two-letter code.
	     If this value and xml:lang are both specified, then this one takes precedence. -->
	<DocumentLanguage>en</DocumentLanguage>

	<!-- "true" when the HighestManagementScore is greater than zero, otherwise "false" -->
	<CurrentJobIsMgmt>true</CurrentJobIsMgmt>

	<!-- A numerical representation of the expected level of management seniority and experience.
	        0 = Not a manager
	        1 to 29 = Lower management
	        30 to 59 = Middle management
	        60 to 99 = Upper management
	        100 = Executive management -->
	<HighestManagementScore>85</HighestManagementScore>

	<!-- The management level as an enumerated string value.
	        none = Not a manager
	        low = Lower management
	        mid = Middle management
	        high = Upper management or executive management -->
	<ManagementLevel>high</ManagementLevel>

	<!-- The type of executive:
	        NONE = Not an executive
	        EXECUTIVE = CEO, President, Chairman
	        ADMIN = Administration
	        ACCOUNTING = Accounting
	        OPERATIONS = Operations
	        FINANCIAL = Financial
	        MARKETING = Marketing
	        BUSINESS_DEV = Business/Product Development
	        IT = Information Technology
	        LEARNING = Learning
	        GENERAL = All others -->
	<ExecutiveType>FINANCIAL</ExecutiveType>

	<JobTitles>
		<!-- The job title determined to be the one most relevant to this job order,
		     from the set of possible JobTitles appearing in the job order. -->
		<MainJobTitle>CFO</MainJobTitle>

		<!-- Each JobTitle element is a possible job title. The sequence of JobTitle elements
		     includes the job title, its variations, and any secondary or alternative job titles.
		     The sequence is ordered from most relevant to least relevant. -->
		<JobTitle>Chief Financial Officer</JobTitle>
		<JobTitle>Financial Officer</JobTitle>

	</JobTitles>

	<EmployerNames>
		<!-- The employer name determined to be the one most relevant to this job order,
         from the set of EmployerNames appearing in the job order. -->
		<MainEmployerName>Bizmoto LLC</MainEmployerName>

		<!-- Each EmployerName element is a possible employer name. The sequence of EmployerName
	       elements includes the employer name, its variations, and any secondary or alternative
	       employer names, divisions or departments. The sequence is ordered from most relevant
	       to least relevant. -->
		<EmployerName>Bizmoto</EmployerName>
		<EmployerName>IT Division</EmployerName>
	</EmployerNames>

	<!-- Each SchoolName element is a school name. There are no variations. -->
	<SchoolNames>
		<SchoolName>Purdue University</SchoolName>
		<SchoolName>University of Pennsylvania</SchoolName>
	</SchoolNames>

	<!-- Each CertificationOrLicense element specifies a desired certification or license. These
	     values are not normalized by type or level. They are reported as they appear within the
	     job order. -->
	<CertificationsAndLicenses>
		<CertificationOrLicense>CPA</CertificationOrLicense>
		<CertificationOrLicense>Certified Financial Planner</CertificationOrLicense>
	</CertificationsAndLicenses>

	<!-- Each LanguageCode element represents desired proficiency in a language.
	     The level of proficiency is not expressed. Each value is an ISO 639-1 two-letter code.
	     For example, "es" for Spanish. -->
	<LanguageCodes>
		<LanguageCode>es</LanguageCode>
		<LanguageCode>fr</LanguageCode>
		<LanguageCode>de</LanguageCode>
	</LanguageCodes>

	<!-- Location of the job or employer. -->
	<CurrentLocation>
		<Municipality>Atlanta</Municipality>
		<Region>GA</Region>
		<PostalCode>30308</PostalCode>
		<CountryCode>US</CountryCode>
		<Latitude inferred="true">33.749</Latitude>
		<Longitude inferred="true">-84.38798</Longitude>
	</CurrentLocation>

	<!-- When the parser setting AddNonTaxonomySkills is true, this sequence of
	     elements contains terms that might be relevant for searching and matching
	     according to indications such as ProperCase, UPPERCASE, punctuation,
	     and so on. This sequence is SOLELY intended as supplemental data for
	     internal use by searching and matching. Please do not ask questions about it. -->
	<TermsOfInterest>
		<TermOfInterest>Tax Software</TermOfInterest>
		<TermOfInterest>BusinessObjects</TermOfInterest>
		<TermOfInterest>Budget Reporting</TermOfInterest>
	</TermsOfInterest>

	<!-- A view of the skills structured in the hierarchical manner that 
		matches the Taxonomy > Subtaxonomy > Skill > Child Skill structure that the 
		parser understands. By default, there will only be one TaxonomyRoot, "Sovren". 
		However, if you have added custom skills, you may see the "Sovren" 
		TaxonomyRoot and your custom TaxonomyRoot(s). -->
	<sov:SkillsTaxonomyOutput>
		<sov:TaxonomyRoot name="Sovren">
			
			<!-- id: A unique identifier either provided by built-in Sovren taxonomy or provided by you in a custom taxonomy. -->
			<!-- percentOfOverall: The weight of a specific taxonomy/subtaxonomy (and its children) divided by the total of all skill weights across all taxonomies, expressed as a percentage. -->
			<!-- percentOfParentTaxonomy: The weight of a specific subtaxonomy (and its children) divided by the weight of its parent taxonomy, expressed as a percentage. -->
			<sov:Taxonomy name="Information Technology" id="10" percentOfOverall="57">
				<sov:Subtaxonomy name="Database" id="193" percentOfOverall="33" percentOfParentTaxonomy="58">
					
					<!-- existsInText: True if the skill/childskill was actually found in the resume text. False if we are only reporting this skill as a parent of a skill that was found. -->
					<!-- required: True if the skill/childskill was listed as a required skill for the job. -->
					<sov:Skill name="DATABASES" existsInText="true" required="true">
						<sov:ChildSkill name="DATABASE" existsInText="true" required="true" ></sov:ChildSkill>
					</sov:Skill>
					<sov:Skill name="MICROSOFT ACCESS" existsInText="false" required="false">
						<sov:ChildSkill name="ACCESS 97" existsInText="true" required="false" ></sov:ChildSkill>
					</sov:Skill>
					<sov:Skill name="MS SQL SERVER" existsInText="false" required="false" >
						<sov:ChildSkill name="SQL SERVER" existsInText="true" required="true" ></sov:ChildSkill>
					</sov:Skill>
					<sov:Skill name="ORACLE" existsInText="true" required="true" ></sov:Skill>
					<sov:Skill name="SQL" existsInText="true" required="true" ></sov:Skill>
				</sov:Subtaxonomy>
				<sov:Subtaxonomy name="Internet" id="196" percentOfOverall="10" percentOfParentTaxonomy="18">
					<sov:Skill name="ASP" existsInText="true" required="false" ></sov:Skill>
					<sov:Skill name="WEB BASED" existsInText="true" required="true" >
						<sov:ChildSkill name="WEB-BASED" existsInText="true" required="true" ></sov:ChildSkill>
					</sov:Skill>
				</sov:Subtaxonomy>
			</sov:Taxonomy>
			...
		</sov:TaxonomyRoot>
	</sov:SkillsTaxonomyOutput>

	<!-- The sequence of Degree elements is sorted in increasing order of degree level.
	     The first Degree element is the minimum level of education expected. All other
	     Degree elements are assumed to be preferred levels of education. -->
	<Education>
		<Degree>
			<!-- DegreeType is one of the following case-sensitive values (defined by
			     the HR-XML standard and not currently extensible):
			       UNSPECIFIED
			       specialeducation
			       someHighSchoolOrEquivalent
			       ged
			       secondary
			       highSchoolOrEquivalent
			       certification
			       vocational
			       someCollege
			       HND_HNC_OrEquivalent
			       associates
			       international
			       bachelors
			       somePostgraduate
			       masters
			       intermediategraduate
			       professional
			       postprofessional
			       doctorate
			       postdoctorate
			-->
			<DegreeType>bachelors</DegreeType>

			<!-- Name of the degree as found in the job order. -->
			<DegreeName>BS</DegreeName>
		</Degree>

		<!-- Additional preferred degrees are listed here. -->
		<Degree>
			<DegreeType>masters</DegreeType>
			<DegreeName>MBA</DegreeName>
		</Degree>

	</Education>

	<!-- Desired minimum number of years of work experience. -->
	<MinimumYears>5</MinimumYears>

	<!-- Desired maximum number of years of work experience. -->
	<MaximumYears>20</MaximumYears>

	<!-- Desired minimum number of years of management experience. -->
	<MinimumYearsManagement>5</MinimumYearsManagement>

	<!-- Desired maximum number of years of management experience. -->
	<MaximumYearsManagement>20</MaximumYearsManagement>

	<!-- The minimum level of education expected.
	     Same as first DegreeType element. -->
	<RequiredDegree>bachelors</RequiredDegree>

	<!-- Start Date. The element text is the raw text. The value attribute
       contains the value normalized to YYYY, YYYY-MM, or YYYY-MM-DD format.
       This value is only extracted from a few specialized vendor formats. -->
	<StartDate value="2013-03-01">3/1/2013</StartDate>

	<!-- End Date. The element text is the raw text.
       The value attribute contains the value normalized to YYYY, YYYY-MM, or YYYY-MM-DD format.
       When inferred is true, the EndDate was inferred from the StartDate and Duration.
       This value is only extracted from a few specialized vendor formats. -->
       <EndDate value="2013-05-31" inferred="true">5/31/2013</EndDate>

	<!-- Duration. The element text is the raw text.
       The unit attribute specifies the unit of time (hour, day, week, month, year).
       The value attribute specifies the number of units.
       When inferred is true, this element was inferred from the StartDate and EndDate. -->
	<!--<Duration value="3" unit="month" inferred="true">3 Months</Duration>-->

	<!-- Bill Rate. The element text is the raw text.
       The amount attribute specifies the bill amount per unit of time.
       The unit attribute specifies the unit of time (hour, day, week, month, year).
       The currency attribute specifies the currency for the amount.
       This value is only extracted from a few specialized vendor formats. -->
	<BillRate amount="45.00" currency="USD" unit="hour">$45.00 hourly</BillRate>

	<!-- Pay Rate
       The amount attribute specifies the pay amount per unit of time.
       The unit attribute specifies the unit of time (hour, day, week, month, year).
       The currency attribute specifies the currency for the amount.
       This value is only extracted from a few specialized vendor formats. -->
	<PayRate amount="30.00" currency="USD" unit="hour">$30.00 hourly</PayRate>

	<!-- Job Description
       This value is only extracted from a few specialized vendor formats. -->
	<JobDescription>The paragraphs of text that describe the job.</JobDescription>

	<!-- Job Requirements
       This value is only extracted from a few specialized vendor formats. -->
	<JobRequirements>The paragraphs of text that describe the requirements.</JobRequirements>

	<!-- Indicates that this collection of data was produced from a job order. -->
	<TypeOfSource>stJobOrder</TypeOfSource>

	<!-- The text that was parsed to produce this XML. -->
	<SourceText>This is where the full text of the job order will appear.</SourceText>

	<!-- Optional value passed to ParseJobOrder. Can be used for filtering search results
	     or for correlating XML files. -->
	<Owners>
		<Owner>MyCompany</Owner>
	</Owners>

	<!-- Optional values passed to ParseJobOrder. Can be used for filtering search results
	     or for correlating XML files. -->
	<Ids>
		<Id>123</Id>
		<Id>xyz</Id>
	</Ids>

	<!-- Date the job order was parsed. In ShortDate format for 6.3.1110.0 and earlier.
	     In YYYY-MM-DD format for subsequent versions. -->
	<RevisionDate>2010-11-03</RevisionDate>

	<!-- When present, this element indicates that a timeout occurred during parsing.
	     Number of milliseconds of parsing that elapsed before the timeout occurred.
		 
	       type = "soft" or "hard"
	       marker = name of the parsing stage before which the timeout occurred
	
	     The XML includes all data that was gathered before the point that the timeout
	     occurred, but beware that some data is cleaned or rearranged in the final
	     stages of parsing, so it is best to discard results that have timed out.
	-->
	<TimedOut type="soft" marker="Employers">30121</TimedOut>
</SovrenData>