Essay evaluation and scoring depends on various features such as word choice, grammatical accuracy, sentence structure and clarity of writing. The dilemma with a manual essay screening process lies in the subjectivity of evaluations. It is difficult to understand the methodology behind essay scoring – do recruiters use different metrics when evaluating candidates? What aspect of writing do they consider to be the most important? One recruiter may focus on a candidate’s vocabulary, while another may concentrate on how well a message is conveyed. Recruiters may also favour a particular writing style, potentially adding bias to the selection process, resulting in inaccurate hiring.
This raises the question – using datasets of candidate essays and recruiter ratings, can we create an objective list of metrics that recruiters use? Moreover, can we use these metrics to develop an automated essay evaluation system? Our team at impress.ai looked to tackle this problem by analyzing various metrics available based on the different aspects of writing they evaluate.
By scoring essays with these metrics and comparing them with recruiter ratings, we can obtain an objective understanding of how recruiters evaluate essays. Furthermore, we can use these metrics to create an automated feature-engineered scoring system that evaluates essays through the same hand-crafted features that recruiters use. Studies have shown that feature-engineered models are a good fit for automated essay evaluation systems due to their transparency and ease of customization. Their methodology is easier to understand compared to recent development in deep learning natural language processing technologies. Moreover, there is a consensus that hand-crafted features “still play a crucial role for automated essay scoring systems and will continue to do so for the foreseeable future”.
Based on our team’s findings, here are five metrics that had the strongest correlation with recruiter evaluations:
Lexical Diversity
Lexical diversity is defined as the measure of how many different words appear in an essay. It is a good indicator of a candidate’s vocabulary and reflects their variety in word choices. The most popular measure of lexical diversity is the type-token ratio, which is calculated by finding the ratio between the total number of unique words in an essay against the word count. A result is a number between 0 and 1 – a score closer to 1 reflects a well-developed vocabulary.
Lexical Density
Lexical words convey information and have a shared meaning in identifying an object or action. The lexical density of an essay is, therefore, a measure of how much information it contains. Calculated by finding the ratio between the number of lexical words over total word count, essays with a higher lexical density are said to convey more information.
Flesch Reading Ease
A metric widely used by marketers, researchers and writers, Flesch reading ease is calculated by a formula developed by Rudolf Flesch, a consultant with the Associated Press. A readability score for essays is calculated using the average number of words in a sentence and the average number of syllables in a word. The higher the score, the easier it is to understand.
Age of Acquisition
Age of acquisition measures the average age when a word is learned. Simple words are shown to be acquired at an earlier age compared to complex words. Several studies have shown that humans tend to easily recollect words that they have learned at an early age compared to words acquired later in their lives. Therefore, essays with a higher average age of acquisition are reflective of better candidate vocabulary and a higher quality of writing.
Grammar Error Percentage
Grammatical errors in an essay can easily be found with the help of natural language processing. Grammar error percentage is determined by the ratio of grammatical mistakes over the total number of words in an essay. A higher grammar error percentage reflects poor writing ability.
These metrics can be used as the foundations for developing automated essay evaluation and scoring systems that can save recruiters time and effort. The customizability of feature-engineered models further introduces the possibility of customizing evaluation metrics based on a recruiter’s requirements or the nature of the essay-based question. Automated essay evaluation and scoring systems have the potential to significantly improve recruitment efficiency while ensuring accurate and useful insights.