This article was originally published here
Educ Inf Technol (Dordr). 2022 Jan 20:1-11. doi: 10.1007/s10639-021-10863-y. Online ahead of print.
Checking essays written by students is a time consuming task. Apart from spelling and grammar, they should also be assessed on their semantic content such as cohesion, coherence, etc. In this study, we focus on such an aspect of semantic content which is the subject of the essay. Formally, given a prompt or a trial and a trial, this study aims to solve the problem of predicting whether the trial is off topic or not by using machine learning techniques. With an increase in online learning and assessment platforms, especially during the COVID-19 pandemic, the off-topic detection system can be very useful in verifying essays that are primarily submitted online. In this article, we answer the question: given a prompt and an essay written in Pakistani English, can the process of detecting whether the essay is off topic or not be reliable and fully empowered without human intervention using the tools and techniques currently available? To this end, we explore and implement various integration techniques proposed in recent years to extract similarity or dissimilarity features between question and answer, and compare the performance of these techniques using 10 data sets of reference and 6 classifiers. With different classifiers and different datasets as well as different embeddings, we conclude that combining word travel distance, mean word embeddings and idf weighted word embeddings and then using random forest as a classifier is the best combination for detecting off-topic essays. The accuracy obtained is 93.5%.