The suitability of personalised AI models for ancient language T&L

The suitability of personalised AI models for ancient language T&L

 

By: Jackie Baines and Edward A. S. Ross, School of Humanities; Department of Classics, j.baines@reading.ac.uk; edward.ross@reading.ac.uk 
A robot and a human hand almost touching.
Photo by Cash Macanaya on Unsplash

Overview

This article outlines the work undertaken in the Department of Classics to test the effectiveness of GenAI model personalisation to reduce hallucinations and output refining time. These tests found that personalised model using OpenAI’s GPTs, Google’s Gems, and Blackboard Ultra’s AI Assistant made some efficiency improvements, but personalisation had no impact on reducing hallucinations.

Objectives

• To test if personalised GenAI tools can reduce hallucinations related to ancient language vocabulary and reduce the number of required inputs to achieve an expected output, compared to the equivalent freely-available GenAI model.
• To develop ethical and sustainable methods for training personalised GenAI tools.
• To collaborate with students to test the GenAI tools from a learner’s perspective.

Context

Based on previous research on the effectiveness of using GenAI tools to support ancient language T&L, we found that ChatGPT, Google Gemini, Microsoft Copilot, and Claude all frequently outputted hallucinated vocabulary that was not included in the restricted vocabulary lists prescribed in our modules. We found that this would cause problems for students without a firm understanding of their vocabulary requirements, so we sought to determine whether personalised GenAI models would significantly reduce these hallucinations. Furthermore, for sustainability purposes, we hoped that personalised models with pre-prepared guiding prompts would potentially reduce the required number of inputs to achieve an intended output.

Implementation

  • We developed an exhaustive dataset of all possible Latin words and forms that a student in CL1L1 (Beginners Latin) would be expected to know at the end of the module.
  • This dataset included 21,825 datapoints and took 48 hours to tabulate.
  • We prepared personalised models using OpenAI’s GPTs and Google’s Gems interfaces, where we uploaded the datasets and created guiding prompts based on our previous work developing guiding phrases.
  • Teaching staff and students then tested the personalised models for their effectiveness at supporting ancient language learning in two different tasks: creating and marking vocabulary quizzes and generating additional homework questions.
  • The personalised model outputs were then compared to equivalent outputs from the general versions of ChatGPT and Google Gemini available at the time.
  • Teaching staff then tested these same prompts with Blackboard Ultra AI assistant, which only had access to the prepared datasets and CL1L1’s module materials.
  • Based on the results of these tests, we updated our departmental AI guidance and instructional booklets.
  • At the beginning of the 2025–2026 academic year, we informed students and staff in the Department of Classics of best practices for supporting ancient language T&L with GenAI ethically and effectively.

Impact

The original intention for this project was to try and reduce the hallucinations present in GenAI outputs related to ancient language vocabulary and thereby reducing the number of prompts required to obtain an accurate desired output. Over the course of this research, we discovered that end-user-friendly GenAI personalisation models are largely ineffective, and sometimes more problematic, when compared to the equivalent general use models. Vocabulary hallucinations were just as persistent in the personalised models as in the general-use models. The major issue, however, was that the personalised models would insist that hallucinated vocabulary was in the original dataset to begin with, while the general-use models would apologize and try to make the mistake a learning opportunity for the user. There was some reduction in the number of required inputs to obtain an accurate desired output, but the hallucination issues tended to outweigh these improvements. For more details about the effectiveness of OpenAI’s GPTs and Google’s Gems, please see Ross and Baines (2025).

Blackboard Ultra’s AI Assistant was able to provide quizzes and extra homework, acting as a tutor. However, despite having access to the vocabulary dataset and module materials, we found that the hallucinated vocabulary issues were also present. When challenged about the presence of unneeded vocabulary, the tool took a balanced approach, compared to the OpenAI and Google’s models.

Screenshot of genAI chatbox
Figure 1. Anthology, Blackboard AI Assistant 3900.121.0, 3 July 2025 version, personal communication, generated 23 July 2025. Prompt: “Write a vocabulary quiz using 2nd declension nouns.”

In the above image (Figure 1), oppidum is a hallucinated noun that is not included in the vocabulary dataset, but the word does exist otherwise. Blackboard Ultra AI Assistant responds to the input that highlights this issue by still providing grammatical details and the opportunity to learn the noun as additional vocabulary. Although this tool does produce the same kind of hallucinations as the other personalised models, it does generate outputs which are similar to a teacher in a classroom.

Reflections

We think that this research is important, despite the lack of positive results. These tests demonstrate that personalisation using general-use AI models like ChatGPT and Google Gemini will not be appropriate for supporting specific language learning tasks, especially for ancient languages. Instead, smaller, independent, bespoke AI models that are trained on restricted datasets would be more effective. However, these models and datasets do not yet exist. Through collaborative work, AI developers and ancient language teachers can create accessible, ethical models to support ancient language T&L.

References and further reading

ChatGPT: A conversational language study tool

ChatGPT: A conversational language study tool

 

By: Jackie Baines and Edward A. S. Ross, Department of Classics, School of Humanities, j.baines@reading.ac.uk and edward.ross@reading.ac.uk
classical Greek/Roman style columns on a classical ruin with a bright futuristic sky background
Photo by Yusuf Dündar on Unsplash

Overview

This project outlines the work undertaken in the Department of Classics to demystify generative artificial intelligence for ancient language staff and students over the 2023-2024 academic year.

Objectives

  • Codify and standardise methods for using conversational AI models (such as ChatGPT, Claude-2, and Google Bard) in ancient language classes.
  • Produce tested guiding phrase documents for students to copy and paste into conversational AI models so that their outputs are standardised to match course expectations.
  • Lead interactive testing sessions in all levels of ancient language classes (i.e. Latin and Ancient Greek) to test these documents and inform students about the ethical considerations for using generative AI.

Context

At the time when this project was instigated, there was a dramatic surge of generative AI development and use at a generally accessible level. This led to extreme anxiety among educators and students alike as to how these tools could impact the known models for teaching, learning, and assessment in classics and beyond. We sought to approach this issue ‘head on’ in order to contextualise the nature and value of generative AI tools for staff and students, dispelling any unwarranted preconceptions and informing them of necessary ethical considerations.

Implementation

  • Surveyed ancient language teaching staff about the necessary elements of their courses.
  • Led sessions with staff to develop the departmental AI guidelines and citation guide in Summer 2023.
  • Led AI ethics information sessions for all undergraduate and postgraduate students in the Department of Classics over the Autumn 2023 term.
  • Held survey sessions with all ancient language students studying Ancient Greek and Latin, gathering data on their views on generative AI before and after the information sessions, in Autumn 2023.
  • Hired three undergraduate research assistants to test guiding phrases on a variety of conversational AI tools to determine the effectiveness of the tools and guiding phrases for supporting various aspects of their ancient language learning.
  • Published the tested guiding phrases as a digital and physical pamphlet for staff and students to freely use in March 2024.
  • Recorded and published a series of tutorial videos on generative AI ethics and digital tools for Classics.
  • Carried out a follow-up survey in Spring 2024 with the same ancient language students that completed the Autumn 2023 survey to gauge the impact generative AI had on their studies over the 2023-2024 academic year.
  • Analysed and published survey data in two academic journal articles (one is currently in press) and on secure data repositories.

Impact

The initial intention for this project was to investigate how effective generative AI tools were for supporting ancient language teaching and learning, but our research and response from students led us to work more towards improving general AI literacy among humanities teachers and students. When we were giving our ethics presentations, teachers and students were shocked by the ethical considerations behind generative AI, especially the environmental and copyright implications, and as soon as they saw this and learned that their own work could be used to train these models, they were much more sceptical of using the tools. At the time of writing, the tutorial videos and guiding phrase pamphlet have been downloaded around 150 times each, and this continues to grow as we present our tutorials to future course groups.

Reflections

We found this work was successful in many ways, particularly through our collaboration with our undergraduate students. By working with our students, who are seeing large swaths of generative AI tools on a daily basis, we were able to get a wider perspective on the impact and use cases of these tools for ancient language teaching and learning. Any research into generative AI and teaching and learning should involve student-teacher collaboration. In some aspects, we were also interdisciplinary in our reach, making some presentations for the Modern Languages Department, but there is scope for many more interdisciplinary collaborations for this work. In the future, we intend to continue making ethics tutorials for ancient language students, and the materials developed during this TLEP-funded project will help us illustrate the current issues more effectively.

Links

Further reading

  • Ross, E. A. S., & Baines, J. (2024). Treading water: New data on the impact of AI ethics information sessions in classics and ancient language pedagogy. Journal of Classics Teaching25(50), 181–190. https://doi.org/10.1017/S2058631024000412