Large language models harnessed for education


Rumours of the death of homework may have been exaggerated. Large language models (LLMs) such as ChatGPT are disrupting some of the ways in which educators assess the progress of pupils and students, but they are also being used for new methods of assessment, teaching and support that aim to enhance education rather than compromise it.

“It was always a bad idea to say, ‘go write me a five-paragraph essay about this topic and turn it in tomorrow’,” says Robert Harrison, director of education and integrated technology at ACS International Schools, a charity which runs four independent schools in Greater London and Doha, Qatar. “Now, it’s a terrible idea because you can write it in five minutes on your way home from school on your laptop.”

Using short essays to check if pupils had absorbed information has always been open to abuse, whether this involved asking an older sibling, copying from an encyclopaedia or using a search engine. To get pupils thinking about how LLMs work, ACS has asked some to use ChatGPT to draft an essay on a topic, then fact-check it for “hallucinations” (factual errors), then rewrite the text to correct it.

Harrison says educators could attempt to train people to work as “AI [artificial intelligence] whisperers” who are good at writing good prompt queries for LLM systems, but such jobs could be as transient as people being paid to design searches when search engines were a novelty. Instead, employers tell ACS they want to hire people who can adopt new technologies intelligently as they appear. “Building resilience, the ability to roll with the punches and manage change at a fast pace – those are dispositions you can practice and teach,” he says.

School homework is an example of what educators called formative assessment, which aims to monitor and reinforce learning during a course. Using LLMs in summative learning, such as exams and coursework used at the ends of courses, is particularly sensitive as it could allow students to gain a qualification or a higher grade if they do so.

Jisc, a not-for-profit agency that provides technology services across UK further and higher education, is researching how LLMs are affecting different types of assessment, and has already found that multiple choice quizzes are particularly vulnerable. Michael Webb, director of technology and analytics, says that previously, someone trying to cheat on an online quiz at least had to use a search engine, potentially learning something in the process, but LLMs short-circuit even that. “Anything that has been done before, ChatGPT is really good at,” he says.

Many institutions use plagiarism detection tools such as TurnItIn, run by a Californian-based company of the same name, which aims to identify material that has appeared previously.

AI detection capability

In March 2023, the company launched an AI detection capability. Webb says that software-generated text can give itself away by being too consistent – it tends to lack human randomness. But detection tools can only generate a probability that an LLM was used, and unlike with straight plagiarism, they cannot point to a likely source.

“None of them can prove conclusively that the text was written by AI,” he says. Furthermore, as LLM tools are incorporated into software such as Microsoft Word, their use may become as routine as spelling and grammar checking is at present.

Webb sees three ways for educators to deal with AI in assessment: avoid it, outrun it or embrace it. Avoidance means using highly controlled environments such as exams, but these don’t work for all assessment. Trying to outrun AI by using formats that services can’t handle is likely to be a short-term fix given how fast the technology is developing, with ChatGPT’s supplier OpenAI adding support for diagrams in March.

He reckons that educators will usually be better off embracing LLMs. This can include setting guidance and rules for their use, something many universities have already done, and requiring students to declare they have used them – although if AI becomes ubiquitous in word processing, this could be a temporary measure. Assessments could be based on new data from ongoing or yet-to-be-published academic research, or students could be allowed to use an LLM to generate a presentation that is based on their own data.

Aside from assessment, LLMs look set to provide pupils and students with new ways to learn. Webb says they can help students get past a blank page by providing ideas on how to start a piece of writing, and they are often good at simplifying complex text.

There are more creative options, too. One further education college told a recent Jisc event that it’s using ChatGPT to let English literature students “interview” Lady Macbeth, while another asks students to write prompts for AI image generator Midjourney, which works better with concise and accurately written instructions.

Michael Bennett, director of the education curriculum at the Institute for Experiential AI at Boston’s Northeastern University, says LLMs may be particularly useful for students needing to work iteratively, such as providing feedback when needed on writing a large amount of computer code over several months.

More generally, some of the university’s instructors get students to use ChatGPT 3 (which unlike version 4 is free to use) to generate ideas or initial drafts of material. Designing and refining prompts can act as “a source of quickening for their thoughts”, says Bennett, adding: “LLMs become a kind of warm-up for the assignments, as opposed as a threat to the integrity of the education process.”

But he warns that some students, both at universities and at schools, are overly trusting of LLM output based on previous experiences of information technology serving up trustworthy answers: “This technology is notorious for hallucinating, creating false citations, for flat out making things up,” he says.

Without mitigations, “that’s a recipe for catastrophe, educationally speaking”. One answer is to teach students how such systems work so they understand their limitations, but Bennett adds that some of his colleagues are concerned that LLMs could further reduce the size of the common set of knowledge that most people share, a source of cohesion for society.

Learn from past mistakes

Mat Pullen, senior education expert at device management and security specialist Jamf, has worked as both a teacher and a lecturer on education. He agrees with Michael Bennett on the potential for LLMs to help get students started on writing, and is concerned that schools will repeat the mistakes he thinks many made by blocking YouTube and Wikipedia when they were first introduced.

Earlier this year, New York’s Department of Education blocked use of ChatGPT, although it later reversed this decision. Although Jamf can block the use of LLMs on its customers’ managed devices, Pullen says it has generally received enquiries over whether this is possible rather than instructions to do so.

He says that when schools blocked the use YouTube, students accessed it covertly elsewhere and the opportunity of using it for self-guided learning was lost. Teachers worried about what students could find on the service. “For me, filtering is a skill to teach children,” says Pullen.

Similarly, it would be better to teach students how to fact-check what they get out of LLMs, including whether it applies to their country, rather than try to stop use entirely.

Language learning

LLMs may have particular benefits when it comes to learning languages. Thailand-based language education service Ling already offers a chatbot which draws on its existing content and accepts predetermined sets of answers. The company is working on using LLMs to extend this to allow customers to talk about what they like and for the chatbot to take on all sorts of roles. This could allow fine degrees of personalisation, such as practicing a forthcoming situation or having the chatbot adopt the accent and vernacular of a particular region of a country.

LLM-linked services could interact for as long as someone wants to use them, cover a huge range of subjects and languages, work consistently and improve over time based on feedback. Simon Bacher, Ling’s co-founder, thinks these will eventually replace online services that use human tutors. “In five to 10 years I don’t think this industry will exist as it is right now,” he says.

LLMs’ propensity for making things up matters less in fictionalised role-playing: “If it tells me wrong facts, it doesn’t really matter for language learning,” says Bacher. But it can be a problem for linguistic rules. LLMs are best at English, which has relatively few rules and dominates the internet. Bacher, a native German speaker, says ChatGPT is pretty good at his first language but does make linguistic mistakes. “You lose trust,” he says, adding that it is also prone to occasional errors in French.

He also believes LLMs are significantly worse at using languages other than the most popular European ones. Ling, which was established to teach Thai and now covers 23 Asian languages, will for the time being use human reviewers and impose controls over how LLMs contribute to its services, using CustomGPT.ai as a third-party tool with ChatGPT.

Ling plans to use general purpose LLMs, but Studiosity, a learning support service based in Sydney, plans to use two decades of its own advice to students as training data. In that time, its staff have reviewed millions of student essays, providing suggestions and feedback rather than corrections, with its services purchased by more than 100 universities globally. It expects to use its custom-built LLM to build hybrid capabilities that allow it to expand its provision of personalised learning services.

“Our university partners are clearly telling us that LLMs and AI are absolutely something we need to engage with, they need to engage with and they can see great learning capabilities and potential,” says Jack Goodman, Studiosity’s founder and chair, adding that great harm could be caused if this is not done well. Using LLMs trained on trusted material rather than the not-entirely accurate contents of the internet may help with factual reliability, although he says that feedback provided in seconds by software may not be taken seriously by students.

And there are bigger problems, given that LLMs are generally designed to accelerate work processes rather than enhance human understanding. Students applying them to homework and assignments could undermine their own learning as they do so, as productivity is not the point. “There are not going to be shortcuts to the learning process – there are going to be better ways to learn and worse ways to learn,” says Goodman. “But you won’t learn to know what you think if you don’t learn to express yourself with language.”



Source link