Kako „razmišljaju“ veliki jezični modeli i možemo li im vjerovati: studija slučaja testiranja ChatGPT-a na zadacima uvodnog statističkog kolegija

Jasminka Dobša

Kako „razmišljaju“ veliki jezični modeli i možemo li im vjerovati: studija slučaja testiranja ChatGPT-a na zadacima uvodnog statističkog kolegija
How large language models "think" and can we trust them: a case study of testing ChatGPT on tasks in an introductory statistics course

Author(s): Jasminka Dobša
Subject(s): South Slavic Languages, Higher Education , ICT Information and Communications Technologies, Sociology of Education
Published by: Akademsko politehničko društvo APOLD Rijeka
Keywords: large language models; ChatGPT; statistics; testing; Croatian language;

Summary/Abstract: The aim of the article is to try to identify cases in which large language models show behaviour similar to human thinking and in which they "think" differently, and to point out opportunities, risks and limits in the application of artificial intelligence in teaching, in the context of testing the ChatGPT model on student tasks in the field of statistics. The possibilities and limitations of large language models will be analysed, as well as how to overcome existing biases and shortcomings in this rapidly growing field. In the paper, a chatbot based on the large language model GPT-4 ChatGPT is tested as part of the introductory statistics course taught to second-year computer science students. The tests were conducted by manually entering 170 statistics quiz questions into the ChatGPT browser. The questions are divided into three categories: theoretical questions in which the knowledge is reproduced, theoretical questions in which the understanding of the field is tested, and exercises. The quiz questions were asked in Croatian and the answers given in Croatian were analysed. The accuracy in solving the quiz questions for students and ChatGPT was compared by question category with the Wilcoxon rank sum test. The results show that ChatGPT performs statistically better than students in the categories of theoretical questions where reproduction of knowledge and understanding is required, while students are more successful in solving the practise questions, but the difference in accuracy is not statistically significant (p < 0.01).

Details
Contents

Journal: Politehnika: Časopis za tehnički odgoj I obrazovanje

Issue Year: 7/2023
Issue No: 2
Page Range: 18-25
Page Count: 8
Language: Croatian

Content File-PDF

Back to list

Kako „razmišljaju“ veliki jezični modeli i možemo li im vjerovati: studija slučaja testiranja ChatGPT-a na zadacima uvodnog statističkog kolegija How large language models "think" and can we trust them: a case study of testing ChatGPT on tasks in an introductory statistics course

Kako „razmišljaju“ veliki jezični modeli i možemo li im vjerovati: studija slučaja testiranja ChatGPT-a na zadacima uvodnog statističkog kolegija
How large language models "think" and can we trust them: a case study of testing ChatGPT on tasks in an introductory statistics course