Dataset Restricted Access

mlphys101 - Exploring the performance of Large-Language Models in multilingual undergraduate physics education

Völschow, Marcel; Buczek, P.; Carreno-Mosquera, P.; Mousavias, C.; Reganova, S.; Roldan-Rodriguez, E.; Steinbach, Peter; Strube, A.


Dublin Core Export

<?xml version='1.0' encoding='utf-8'?>
<oai_dc:dc xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
  <dc:creator>Völschow, Marcel</dc:creator>
  <dc:creator>Buczek, P.</dc:creator>
  <dc:creator>Carreno-Mosquera, P.</dc:creator>
  <dc:creator>Mousavias, C.</dc:creator>
  <dc:creator>Reganova, S.</dc:creator>
  <dc:creator>Roldan-Rodriguez, E.</dc:creator>
  <dc:creator>Steinbach, Peter</dc:creator>
  <dc:creator>Strube, A.</dc:creator>
  <dc:date>2024-09-09</dc:date>
  <dc:description>Large-Language Models such as ChatGPT have the potential to revo-&#13;
lutionize academic teaching in physics in a similar way the electronic calculator,&#13;
the home computer or the internet did. AI models are patient, produce answers&#13;
tailored to a student’s needs and are accessible whenever needed. Those involved&#13;
in academic teaching are facing a number of questions: Just how reliable are pub-&#13;
licly accessible models in answering, how does the question’s language affect the&#13;
models’ performance and how well do the models perform with more difficult tasks&#13;
beyond retrieval? To adress these questions, we benchmark a number of publicly&#13;
available models on the mlphys101 dataset, a new set of 823 university level MC5&#13;
questions and answers released alongside this work. While the original questions&#13;
are in English, we employ GPT-4 to translate them into various other languages,&#13;
followed by revision and refinement by native speakers. Our findings indicate that&#13;
state-of-the-art models perform well on questions involving the replication of facts,&#13;
definitions, and basic concepts, but struggle with multi-step quantitative reason-&#13;
ing. This aligns with existing literature that highlights the challenges LLMs face&#13;
in mathematical and logical reasoning tasks. We conclude that the most advanced&#13;
current LLMs are a valuable addition to the academic curriculum and LLM pow-&#13;
ered translations are a viable method to increase the accessibility of materials, but&#13;
their utility for more difficult quantitative tasks remains limited.&#13;
&#13;
The dataset is available in English here only and will be removed, once the mlphys101 publication was accepted and released to the public.</dc:description>
  <dc:description>The dataset is available in English here only and will be removed, once the mlphys101 publication was accepted and released to the public.</dc:description>
  <dc:identifier>https://rodare.hzdr.de/record/3137</dc:identifier>
  <dc:identifier>10.14278/rodare.3137</dc:identifier>
  <dc:identifier>oai:rodare.hzdr.de:3137</dc:identifier>
  <dc:language>eng</dc:language>
  <dc:relation>url:https://www.hzdr.de/publications/Publ-39561</dc:relation>
  <dc:relation>doi:10.14278/rodare.3136</dc:relation>
  <dc:relation>url:https://rodare.hzdr.de/communities/rodare</dc:relation>
  <dc:rights>info:eu-repo/semantics/restrictedAccess</dc:rights>
  <dc:subject>machine learning</dc:subject>
  <dc:subject>deep learning</dc:subject>
  <dc:subject>large language models</dc:subject>
  <dc:subject>chatgpt</dc:subject>
  <dc:subject>blablador</dc:subject>
  <dc:title>mlphys101 - Exploring the performance of Large-Language Models in multilingual undergraduate physics education</dc:title>
  <dc:type>info:eu-repo/semantics/other</dc:type>
  <dc:type>dataset</dc:type>
</oai_dc:dc>
110
0
views
downloads
All versions This version
Views 110110
Downloads 00
Data volume 0 Bytes0 Bytes
Unique views 9494
Unique downloads 00

Share

Cite as