mlphys101 - Exploring the performance of Large-Language Models in multilingual undergraduate physics education

Völschow, Marcel; Buczek, P.; Carreno-Mosquera, P.; Mousavias, C.; Reganova, S.; Roldan-Rodriguez, E.; Steinbach, Peter; Strube, A.

doi:10.14278/rodare.3137

September 9, 2024 Dataset Restricted Access

mlphys101 - Exploring the performance of Large-Language Models in multilingual undergraduate physics education

Völschow, Marcel; Buczek, P.; Carreno-Mosquera, P.; Mousavias, C.; Reganova, S.; Roldan-Rodriguez, E.; Steinbach, Peter; Strube, A.

JSON-LD (schema.org) Export

{
  "name": "mlphys101 - Exploring the performance of Large-Language Models in multilingual undergraduate physics education", 
  "keywords": [
    "machine learning", 
    "deep learning", 
    "large language models", 
    "chatgpt", 
    "blablador"
  ], 
  "description": "<p>Large-Language Models such as ChatGPT have the potential to revo-<br>\r\nlutionize academic teaching in physics in a similar way the electronic calculator,<br>\r\nthe home computer or the internet did. AI models are patient, produce answers<br>\r\ntailored to a student\u2019s needs and are accessible whenever needed. Those involved<br>\r\nin academic teaching are facing a number of questions: Just how reliable are pub-<br>\r\nlicly accessible models in answering, how does the question\u2019s language affect the<br>\r\nmodels\u2019 performance and how well do the models perform with more difficult tasks<br>\r\nbeyond retrieval? To adress these questions, we benchmark a number of publicly<br>\r\navailable models on the mlphys101 dataset, a new set of 823 university level MC5<br>\r\nquestions and answers released alongside this work. While the original questions<br>\r\nare in English, we employ GPT-4 to translate them into various other languages,<br>\r\nfollowed by revision and refinement by native speakers. Our findings indicate that<br>\r\nstate-of-the-art models perform well on questions involving the replication of facts,<br>\r\ndefinitions, and basic concepts, but struggle with multi-step quantitative reason-<br>\r\ning. This aligns with existing literature that highlights the challenges LLMs face<br>\r\nin mathematical and logical reasoning tasks. We conclude that the most advanced<br>\r\ncurrent LLMs are a valuable addition to the academic curriculum and LLM pow-<br>\r\nered translations are a viable method to increase the accessibility of materials, but<br>\r\ntheir utility for more difficult quantitative tasks remains limited.</p>\r\n\r\n<p>The dataset is available in English here only and will be removed, once the mlphys101 publication was accepted and released to the public.</p>", 
  "inLanguage": {
    "name": "English", 
    "@type": "Language", 
    "alternateName": "eng"
  }, 
  "datePublished": "2024-09-09", 
  "creator": [
    {
      "name": "V\u00f6lschow, Marcel", 
      "@type": "Person"
    }, 
    {
      "name": "Buczek, P.", 
      "@type": "Person"
    }, 
    {
      "name": "Carreno-Mosquera, P.", 
      "@type": "Person"
    }, 
    {
      "name": "Mousavias, C.", 
      "@type": "Person"
    }, 
    {
      "name": "Reganova, S.", 
      "@type": "Person"
    }, 
    {
      "name": "Roldan-Rodriguez, E.", 
      "@type": "Person"
    }, 
    {
      "name": "Steinbach, Peter", 
      "@type": "Person", 
      "@id": "https://orcid.org/0000-0002-4974-230X"
    }, 
    {
      "name": "Strube, A.", 
      "@type": "Person"
    }
  ], 
  "@id": "https://doi.org/10.14278/rodare.3137", 
  "sameAs": [
    "https://www.hzdr.de/publications/Publ-39561"
  ], 
  "@type": "Dataset", 
  "identifier": "https://doi.org/10.14278/rodare.3137", 
  "@context": "https://schema.org/", 
  "url": "https://rodare.hzdr.de/record/3137"
}

110

views

downloads

See more details...

	All versions	This version
Views	110	110
Downloads	0	0
Data volume	0 Bytes	0 Bytes
Unique views	94	94
Unique downloads	0	0

More info on how stats are collected.

Publication date:

September 9, 2024

DOI:

Keyword(s):

machine learning deep learning large language models chatgpt blablador

Related identifiers:

Identical to:
https://www.hzdr.de/publications/Publ-39561

Communities:

RODARE

Versions

Version 1 10.14278/rodare.3137

Sep 9, 2024

Cite all versions? You can cite all versions by using the DOI 10.14278/rodare.3136. This DOI represents all versions, and will always resolve to the latest one. Read more.

mlphys101 - Exploring the performance of Large-Language Models in multilingual undergraduate physics education

JSON-LD (schema.org) Export

Versions

Share

Cite as

Export

About

Help

Contribute

Follow us

Registered in

mlphys101 - Exploring the performance of Large-Language Models in multilingual undergraduate physics education

JSON-LD (schema.org) Export

RODARE DOI Badge

DOI

10.14278/rodare.3137

Markdown

[![DOI](https://rodare.hzdr.de/badge/DOI/10.14278/rodare.3137.svg)](https://doi.org/10.14278/rodare.3137)

reStructedText

.. image:: https://rodare.hzdr.de/badge/DOI/10.14278/rodare.3137.svg :target: https://doi.org/10.14278/rodare.3137

HTML

<a href="https://doi.org/10.14278/rodare.3137"><img src="https://rodare.hzdr.de/badge/DOI/10.14278/rodare.3137.svg" alt="DOI"></a>

Image URL

https://rodare.hzdr.de/badge/DOI/10.14278/rodare.3137.svg

Target URL

https://doi.org/10.14278/rodare.3137

Versions

Share

Cite as

Export