Dataset Restricted Access
Völschow, Marcel;
Buczek, P.;
Carreno-Mosquera, P.;
Mousavias, C.;
Reganova, S.;
Roldan-Rodriguez, E.;
Steinbach, Peter;
Strube, A.
<?xml version='1.0' encoding='UTF-8'?>
<record xmlns="http://www.loc.gov/MARC21/slim">
<leader>00000nmm##2200000uu#4500</leader>
<controlfield tag="005">20240912092419.0</controlfield>
<datafield tag="653" ind1=" " ind2=" ">
<subfield code="a">machine learning</subfield>
</datafield>
<datafield tag="653" ind1=" " ind2=" ">
<subfield code="a">deep learning</subfield>
</datafield>
<datafield tag="653" ind1=" " ind2=" ">
<subfield code="a">large language models</subfield>
</datafield>
<datafield tag="653" ind1=" " ind2=" ">
<subfield code="a">chatgpt</subfield>
</datafield>
<datafield tag="653" ind1=" " ind2=" ">
<subfield code="a">blablador</subfield>
</datafield>
<datafield tag="245" ind1=" " ind2=" ">
<subfield code="a">mlphys101 - Exploring the performance of Large-Language Models in multilingual undergraduate physics education</subfield>
</datafield>
<datafield tag="100" ind1=" " ind2=" ">
<subfield code="a">Völschow, Marcel</subfield>
</datafield>
<datafield tag="980" ind1=" " ind2=" ">
<subfield code="a">user-rodare</subfield>
</datafield>
<datafield tag="260" ind1=" " ind2=" ">
<subfield code="c">2024-09-09</subfield>
</datafield>
<datafield tag="773" ind1=" " ind2=" ">
<subfield code="a">https://www.hzdr.de/publications/Publ-39561</subfield>
<subfield code="i">isIdenticalTo</subfield>
<subfield code="n">url</subfield>
</datafield>
<datafield tag="773" ind1=" " ind2=" ">
<subfield code="a">10.14278/rodare.3136</subfield>
<subfield code="i">isVersionOf</subfield>
<subfield code="n">doi</subfield>
</datafield>
<datafield tag="520" ind1=" " ind2=" ">
<subfield code="a"><p>Large-Language Models such as ChatGPT have the potential to revo-<br>
lutionize academic teaching in physics in a similar way the electronic calculator,<br>
the home computer or the internet did. AI models are patient, produce answers<br>
tailored to a student’s needs and are accessible whenever needed. Those involved<br>
in academic teaching are facing a number of questions: Just how reliable are pub-<br>
licly accessible models in answering, how does the question’s language affect the<br>
models’ performance and how well do the models perform with more difficult tasks<br>
beyond retrieval? To adress these questions, we benchmark a number of publicly<br>
available models on the mlphys101 dataset, a new set of 823 university level MC5<br>
questions and answers released alongside this work. While the original questions<br>
are in English, we employ GPT-4 to translate them into various other languages,<br>
followed by revision and refinement by native speakers. Our findings indicate that<br>
state-of-the-art models perform well on questions involving the replication of facts,<br>
definitions, and basic concepts, but struggle with multi-step quantitative reason-<br>
ing. This aligns with existing literature that highlights the challenges LLMs face<br>
in mathematical and logical reasoning tasks. We conclude that the most advanced<br>
current LLMs are a valuable addition to the academic curriculum and LLM pow-<br>
ered translations are a viable method to increase the accessibility of materials, but<br>
their utility for more difficult quantitative tasks remains limited.</p>
<p>The dataset is available in English here only and will be removed, once the mlphys101 publication was accepted and released to the public.</p></subfield>
</datafield>
<controlfield tag="001">3137</controlfield>
<datafield tag="542" ind1=" " ind2=" ">
<subfield code="l">restricted</subfield>
</datafield>
<datafield tag="700" ind1=" " ind2=" ">
<subfield code="a">Buczek, P.</subfield>
</datafield>
<datafield tag="700" ind1=" " ind2=" ">
<subfield code="a">Carreno-Mosquera, P.</subfield>
</datafield>
<datafield tag="700" ind1=" " ind2=" ">
<subfield code="a">Mousavias, C.</subfield>
</datafield>
<datafield tag="700" ind1=" " ind2=" ">
<subfield code="a">Reganova, S.</subfield>
</datafield>
<datafield tag="700" ind1=" " ind2=" ">
<subfield code="a">Roldan-Rodriguez, E.</subfield>
</datafield>
<datafield tag="700" ind1=" " ind2=" ">
<subfield code="a">Steinbach, Peter</subfield>
<subfield code="0">(orcid)0000-0002-4974-230X</subfield>
</datafield>
<datafield tag="700" ind1=" " ind2=" ">
<subfield code="a">Strube, A.</subfield>
</datafield>
<datafield tag="500" ind1=" " ind2=" ">
<subfield code="a">The dataset is available in English here only and will be removed, once the mlphys101 publication was accepted and released to the public.</subfield>
</datafield>
<datafield tag="024" ind1=" " ind2=" ">
<subfield code="a">10.14278/rodare.3137</subfield>
<subfield code="2">doi</subfield>
</datafield>
<datafield tag="041" ind1=" " ind2=" ">
<subfield code="a">eng</subfield>
</datafield>
<datafield tag="980" ind1=" " ind2=" ">
<subfield code="a">dataset</subfield>
</datafield>
<datafield tag="909" ind1="C" ind2="O">
<subfield code="o">oai:rodare.hzdr.de:3137</subfield>
<subfield code="p">openaire_data</subfield>
<subfield code="p">user-rodare</subfield>
</datafield>
</record>
| All versions | This version | |
|---|---|---|
| Views | 552 | 552 |
| Downloads | 2 | 2 |
| Data volume | 660.5 kB | 660.5 kB |
| Unique views | 498 | 498 |
| Unique downloads | 2 | 2 |