Dataset Restricted Access

mlphys101 - Exploring the performance of Large-Language Models in multilingual undergraduate physics education

Völschow, Marcel; Buczek, P.; Carreno-Mosquera, P.; Mousavias, C.; Reganova, S.; Roldan-Rodriguez, E.; Steinbach, Peter; Strube, A.


JSON Export

{
  "doi": "10.14278/rodare.3137", 
  "conceptrecid": "3136", 
  "conceptdoi": "10.14278/rodare.3136", 
  "links": {
    "badge": "https://rodare.hzdr.de/badge/doi/10.14278/rodare.3137.svg", 
    "doi": "https://doi.org/10.14278/rodare.3137", 
    "conceptbadge": "https://rodare.hzdr.de/badge/doi/10.14278/rodare.3136.svg", 
    "conceptdoi": "https://doi.org/10.14278/rodare.3136", 
    "html": "https://rodare.hzdr.de/record/3137", 
    "latest": "https://rodare.hzdr.de/api/records/3137", 
    "latest_html": "https://rodare.hzdr.de/record/3137"
  }, 
  "metadata": {
    "doi": "10.14278/rodare.3137", 
    "access_conditions": "<p>Access will be granted based on reasonable request until we wait for the publication by the original author of the dataset on another platform.</p>", 
    "publication_date": "2024-09-09", 
    "title": "mlphys101 - Exploring the performance of Large-Language Models in multilingual undergraduate physics education", 
    "access_right": "restricted", 
    "relations": {
      "version": [
        {
          "parent": {
            "pid_type": "recid", 
            "pid_value": "3136"
          }, 
          "count": 1, 
          "last_child": {
            "pid_type": "recid", 
            "pid_value": "3137"
          }, 
          "is_last": true, 
          "index": 0
        }
      ]
    }, 
    "communities": [
      {
        "id": "rodare"
      }
    ], 
    "doc_id": "1", 
    "notes": "The dataset is available in English here only and will be removed, once the mlphys101 publication was accepted and released to the public.", 
    "resource_type": {
      "title": "Dataset", 
      "type": "dataset"
    }, 
    "creators": [
      {
        "name": "V\u00f6lschow, Marcel"
      }, 
      {
        "name": "Buczek, P."
      }, 
      {
        "name": "Carreno-Mosquera, P."
      }, 
      {
        "name": "Mousavias, C."
      }, 
      {
        "name": "Reganova, S."
      }, 
      {
        "name": "Roldan-Rodriguez, E."
      }, 
      {
        "orcid": "0000-0002-4974-230X", 
        "name": "Steinbach, Peter"
      }, 
      {
        "name": "Strube, A."
      }
    ], 
    "related_identifiers": [
      {
        "identifier": "https://www.hzdr.de/publications/Publ-39561", 
        "scheme": "url", 
        "relation": "isIdenticalTo"
      }, 
      {
        "identifier": "10.14278/rodare.3136", 
        "scheme": "doi", 
        "relation": "isVersionOf"
      }
    ], 
    "keywords": [
      "machine learning", 
      "deep learning", 
      "large language models", 
      "chatgpt", 
      "blablador"
    ], 
    "description": "<p>Large-Language Models such as ChatGPT have the potential to revo-<br>\r\nlutionize academic teaching in physics in a similar way the electronic calculator,<br>\r\nthe home computer or the internet did. AI models are patient, produce answers<br>\r\ntailored to a student\u2019s needs and are accessible whenever needed. Those involved<br>\r\nin academic teaching are facing a number of questions: Just how reliable are pub-<br>\r\nlicly accessible models in answering, how does the question\u2019s language affect the<br>\r\nmodels\u2019 performance and how well do the models perform with more difficult tasks<br>\r\nbeyond retrieval? To adress these questions, we benchmark a number of publicly<br>\r\navailable models on the mlphys101 dataset, a new set of 823 university level MC5<br>\r\nquestions and answers released alongside this work. While the original questions<br>\r\nare in English, we employ GPT-4 to translate them into various other languages,<br>\r\nfollowed by revision and refinement by native speakers. Our findings indicate that<br>\r\nstate-of-the-art models perform well on questions involving the replication of facts,<br>\r\ndefinitions, and basic concepts, but struggle with multi-step quantitative reason-<br>\r\ning. This aligns with existing literature that highlights the challenges LLMs face<br>\r\nin mathematical and logical reasoning tasks. We conclude that the most advanced<br>\r\ncurrent LLMs are a valuable addition to the academic curriculum and LLM pow-<br>\r\nered translations are a viable method to increase the accessibility of materials, but<br>\r\ntheir utility for more difficult quantitative tasks remains limited.</p>\r\n\r\n<p>The dataset is available in English here only and will be removed, once the mlphys101 publication was accepted and released to the public.</p>", 
    "language": "eng", 
    "access_right_category": "danger", 
    "pub_id": "39561"
  }, 
  "created": "2024-09-10T09:37:15.570547+00:00", 
  "updated": "2024-09-12T09:24:19.933452+00:00", 
  "revision": 4, 
  "stats": {
    "volume": 0.0, 
    "unique_downloads": 0.0, 
    "version_unique_downloads": 0.0, 
    "unique_views": 94.0, 
    "downloads": 0.0, 
    "version_unique_views": 94.0, 
    "version_views": 110.0, 
    "version_downloads": 0.0, 
    "version_volume": 0.0, 
    "views": 110.0
  }, 
  "owners": [
    156
  ], 
  "id": 3137
}
110
0
views
downloads
All versions This version
Views 110110
Downloads 00
Data volume 0 Bytes0 Bytes
Unique views 9494
Unique downloads 00

Share

Cite as