Using Code Generation in LLMs for Automatic Execution of Data Science Tasks

Venko Andonov

Using Code Generation in LLMs for Automatic Execution of Data Science Tasks
Using Code Generation in LLMs for Automatic Execution of Data Science Tasks

Author(s): Venko Andonov
Subject(s): Economy, Business Economy / Management, ICT Information and Communications Technologies, Socio-Economic Research
Published by: Университет за национално и световно стопанство (УНСС)
Keywords: Data science; LLM; Data processing
Summary/Abstract: Typically, most of the data science tasks start with data cleaning, preprocessing and feature engineering. The specifics of each one depends on the data itself as well as the final goal, and that is why it is not a straightforward process and requires human expertise and domain knowledge. The state of the art large language models (LLMs) have the capability to understand complex problems and generate programming code. In this paper, these capabilities are evaluated for proprietary and open source models and interpretation environments, for both public and private datasets in different domains. The results show that this is a feasible approach, but it requires the guidance of a human, especially for tasks that require specific knowledge about the data, its context and interpretation.

Details
Contents

Book: Selected Papers from the 13th International Conference on Application of Information and Communication Technology and Statistics in Economy and Education (ICAICTSEE - 2023), December 15-16th, 2023, UNWE, Sofia, Bulgaria

Page Range: 136-142
Page Count: 7
Publication Year: 2024
Language: English

Content File-PDF

Back to list

Using Code Generation in LLMs for Automatic Execution of Data Science Tasks Using Code Generation in LLMs for Automatic Execution of Data Science Tasks

Using Code Generation in LLMs for Automatic Execution of Data Science Tasks
Using Code Generation in LLMs for Automatic Execution of Data Science Tasks