Using Code Generation in LLMs for Automatic Execution of Data Science Tasks Cover Image

Using Code Generation in LLMs for Automatic Execution of Data Science Tasks
Using Code Generation in LLMs for Automatic Execution of Data Science Tasks

Author(s): Venko Andonov
Subject(s): Economy, Business Economy / Management, ICT Information and Communications Technologies, Socio-Economic Research
Published by: Университет за национално и световно стопанство (УНСС)
Keywords: Data science; LLM; Data processing
Summary/Abstract: Typically, most of the data science tasks start with data cleaning, preprocessing and feature engineering. The specifics of each one depends on the data itself as well as the final goal, and that is why it is not a straightforward process and requires human expertise and domain knowledge. The state of the art large language models (LLMs) have the capability to understand complex problems and generate programming code. In this paper, these capabilities are evaluated for proprietary and open source models and interpretation environments, for both public and private datasets in different domains. The results show that this is a feasible approach, but it requires the guidance of a human, especially for tasks that require specific knowledge about the data, its context and interpretation.

Toggle Accessibility Mode