TFG: MLSToolbox Code Assessment: A tool for evaluating ML pipeline code

Description: Machine learning is a branch of artificial intelligence which focuses on the use of data and algorithms to produce machine learning models to imitate intelligent human behaviour. A machine learning pipeline is the code that builds a machine learning model. Usually, machine learning pipelines (defined in the experimentation stage) are written in Python using Jupiter Notebook (a server-client application that allows editing and running notebook documents via a web browser). The rapid development of these pipelines in an experimental stage together with the lack of application of software engineering best practices make it difficult to maintain, evolve and replicate the pipeline for the construction of the corresponding machine learning models. Therefore, the transition of machine learning pipelines from experimentation to production is currently anecdotal. This Final Degree Project (TFG) aims to support data scientists in assessing ML pipeline code according to software design principles (e.g., coupling and SOLID), identifying potential violations, and providing recommendations for improving code quality and facilitating their maintainability in production environments.

Degree: GEI, MEI

Forms of collaboration: TFG, Master Thesis

Requirements: Software engineering specialization

Technologies: Python and machine learning (both recommended)

Contact: Claudia Ayala, Cristina Gómez, Lidia López

e-mail: claudia.ayala@upc.edu, cristina.gomez@upc.edu, lidia.lopez@upc.edu