(TFG) Supporting the Design of Evolvable and Replicable Machine Learning Pipelines in Jupiter Notebook

Description:  Machine learning is a branch of artificial intelligence which focuses on the use of data and algorithms to produce machine learning models to imitate intelligent human behaviour. A machine learning pipeline is the code that builds a machine learning model. Usually, machine learning pipelines (defined in the experimentation stage) are written in Python using Jupiter Notebook (a server-client application that allows editing and running notebook documents via a web browser). The rapid development of these pipelines in an experimental stage together with the lack of application of software engineering best practices make it difficult to maintain, evolve and replicate the pipeline for the construction of the corresponding machine learning models.  Therefore, the transition of machine learning pipelines from experimentation to production is currently anecdotal. This Final Degree Project (TFG) aims to improve machine learning pipelines maintainability, evolvability and replication, and ease their transition from experimentation to production. To do so, it proposes the use of object-oriented concepts to define suitable architectural elements needed to design Python machine learning pipelines in Jupiter Notebook.  Such architectural elements will support the design of mantenible, evolvable and replicable Python machine learning pipelines in Jupiter Notebook, easing their transition from experimentation to production.  To demonstrate the feasibility of the proposal, we present an example in the domain of tweet sentiment analysis.

Degree: GEI.

Requirements: Software engineering specialization

Technologies: Python and  machine learning (both recommended)

Contact: Claudia Ayala, Cristina Gómez

e-mail: cayala@essi.upc.edu, cristina@essi.upc.edu

Offer expiration date: 15-9-2023