NLP4SE Feature Extraction from Mobile App Reviews

Set of contributions for extracting mobile app features from user-generated reviews leveraging traditional natural language processing methods and large language models (LLMs).

Name: TransFeatEx: a NLP pipeline for feature extraction

Description: Standalone, decoupled pipeline that combines a RoBERTa-based model with consolidated syntactic and semantic techniques to extract mobile app features from texts such as descriptions and reviews. Provides batch processing via API and a local playground.

Scope: RoBERTa-powered linguistic annotation, noun phrase selection and filtering, optional sentiment-based filtering, cleaning and normalization, export of extracted features.

Authors: Agustí Gállego, Quim Motger, Xavier Franch, Jordi Marco

paper · code

demo video


Name: T-FREX: Transformer-based feature extraction from mobile app reviews

Description: Redefines app feature extraction as a named entity recognition task at token level. Trains transformer models on a crowdsourced ground truth to identify feature spans in review sentences, and evaluates against a syntactic baseline.

Scope: Token classification (B/I/O), evaluation in in-domain and out-of-domain settings, comparison with SAFE-like baselines, human validation of new features.

Authors: Quim Motger, Alessio Miaschi, Felice Dell'Orletta, Xavier Franch, Jordi Marco

paper · code


Name: FeClustRE: Feature clustering for requirements engineering

Description: Hybrid framework that integrates feature extraction, hierarchical clustering with auto-tuning, and LLM-based semantic labelling to build interpretable taxonomies of app review features.

Scope: Embedding-based clustering, dendrogram cut auto-tuning with internal metrics, semantic tag generation and taxonomy merging, evaluation on public benchmarks and on reviews of LLM assistant apps.

Authors: Max Tiessler, Quim Motger

code and materials