Classifier Services (ORCS)

GitHub: https://github.com/OpenReqEU/requirements-classifier

The OpenReq Classifier Services (ORCS) component offers the implementation of a machine learning multiclass classifier and a multilabel classifier. In the context of OpenReq, this classifier is used to assign values to requirement properties.

By having a multiclass and multilabel classifier the component is not restricted to recommend values just to those requirement properties that can have assigned only one value (for instance, the priority of a requirement, since it does not make sense to have two priorities for the same requirement).

The idea of a multiclass classifier is that one entity (in our case, requirements) has to be classified into one of two or more labels (i.e., in the case of the component, a requirement property value). In contrast, the idea of a multilabel classifier is that more than one label can be assigned to the same entity. For achieving the multilabel classifier, the component transforms a multilabel classifier into different instances of binary classifiers (which is actually the simplest case of a multiclass classifier having just two labels), one for each possible label. As an example, imagine that we have the requirement property domain, which represents the possible departments in an organization that are responsible of a requirement. It does make sense that the domain is a multi-valued requirement property, since the same requirement can deal with topics that different departments are responsible for. The component would create a multiclass classifier, one for each one of the possible values of domain, having as possible labels Yes (i.e., the requirement is related to this domain) or No (the requirements are not related to this domain).

For implementing the multiclass and multilabel classifiers we have used the Mahout framework, which provides two implementations of Naïve Bayes. The component uses the Transformed Weight-normalized Complement Naive Bayes (TWCNB), which performs particularly well on imbalanced datasets [1].

The models representing the classifiers are stored in an internal database. These classifiers are represented in the database by the name of the company that the requirements pertain to and the requirement property that the classifier aims to predict. Therefore, it is not possible to have more than one classifier for the same company and property.

 

References

[1] J. Rennie, J. et al., Tackling the Poor Assumptions of Naive Bayes Text Classifiers, in ICML, 2003.