Traditional classification algorithms consider learning problems that contain only one label, i.e., each example is associated with one single nominal target variable characterizing its property. However, the number of practical applications involving data with multiple target variables has increased. To learn from this sort of data, multi-label classification algorithms should be used. The task of learning from multi-label data can be addressed by methods that transform the multi-label classification problem into several single-label classification problems. In this work, two well known methods based on this approach are used, as well as a third method we propose to overcome some deficiencies of one of them, in a case study using textual data related to medical findings, which were structured using the bag-of-words approach. The experimental study using these three methods shows an improvement on the results obtained by our proposed multi-label classification method.
Algoritmos de classificação usualmente consideram problemas de aprendizado que contêm apenas um único rótulo, i.e., cada exemplo é associado a um único valor para o atributo meta. No entanto, um número crescente de aplicações envolve dados para os quais múltiplos atributos metas estão associados. Para esses casos, são utilizados algoritmos de classificação chamados multirrótulo. A tarefa de aprendizado com esses dados pode ser resolvida por métodos que transformam o problema em diversos problemas de classificação monorrótulo. Neste trabalho, dois métodos tradicionais baseados nessa abordagem são utilizados, bem como um terceiro método por nós proposto para superar algumas deficiências desses métodos. Também é realizado um estudo de caso utilizando dados textuais relacionados a laudos médicos, os quais foram estruturados utilizando a abordagem bag-of-words. O estudo experimental utilizando esses três métodos mostra uma melhora na qualidade de predição obtida pela utilização do método de classificação multirrótulo proposto neste trabalho.
The aim of data mining is to find useful knowledge inout of databases. In order to extract such knowledge, several methods can be used, among them machine learning (ML) algorithms. In this work we focus on ML algorithms that express the extracted knowledge in a symbolic form, such as rules. This representation may allow us to ''explain'' the data. Rule learning algorithms are mainly designed to induce classification rules that can predict new cases with high accuracy. However, these sorts of rules generally express common sense knowledge, resulting in many interesting and useful rules not being discovered. Furthermore, the domain independent biases, especially those related to the language used to express the induced knowledge, could induce rules that are difficult to understand. Exceptions might be used in order to overcome these drawbacks. Exceptions are defined as rules that contradict common believebeliefs. This kind of rules can play an important role in the process of understanding the underlying data as well as in making critical decisions. By contradicting the user's common beliefves, exceptions are bound to be interesting. This work proposes a method to find exceptions. In order to illustrate the potential of our approach, we apply the method in a real world data set to discover rules and exceptions in the HIV virus protein cleavage process. A good understanding of the process that generates this data plays an important role oin the research of cleavage inhibitors. We consider believe that the proposed approach may help the domain expert to further understand this process.