Instruction manuals

21 views

Genetics-Based Machine Learning for Rule Induction: State of the Art, Taxonomy, and Comparative Study

Genetics-Based Machine Learning for Rule Induction: State of the Art, Taxonomy, and Comparative Study
of 28

Please download to get full document.

View again

All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Share
Tags
Transcript
  1 Genetics-Based Machine Learning for RuleInduction: Taxonomy, Experimental Study andState of the Art Alberto Fern´andez, Salvador Garc´ıa, Juli´an Luengo, Ester Bernad´o-Mansilla and Francisco Herrera  Abstract —The classification problem can be addressed bynumerous techniques and algorithms, which belong to differentparadigms of Machine Learning. In this work, we are interestedin evolutionary algorithms, the so-called Genetics-Based MachineLearning algorithms. In particular, we will focus on evolutionaryapproaches that evolve a set of rules, i.e., evolutionary rule-basedsystems, applied to classification tasks, in order to provide a state-of-the-art in this field.This study has been done with a double aim: to present ataxonomy of the Genetics-Based Machine Learning approachesfor rule induction, and to develop an empirical analysis bothfor standard classification and for classification with imbalanceddata sets.We also include a comparative study of the GBML methodswith some classical non-evolutionary algorithms, in order toobserve the suitability and high power of the search performedby evolutionary algorithms and the behaviour for the GBMLalgorithms in contrast to the classical approaches, in terms of classification accuracy.  Index Terms —Classification, Evolutionary Algorithms,Genetics-Based Machine Learning, Imbalanced data sets, RuleInduction, Learning Classifier Systems, Taxonomy I. I NTRODUCTION C LASSIFICATION in Machine Learning [1] is a tech-nique that, from a set of   n  input patterns  w 1 ,...,w n characterised by  i  attributes  a 1 ,...,a i , which can includenumerical or nominal values, and  m  classes  c 1 ,...,c m , has theobjective of obtaining a system that automatically assigns toeach pattern a class label  c n . Specifically, a classifier is a map-ping function defined over the patterns, A i → { c 1 ,...,c m } ( A stands for the set of attributes) generated by a learning algo-rithm. This methodology is also known as Pattern Recognition[2].Learning is the building process of the classifier. In thisprocess, the input patterns will be used to train the modelin order to describe the attribute space accurately. Supervisedlearning is carried out when the input training patterns havelabeled classes and, in this manner, the learning algorithm isable to build a model that matches regions of the attributespace with an associated class. A. Fern´andez, J. Luengo and F. Herrera are with the Department of Com-puter Science and Artificial Intelligence, University of Granada, Spain (email:alberto@decsai.ugr.es, julianlm@decsai.ugr.es and herrera@decsai.ugr.es).S. Garc´ıa is with the Department of Computer Science, University of Ja´en, Spain (e-mail: sglopez@ujaen.es).E. Bernad´o-Mansilla is with the Grup de Recerca en Sistemes Intelligents,Enginyeria i Arquitectura La Salle, Universitat Ramon Llull, Spain (e-mail:esterb@salle.url.edu). Over the years, several approaches have been designed forclassification tasks such as decision trees [3], support vectormachines [4], instance-based algorithms [5], and probabilisticclassifiers (such as Na¨ıve-Bayes [6]) among others (see [7]as an interesting book covering different Machine Learningtechniques).Rule induction algorithms aim at discovering a descriptionfor the target concept in the form of explicit rules formulatedin terms of tests for certain values of the attributes [8].The resulting rule set should be able to correctly recognizeinstances of the target concept and discriminate them fromobjects that do not belong to it. The use of rule sets asknowledge representation also makes them very competitivein terms of interpretability since the rules can be read easilyby human experts.Evolutionary rule-based systems [9] are a type of Genetics-Based Machine Learning (GBML) that use sets of rules asknowledge representation [10]. One of the strengths of theseapproaches is the use of the Evolutionary Algorithms (EAs)as search mechanisms which allows for efficient searches overcomplex search spaces [11].At present, there is a huge set of algorithms proposed underthe evolutionary rule-based paradigm. Some of them havebeen in use for the last decade, such as XCS [12], SIA [13]and GIL [14], and many others have been proposed recently,such as UCS [15] and HIDER [16]. Nevertheless, there is noexplicit framework where all these methods can be categorised.Furthermore, there are no exhaustive comparisons of theperformance of the methods. In many cases, the new proposalsare compared to a limited set of problems and with respect to asmall number of algorithms, usually, of a given type [17], [18].There is no exhaustive analysis of the algorithms’ performanceacross different families. Our motivation for this work is toprovide a state-of-the-art summary of the GBML algorithmsfor rule induction in classification tasks, proposing a taxonomythat defines a general framework within which each algorithmcan be placed.Classically, rule-based systems have been classified into twogroups: Michigan-style GBML [19], [20], and Pittsburgh-styleGBML [21], [22]. But many approaches do not fall easilyinto one of these categories and are simply called hybridapproaches. We propose a taxonomy based on the representa-tion of the chromosome of the associated EA. The taxonomypreserves the Michigan and Pittsburgh categories and definesother families in this field, hence including all the state-of-the-art evolutionary GBML methods for rule induction.  2 The main algorithms proposed in the literature have beenselected in order to place them within the framework of the proposed taxonomy. This allows the development of anexhaustive and well-supported analysis to compare the per-formance of all the algorithms among themselves and withrespect to some well known non-evolutionary algorithms suchas CART [23], AQ [24], CN2 [25], C4.5 [3], C4.5-Rules [26]and RIPPER [27]. To address this, a methodology based onhierarchical comparisons is proposed, first identifying the bestperformers inside each category and then, selecting them asrepresentatives of the category for the cross-family compari-son. This hierarchical comparison has been used in order tosimplify the empirical study and to focus on the algorithmswith better behaviour.In addition to standard classification problems, this paper isadditionally concerned with imbalanced data sets, also knownas the class imbalance problem, which has been defined asa current challenge of the Data Mining community [28].This refers to the cases where one class, usually the onethat contains the concept to be learnt (the positive class),is underrepresented in the data set [29]. Since this issueis present in many real-world situations, we are particularlyinterested in analysing the performance of the algorithms insuch cases. We will analyse the results using the originaldata sets and with preprocessing in order to study two facts:how the different algorithms directly deal with the imbalanceproblem and the need of using preprocessing algorithms tohelp improve classification performance in GBML methods.The experimental framework has been designed so as toprovide well-founded conclusions. We use a set of 30 real-world problems, which is usually considered to be a repre-sentative sample. We use two different sets of real-worldproblems. The first one is selected randomly from the UCIrepository [30], without any prerequisite, and the second oneis selected to be representative of imbalanced class problems.The measures of performance are based on the accuracy rateand Cohen’s Kappa metric [31], which is less biased. In thecase of imbalanced problems, we use the geometric mean of the accuracy rates per class. The significance of the resultsis supported by the proper statistical tests as suggested in theliterature [32], [33].Finally, we will discuss some lessons learned and newchallenges within this topic, in correspondence to the resultsacquired in this study.The rest of this paper is organised as follows. In SectionII we present our taxonomy proposal for GBML algorithmsfor rule induction and we describe the approaches used in thiswork. Section III introduces the experimental framework, thatis, the performance measures used in the study, the benchmark data sets and the parameters, the statistical tests, the classicalalgorithms employed for comparison and the software toolin which all the selected methods are implemented and thatwe have used for the design of the experiments. We developthe empirical analysis for standard classification in SectionIV whereas in Section V this analysis is oriented towardsimbalanced data sets. The lessons learned throughout thisstudy are presented in Section VI. Finally, in Section VII wemake our concluding remarks.II. T AXONOMY OF  G ENETICS -B ASED  M ACHINE L EARNING  A LGORITHMS FOR  C LASSIFICATION In this section, we first propose a taxonomy for the differentGBML approaches for rule induction, classifying them intodifferent families. Then, we introduce some basic featuresof the GBML algorithms that enable us to highlight thesimilarities and differences among the existing methods in thiscontext. Finally, we describe each one of the GBML families,presenting the representative algorithms that we have selectedwithin each category.  A. Taxonomy Many proposals have been developed under the label of GBML for rule induction. All these approaches share anunderlying commonality: the codification of the rule set or rulein a chromosome and the use of an evolutionary algorithm asthe search mechanism. However, there are many differencesamong the algorithms, such as the type of learning scheme(supervised vs. reinforcement learning or online vs offlinelearning), the rule set representation (i.e., a decision list ora set of overlapping rules), the representation of each rule(i.e., the management of numerical and nominal values), orthe design of the EA that guides the search.With the aim of categorising the GBML methods for ruleinduction presented in the literature, we distinguish amongthree different families based on the chromosome codification:1)  Chromosome = Rule . In this type of algorithms, eachrule is encoded in a single chromosome and the wholepopulation evolves to the final rule-set. We considerthree subcategories of methods: •  The Michigan approach [34], [35], in which therule-set is incrementally updated through the se-quential observation of training examples and theircorresponding classification. Eventually, the rule-setis improved by the action of the EA. •  The Iterative Rule Learning (IRL) approach [13],where the EA learns rule by rule iteratively andremoves from the training set the examples covered(matched) by each new rule, in a divide-and-conquerstyle. •  The Genetic Cooperative CompetitiveLearning (GCCL) approach [36], where allrules/chromosomes are evolved together in the EA.The chromosomes cooperate among themselves toperform the classification task, but the final rule setdoes not need to include all the rules. Hence, thereis a competition to be among the best fitted rulesand therefore to remain in the rule base.2)  Chromosome = Rule Set  . This category is known as thePittsburgh approach [22], [37]. These types of methodsare a result of directly extending GAs [38] to supervisedlearning problems. The system maintains a populationof candidate rule sets whose quality is evaluated with afitness function that considers different aspects such asthe prediction accuracy and the generalization of the rulesets. Once the classifier is trained, the best individual  3 TABLE IS UMMARY OF THE ALGORITHMS EMPLOYED IN THIS WORK Algorithm name Acronym Family Reference XCS XCS Michigan Wilson (1995) [12]UCS UCS Michigan Bernad´o-Mansilla and Garrell (2003) [15]Supervised Inductive Algorithm SIA IRL Venturini (1993) [13]HIerarchical DEcision Rules HIDER IRL Aguilar-Ruiz et al. (2007) [16]CO-Evolutionary Rule Extractor CORE GCCL Tan et al. (2006) [39]Organizational Co-Evolutionary algorithm for Classification OCEC GCCL Jiao et al. (2006) [40]COverage-based Genetic INduction COGIN GCCL Greene and Smith (1993) [36]Genetic-based Inductive Learning GIL Pittsburgh Janikow (1993) [14]Pittsburgh Genetic Interval Rule Learning Algorithm Pitts-GIRLA Pittsburgh Corcoran and Sen (1994) [41]Data Mining for Evolutionary Learning DMEL Pittsburgh Au et al. (2003) [42]Genetic Algorithms based claSSIfier sySTem GASSIST Pittsburgh Bacardit et al. (2007) [43]Ordered Incremental Genetic Algorithm OIGA Pittsburgh Zhu and Guan (2004) [44]Incremental Learning with Genetic Algorithms ILGA Pittsburgh Guan and Zhu (2005) [45]Hybrid Decision Tree - Genetic Algorithm DT-GA HEDT Carvalho and Freitas (2004) [46]Oblique Decision Tree Oblique-DT HEDT Cant´u-Paz and Kamath (2003) [47]Tree Analysis with Randomly Generated and Evolved Trees TARGET HEDT Gray and Fan (2008) [48] found during the evolutionary process is used to predictthe class of unknown examples.3)  Chromosome = Decision Tree or Tree Rule . This typeof GBML contains mechanisms that combine decisiontrees with genetic algorithms (GAs) [46], [47]. Theunderlying idea is to use the search capability of the GAto find a highly accurate final tree. The tree can then beinterpreted as a rule set, formed by the disjunction of the rules that are obtained in each branch of the tree.Thus the chromosome can encode the full tree or a justtree node/rule. We have defined this family as HybridEvolutionary Decision Trees (HEDTs).In this work, we have considered the algorithms that havebeen published in the most influential journals on the topicand only classical methods proposed in conference contribu-tions. Genetic Fuzzy Systems [49], methods based on GeneticProgramming [50], and distributed approaches [51], [52] areout of the scope of this study.Table I summarises the list of selected algorithms of eachtype. All algorithms were implemented by the authors of thispaper, except for GASSIST and HIDER whose implementa-tions were directly supplied by their corresponding authors.Furthermore, these methods are included in the KEEL 1 soft-ware [53].  B. Preliminaries: Genetics-Based Machine Learning Features With the aim of giving a general background of the selectedalgorithms, in this section we describe the main componentswhich are relevant to GBML algorithms and at the same timedistinguish them from non-evolutionary learning algorithms.This also leads to providing an educational character to thepaper content. We considered the following characteristics:1)  Chromosome representation:  As mentioned before,the underlying commonality of GMBL is the use of an EA as the search mechanism. In this respect, thefirst consideration is the representation of the solutionin a population of chromosomes. The choice of thechromosome as codifying a single rule or a rule setdetermines largely the operation of the algorithm, and 1 http://www.keel.es that is the motivation for basing our taxonomy on thechromosome representation.In any case, the rule set in its whole may operate asa set of non-ordered rules, either overlapped or notoverlapped, or as a decision list. Also, the inference type(the classification process itself) is very dependent onthe type of rule used in the algorithm. The details areas follows: •  Non-ordered overlapping rule sets, also called “IF-THEN” rules. Since the rules can be overlapping,the whole rule set does not necessarily cover thefull space of inputs. At classification time, if theexample matches more than a rule, a voting methodis usually applied, where the output is the classadvocated by the majority. Other approaches includea weighted voting, or a measure of the distance tothe nearest rule [54]. In the case of a tie, some ap-proaches leave the example as non-classified, whileothers assign the class randomly. •  Non-ordered, non-overlapped rule sets. This is thecase of rule sets obtained from decision trees oraxis-parallel hyperplanes, which split the searchspace into a set of non-overlapping regions. In thissense, the rule set covers all the input space. Theclassification process with these rules is trivial, sinceonly one rule can match the example. •  Ordered rule sets, also called “IF-THEN-ELSE”rules or decision lists [55]. The classification issimply made using the first rule in the list thatmatches the example. It is very common to usea “default rule” that explains the input space notcovered by any of the rules in the set.Each rule has the form  condition  →  class , where thecondition specifies the set of input states that the rulematches, i.e., the region of the search space covered bythe rule, and the class is the classification that the ruleadvocates for that region. The condition part is usuallyrepresented as a conjunction of tests over the attributes.There are two main schemes: •  “Attribute  < operator >  value”. A test is a conditionover an attribute value, which is specified differentlydepending on whether the attribute is nominal or  4 numerical: –  Nominal attributes: when the attribute can takea value among a list of possible categories, thetest over the attribute consists in checking forequality to a given category or belonging to asubset of categories. That is, “ < operator > ” maybe in either  { = ,  = , ∈ , ¬ ∈} . –  Numerical attributes: if the attribute can takenumerical values, the test checks whether thevalue belongs to a given interval. That is, the“ < operator > ” may be “ <, ≤ , = , ≥ ,> ”. Interval-rules take the form “Attribute  ∈  [min,max]”which is equal to the case of “Attribute  ≥  minAND Attribute  ≤  max”.Disjunction of attributes within a condition can alsobe codified, although this is rarely implementedin GBML algorithms. The disjunction of attributevalues can be alternatively codified with the disjun-ction of several rules. This may enlarge the rule set,but simplifies the complexity of the rule.We must also remark that some algorithms treat thenominal attributes as ordered numerical values andconsequently use the representations designed fornumerical attributes. •  Oblique Rules: The tests over the attributes arecodified by a hyperplane, with the following form:  di =1 a i x i + a d +1  >  0 , where the  a i  are real-valuedcoefficients [56], [57] and  x i  is the value of the  i -th attribute. This is an extension of the axis-parallelhyperplanes in the attribute space used by decisiontrees.It is common that not all the attributes are relevant tothe classification. Some algorithms explicitly allow thecodification of irrelevant attributes with the so-called“don’t care” symbol. This favours the generalizationof the rules. Some approaches enable the chromosometo represent variable length conditions. For algorithmsusing a fixed chromosome length, the representation of the “don’t care” condition depends on the selected rulescheme defined above: •  For the “Attribute  < operator >  value” rules, we canrepresent the “don’t care” using an specific specialvalue with the  =  operator, including all possiblevalues for  ∈  (none for  ¬ ∈ ), using the maximumrange or minimum range for the  <, ≤  and  >, ≥ operators respectively, or applying an inversion of operators for the interval-rules, that is, having the“min” value higher than the “max” one. •  For the oblique rules, it is only needed to assign aweight 0 for those variables that are not included inthe antecedent.2)  Learning scheme: This feature refers to the procedure in which the al-gorithm learns the rule set from a set of pre-classifiedexamples.Depending on how examples are provided, learning canbe performed in an incremental mode (online method),or in a batch mode (offline method). The former schemeimplies that the knowledge can be updated as newexamples arrive to the system. Michigan approachesusually follow this. The latter model does not easilyallow updates of the rule set once the algorithm istrained. If new information is available, the whole rulebase should be retrained, often from scratch. Trainingstarts with the whole training data set available. Theinternal operation of the algorithm may decide whetherto use the whole data set in each iteration to build themodel (as in the Pittsburgh approach), or incrementallyreduce it by deleting the examples already covered, suchas in those approaches following a “divide-and-conquer”strategy.Learning can also happen in a supervised, unsupervisedor reinforcement learning environment. In a supervisedenvironment, the learning algorithm knows the class of the example. In reinforcement learning, the algorithmgets a reward after predicting the class of the example.The algorithm should build its model based on the posi-tive and negative (or absence of) rewards. Unsupervisedclassification belongs to the case where no externalfeedback is given to the learner. This case does not fallinto the scope of the paper.3)  Fitness Function: The assessment of the quality of the rule set is essentialto evolve an appropriate model from examples. Sincethe search of the EA is guided by the fitness function, itmust include the quality measures that reflect the type of rule set required. The optimal rule set is usually that withhigh accuracy and low complexity. Accuracy is usuallyconsidered as the percentage of correct classifications,although more sophisticated metrics may arise such asthose considering the percentage of classification perclass. Complexity is measured with the number of rulesor the generalization of the individual rules. We mustremark that these measures of quality are applied both atthe rule set level and for the individual rules, dependingon what the EA evolve.Fitness may also include niching or sharing techniques,or token competition as in [58], which enables to adjustthe global fitness and optimise the search towards dif-ferent areas in the problem space by penalising thosesolutions/rules which are similar or which cover thesame training examples.4)  Design of the EA: As mentioned before, all GBML algorithms have asa common characteristic the use of a search processbased on an EA to learn the final rule set. They usuallyapply a GA which can be generational or steady-state. Inthe former case, the population is replaced completelyiteration by iteration, possibly with an elitism schemein which the best individual(s) remain(s) in the newpopulation. The steady-state model only modifies asubset of individuals (normally only two) during eachiteration.5)  Specific Operators: Each new proposal in the context of GBML algorithms  5 not only implies the choice of representation, learningmethodology, and design of the EA scheme, but also theuse of specific operators for the knowledge discovery.These usually represent the most significant novelty of the approach and accentuate the differences among themethods.These specific operators include the heuristics to initia-lise the population, the use of covering algorithms, andthe crossover and mutation operators. Since they areparticular to each algorithm, we will focus on this issueon each single description when needed. C. Taxonomy Families and Algorithms’ Description The description of the methods used in this study is summa-rised in a table for each one of the proposed families, in whichwe detail the main characteristics with regard to the GBMLfeatures stated previously. Therefore, the reader can observethe similarities and differences among the algorithms of thesame category or between any two methods. Afterwards, wepresent a brief description of the intrinsic operators associatedwith each one of the algorithms to provide the specific detailsfor each of them. 1) Michigan Approach:  Michigan-style algorithms [34],[35] evolve a set of rules called “classifiers” which areincrementally updated through the sequential observation of training examples. The best “classifiers”, which are those thatcorrectly cover a higher proportion of training examples, areselected and propagated over other less useful “classifiers”,leading to an improvement in the overall system performance.Some of the first developments of Michigan-style GBMLare SCS [59] and NewBoole [60]. In this paper we select XCS,as it is the most influential algorithm in this family of classi-fiers, and UCS, a system that specialises XCS for supervisedlearning. The features of both algorithms are summarised inTable II. TABLE IIF EATURES OF THE  M ICHIGAN  A LGORITHMS Features XCS UCSLearning Method Online OnlineReinforcement Learning Supervised LearningType of Rule-Set IF-THEN IF-THENInference Type Weighted Voting Weighted VotingType of Rule Conjunction of Atts. Conjunction of Atts.Nominal None: None:Representation Transformation to Ordered Transformation to OrderedNumerical Values Numerical ValuesNumerical Interval-Rule Interval-RuleRepresentation“Don’t Care” Complete Interval Complete IntervalManagementFitness Inverse of Prediction AccuracyError. Sharing SharingEA Type Steady-State GA Steady-State GAEA Features Roulette Wheel Selection Roulette Wheel Selection2-Point Crossover 2-Point CrossoverRandom Mutation Random MutationChromosome Real Coding Real CodingRepresentation normalised to [0,1] normalised to [0,1]Fixed Length Fixed Length a) XCS [12], [61] was designed for reinforcement learning,although it can be used for pattern recognition by con-sidering that a classification problem is a reinforcementproblem in which maximum rewards are given to correctclassifications and low rewards correspond to incorrectclassifications.Learning takes place incrementally. In each iteration, anexample is provided to the system. Then, XCS finds thematching rules and randomly chooses one of the possibleclasses. If no rules match, then covering is applied. Itcreates a new rule with a condition, which is generalizedfrom the attribute values of the example, and a randomclass. Once the class is decided, a reward is received.This reward is used to update the quality of the rulesthat advocated that particular class. Eventually, the GAis applied to the rules that proposed the chosen class asdescribed in [62]. The GA selects two parents (two rules)based on their fitness, which is a measure of the accuracyof the reward prediction relative to the accuracies of theoverlapping rules. The rules are crossed and mutated andthe final offspring are introduced into the population,deleting other rules if the population is full. The rulesfollow the interval rule representation, which was firstlyintroduced in XCS in [63].b)  UCS  UCS [15] is a method derived from XCS. It inheritsthe main features of XCS, but mainly differs in tworespects. Firstly, the learning interaction is adjusted toa supervised learning scheme. This means that eachexample shown to the system comes along with theclass. Then, UCS uses the class information to buildthe set of correct rules. The rules that belong to this setget their fitness improved, while rules that matched theexample but not the class get their fitness reduced. Thesecond main difference with XCS is on fitness. In UCS,fitness is directly the proportion of correct predictionsof the rule with respect to the total number of matches,whereas fitness in XCS is a windowed average of theaccuracy of the reward prediction. 2) Iterative Rule Learning Approaches:  IRL GBML sys-tems use a divide-and-conquer methodology to create anordered list of rules [13]. The system iteratively invokes anEA where the best individual returned is added to the end of a list of rules and all the matching examples are removed fromthe training data set. This process is repeated until the trainingdata set is empty.In this case, we selected two algorithms. First, we con-sidered a classical approach, the SIA algorithm and then,a recent representative method of this type: HIDER. Theircharacteristics are shown in Table III. TABLE IIIF EATURES OF THE  IRL A PPROACHES Features SIA HIDERLearning Method Batch BatchDivide-and-conquer Divide-and-conquerType of Rule-Set IF-THEN IF-THEN-ELSEInference Type Distance to the Decision ListNearest RuleType of Rule Conjunction of Atts. Conjunction of Atts.Nominal Representation Operator “=” Operator “=”Numerical Representation Interval-Rule Natural CodingDiscretization“Don’t Care” Special Value Complete IntervalManagementFitness Accuracy and Complexity Accuracy and CoverageEA Type Steady-State GA Generational GAEA Features Random Selection Roulete-WheelUniform Crossover Selection. SpecificGeneralisation Operator Crossover and Mutation(see description) ElitismChromosome Real and Binary Natural CodingRepresentation Coding (see description)Fixed Length Fixed Length
Advertisement
Related Search
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks