Subgroup Discovery Toolkit

The subgroup discovery toolkit for RapidMiner implements two algorithms for subgroup discovery: SD and CN2-SD, each as its own operator:

SD

This algorithm, called SD [1], is a beam search based algorithm for searching for the best rule by increasingly adding each feature to current rules in the beam. As a feature is added to a rule in the beam, we calculate its support and relevancy in respect to other rules in the new beam and its rule quality. If its support is greater than the defined minimal support and the rule is relevant w.r.t. the other rules in the beam and there exists another rule in new beam, which has lower quality, the worst rule in the beam is replaced by the new rule.

CN2-SD

This algorithm is based on CN2 algorithm and is called CN2-SD [2]. Its idea is in finding the best complex based on current example set and rules. After finding the best complex (subgroup description), example weights of covered examples are multiplicatively decreased and the times covered variable for covered examples is increased. If the value of the times covered variable is greater than a user defined parameter, the example is removed from example set. The stopping criterion is reached when no complex with positive weighted weighted relative accuracy (wWRAcc) can be found.

References

  1. Gamberger, Dragan and Lavrac, Nada (2002) Expert-guided subgroup discovery: methodology and application. J. Artif. Int. Res., 27:501-527.
  2. Lavrac et al. (2004) Journal of Machine Learning Research 5, 153-188.

The software, along with its installation procedure and instructions for use are available on the following web-site:
http://kt.ijs.si/petra_kralj/SubgroupDiscovery/rm.html

The Subgroup Discovery Toolkit can be downloaded from the Rapid-I Marketplace from within RapidMiner. Details are available here.

A workflow that presents an example of subgroup discovery operators use is available on myExperiment portal:
http://www.myexperiment.org/workflows/1752.html