Data Characteristics Tool

The data characteristics tool is used by the Meta Miner to extract several types of data characteristics:

  1. statistical measures: number of instances, number of classes, proportion of missing values, proportion of continuous / categorical features, noise signal ratio.
  2. information-theoretic measures: class entropy, mutual information [1].
  3. geometrical and topological measures: non-linearity, volume of overlap region, maximum fisher’s discriminant ratio, fraction of instance on class boundary, ratio of average intra/inter class nearest neighbour distance [2].
  4. model-based measures: error rates and pairwise 1 − p values obtained by landmarkers such as 1NN or DecisionStump [3], and histogram weights learned by Relief or SVM.

 

[1] http://www.metal-kdd.org

[2] Ho, T. K., & Basu, M. (2006). Data complexity in pattern recognition. Springer.

[3]  Pfahringer, B., Bensusan, H., & Giraud-Carrier., C. (2000). Meta-learning by landmarking various learning algorithms. Proc. 17th International Conference on Machine Learning, 743–750.