# Data Characteristics Tool

The data characteristics tool is used by the Meta Miner to extract several types of data characteristics:

*statistical measures*: number of instances, number of classes, proportion of missing values, proportion of continuous / categorical features, noise signal ratio.*information-theoretic measures*: class entropy, mutual information [1].*geometrical and topological measures*: non-linearity, volume of overlap region, maximum fisher’s discriminant ratio, fraction of instance on class boundary, ratio of average intra/inter class nearest neighbour distance [2].*model-based measures*: error rates and pairwise 1 − p values obtained by landmarkers such as 1NN or DecisionStump [3], and histogram weights learned by Relief or SVM.

[1] http://www.metal-kdd.org

[2] Ho, T. K., & Basu, M. (2006). *Data complexity in pattern recognition*. Springer.

[3] Pfahringer, B., Bensusan, H., & Giraud-Carrier., C. (2000). Meta-learning by landmarking various learning algorithms. *Proc. 17th International Conference on Machine Learning*, 743–750.