Benchmarking

Overview

Benchmark Experiment
				  Visualisations

In statistical learning benchmarking is the methodology of comparing learners or algorithms with respect to a certain performance measure. New benchmark experiments are published on almost a daily basis. Especially in the machine learning community benchmarking is the primary method of choice to evaluate new learning algorithms.

The benchmarking process abstractly consists of three levels: Setup, Execution and Analysis. In each level different statistical and computational aspects play a role. The aim of our work is to investigate each level in detail and provide a statistically correct way of comparing learning methods with smallest possible computational effort.

In addition to theoretical investigations, we are developing an open-source reference implementation within R -- an environment for statistical computing and graphics. The implementation will allow reproducible comparisons of learning algorithms (existing or newly developed ones) in an easy manner. The ultimate goal is to set a quasi-standard for the comparison of learning methods with this toolbox.

LMU Project Members

Resources

Selected publications

  • Exploratory and Inferential Analysis of Benchmark Experiments. Manuel J. A. Eugster and Torsten Hothorn and Friedrich Leisch. Technical Report 30, Institut für Statistik, Ludwig-Maximilians-Universität München, Germany, 2008.
    [ http | Supplement ]
  • Bench Plot and Mixed Effects Models: First steps toward a comprehensiv benchmark analysis toolbox. Manuel J. A. Eugster and Friedrich Leisch. In Paula Brito, editor, Compstat 2008-Proceedings in Computational Statistics, pages 299-306. Physica Verlag, Heidelberg, Germany, 2008.
    [ Preprint | Supplement ]