Center for Empirical Studies (CEST): Data Analysis, Modelling & Knowledge Discovery in Social Sciences, Economics and HumanitiesThe Center of Empirical Studies is a research initiative linking empirical and methodological research groups from several different faculties. Methodological challenges include modeling unobservable heterogeneity or measuring latent traits, that recur in similar form in, for example, economic, sociological and psychological tasks. The aim of the Center of Empirical Studies is thus to enhance the explanatory power of empirical studies by means of new methodological developments. The initiative is organized in three interacting areas: Statistical Learning, Data Mining & Knowledge Discovery, Measurement & Evaluation and Dynamic Modeling. |
| Coordinator: Margret Oelker, Gerhard Tutz |
Analysis and Modelling of Complex Systems in Biology and MedicineThe collaborative centre analysis and modelling of complex systems in biology and medicine is part of the LMUinnovativ initiative. Fundamental research in life sciences, technological innovations, and the continuing increase in the understanding of complex biological and biomedical systems create a challenging and huge amount of data that needs to be managed, analyzed, modelled and put into conceptual frameworks. The general aim of this centre is to develop novel methodology emerging from challenging substantive problems in such complex systems and to apply it in collaboration with colleagues from biology and medicine. The centre will constitute the computational, mathematical and statistical backbone for interdisciplinary research in biology and medicine, in particular in molecular life sciences at the LMU. It provides high-performance resources from computer science, statistics, mathematics and physics to collaborators in natural sciences, initiating a core facility for a Centre of Quantitative Methods in a School of Science. In close interaction with experimental life sciences this will contribute to novel concepts for investigating complex biological and medical systems with a high potential to revolutionise our view and practice of biomedical research and its applications. |
|
Coordinator:
Ludwig Fahrmeir
Principal Investigators: Torsten Hothorn, Friedrich Leisch, Gerhard Tutz |
"Munich Centre for Health Sciences" (MC-Health)Apart from causes and connections between health and disease the project aims at investigating structures, features and processes of health care. All investigations are performed by taking economical aspects into account. Methodologically, the focus is on interdisciplinary, quantitative and empirical research.The research efforts are concentrated on three major fields:
|
| Ludwig Fahrmeir, Nora Fenske, Michael Höhle |
"Empirical Speech and Language Processing"The Doctoral Program Empirical Speech and Language Processing (ESP) is at the intersection of the disciplines Computer Linguistics, Computational Statistics, Information Science and Phonetics and Speech Processing. A characteristic of these disciplines is the increasingly empirical-based approach to research and the goal of the ESP is to make use of the synergies between these disciplines and to extend them. ESP is conceived as a research-based and systematically structured training program. Doctoral students will have many opportunities to present and discuss their research in colloquia, thematic workshops and symposia with invited researchers. Contributions will be made to the program by internationally acclaimed scientists and the program will draw upon extensive speech and language corpora, well established infrastructural support and a wide range of research projects in both basic and applied research. |
| Friedrich Leisch |
Analysis and Modelling of Dynamic BioImages with Bayesian Spatial StatisticsDynamic images, i.e., 2D or 3D images acquired over time, are used in biology and medicine in order to capture rapid kinetic processes in organisms in vivo and can be derived with a variety of technologies, including, amongst others, fluorescence microscopy and magnetic resonance imaging (MRI). From a statistical point-of-view dynamic images – independent from the modalities they have been acquired with – share a similar data structure. The signal time curve in each voxel can be described by kinetic models based on the biological processes in the organism. Biological models for dynamic images are often oversimplified to ease param- eter fitting. The aim of this project is to develop and to apply advanced spatial statistical models for the analysis of dynamic images. We use Bayesian inference to allow for robust parameter estimation in more realistic biological models. Criteria for the choice between competing local kinetic models will be devel- oped. In contrast to existing model choice, we will account for the fact that local kinetic time curves are not independent. By using spatial prior information, more robust estimators and, additionally, information about the spatial structure will be obtained. In addition, we will develop methods to analyze multiple dynamic images simultaneously. The development of statistical methodology will mainly be driven by problems in two applications of dynamic images, Fluorescence Recovery Af- ter Photobleaching (FRAP) and Dynamic Contrast-Enhanced Magnetic Resonance Imaging (DCE-MRI). |
|
Principal Investigator:
Volker Schmid Staff: Julia Kärcher, Martina Feilke |
Methods to Account for Subject-Covariates in IRT-ModelsItem-Response-Theory (IRT) comprises a variety of statistical models for linking the latent traits of subjects to their reactions to test-items or -stimuli. One example is the application of the Rasch-model to measure latent abilities, where the parameters of all persons and items are represented on a common scale. Since, however, a common scale can often not be assumed for different groups of subjects, several methods - including latent-class approaches - have been suggested for incorporating observable subject-covariates in the model. The existing aproaches have some drawbacks, though: In many cases the information available in the covariates is not fully utilized. Moreover, complex parametric models are hard to interpret for the vast majority of applied scientists. Therefore, the aim of this research project is to develop a flexible and yet easy-to-handle range of methods, that allows to incorporate subject-covariates of all kinds - alone and in combination with latent-class approaches - in a variety of IRT-models. Application areas of these methods in psychology and empirical education research include the exploratory modeling of heterogeneity as well as the hypothesis-driven application as a test for the validity of a common model. |
| Principal Investigator: Carolin Strobl |
Bayesian Regularisation in Regression Models with High-dimensional PredictorsRegression models with high-dimensional predictors arise in many statistical fields, such as non- and semiparametric regression models based on splines or wavelets and spatio-temporal statistical approaches, but also more generally in bio- and information-technology. In such models the number of parameters to be estimated is typically large compared to the sample size and the resulting inverse problems therefore require some kind of regularisation. In this project, modern Bayesian approaches for regularisation and model choice in regression models with high-dimensional predictors will be investigated. Suitable prior distributions with the desired selection and adaptivity properties will be formulated and inferential procedures will be developed based on MCMC simulation techniques for a large and flexible class of regression models. The developed methodology will be applied in collaboration with researchers from biostatistics and the life sciences. |
|
Principal investigators:
Ludwig Fahrmeir,
Thomas Kneib Staff: Susanne Konrath, Fabian Scheipl |
Ensemble-methods for the Improvement of Regression-models when the Response is Continuous or CensoredModelling censored response variables based on potentially high-dimensional covariates is essential for the analysis of studies where the primary endpoint is a time interval. Modern optimization techniques can help to fit such models. The primary aim of the project is the application and investigation of Boosting techniques for fitting parametric and non-parametric survival models.Classical linear models as well as additive and flexible variants are to be formulated in a general framework. Especially the application of linear models to high-dimensional covariate spaces utilizing the inherent variable selection property of Boosting algorithms is extremely interesting from a practical point of view. Both the theoretical and practical properties of the suggested procedures are under test, for example in a prognostic factor study on rectal carcinoma survival. |
| Principal investigators:
Torsten Hothorn,
Olaf Gefeller
Staff: Matthias Schmid |
Model Based Feature Extraction and Regularisation in High-dimensional StructuresFeature extraction aims at detecting influential structures or patterns in data. The focus of the project is on model based feature selection methods where the predictor space is linked to the target criterion by parametric or semiparametric models and features are extracted with reference to the modelling approaches. The supervised learning techniques that are considered explicitly use the target criterion in the feature selection process in contrast to widely used two-step approaches where in the first step unsupervised learning is applied to extract features and only in the second step the features are linked to the target.The type of model used depends on the data structure and the objective of modelling. One area of investigation is functional data where predictors are given as signals. Feature extraction then makes use of the information content in the underlying metric space. These spatial methods tend to show better performance than equivariant methods where no ordering of predictors is used. For predictors without ordering the focus is on the selection of variables when groups of highly correlated variables and different type of variables are present. |
| Principal investigator:
Gerhard Tutz
Staff: Jan Gertheiss |
Combining individual and central site measurements of ultrafine particles:
Exposure assessment studies have shown that centrally measured particulate air pollution may not adequately represent
personal exposure of individuals. Using centrally measured data as a surrogate for personal exposure can result in
systematic errors in the effect estimates and, thus, diminish the power of the relevant statistical tests. The
improvement of the estimations by combining individual and central site measurements is the intention of this research
project.
|
| Principal investigator:
Helmut Küchenhoff,
Annette Peters (Helmholtz-Zentrum),
Josef Cyrys (Universität Augsburg)
Staff: Veronika Fensterer, Verena Maier |
BiclusteringBiclustering -- simultaneous clustering of rows and columns -- is an important new technique in two-way data analysis. Though the idea has been around for 30 years, there has been a huge development in algorithms since 2000. Many of the algorithms discovered deal with different kinds of bicluster problems, especially relating to the expected outcome or structure.One of the problems raised is not only finding the best algorithm, but knowing which algorithm should be used under which conditions. So our work is pursuing two goals: to develop a general framework for model-based biclustering and also to benchmark popular algorithms. |
|
Principal investigator:
Friedrich Leisch
Staff: Sebastian Kaiser |
Estimating Dear-browsing Intensities Utilizing Spatio-temporal InformationIn most parts of Germany, the natural or artificial regeneration of forests is difficult due to a high browsing intensity. Young trees suffer from browsing damage, mostly by roe and red deer. In order to estimate the browsing intensity for several tree species, the Bavarian State Ministry of Agriculture and Forestry conducts a survey every three years.The primary aim of this project is to model and predict the probability of dear browsing in Bavaria. Based on geostatistical regression models, a smooth probability function representing the dear browsing intensity in all dear management districts in Bavaria will be derived from data gathered in both the 2006 and 2009 surveys. This model will allow for the identification of areas with high browsing intensities for at least one tree species. This information is valuable for the implementation or modification of deer management plans. |
| Principal investigators:
Torsten Hothorn,
Thomas Knoke
Staff: Jan Ulbricht |