Research Seminar Series on Foundations of Statistics

We discuss a wide range of topics related to the foundations of statistics, such as reasoning and decision making under uncertainty, or theories and applications of imprecise probability.

We meet on Wednesdays at 18:30 in room 245 (Alte Bibliothek) of the Department of Statistics (Ludwigstraße 33, 80539 München). Anyone interested is welcome to attend. Contact Marco Cattaneo for more information.

Tentative Program:

DateTalk
2 May 2012 Julia Kopf (LMU München): Heterogenität in IRT-Modellen
Ziel vieler empirischer Bildungsstudien ist die Messung latenter Eigenschaften wie zum Beispiel der Lesekompetenz. Hierfür werden statistische Modelle aus der Item Response Theorie (IRT) wie zum Beispiel das Rasch Modell herangezogen. Eine zentrale Annahme dieser Modelle ist die Eigenschaft der invarianten Itemparameter. Ist diese Annahme verletzt, liegt sogenanntes Differential Item Functioning (DIF) vor: Auch bei gleicher Fähigkeit weisen Gruppen von Befragten unterschiedliche Lösungswahrscheinlichkeiten für einzelne Aufgaben (Items) auf. Dies kann beim Vergleich von unterschiedlichen Personen-Gruppen (z.B. nach Geschlecht, Muttersprache) zu gravierenden Fehlschlüssen führen. Der Vortrag stellt Herausforderungen bei der Analyse von DIF im Rasch-Modell und erste Ansätze diese zu lösen vor.
16 May 2012 Roland Pöllinger (LMU München): Newcomb's Paradox — Wissen ordnen und erschließen in hybriden Netzen
In Rückbezug auf den Physiker William NEWCOMB stellt Robert NOZICK (1969) — wie er es nennt — Newcomb's Problem vor, ein entscheidungstheoretisches Dilemma, in welchem zwei Prinzipien rationalen Urteilens in Konflikt zu stehen scheinen, zumindest in einem Großteil der einschlägigen Literatur quer durch Statistik und Philosophie: Das Principle of Dominance und das Principle of Maximum Expected Utility empfehlen voneinander abweichende Strategien in der Spielsituation des Gedankenexperiments. Während Vertreter der Evidential Decision Theory (EDT) geteilter Meinung zur anzuwendenden Strategie und der grundsätzlichen Interpretation beider zu sein scheinen, tendiert die Literatur der Causal Decision Theory (CDT) mehrheitlich zur Lösung, die auch von Dominance empfohlen wird ("two-boxing").
In diesem Vortrag möchte ich die Modellierung des Paradoxons in Bayes'schen kausalen Modellen erläutern, wie sie von PEARL (1995 oder 2000/2009) definiert und von Wolfgang SPOHN ("Reversing 30 Years of Discussion: Why Causal Decision Theorists Should One-Box") bzw. MEEK & GLYMOUR (1994) zur Analyse von Newcomb's Problem herangezogen werden. Als Antwort auf diese Ansätze möchte ich im zweiten Teil meiner Diskussion meinen Lösungsvorschlag in Causal Knowledge Patterns (einer Erweiterung des Bayesnetz-Frameworks mit intensionalen Informationsbrücken) präsentieren, um schließlich — näher an der Intuition und der ursprünglichen Formulierung von NOZICKs Geschichte — bei der Lösung des "one-boxing" anzugelangen.
Keywords: (causal/evidential) decision theory, causal reasoning, epistemic causation, formal epistemology, Bayes nets, interventionist account of causation
References:
Lewis, D.: Prisoners' Dilemma is a Newcomb Problem. Philosophy & Public Affairs, Blackwell Publishing, 1979, 8, 235–240
Meek, C. & Glymour, C.: Conditioning and intervening. British Journal for the Philosophy of Science, 1994, 45 (4):1001–1021
Nozick, R. in Rescher, N. (Ed.): Newcomb's Problem and Two principles of Choice. Essays in Honor of Carl G. Hempel, Dordrecht: Reidel, 1969, 114–146
Pearl, J.: Causality: Models, Reasoning, and Inference. Cambridge University Press, 2009
Pearl, J.: Causal diagrams for empirical research. Biometrika, 1995, 82, 669–688
Spohn, W.: Reversing 30 Years of Discussion: Why Causal Decision Theorists Should One-Box. Synthese, (forthcoming)

Program (Wintersemester 2011/12, Thursdays at 18:15):

DateTalk
20 October 2011 Gero Walter (LMU München): On shapes of parameter sets defining sets of conjugate priors in generalized Bayesian inference
Imprecise Bayesian inference aims to generalize and robustify standard Bayesian inference by considering sets of priors instead of a single prior distribution to model (possibly vague) prior information.
Due to tractability requirements, one often resorts to conjugate priors, where the posterior is from the same distribution family as the prior, and so the update step from prior to posterior distribution can be characterized by the change of parameter values.
A central benefit of the imprecise Bayesian approach is that it is able to mirror the quality, or precision, of prior information by the magnitude of the set of priors. Perfect probabilistic knowledge yields a precise prior, whereas vague knowledge can be expressed by a large set of priors. This carries forward to the posterior set, which is reduced in magnitude when more and more data points are used for updating. This generally desirable behaviour should, however, hold only if prior and data information are in accordance (or the prior is overridden by vast amounts of data). Whenever there is a situation of "prior-data conflict" instead, this should inflate the posterior set as compared to the non-conflicting case, thus signalizing conflict by giving more cautious posterior inferences.
In imprecise Bayesian inference with conjugate priors as presented in Walley (1991, § 5.4.3) for Bernoulli data, and in the generalization to data from exponential family distributions in Walter and Augustin (2009), the set of priors is characterized by an interval for the pseudocounts parameter n and an interval for the main interest parameter y (for y one-dimensional) of the conjugate prior. This makes the description of the prior set very easy, and leads to simple updating rules with respect to prior-data conflict. Such rectangular prior sets may, however, for several reasons, not be a good representation of prior beliefs, as this set shape poses considerable constraints on the set of priors. I would like to point to several issues arising with rectangular prior sets and explore some ideas about more flexible descriptions of parameter sets.
17 November 2011 Marco Cattaneo (LMU München): Robust regression with imprecise data
We consider the problem of regression analysis with imprecise data (meaning imprecise observations of precise quantities in the form of sets of values). Without distributional assumptions, a likelihood-based approach to this problem leads to a very robust regression method (which can be interpreted as a generalization of the method of least median of squares). We compare this method with other approaches to regression with imprecise data, and apply it to data from a social survey.
Technical report: Robust regression with imprecise data
24 November 2011 Bernhard Haller (TU München): Regression models for failure time data in the presence of competing risks
In the analysis of failure time data the time to one certain out of many possible events may be of interest e.g. in clinical research or engineering. In the presence of so called competing risks, the joint distribution of times to different types of event cannot be estimated from the observed data without making unverifiable assumptions, since only time to the first event can be observed for each subject. Furthermore, standard failure time methods as the "naïve" Kaplan-Meier estimator, treating competing events as censored observations, lead to biased results and relationships between hazard rates and event probabilities known from classical survival analysis do no longer hold.
In the last three decades a variety of measures and methods for the description and analysis of competing risks data were introduced. In my talk I will present the most common measures used in the analysis of competing risks data and show pitfalls and problems present in the analysis of failure time data with mutually exclusive types of event. I will focus on different regression modelling approaches proposed in statistical literature. In the presence of competing risks, assessment of covariate effects is not straightforward. On the one hand, regression modelling approaches using different versions of hazard rates as dependent variables were introduced (Prentice et al., 1978; Fine and Gray, 1999), on the other hand, approaches based on factorizations of the joint distribution of event times and types of event were proposed (Larson and Dinse, 1985; Nicolaie et al., 2010). I will discuss the different approaches focusing on assumptions, applicability and interpretation of the results. All measures and models will be illustrated using data from clinical practice.
1 December 2011 Petra Wolf (TU München): Predictive accuracy in survival analysis: The ROC curve and related measures
Receiver Operating Characteristic (ROC) curves are widely used to evaluate and compare diagnostic tests in case control studies. To evaluate a prognostic marker in survival analysis the concept of ROC curves has to be extended. Heagerty (2000) and Heagerty and Zheng (2005) introduced a definition of time dependent sensitivity and specificity. Following this approach, I will present an enhanced method to calculate ROC curves in the setting of censored data.
Furthermore I will compare the ROC methodology with other measures for predictive accuracy in survival analysis: besides the classical concept of using the area under the ROC curve (AUC) as a measure of predictive accuracy there exist related methods as the C-Index and a new proposal the integrated discrimination index (IDI) to compare prognostic markers. In my talk I will show some similarities and also differences between these concepts.
8 December 2011 Manuel Eugster (LMU München): Reproduzierbare Forschung — Wieso? Weshalb? Warum? Und Wie?
Reproduzierbarkeit — in der Wissenschaft — bedeutet die Wiederholbarkeit von Experimenten, Analysen und Ergebnissen. Auch in den computationalen Wissenschaften spielt Reproduzierbarkeit eine große Rolle. Trotz einer "gefühlt einfach zu erreichenden Reproduzierbarkeit" (Daten und Source Code zur Verfügung stellen) sind nur sehr wenige Publikationen in allen Schritten vollständig nachvollziehbar.
In diesem Vortrag möchte ich diesen Umstand diskutieren. Ich stelle die Vorzüge von Reproduzierbarkeit dar und präsentiere Beispiele zur Verbesserung der Forschung im Kleinen (die eigenen Forschung) und im Großen (der Forschungsbetrieb allgemein). Ich diskutiere Probleme und Gegenargumente bei der Forderung nach Reproduzierbarkeit und präsentiere abschließend meinen Ansatz, meine Forschung reproduzierbar zu machen.
15 December 2011
at 17:15
Thomas Augustin (LMU München): Imprecise measurement error models and partial identification: towards a unified approach for non-idealized data
Some first steps towards a generalized, unified handling of deficient, nay non-idealized, data are considered. The ideas are based on a more general understanding of measurement error models, relying on possibly imprecise error and sampling models. This modelling comprises common deficient data models, including classical and non-classical measurement error, coarsened and missing data, as well as neighbourhood models used in robust statistics. Estimation is based on an eclectic combination of concepts from Manski's theory of partial identification and from the theory of imprecise probabilities. Firstly, measurement error modelling with precise probabilities is discussed, with an emphasis on Nakamura's method of corrected score functions and some extensions. Secondly, error models based on imprecise probabilities are considered, relaxing the rather rigorous assumptions underlying all the common measurement error models. The concept of partial identification is generalized to estimating equations by considering sets of potentially unbiased estimating functions. Some properties of the corresponding set-valued parameter estimators are discussed, including their consistency (in an appropriately generalized sense). Finally, the relation to previous work in the literature on partial identification in linear models is made explicit.

Marco Cattaneo (LMU München): On the implementation of Likelihood-based Imprecise Regression
Likelihood-based Imprecise Regression (LIR) is a new approach to regression allowing the direct consideration of any kind of coarse data (including e.g. interval data, precise data, and missing data). LIR uses likelihood-based decision theory to obtain the regression estimates, which are in general imprecise, reflecting the uncertainty in the coarse data. Here, we address in particular the implementation of LIR, focusing on some important regression problems. From the computational point of view, the possible non-convexity of the estimated set of regression functions poses a considerable challenge.

Gero Walter (LMU München): Generalised Bayesian inference with conjugate priors, and a link to g-priors for Bayesian model selection
In generalised Bayesian inference, sets of priors are considered instead of a single prior, allowing for partial probability specifications and a systematic analysis of sensitivity to the prior. Especially when substantial information is used to elicit the prior, prior-data conflict can occur, i.e., data that are very unlikely from the standpoint of the prior may be observed. This conflict should show up in posterior inferences, alerting the analyst and, e.g., lead to a revision of prior specifications. However, when conjugate priors are used, a reasonable reaction is not guaranteed. Mostly, prior-data conflict is just averaged out, and in Bayesian regression, conflict in one regressor leads only to a non-specific reaction across all regressors. Generalised Bayesian inference can amend this behaviour by encoding the precision of inferences via the magnitude of the posterior set. The simplified natural conjugate prior most suited for generalised Bayesian regression has a link to the so-called g-prior, which is used for model selection in classical Bayesian regression.
12 January 2012 Georg Schollmeyer (LMU München): Necessity-measures and their Möbius inverses in the framework of generalized coherent previsions
In this talk we investigate necessity-measures as special coherent previsions. The motivation for focusing on necessity-measures is a closedness property under a general construction of hierarchical models. To describe this effectively we introduce generalized coherent lower previsions as a framework. Later on we will use the Möbius inversion to find an effective algorithm for calculating the extreme points of the core of a necessity-measure as well as an exact equation for the number of the extreme points. The algorithm can be used to make calculations of unknown total and conditional previsions of the hierarchical model.
Finally, again with the use of the Möbius inversion, we show that there are no non-trivial infimum-preserving coherent lower previsions. Thus, in general only the trivial necessity-meaures are closed under the above-mentioned construction. This may put in question either the conventional generalization of classical necessity-logic to necessity-theory or the appropriateness of this construction or parts of it, like the focus on coherence and natural extension.
26 January 2012 Informal presentation and discussion of ongoing research work

Program (Sommersemester 2011, Thursdays at 18:15):

DateTalk
14 April 2011 Christian Seiler (ifo Institut München): Micro data Imputation and Macro data Implications - Evidence from the Ifo Business Survey
Surveys are commonly affected by nonresponding units which can produce biases if these missing values can not be regarded as missing at random (MAR). As many papers examined the effect of nonresponse in individual or household surveys, only less is done in the case of business surveys. This paper analysis the missing data in the Ifo Business Survey, which most prominent result is the Ifo Business Climate Index, a leading indicator for the businss cycle development in Germany. The missing values are imputed using various imputation approaches for longitudinal which reflect the underlying latent data generating process. After this, the data is aggregated and compared with the original indices to evaluate their implications to the macro level.
28 April 2011 Andreas Ströhle (LMU München): Über den Begriff "Zufall" aus ontologischer Perspektive
Sowohl im Alltag als auch im zeitgenössischen Wissenschaftsbetrieb gilt es als Allgemeingut, dass es in unserer Welt zu Zufällen kommt. In der Wissenschaft gab es hierbei aufgrund der experimentellen Ergebnisse der Quantenphysik einen Paradigmenwechsel zu Beginn des 20. Jahrhunderts — in den Jahrhunderten davor galt der Determinismus als common sense unter Wissenschaftlern bzw. Naturphilosophen. Heutzutage wird in der Wissenschaft zwar in der Regel zwischen "relativen" und "absoluten" Zufällen unterschieden, jedoch wird dabei nicht weiter hinterfragt, ob das Konzept des absoluten Zufall aus ontologischer Perspektive überhaupt sinnvoll ist, sondern dessen ontisch reales Vorkommen wird als selbstverständlich vorausgesetzt.
In meinem Vortrag zeige ich, dass das Konzept des Zufalls in seiner absoluten Form mit schwerwiegenden ontologischen Problemen belastet ist, aufgrund deren es ratsam erscheint, den Begriff "Zufall" ausschließlich als epistemisches Konzept zu betrachten.
5 May 2011 Roland Pöllinger (LMU München): Strukturen kausalen Wissens
Die Frage nach der Ontologie oder der Beschreibung von Kausalzusammenhängen kreist immer wieder um das Verhältnis von Determinismus und (ontologischem oder deskriptiven) Indeterminismus innerhalb einer Theorie der Verursachung. Einige prominente Ansätze gründen die Kausalanalyse allein auf Korrelationen oder auf geeignet eingeschränkte Anforderungen an statistische Abhängigkeiten. Etliche Fallstricke und kontraintuitive Gegenbeispiele vereiteln allerdings dieses Vorhaben und erzwingen weitere Methodenverfeinerung. Judea PEARL (2000/2009) basiert seinen Analyseansatz auf die Abbildung von statistischen Abhängigkeiten in so genannten Bayesnetzen und formuliert seinen deterministischen Kausalbegriff mithilfe von systematischen, strukturellen Manipulationen solcher Bayesnetze.
Der Vortrag soll als "Work-in-Progress" auf das Format der Bayesnetze eingehen, automatische Verfahren der Netzgenerierung vorstellen und dies in den Zusammenhang mit einem deterministischen Verständnis von Kausalität bringen, welches im Kern (notwendigerweise) epistemisch ausgerichtet ist.
Empfehlung: Pearl, Judea. Causal diagrams for empirical research. Biometrika, 1995, 82, 669-688.
12 May 2011 Martin Gümbel (München): About the probability-field-intersections of Weichselberger and a simple conclusion from least favorable pairs
In the frame of probability theory of Weichselberger there are probability fields and operations on probability fields. We look at the probability-field-intersection and present a simple conclusion for this operation, if there exists a least favorable pair of probabilities.
26 May 2011 Ulrich Pötter (LMU München): Sampling Extended Household and Family Networks and the Use of Simplicial Complexes
One possible solution to the problem of computing inclusion probabilities for families from surveys of individuals is based on counting formulae derived from the theory of simplicial complexes. Here simplicial complexes are seen as unions of sets of subsets subject to certain constraints. After a sketch of this solution I would like to discuss some further areas of statistical applications of these techniques.
9 June 2011 Informal presentation and discussion of ongoing research work
16 June 2011 Manfred Schramm (München): Schließen mit Wahrscheinlichkeiten und maximaler Entropie: Theorie, Implementierung, Anwendung
Aussagen über Häufigkeiten bestimmter Phänomene bilden eine einfache, leicht verständliche und sehr verbreitete Art der Information. Versuchen wir allerdings, auf der Basis von Häufigkeitsinformationen Entscheidungen zu treffen, werden wir überrascht von der Vielzahl der Möglichkeiten, die unsere Informationen noch offen lassen. Eine "Logik" auf Basis von Häufigkeiten wird daher nur in seltenen Fällen Folgerungen bzw. Entscheidungen unterstützen können. Durch welche zusätzlichen Prinzipien lässt sich dem begegnen? Der Vortrag zeigt in Theorie und an praktischen Beispielen, wie sich die Prinzipien der Indifferenz, der Unabhängigkeit und der maximalen Entropie gegenseitig stützen und mit den Häufigkeitsinformationen zu einem leistungsfähigen wissensbasierten System ergänzen. Der praktische Einsatz eines solchen Systems wird anhand einer medizinische Anwendung erläutert.
7 July 2011 Informal presentation and discussion of ongoing research work
14 July 2011 Andrea Wiencierz (LMU München): Regression with Imprecise Data: A Robust Approach
We introduce a robust regression method for imprecise data, and apply it to social survey data. Our method combines nonparametric likelihood inference with imprecise probability, so that only very weak assumptions are needed and different kinds of uncertainty can be taken into account. The proposed regression method is based on interval dominance: interval estimates of quantiles of the error distribution are used to identify plausible descriptions of the relationship of interest. In the application to social survey data, the resulting set of plausible descriptions is relatively large, reflecting the amount of uncertainty inherent in the analyzed data set.

Gero Walter (LMU München): On Prior-Data Conflict in Predictive Bernoulli Inferences
By its capability to deal with the multidimensional nature of uncertainty, imprecise probability provides a powerful methodology to sensibly handle prior-data conflict in Bayesian inference. When there is strong conflict between sample observations and prior knowledge the posterior model should be more imprecise than in the situation of mutual agreement or compatibility. Focusing presentation on the prototypical example of Bernoulli trials, we discuss the ability of different approaches to deal with prior-data conflict.
We study a generalized Bayesian setting, including Walley's Imprecise Beta-Binomial model and his extension to handle prior data conflict (called pdc-IBBM here). We investigate alternative shapes of prior parameter sets, chosen in a way that shows improved behaviour in the case of prior-data conflict and their influence on the posterior predictive distribution. Thereafter we present a new approach, consisting of an imprecise weighting of two originally separate inferences, one of which is based on an informative imprecise prior whereas the other one is based on an uninformative imprecise prior. This approach deals with prior-data conflict in a fascinating way.
21 July 2011
in room 225
Joint event with the MCMP Colloquium in Mathematical Philosophy:
Jan-Willem Romeijn (Rijksuniversiteit Groningen): Frequencies, Chances, and Undefinable Sets
In this talk I aim to clarify the concept of chance. The talk consists of two parts, concerning the epistemology and metaphysics of chance respectively. In the first part I consider statistical hypotheses and their role in inference. I maintain that statistical hypotheses are best explicated along frequentist lines, following the theory of von Mises. I will argue that the well-known problems for frequentism do not apply in the inferential context.
In the second part of the talk I ask what relation obtains between these frequentist hypotheses and the world. I will show that we can avoid the problem of the reference class, as well as the closely related conflict between determinism and chance, by means of a formal antireductionist argument: events can be assigned meaningful and nontrivial chances if they correspond to undefinable sets of events in the reducing theory.
29 July 2011
(Friday) at 15:15
Joint event with the MCMP Colloquium in Mathematical Philosophy:
Teddy Seidenfeld (CMU Pittsburgh): Three contrasts between two senses of "coherence" (joint work with Mark Schervish and Jay Kadane)
B. de Finetti defended two senses of "coherence" in providing foundations for his theory of subjective probabilities. Coherence_1 requires that when a decision maker announces fair prices for random variables these are immune to a uniform sure-loss — no Book is possible using finitely many fair contracts! Coherence_2 requires that when a decision maker's forecasts for a finite set of random variables are evaluated by Brier Score — squared error loss — there is no rival set of forecasts that dominate with a uniformly better score for sure. De Finetti established these two concepts are equivalent: fair prices are coherent_1 if and only if they constitute a coherent_2 set of forecasts if and only if they are the expected values for the variables under some common (finitely additive) personal probability.
I report three additional contrasts between these two senses of "coherence". One contrast (relating to finitely additive probabilities) favors coherence_2. One contrast (relating to decisions with moral hazard) favors coherence_1. The third contrast relates to the challenge of state-dependent utilities.
Technical reports: The Effect of Exchange Rates on Statistical Decisions, Dominating countably many forecasts

Program (Wintersemester 2010/11, Wednesdays at 18:30):

DateTalk
3 November 2010 Atiye Sarabi Jamab (LMU München): An experimental comparative study of the performance of uncertainty measures in Dempster-Shafer theory
In Dempster-Shafer theory, it is distinguished between two types of uncertainty: conflict is associated with cases where the information focuses on sets with empty intersections, and non-specificity is associated with cases where the information focuses on sets where the cardinality is greater than one. Several criteria for measuring the conflict or non-specificity or both of them are proposed in the literature, and could be used for measuring the uncertainty in Dempster-Shafer theory, but might not use all the information of the bodies of evidence. The aim of this talk is to compare the behaviour of some of them as a "distance" between two basic probability assignments.
15 November 2010
(Monday) at 14:15
Philipp Bleninger (IAB Nürnberg): Remote Data Access und Enthüllungsrisiken für sensible Informationen aus inferentiellen Datenangriffen
Zahlreiche Stellen der öffentlichen Verwaltung (die Statistischen Ämter, die Bundesagentur für Arbeit, die Deutsche Rentenversicherung etc.) produzieren große Mengen an Daten, die auch für die Forschung von großem Interesse sind. Allerdings können die Datenproduzenten ihre Daten nicht einfach weitergeben, sondern müssen besondere Vorgaben hinsichtlich des Datenschutzes und der Anonymität wahren (gemäß BStatG, SGB X etc.). Der gegenwärtige Standard für die Weitergabe von Daten besteht entweder im On-Site Access oder in der Datenveränderung. Aber On-Site Access ist sehr aufwendig sowohl für den Datennutzer als auch für den Datenproduzenten und veränderte Daten haben eine geringe Akzeptanz der Nutzer insbesondere was Inferenzen betrifft.
Auf der Suche nach geeigneten Datenzugängen scheint der Remote Access eine vielversprechende Lösung zu sein. Remote Access wird über einen Server gestattet, auf dem der Datennutzer arbeitet. Er bekommt dabei entweder gar keine oder nur verfremdete Daten zu sehen, während seine Analysen auf den wahren Daten gerechnet werden. Dennoch ist auch dieser Datenzugang nicht sicher, da die Enthüllung sowohl ganzer Datenvektoren, als auch individueller Informationen möglich ist.
Dieser Vortrag beschäftigt sich mit den Risiken für Datenenthüllung im Remote Access am Beispiel des IAB Betriebspanels. Die Möglichkeiten eines Datenangreifers für inferentielle Datenenthüllung werden anhand der Hauptkomponenten-/Faktorenanalyse und der einfachen linearen Regression aufgezeigt. Diese beiden eigentlich leicht zu verhindernden Risikoquellen stehen dabei beispielhaft für die mannigfaltigen Möglichkeiten einfallsreicher Datenangreifer.
19 November 2010
(Friday) at 14:15
in room 144
(Seminarraum)
Ric Crossman and Frank Coolen (Durham University): Nonparametric predictive inference for ordinal data: multiple comparisons and classification
Nonparametric Predictive Inference (NPI) is a statistical method which uses few modelling assumptions, enabled by the use of lower and upper probabilities to quantify uncertainty. NPI has been presented for many problems in Statistics, Risk and Reliability and Operations Research. The first part of the presentation will give an informal introduction into basic ideas of NPI and its use for ordered categories, followed by some examples on multiple comparisons. The second part will focus on applications of NPI in classification trees.
17 January 2011
(Monday)
Informal presentation and discussion of ongoing research work
4 February 2011
(Friday) at 14:15
in room 144
(Seminarraum)
Uwe Saint-Mont (FH Nordhausen): Statistik, empirische Wissenschaften und Wissenschaftstheorie
Die genannten Gebiete entwickeln sich zur Zeit eher unabhängig voneinander. Dies war jedoch nicht immer so: Noch vor einigen Jahrzehnen bestand eine enge Verbindung zwischen der Statistik und den sie umgebenden Feldern, insbesondere personifiziert durch R.A. Fisher.
Der Vortrag wird zwei Fragen thematisieren: Warum hat sich die Statistik seitdem mehr und mehr isoliert und wie ließe sich diese Entwicklung wieder umkehren? Neben der generellen "wissenschaftstheoretischen" Ausrichtung der Statistik spielt hierbei insbesondere der Begriff und der Umgang mit "Information" eine wichtige Rolle.
11 February 2011
(Friday) at 14:15
Gerhard Winkler (Helmholtz Zentrum München): Zufall! Zufall?
In diesem Vortrag beschäftigen wir uns mit dem Zufall. Die Mathematik lassen wir dabei beiseite. Es geht vielmehr um eine Diskussion der Entwicklung und der vielfältigen Facetten dieses Begriffes.
In der Tat zählt der Zufall zu den oft mißverstandenen fundamentalen Begriffen. Dies gilt vom menschlichen Alltag angefangen bis in die verschiedensten Zweige der Wissenschaft hinein. Wir versuchen, dem entgegenzuwirken.
Der Vortrag ist auch für Nichtmathematiker bzw. –Statistiker geeignet.
4 March 2011
(Friday) at 14:15
Sara Kleyer (Universität Bamberg): Regionaler Preisindex
Ziel der Preisstatistik ist es, die Preisentwicklung abzubilden und damit die Inflation zu messen. Neben dieser zeitlichen Perspektive ist es jedoch insbesondere für die Sozial- und Wirtschaftswissenschaften von Interesse, auch regionale Vergleiche ziehen zu können. Beispielsweise sind Lohnschätzungen solange verzerrt, wie sie nicht um das regionale Preisniveau bereinigt werden können. Die amtliche Statistik in Deutschland bietet aber nur als tiefste Gliederungsebene Preisindizes für die Bundesländer an, was bei weitem nicht ausreicht. Um regionale Preisindizes zu bestimmen, sind verschiedene statistische Methoden denkbar, welche vorgestellt werden sollen.

Program (Sommersemester 2010, Wednesdays at 19:15):

DateTalk
5 May 2010
and
19 May 2010
Thomas Augustin: Imprecise Measurement Error Models and Partial Identification — Towards a Unified Approach for Non-Idealized Data (1st talk, 2nd talk)
The talk ventilates some first steps towards a generalized, unified handling of deficient, nay non-idealized, data. The ideas are based on a more general understanding of measurement error models, relying on possibly imprecise error and sampling models. This modelling comprises common deficient data models, including classical and non-classical measurement error, coarsened and missing data, as well as neighbourhood models used in robust statistics. Estimation is based on an eclectic combination of concepts from Manski's theory of partial identification and from the theory of imprecise probabilities.
(Not only) as a preparation, the first part of the talk discusses measurement error modelling with precise probabilities. After a brief introduction into the background, I consider one of the most general methods to correct for classical measurement error, namely Nakamura's method of corrected score functions. It is shown how this method to construct unbiased estimating functions under measurement error can be extended to deal with other types of error models, in particular with deficient dependent variables and with the so-called Berkson error.
The second part of the talk extends consideration to imprecise probabilities, relaxing the rather rigorous assumptions underlying all the common measurement error models. The concept of partial identification is extended to estimating equations by considering sets of potentially unbiased estimating functions. Some properties of the corresponding set-valued parameter estimators are discussed, including their consistency (in an appropriately generalized sense). Finally, the relation to previous work in the literature on partial identification in linear models is made explicit.
12 May 2010 Gero Walter: "Strong happiness" and other properties of certain imprecise probability models when treating samples sequentially
Generalized iLUCK-models, a model introduced by Walter and Augustin (2009) as an imprecise probability generalization of conjugate Bayesian inference, have the advantage of an adaptive reaction to prior-data conflict. Whereas standard conjugate Bayesian inference is not necessarily sensitive to conflicts between prior and data, generalized iLUCK-models lead to much more cautious inferences if prior and data are in conflict. In this talk, the case of data trickling in as separate portions of observations is investigated, and a number of ideas related to sequential updating of the prior are presented, exploiting the sensitivity to prior-data conflict that generalized iLUCK-models offer. "Strong happiness" is a concept using these ideas for a simple sample size calculation, guaranteeing a certain precision with the possibility of prior-data conflict factored in.
2 June 2010 Marco Cattaneo: Independence and Combination of Belief Functions
In belief functions theory, information about an uncertain value is described by a random set, and not by a random variable. We shall discuss some ideas about the interpretation of belief functions and the fusion of dependent information.
28 June 2010
(Monday)
Christina Schneider: Randomness Does Not Exist
Die Kenner werden feststellen, dass dieser Titel sich an de Finettis Diktum "Probability does not exist" anlehnt. Während de Finetti daraus den Schluss zieht, Wahrscheinlichkeit sei subjektivistisch zu interpretieren, wird dieser Weg nicht beschritten werden.
Zunächst wird, im Rahmen einer "objektivistischen" Inferenzschule zu dieser These hingeführt — hierzu sind einige wissenschaftstheoretische Überlegungen nötig — und dann werden einige Konsequenzen dieser Hinführung gezogen. Die wichtigste Konsequenz ist, dem "Wahrheitsanspruch" von Wahrscheinlichkeiten bzw. Wahrscheinlichkeitsaussagen zu entsagen.
Die positive Konsequenz ist pragmatischer Natur: Statistische Inferenz als Inference to the Best (Idealized) Description aufzufassen.
30 June 2010 Andrea Wiencierz: The course of well-being over the life span — Restricted Likelihood Ratio Testing (RLRT) in the presence of correlated errors
Tests for zero variance components in general form linear mixed models (LMMs) have been established for different cases where the errors are assumed to be independent and identically distributed (i.i.d.). These tests can be applied to many interesting questions in practice. They allow, for example, to test if a relation between two variables is significantly different from a polynomial of a given degree.
However, in many real applications the independence of the errors is not given. For example in economic applications the errors are often positively autocorrelated. In the case of the ordinary linear model, there is a simple transformation technique to take the correlation into account, known to econometricians as Generalized Least Squares (GLS) transformation.
Motivated by an economic study about the course of subjective well-being over the life span, the transformation technique is adapted to the case of general form LMMs, and it is investigated if this transformation technique can be used for expanding the application areas of the established tests for zero variance components to the case of correlated errors.
7 July 2010 Hansjörg Baurecht: Detecting Signals in Genomewide Association Studies
Genomewide data which are collected to detect statistical associations between SNPs and complex traits are usually analyzed by univariate testing of each SNP with the trait. To account for the large number of significance tests carried out, a very stringent p-value is used. This reduces occurrence of false positives, but it may cause many real associations to be missed. I will discuss an idea to incorporate the consideration of a region of SNPs where each single SNP does not pass the detection threshold. But by aggregating them so far undetected associated regions might be discovered. Therefore I adopted the idea of kernel smoothing to calculate a combined statistic incorporating the genetic distance and the linkage disequilibrium.
19 July 2010
(Monday)
Julia Kopf: Reflecting methods from machine learning with respect to their application in social science, psychology and statistics
Some ideas about the application and interpretation of methods from machine learning like recursive partitioning or association rules are presented. The main focus of the talk lies on the statistical validation of Ockham's Razor using model-based recursive partitioning.