Sample reconstruction from summary statistics: assumptions and novel contributions - INRIA - Institut National de Recherche en Informatique et en Automatique Accéder directement au contenu
Communication Dans Un Congrès Année : 2021

Sample reconstruction from summary statistics: assumptions and novel contributions

Résumé

Response distributions in social psychology papers are usually summarized by means and standard deviations (in absence of raw data), even when they are not necessarily continuous, unbounded nor unimodal (e.g., scores on a small set of Likert-like items). Applying statistical methods assuming normality (e.g., of prediction errors for the general linear model) may lead to inflated type I error rates and broadly to incorrect inferences drawn from the data. This particularly applies to social cognition when studying biases that may heavily distort the distributions (e.g., social desirability or utility response bias). Given the need to confirm that summary statistics are correct (to prevent errors or fraud) and representative of the distributions to trust scientific results, several sample reconstruction techniques have been recently developed to infer probable distributions given a reported mean, standard deviation, and measure constraints. Nevertheless, current methods are either heuristic (e.g., iterative in SPRITE; Heathers et al., 2018) or impose strong constraints (e.g., integer measures in CORVIDS; Wilner, Wood, & Simons, 2018) yet are used to detect reporting errors based on probabilistic reasoning. They implicitly assume distributions of real-world study samples closely resemble random distributions generated through model-based simulations. Although generating some specific samples through deterministic or stochastic simulations may be highly improbable, experimental manipulation and study constraints may drastically increase this probability, which cannot be estimated with such methods. We here introduce two contributions to study and address methodological issues of existing approaches: 1) Exact calculation of the number of samples matching a given approximate mean, approximate standard deviation and measure constraints. Combinatorics allows to assess with certitude the (im)possibility of given statistics being observed on a real sample satisfying the constraints, and the exhaustivity of solutions provided by sample reconstruction techniques. 2) Graph-based method for exhaustive reconstruction of all possible samples given approximate mean, approximate standard deviation and measure constraints (any finite set of arbitrary values), not relying on any heuristics. The method also allows to relax assumptions from existing methods, and the resulting graph can be used to generate samples under various additional constraints, better reflecting study constraints and/or assessing the validity of empirical results. Beyond these two contributions aiming to circumvent methodological issues of existing approaches, they more broadly aim at showing that all mathematical and computational developments (including sample reconstruction techniques and most statistical methods) rely on a set of assumptions that must be satisfied for the inference to be correct.
Fichier non déposé

Dates et versions

hal-03353361 , version 1 (24-09-2021)

Identifiants

  • HAL Id : hal-03353361 , version 1

Citer

Jean-Charles Quinton, Annique Smeding. Sample reconstruction from summary statistics: assumptions and novel contributions. ESCON Transfer of Knowledge Conference, Sep 2021, Salzburg, Austria. ⟨hal-03353361⟩
61 Consultations
0 Téléchargements

Partager

Gmail Facebook X LinkedIn More