International Workshop on Theory (Re)Construction in the Empirical Social and Behavioral Sciences (TRC2020)
SAT & SUN 7-8 NOVEMBER 2020(online or on-site)
Boğaziçi University, Dpt. of Philosophy & Cognitive Science Program, 34342 Bebek/Istanbul, Turkey Map it
CALL FOR ABSTRACTS
It has been repeatedly observed that the Empirical Social and Behavioral Sciences (ESBS) lack well-developed theoretical superstructures, structures that researchers could apply to generate (point-)predictive empirical hypotheses. The MTR project treats this lacuna as an important reason to explain, and to treat, the ongoing replicability crisis in the ESBS.
We invite submissions from any scientific field addressing this lacuna via reconstructions of empirical theories (from the ESBS or not), research on frameworks (or methods) for theory reconstruction, synchronic or diachronic work on concept formation/ontology in the ESBS, and explanatory accounts why this lacuna persists. We particularly invite applied work on how to go about constructing an ESBS theory.
Participation is on-site or online. There are no fees. Please submit an abstract (max. around 500 words) plus key references by 15 SEPT 2020.
Replication Under Underdetermination: Introducing Systematic Replications Framework
Duygu Uygun Tunc is a PhD Candidate in Philosophy at Heidelberg University (Germany) and the University of Helsinki (Finland). In August 2020 she will begin the research project Extended Scientific Virtue.
ABSTRACT: Single empirical tests are always ambiguous in their implications for the theory under investigation, because non-corroborative evidence leaves us underdetermined in our decision as to whether the main theoretical hypothesis or one or more auxiliary hypotheses should bear the burden of falsification. Popperian methodological falsificationism tries to solve this problem by relegating certain kinds of auxiliary hypotheses to “unproblematic background knowledge” and disallowing others. Auxiliary hypotheses regarding operational definitions of theoretical terms are not only permissible but indispensable to increase the falsifiability of theories. However, decisions to accept such auxiliaries as unproblematic are seldom conclusively justified. This uncertainty is amplified in the social sciences, where operationalizations play a very central role, but are much less theory-driven and independently testable. This situation has direct consequences for the assessment of the outcomes of replication attempts. Neither close nor conceptual replications can mitigate underdetermination when they are conducted in isolation. To circumvent this problem, we propose Systematic Replications Framework (SRF) that organizes subsequent tests into a pre-planned series of logically interlinked close and conceptual replications. SRF aims to decrease underdetermination by disentangling the implications of non-corroborative findings for the main theoretical hypothesis and for the operationalization-related auxiliary hypotheses. SRF can also strengthen hypothesis testing through systematically organized and pre-registered self-replications. We also discuss how applying this framework can scaffold judgments regarding the permissibility of ad hoc hypothesizing in reference to the Lakatosian notions of progressive and degenerative research programs.
Duygu Uygun Tunc is a PhD Candidate in Philosophy at Heidelberg University (Germany) and the University of Helsinki (Finland) (Disputation expected in summer 2020). Her dissertation Communication and the Origins of Personhood presents a genetic account of personhood from an interdisciplinary perspective that brings together communication theory, (Peircean) semiotics, comparative and developmental psychology and cognitive science. In August 2020 she will begin the research project Extended Scientific Virtue (funded under the EC Horizon 2020 Marie Skłodowska-Curie Actions Cofund program “Co-Funded Brain Circulation Scheme”) at METU and UC, Irvine (supervisors Murat Bac and Duncan Pritchard), which will develop a virtue epistemology that can contribute to the identification, understanding and realization of the epistemic goals and norms of scientific inquiry in the contemporary context of technologically and socially extended scientific knowledge.
Making Trustworthy Science: Some Philosophical and Ethical Puzzles
Brian Earp is Associate Director of the Yale-Hastings Program in Ethics and Health Policy at Yale University and the Hastings Center and a Research Fellow at the Uehiro Centre for Practical Ethics at the University of Oxford.
ABSTRACT: There has been a lot of heated discussion in recent years about a “reproducibility crisis” in science. How did it come about? Is there really a crisis? If there is, what can we do to resolve it? In this talk I will give a philosophical overview of some key puzzles in evaluating the trustworthiness of published studies, and explore the ethical obligations of researchers — individually and collectively — in producing trustworthy science of their own.
ABSTRACT: Effect sizes are the currency of the social and behavioral sciences. They quantify the results of a study to answer the research question and are used to calculate statistical power. They are also a central aspect when the evidence of a study–and thus, its practical usefulness–is to be evaluated. In these days, effect sizes are also used to evaluate the success of replication studies. However, the meaningfulness and usefulness of effect sizes hinges on a reliable framework that defines how the size of an effect is to be interpreted. This framework—helping define an effect as small, medium, or large—has been guided by the recommendations Jacob Cohen gave in his pioneering writings starting in 1962: Either compare an effect with the effects found in past research or use certain conventional benchmarks. The present analysis shows that neither of these recommendations is currently applicable. From past publications without pre-registration, 900 effects were randomly drawn and compared with some 100 effects from publications with pre-registration, revealing a large difference: Effects from the former were much larger than effects from the latter. That is, certain biases, such as publication bias or questionable research practices, have caused a dramatic inflation in published effects, making it difficult to compare an actual effect with the real population effects (as these are unknown). In addition, there were very large differences in the mean effects between psychological sub-disciplines and between different study designs, making it impossible to apply any global benchmarks. Many more pre-registered studies are needed in the future to derive a reliable picture of real population effects. Apart from that, it is outlined how we can arrive at more theory-driven criteria for the interpretation of effects.
MTR offers project-based scholarships (MA/PhD/PostDoc) for members of any school/department at BOUN, from now until at least May 2022, at 3500/4500/6000 TRY per month.
MTR provides infrastructure and support in securing short-term training and conference funding.
We support BAP projects at MA level (BOUN internal). Up to five BOUN researchers can join as local collaborators (750 TRY/month).
MTR responds critically to the ongoing replication crisis in the empirical social and behavioral empirical sciences (ESBS). The project concerns the artful combination of Frequentist and Bayesian inferential strategies in order to make (theoretically and empirically) progressive ESBS-research programs possible. The central tasks are:
To explain the replication crisis in the ESBS as the result of having arranged research efforts in ways that misapply and over-interpret our best statistical inference methods.
To develop the research program strategy (RPS) as a superior methodology to overcome the crisis by embedding RPS into qualitative research, by furthering “induction analysis” as an alternative to current meta-analysis, and by providing a toolbox for theory construction.
To disseminate the explanation and the remedy via publications, presentations, and teaching materials among such key-players as researchers, funding agencies, and science-journalists.
We develop project results jointly, publishing in leading international journals.
Besides decent command of the English language and a strong interest in the topic, we look for background knowledge in:
qualitative and quantitative empirical methods
statistical inference (including meta-analysis)
theory (re)construction, and general philosophy of science.
Subject-specific background in, for instance, empirical psychology, cognitive science, sociology, linguistics, behavioral economics, experimental philosophy, etc., can be a benefit, as is experience in conceptual analysis, programming/coding skills, as well as organizational and administrative or media-related skills.
Please read the project application at http://bit.ly/MTR-2232, particularly Sect. 3.1 of the proposal and the list of deliverables. To express your interest in joining this project, please submit a CV and a statement of motivation at http://bit.ly/MTR-2232-hiring.
Having recently notified the BOUN Press & Publications Office about the MTR project, they replied with rather good questions–now answered.
2 April 2020
What is the “replication/confidence crisis” in empirical social and behavioral sciences (ESBS) and why are we seeing this crisis in ESBS and not in physical, mathematical and natural sciences?
As the 2019 National Academies of Sciences guideline on “Reproducibility and Replicability in Science” makes clear, reliable scientific knowledge requires empirical research results that are replicated independently. The replication crisis in the ESBS arises because one must estimate that the majority of research results here fail to replicate. This rightly leads to questioning the confidence one should have in published and current ESBS-research results. As a non-empirical science, mathematics simply cannot experience a similar crisis. Empirical sciences such as physics or chemistry, by contrast, are generally better than the ESBS at using available experimental and statistical methods. They also accept greater risks by emphasizing the value of successful theoretical prediction. By contrast, the dominant ESBS-standards for collecting and evaluating data look comparatively “aged.” The null-hypothesis significance testing approach (NHST), for instance, is of extremely limited use, because NHST cannot show that data support a substantial theoretical hypothesis. NHST has nevertheless remained the most widely used statistical approach in the ESBS, its results being regularly over-interpreted. Bayesian methods, which are currently “en vogue,” face other problems that mainly relate to small sample sizes and unreasonable laxness in selecting the prior probability of hypotheses. Before data collection starts, after all, the Bayesian approach allows researchers to favor a hypothesis over rival hypotheses on purely subjective grounds. Of course, unlike when natural scientists work with particles, waves, or DNA-strands, achieving well-controlled experimental conditions is more difficult for ESBS-researchers, because human responses not only display large heterogeneity but are also subject to feedback effects. We claim that the ESBS could nevertheless vastly improve the quality of their research results by using larger samples together with better statistical methods, by improving meta-analytical methods, and by engaging in theory construction.
Why is this crisis more apparent in the last 10 years as you addressed in the press release?
Various aspects of the crisis were known roughly since the 1970s (Meehl, 1978). A veritable crisis only arose as awareness of the problem’s magnitude within the ESBS increased. Highly influential in this respect was John Ioannidis 2005 paper “Why Most Published Research Findings Are False,” especially because it focuses on experimental research in medical science. As the largest and most influential ESBS, psychology had its “coming out” with papers by Pasher and Wagenmakers, in 2012, and the Open Science Collaboration, in 2015. Around that time, a politically-driven “push-back” against science not only coined terms such as “fake news”, but also eroded public confidence in science. Among others, this led to science-internal conceptual research on how the crisis may be overcome.
What are the causes of this crisis in terms of the empirical structures of ESBS? You claim that this crisis is a result of having arranged research efforts in ways that misapply and over-interpret statistical methods. Then can we say that one of the reasons is related to the human part who apply these methods?
Multiple science-internal and science-external causes are working together. As for internal causes, most ESBS research is data-driven, researchers often misapply statistical methods, and they regularly over-interpret or overstate the scientific significance of such empirical results. A related problem arises from a habit of assuming “exactly zero effect of site, experimenter, stimuli, task, instructions, and every other factor except subject” (Yarkoni, 2019), resulting in a mismatch between a general verbal statement of a theoretical hypothesis and its statistical expressions, and leading to a “generalizability crisis.”
A heavy dose of training in philosophy of science could improve graduate education.
As for external causes, a strong preference for merely novel results has led many ESBS researchers to recognize the full value of replication research only recently. In fact, funding, publication, and promotion incentives are still biased towards novelty. In technical terms, statistically significant empirical results broadly remain sufficient for publication, without due emphasis on replication. In sum, the ESBS engage in rather questionable practices that prove slow to change.
You also claim there is a lack of general theory in ESBS to provide more accurate predictions. Can this kind of a theory be developed bearing in mind that the disciplines in ESBS are widely differentiated in terms of research methods and what can be the main characteristics of this general theory?
Part of the challenge in the MTR project is to contribute worked-out answers to this question. The project engages with the few examples of rudimentary theories that the ESBS have developed. We bring (niche-)research from the philosophy of science to bear on these theories, specifically the semantic view of empirical theories. We are confident that our project’s results can improve how ESBS researchers theorize their research areas. As the MTR project has started only very recently, however, it’s simply too early to tell.
Are the current standards in academia about making publications or having funding among the obstacles to reach more accurate results in ESBS and what can some of the alternatives be in order to change the current system?
Pairing publication pressure with a merely novelty-seeking research culture certainly does not help, for it leads to making one-off “discoveries” without engaging in replication or theory construction. This has downstream effects, for instance where statistical training of ESBS researchers is tailored accordingly, or where the poor quality of such training leads to an honest but entrenched misunderstanding of fundamental notions (Gigerenzer, 2018). A heavy dose of training in philosophy of science could improve graduate education in the ESBS. Rather than applying the “follow the crowd”-heuristics, moreover, PhD-students and post-doctoral researchers should question critically what senior researchers find normal today. Even as training, incentives, and statistical techniques would become more sensitive to the value of replicating previous experimental results, however, the main obstacle in the ESBS appears to be a lack of serious theory-construction efforts. Until the mass of data the ESBS are producing bear on theory-construction, there is no reason to hope for theoretically progressive ESBS research anytime soon.
How can “scientific success” be defined with this lack of general theory or should it be defined at all?
Definitions do tend to change, of course. Yet they always add rigor and clarity. For the ESBS, scientific success requires the ability to derive point-specific predictions from empirical theories that subsume well-replicated experimental effects, thus contributing to explaining, and intervening on, focal phenomena. The practical value of theoretical knowledge indeed rests entirely on successful intervention. Point-predicting theories must therefore integrate seemingly disparate data-sets, or differentiate between them. In any case, they must both retrodict old data and predict new data. Without such theories, scientific success is hard to define and hard to achieve.
How can philosophy of science help in developing more trustworthy insights to arrange an ESBS research program?
Scientific knowledge must be replicable, generalizable, but nevertheless open to revision. The philosophy of science has long recognized that induction and prediction are distinct notions. ESBS researchers, however, tend to conflate them, taking it for granted that inductive knowledge learnt from past data substitutes for a theory based prediction of new data. Particularly Bayesian methods embrace this idea. But predictions derive from theoretical knowledge, while induction may practically succeed without theoretical accountability. Induction is the process of arriving at a parameter value, a theory then predicts this value in a specific empirical condition, and tests it to confirm the prediction against new data. For this reason confirmation can only be a confirmation of a theory-derived point-specific prediction. Theories that fail to offer point-specific predictions can therefore never be well-confirmed by data.
A meta-analytical pooling of object-level studies into a global estimate is prone to misestimate the true effect.
Our own approach, the research program strategy (RPS) (Witte & Zenker, 2017; 2018; Krefeld-Schwalb et al., 2018), combines the best available statistical methods from Frequentism and Bayesianism. RPS relies on insights from Lakatos’ (Lakatos, 1968; 1970) and Laudan’s (Laudan, 1981) philosophy of scientific research programs, as well as the semantic approach to reconstructing theoretical structures. The philosophy of science teaches, after all, that theory construction and evaluation are informed by continuous efforts at reconstructing historically older theories, especially those that offer successful predictions. Moreover, RPS stresses the role of collaboration for good science. Collaboration is the obvious way to increase the small sample sizes that are typical in the ESBS, and the replication crisis cannot be overcome without gathering much larger samples. Data-sets that an individual lab collects must therefore be integrated into larger sets (Stanley et al., 2018). But current meta-analytical methods focus on statistically significant empirical results, while neglecting various factors that reduce the experimental conditions’ sensitivity. Induction analysis as proposed in RPS estimates such factors, and seeks to correct them. Induction analysis thus is an improved version of meta-analysis (see the next question).
What are the main points of the research program strategy you will develop? You suggest developing induction analysis as an alternative method to meta-analysis, what are the differences between the two?
It should be clear that replicable results are more important than novel results, and that one’s ability to predict the former makes them even more important. RPS accepts this fully. RPS itself is a fairly sophisticated combination of two different standard approaches to statistical inference: the frequentist null-hypothesis significance testing (NHST) and the Bayesian hypothesis testing (BHT) approach. RPS joins both approaches into an all-things-considered superior approach that combines an optimal way of learning the numerical value of an empirical parameter from data, on one hand, with an optimal way of confirming theoretical hypotheses by data, on the other. Given the replication crisis, if a standard meta-analysis today combines NHST- and BHT-results into a global result, then three problematic conditions apply. First, published object-level ESBS-studies mostly report non-replicable effects.
Until ESBS-researchers mostly publish replicable results, one should be cautious with investing confidence.
Second, published ESBS-studies predominantly report statistically significant effects; the number of unpublished non-significant effects, however, is unknown. (This alone has implications for the public perception of science.) Third, empirically observed effects are heterogeneous and typically arise under different experimental conditions. Given these three conditions, a meta-analytical pooling of object-level studies into a global estimate is prone to misestimate the true effect, because underpowered studies and an inadequate inclusion of unpublished studies with smaller effect sizes must lead to overestimating the global effect size. To clarify the extent of misestimation is part of what our work on induction analysis aims to achieve.
Do you think the confidence crisis in the ESBS can also cause a loss of confidence in society towards these sciences and their results?
One should not forget that the ESBS did also uncover experimental effects that are stable enough to have some confidence in them. In fact, such results are applied in various praxes today, influencing decision-making for instance in marketing, human resource management, or therapy. Generally, however, a statistically significant result being published in a quality-controlled scientific journal is an insufficient reason to trust the result. Unless the result is well-replicated, one should assign low confidence to it. This message bears repeating. Even the Nobel prize winner Daniel Kahneman has admitted, in 2017 (Schimmack et al., 2017), that his 2011 bestseller “Thinking fast and slow” did rely on far too many non-replicated results. Until ESBS-researchers mostly publish replicable results, one should be cautious with investing confidence.
Lakatos, I. (1968). Criticism and the Methodology of Scientific Research Programmes. Proceedings of the Aristotelian Society, 69, 149–186.
Lakatos, I. (1970). Falsification and the Methodology of Scientific Research Programmes. In I. Lakatos & A. Musgrave (Eds.), Criticism and the Growth of Knowledge (pp. 91-195). London: Cambridge University Press.
Laudan, L. (1981). A confutation of convergent realism. Philosophy of Science,48(1), 19-49.
Meehl, P. E. (1978). Theoretical risks and tabular asterisks: Sir Karl, Sir Ronald, and the slow progress of soft psychology. Journal of Consulting and Clinical Psychology, 46(4), 806–834. https://doi.org/10.1037//0022-006x.46.4.806
Pashler, H., & Wagenmakers, E. (2012). Editors’ Introduction to the Special Section on Replicability in Psychological Science: A Crisis of Confidence?. Perspectives on Psychological Science, 7(6), 528–530. https://doi.org/10.1177/1745691612465253.
Stanley, T. D., Carter, E. C., & Doucouliagos, H. (2018). What meta-analyses reveal about the replicability of psychological research. Psychological Bulletin, 144(12), 1325–1346. https://doi.org/10.1037/bul0000169
How to apply cognitive models to deliver externally valid descriptions of cognitive processes
AK-S is a post-doctoral researcher at Columbia Business School. Her main research interest lies in understanding the fundamental cognitive processes of decision making, to understand observable and relevant phenomena outside of the lab.