Optimization of scientific reasoning : а data-driven approach
Optimizacija zaključivanja u nauci : pristup zasnovan na podacima
Author
Sikimić, Vlasta
Mentor
Perović, Slobodan
Committee members
Perović, Slobodan
Zollman, Kevin J. S.
Adžić, Miloš

Metadata
Show full item recordAbstract
Scientific reasoning represents complex argumentation patterns that eventually lead
to scientific discoveries. Social epistemology of science provides a perspective on the
scientific community as a whole and on its collective knowledge acquisition. Different
techniques have been employed with the goal of maximization of scientific knowledge
on the group level. These techniques include formal models and computer simulations
of scientific reasoning and interaction. Still, these models have tested mainly abstract
hypothetical scenarios. The present thesis instead presents data-driven approaches in
social epistemology of science. A data-driven approach requires data collection and
curation for its further usage, which can include creating empirically calibrated models
and simulations of scientific inquiry, performing statistical analyses, or employing datamining
techniques and other procedures.
We present and analyze in detail three co-authored research projects on which the
thesis’ author... was engaged during her PhD. The first project sought to identify optimal
team composition in high energy physics laboratories using data-mining techniques.
The results of this project are published in (Perovic et al. 2016), and indicate that
projects with smaller numbers of teams and team members outperform bigger ones. In
the second project, we attempted to determine whether there is an epistemic saturation
point in experimentation in high energy physics. The initial results from this project
are published in (Sikimic et al. 2018). In the thesis, we expand on this topic by using
computer simulations to test for biases that could induce scientists to invest in projects
5
6
beyond their epistemic saturation point. Finally, in previous examples of data-driven
analyses, citations are used as a measure of epistemic efficiency of projects in high
energy physics. In order to additionally justify and analyze the usage of this parameter
in their data-driven research, in the third project Perovic & Sikimic (under revision)
analyzed and compared inductive patterns in experimental physics and biology with
the reliability of citation records in these fields. They conclude that while citations are
a relatively reliable measure of efficiency in high energy physics research, the same does
not hold for the majority of research in experimental biology.
Additionally, contributions of the author that are for the first time published in
this theses are: (a) an empirically calibrated model of scientific interaction of research
groups in biology, (b) a case study of irregular argumentation patterns in some pathogen
discoveries, and (c) an introductory discussion of the benefits and limitations of datadriven
approaches to the social epistemology of science. Using computer simulations of
an empirically calibrated model, we demonstrate that having several levels of hierarchy
and division into smaller research sub-teams is epistemically beneficial for researchers in
experimental biology. We also show that argumentation analysis in biology represents
a good starting point for further data-driven analyses in the field. Finally, we conclude
that a data-driven approach is informative and useful for science policy, but requires
careful considerations about data collection, curation, and interpretation
Zakljucivanje u nauci ogleda se u složenim argumentativnim strukturama koje u krajnjoj
instanci dovode do naucnih otkrica. Socijalna epistemologija nauke posmatra nauku
iz perspektive celokupne naucne zajednice i bavi se kolektivnim sticanjem znanja. Razlicite
tehnike su se primenjivale u cilju maksimizacije naucnog znanja na nivou grupe.
Ove tehnike ukljucuju formalne modele i kompijuterske simulacije naucnog zakljucivanja
i interakcije. Ipak, ovi modeli su uglavnom testirali hipoteticke scenarije. Sa
druge strane, ova disertacija predstavlja pristupe u socijalnoj epistemologiji nauke koji
se zasnivaju na podacima. Pristup zasnovan na podacima podrazumeva prikupljanje
podataka i njihovo sistematizovanje za dalju upotrebu. Ova upotreba podrazumeva empirijski
kalibrirane modele i simulacije naucnog procesa, statisticke analize, algoritme
za obradu velikog broja podataka itd.
U tekstu predstavljamo i detaljno analiziramo tri koautorska istraživanja u kojima
je autorka disertacije ucestvo...vala tokom doktorskih studija. Prvo istraživanje imalo je
za cilj da odredi optimalnu strukturu timova u laboratorijama fizike visokih energija
koristeci algoritme za obradu velikog broja podataka. Rezultati ovog istraživanja su
objavljeni u (Perovic et al. 2016) i ukazuju na to da su projekti u koje je ukljucen manji
broj timova i istraživaca efikasniji od vecih. U drugom istraživanju smo pokušali da
utvrdimo da li postoji tacka epistemickog zasicenja, kada su u pitanju eksperimenti u
fizici visokih energija. Inicijalni rezultati ovog istraživanja objavljeni su u (Sikimic et al.
2018). U disertaciji produbljujemo ovu temu korišcenjem kompjuterskih simulacija da
7
8
bismo testirali mehanizme pristrasnosti koji navode naucnike da ulažu u projekte iznad
tacke epistemickog zasicenja. Konacno, u prethodnim primerima analiza zasnovanih na
podacima, citiranost je korišcena kao mera epistemicke efikasnosti pojekata u fizici visokih
energija. Da bi dodatno opravdali upotrebu ovog parametra u svojim analizama,
u trecem istraživanju Perovic & Sikimic (under revision) su razmatrali i upore ivali
induktivne šematizme u eksperimentalnoj fizici i biologiji sa pouzdanošcu mere citiranosti
u ovim oblastima. Zakljucili su da, iako su citati relativno pouzdana mera
efikasnosti u fizici visokih energija, to nije slucaj u najvecem delu istraživanja u oblasti
eksperimentalne biologije.
Povrh toga, doprinosi autorke koji su prvi put objavljeni u ovoj disertaciji jesu: (a)
empirijski kalibrirani model naucne komunikacije unutar istraživackih grupa u biologiji,
(b) analiza neocekivanih argumentativnih struktura u otkricima nekih patogena i (c)
uvodna diskusija u pogledu prednosti i ogranicenja pristupa zasnovanih na podacima
u socijalnoj epistemologiji nauke. Korišcenjem kompjuterskih simulacija na empirijski
kalibriranim modelima, pokazujemo da je raslojavanje i podela na manje istraživacke
timove epistemicki korisno za istraživace u eksperimentalnoj biologiji. Tako e, pokazujemo
da je analiza argumenata u biologiji dobra osnova za dalje analize zasnovane na
podacima u ovoj oblasti. Na kraju, zakljucujemo da je pristup zasnovan na podacima
informativan i koristan za kreiranje naucne politike, ali da zahteva pažljiva razmatranja
u pogledu prikupljanja podataka, njihovog sortiranja i interpretiranja