Confronting the streetlight effect in biology

The act of asking the 'right' scientific question is hailed as one of the most essential skills of a scientific investigator. However, the choice of question, and how it is investigated, is highly dependent on factors such as cost, available tools, and prior experience. Here, Preston sheds a light on how these and other forces coalesce to create the “streetlight effect” in science, how this effect shapes the corpus of scientific literature, and notable discoveries that defy this bias.

Decision-making in scientific projects often hinges on risk-benefit tradeoffs under time and resource constraints. For example, let’s say that we conduct a genome-wide screen for modifiers of cell death in a disease model. We find that the top 10 hits of this screen are poorly studied genes; some have unknown function, others have few reagents available to study them, and yet others have had no prior linkage to human disease. Meanwhile, the eleventh hit (Gene A) is a well-studied gene that has previously been linked to other human diseases and has abundant community resources already developed for studying it. We choose to follow up Gene A for further validation and find a strong mechanistic link with our disease process, leading to a nice paper and further interest in the field in developing Gene A as a drug target.

At major decision points, we often weigh a tradeoff between a path that is “scientifically ideal” or potentially more interesting or novel versus one that is lower risk. Here, one could argue that the “ scientifically ideal” path would be to chase the top-ranked hits. On the other hand, following up Gene A requires far less resource development or functional characterization, thereby saving significant resources and time. Gene A’s prior linkage to human disease likely means there is prior interest in its potential for disease treatment, thereby increasing the likelihood that it would be further developed as a therapeutic target. Furthermore, genetic screens can often be noisy, and even with measures to augment statistical robustness there is the question of whether certain hits - especially those in poorly characterized genes - are spurious. As such, one could compellingly argue that following up Gene A is the best decision after taking real life considerations into account.

However, it is important to ask what the effects of these decisions are. By following the path of least resistance and studying already well-characterized genes, potentially interesting but understudied genes continue to be neglected as the rich proteins get richer [1]. For example, 50 or so protein kinases considered ‘hot’ in the 1990s dominated 65% of kinase-related papers in 2009 despite representing a mere 10% of all kinases in the human genome [2]. As recently as 2020, up to 9.6 percent of human protein-coding genes annotated in the genome lacked high confidence confirmation by any protein detection method [3]. While it is certainly possible that the most well-studied genes are simply those genes that are more “important,” these poorly characterized genes also likely include many functionally important genes with disease significance; approximately ⅙ of genes essential for cell survival in a human cell line had no known function in 2015 [4], despite many of them having appeared as hits in genome-wide association studies or genomic screens [5]. As a result, broad swathes of functionally significant biology that could have profound importance for human health remain poorly understood.

This sort of phenomenon falls into a broader type of observational bias colloquially known as the “streetlight effect.” The name refers to an old, well-known story:

A drunk man is looking for his keys under a streetlight. A curious passerby stops and helps the man look under the streetlight. After a few minutes, the good Samaritan asks if the man is sure that he lost the keys there. The drunk man replies “no, I lost them in the park up the street.” When asked why he is searching under the streetlight then, the man replies, “this is where the light is brightest.”

There are numerous such examples of this phenomenon in biomedical research, where we look for answers where the light is brightest rather than where the ‘true’ answer may be more likely to reside. As described above, there is the bias in favor of previously well-studied or well-annotated proteins. There is the fact that most biology research is done with a relatively small number of model organisms and cell lines compared to the immense biodiversity of this planet [6][7]. And there are technical biases where the limitations of our tools influence what we choose to measure. For example, single-cell level studies to date have been almost exclusively conducted with RNA given our current inability to amplify proteins the way we can do for nucleic acids [8], despite the known fact that mRNA and protein levels are poorly correlated [9][10]. Furthermore, the vast majority of spectra detected via mass spectrometry cannot be identified in both proteomic (~75%) and metabolomic (~80-98%) studies [11][12][13], suggesting that significant proportions of the proteome and metabolome are either not known or are unreliably quantified.

Of course, walking down well-traveled paths is advantageous and even desired in many situations. In method development papers, it is critical to demonstrate that the new method can replicate elements of known biology, which leads to another light being shone on a well-scrutinized gene or pathway. And in the example of the genetic screen above, why dedicate potentially years of training trying to elucidate the function of an unknown gene and develop new reagents to study it, only for it to potentially end up failing to be a good target for reasons you could not predict from the outset? Furthermore, the more novel an argument, the higher the burden of data for proving its significance. In an era of high throughput, large-scale data generation (see Deborah’s and Carla’s post), one can often find a scientifically fruitful and pragmatically tractable path to take.

If taken too far, however, this bias in our decision making can limit scientific innovation. For example, the number of foundational scientific discoveries facilitated by work done in atypical model organisms is nearly uncountable, including:

Beyond the discovery of foundational biology and tools in unexpected model organisms, oftentimes the most novel discoveries occur from serendipitous observations of unexplained phenomena. Tyrosine phosphorylation, for example, was discovered when it was noticed that a phosphoprotein was migrating just a few millimeters differently from what one would expect from a phosphothreonine (personal correspondence with M. Krieger at MIT, and [18]), and microRNAs were first observed when it was noted that a gene known to control C. elegans larval development unexpectedly did not encode for a protein, but rather encoded a pair of small RNA molecules [19]. It is rare that paradigm-changing discoveries occur from studying phenomena or genes that are already well-studied. While we need to be mindful of survivorship bias when considering these success stories - there are innumerable examples of failed attempts to go off the beaten path - they demonstrate that many of the discoveries that could transform our understanding of biology or treatment of human disease lay in relatively overlooked parts of biology.

How can we increase the exploration of poorly illuminated aspects of biology? To answer this question, we must first identify the fundamental drivers for pressures that incentivize us to stick to well-traveled roads. I believe that there are five distinct underlying forces at play.

  1. Resource bottlenecks: at their most basic, these include the amount of financial, equipment, core facility, community/commercial resources, and accessible expertise available. Defined more broadly, this can also reflect the fact that time is always limited, such as with training timeline constraints, life considerations, and pressures from funding agencies to publish.
  2. Technical limitations: such as the examples mentioned above of using transcriptomics instead of proteomics for single-cell work, and the presence of unidentified spectra in mass spec. These limit what is easiest or even feasible for us to measure and can guide us to, for example, measure RNA levels for a scientific question for which protein-level measurements are more appropriate.
  3. Signal vs. noise: all biological measurements come with noise, and it can be difficult to identify true biological signal from spurious noise. These challenges are compounded by the high-throughput nature of data collection, where often we test many thousands of individual hypotheses in parallel (e.g., an RNAseq experiment identifying differentially expressed genes). Findings that are unexpected or do not make immediate sense are often attributed to being a result of noise.
  4. Prior experience bias: we naturally prefer techniques, genes, and pathways that we have previously been exposed to. Given the demands of modern biology research, we often find ourselves increasingly specialized in disciplines or techniques and we are unavoidably shaped by our prior experiences in our decision making.
  5. Paradigm effects: As argued by Thomas Kuhn in The Structure of Scientific Revolutions, much of science is done under a shared framework of theories, facts, and methods (a paradigm), and most scientific decision making and interpretation of results is done under the umbrella of the dominant paradigm [20]. One example of this is the genetic paradigm of cancer, whereby cancer is caused by a stepwise accumulation of somatic driver mutations and clonal expansion, which has transformed our ability to understand and treat cancers but, as some have argued, may have led to the overshadowing of other aspects of tumor biology [21].

Time pressure, in particular, is something that I have grappled with throughout my dual-degree physician-scientist training. Major decisions have been driven as much by what is most feasible and achievable perhaps more so than by what is the most interesting to me scientifically. And when looking forward to how to pursue research in the future during/after residency, the time window for obtaining training fellowships (such as a UE5 or K award) and publishing is highly limited. In our discussions leading up to this post, many of us on the Emergent Properties team find that our ambitions with prospective postdoctoral projects are constrained by these pressures.

Tackling all of these drivers of the streetlight effect bias would take another blog post and, in the case of paradigm bias, may represent an unavoidable feature of science. Here, I’d like to highlight some potential efforts that could alleviate the degree to which resource limitations drive us away from studying novel elements of biology. A significant amount of the day-to-day decision-making tends to boil down to factors that largely fit under the umbrella of resource limitations: choosing to ignore genes that are poorly annotated, choosing mice over another mammalian system because of the number of genetic and viral tools available, or following up a specific protein because there are antibodies available. Addressing these resource limitations will likely require a concerted effort by consortiums such as the Understudied Proteins Initiative [5], the Canadian Rare Disease Models and Mechanisms Network [22], and various antibody validation resources and initiatives [23]. Investment and participation in these consortium efforts will reduce the barrier to entry for studying previously understudied genes or model organisms.

Furthermore, transformative findings are often unexpected and initially difficult to interpret, requiring project pivots and long periods of uncertain work before their full impact becomes clear. However, funding agencies are notoriously risk averse and often favor work with more predictable outcomes, while metrics used for hiring and promotions are driven in large part by the ability to reliably produce positive results (with negative results remaining unpublished). These pressures are all the more acute given the current day turbulence in the science policy domain, with extensive threats to biomedical research funding including termination of already approved research grants, attempts to curtail research overhead payments, taxes on university endowments, and draconian proposed cuts to the NIH budget. The net effect is that there exist strong incentives to use one’s limited time and resources to pursue lower risk lines of inquiry that are likely to be productive but more incremental in their findings.

More flexible funding structures that fund groups of individuals rather than specific projects, and provide funding over longer time horizons will likely alleviate some of these pressures. For example, Y Combinator, a major tech startup incubator that has launched many now-ubiquitous technology companies, is famous for prioritizing strong founder teams and characteristics rather than starting ideas. There already exist a limited number of these sorts of funding programs at the investigator (eg. HHMI Investigators, NIH R35, Chan-Zuckerberg Investigators, and ERC Advanced Grants) and at the trainee level (eg. Hertz Fellowship, NSF Graduate Fellowship, Damon Runyon Postdoctoral Fellowships). These sorts of funding structures are rare, and largely seen as “prestige” positions awarded to previously productive scientists. However, expanding the scope of this type of funding model may provide significantly more flexibility for pursuing unexpected findings and alleviate some of the risks of pursuing high risk high reward projects.

In our first blog post, we wrote about Francois Jacob’s distinction between “night science” and “day science”, where night science is the messy realm of novel ideas and possible hypotheses [24]. In many ways, the streetlight effect represents the uneasy intersection between night science and day science - the tension between open-ended curiosity that leads us to want to study the poorly understood dark zones of biology, and the pragmatic constraints that often drive us to study something new but not too unknown or risky. To overcome these pressures, structural reforms such as more flexible, longer-term funding models that provide flexibility in project direction, and more robust investment and development of better community resources are essential. Yet, these changes will only go so far without a cultural shift: one that values risk-taking, tolerates failure, and recognizes that genuine innovation often begins in the darkness of the park rather than under the light of the street. Building such a culture means giving ourselves the space to wander beyond the streetlight, to trust that what we find there may one day illuminate biology in ways we cannot yet imagine.

References

Haynes WA, Tomczak A, Khatri P. Gene annotation bias impedes biomedical research. Sci Rep. 2018;8: 1362.
Edwards AM, Isserlin R, Bader GD, Frye SV, Willson TM, Yu FH. Too many roads not taken. Nature. 2011;470: 163–165.
Adhikari S, Nice EC, Deutsch EW, Lane L, Omenn GS, Pennington SR, et al. A high-stringency blueprint of the human proteome. Nat Commun. 2020;11: 5301.
Wang T, Birsoy K, Hughes NW, Krupczak KM, Post Y, Wei JJ, et al. Identification and characterization of essential genes in the human genome. Science. 2015;350: 1096–1101.
Kustatscher G, Collins T, Gingras A-C, Guo T, Hermjakob H, Ideker T, et al. Understudied proteins: opportunities and challenges for functional proteomics. Nat Methods. 2022;19: 774–779.
Russell JJ, Theriot JA, Sood P, Marshall WF, Landweber LF, Fritz-Laylin L, et al. Non-model model organisms. BMC Biol. 2017;15: 55.
de Magalhães JP. The big, the bad and the ugly: Extreme animals as inspiration for biomedical research: Extreme animals as inspiration for biomedical research. EMBO Rep. 2015;16: 771–776.
Bennett HM, Stephenson W, Rose CM, Darmanis S. Single-cell proteomics enabled by next-generation sequencing or mass spectrometry. Nat Methods. 2023;20: 363–374.
Liu Y, Beyer A, Aebersold R. On the dependency of cellular protein levels on mRNA abundance. Cell. 2016;165: 535–550.
Cote AJ, McLeod CM, Farrell MJ, McClanahan PD, Dunagin MC, Raj A, et al. Single-cell differences in matrix gene expression do not predict matrix deposition. Nat Commun. 2016;7: 10865.
Griss J, Perez-Riverol Y, Lewis S, Tabb DL, Dianes JA, Del-Toro N, et al. Recognizing millions of consistently unidentified spectra across hundreds of shotgun proteomics datasets. Nat Methods. 2016;13: 651–656.
El Abiead Y, Rutz A, Zuffa S, Amer B, Xing S, Brungs C, et al. Discovery of metabolites prevails amid in-source fragmentation. Nat Metab. 2025;7: 435–437.
da Silva RR, Dorrestein PC, Quinn RA. Illuminating the dark matter in metabolomics. Proc Natl Acad Sci U S A. 2015;112: 12549–12550.
Ishino Y, Krupovic M, Forterre P. History of CRISPR-Cas from encounter with a mysterious repeated sequence to genome editing technology. J Bacteriol. 2018;200. doi:10.1128/JB.00580-17
Furman BL. The development of Byetta (exenatide) from the venom of the Gila monster as an anti-diabetic agent. Toxicon. 2012;59: 464–471.
Blackburn E. Telomeres and Tetrahymena: An interview with Elizabeth Blackburn. Dis Model Mech. 2009;2: 534–537.
Unwin N. Nicotinic acetylcholine receptor and the structural basis of neuromuscular transmission: insights from Torpedo postsynaptic membranes. Q Rev Biophys. 2013;46: 283–322.
Eckhart W, Hutchinson MA, Hunter T. An activity phosphorylating tyrosine in polyoma T antigen immunoprecipitates. Cell. 1979;18: 925–933.
Bartel DP. MicroRNAs: genomics, biogenesis, mechanism, and function. Cell. 2004;116: 281–297.
Kuhn TS. The Structure of Scientific Revolutions: 50th Anniversary Edition. Chicago, IL: University of Chicago Press; 2012.
Yaffe MB. Why geneticists stole cancer research even though cancer is primarily a signaling disease. Sci Signal. 2019;12: eaaw3483.
Yamamoto S, Kanca O, Wangler MF, Bellen HJ. Integrating non-mammalian model organisms in the diagnosis of rare genetic diseases in humans. Nat Rev Genet. 2024;25: 46–60.
Kahn RA, Virk H, Laflamme C, Houston DW, Polinski NK, Meijers R, et al. Antibody characterization is critical to enhance reproducibility in biomedical research. Elife. 2024;13: e100211.
Yanai I, Lercher M. Night science. Genome Biol. 2019;20: 179.