Mapping the connections between the exposome and phenome

dots logo that says "Data Science of Exposome"

The long-term research goal of Chirag Patel’s Group is to solve problems in human health and disease by developing computational & statistical approaches to reason over large-scale environmental exposure and genomic data.

Untangling exposures, genotypes, and phenotypes

Who we are -- our phenotype -- is determined by the interplay between our genome and exposome. The exposome is a complement of the genome, consisting of the totality of non-genetic factors of behavior and our environment, including dietary nutrients, lifestyle, infectious agents, pollutants, and extreme temperature.

We attempt to dissect the relationship between the genome and exposome in complex phenotypes by using data science and machine learning approaches in large scale data such as electronic health records, biobanks, and real-world data, integrating exposure monitoring, multi-omic measurements to ultimately create a comprehensive map of human health.

Patel et al, JAMA 2014

Meta-Science: research on research

Are findings from high-throughput exposomic and genomic research clinically or biologically useful? For example, after many years of diet-related investigation, including our own here and here, it is a challenge to navigate findings on what the optimal diet might be. How can we do the best job possible to ensure big data science is translatable?

We develop computer intensive approaches to make big data based findings more robust to accelerate meta science, or the “science of science”. We use these data approaches to study how and why we come to the conclusions that we do and if these conclusions are robust. For example, “vibration of effects” attempts to estimate how much arbitrary big data choices can change the outcome of a study and decision making


We are part of the Department of Biomedical Informatics at Harvard Medical School. Chirag Patel participates in DBMI’s educational programs, including:


We are or have been funded by the National Institutes of Health, the National Science Foundation, the Harvard Data Science Institute, and the Harvard-Chan NIEHS Center. We are thankful for the compute and infrastructural support from Sanofi, Amazon Web Services, Microsoft Azure, Oracle Cloud, and Google.