Projects/Publications

Building a genetic diversity predictive framework for conservation policy

Predicting genetic diversity trajectories in the Anthropocene

Genetic diversity is crucial for species' ability to adapt and has been targeted for protection in the UN's Global Biodiversity Framework. However, predicting genetic diversity loss is challenging. Using genetic data from 18 species, we found that genetic diversity declines slowly after habitat loss but continues for decades even if habitats are protected. Our model, combined with habitat data and conservation indicators, projects significant genetic diversity losses in 13,808 species, with short-term losses of 13-22% and long-term losses of 42-48%. These findings suggest that protecting current habitats alone is not enough to maintain genetic health, emphasizing the need for ongoing genetic monitoring and a new predictive framework for global genetic biodiversity


encode

An encyclopedia of enhancer-gene regulatory interactions in the human genome

We developed a massive resource of over 13 million enhancer-gene interactions across 352 cell types and tissues to better understand gene regulation and the impact of human genetic variation on disease. Using data from CRISPR experiments, chromatin measurements, and genetic studies, we created a new predictive model, ENCODE-rE2G, which excels at predicting how enhancers regulate their target genes. This model helps build a detailed encyclopedia of enhancer-gene interactions, revealing how these networks function and linking genetic variants to diseases. We discovered that not only enhancer activity and 3D contacts but also promoter types and enhancer synergy play a role in gene regulation. Our resource of genome-wide maps, benchmarking tools, and predictive models will be invaluable for future genetic and regulatory studies.


mar

Power and limitations of the mutations-area relationship (MAR) to assess within-species genetic diversity targets for post-2020 Sustainable Development Goals

To evaluate the United Nation’s preliminary post-2020 sustainable goals on protecting high levels of genetic diversity per species, Exposito-Alonso et al. (2022) proposed a new framework to predict a species’ loss of genetic diversity given its loss of habitat area. This method, called the mutations-area relationship (MAR), is analogous to the species-area relationship (SAR), often used to assess and design species diversity targets. To advise conservation practitioners, here we discuss the power of MAR, its limitations, and potential improvements


coal

An analytical solution to estimating population divergence

We have developed a method called the G(A|B) method to estimate when different populations split from their common ancestors. By analyzing DNA sequences, this method helps determine these separation times without needing to know the full history of population sizes or other complex assumptions. Our approach builds on previous research and offers a way to test if ancient genomes are directly related to modern human populations. Unlike some past methods, our G(A|B) method doesn't require distinguishing between certain types of genetic variants or making assumptions about population history before they diverged. We also compare our method with two other existing methods and show that our approach can test whether the population history can be represented as a straightforward family tree when applied to three or more groups. We demonstrated the effectiveness of our method using the genomes of two Neanderthals and a Denisovan.


Linking risk variants to disease genes

Linking risk variants to disease genes

We've discovered thousands of genetic regions linked to human diseases and traits, most of which affect enhancers—parts of DNA that control gene activity. But, without detailed maps, these connections were unclear. Using our Activity-by-Contact (ABC) Model, we've created maps for 131 cell types and tissues to pinpoint which enhancers control which genes. These maps help us understand the genetic basis of diseases like inflammatory bowel disease (IBD). Our ABC Model has connected over 5,000 genetic signals to more than 2,000 genes across 72 diseases and traits, revealing how certain genetic variants impact health. For instance, an IBD-related variant was found to control the PPIF gene, affecting cell energy management. Overall, our research provides new insights into genome regulation and disease mechanisms.

Previous Projects

A deep learning approach to predicting gene expression

A deep learning approach to predicting gene expression

Improving the general baselines and approaches to predicting gene expression via utilizing LINCS Consortium.

Phylo: a novel way to solving the multiple sequence alignment problem

Phylo: a novel way to solving the multiple sequence alignment problem

Solving a multiple sequence alignment problem by utilizing a human crowd-computing platform.

A Tutorial: Training Generative Adversarial Networks

A Tutorial: Training Generative Adversarial Networks

A brief introduction into GANs and tips on training a Deep Convolutional GAN!

Maxcessibility: A Machine-Learning approach to improving building accessibility

Maxcessibility: A Machine-Learning approach to improving building accessibility

Utilizing real time data to facilitate easier accessibility for the physically challenged within communities.

A Classifier for categorizing hand-drawn images

A Classifier for categorizing hand-drawn images

Utilizing CNNs in image recognition.