Mitochondrial DNA (mtDNA) heteroplasmy dynamics

Most mammalian cells contain hundreds to thousands of copies of mtDNA, which is inherited matrilineally , and which is subject to a higher mutation rate compared to nuclear DNA. This means wildtype and mutant mtDNA can coexist within the same sample in a state known as heteroplasmy. Many pathogenic mtDNA mutations are heteroplasmic, and higher heteroplasmy fractions, i.e. higher proportions of mutant mtDNA in the affected tissue, are often associated with worse phenotypes. I am interested in how heteroplasmy fractions change over time, as well as from mother to offspring, both at the organism and bulk-tissue level, as well as in single cells. A particular challenge in this area is that heteroplasmy measurements have relatively high variance and often a single measurement is obtained per sample.

Uncertainty in protein-protein interaction networks

Protein-protein interaction data is obtained through a range of experimental techniques, each of which is subject to experimental error. When we build and analyse protein interaction networks using such data, we often ignore the uncertainty associated with it. It is therefore not always clear whether the conclusions we draw from such analysis are biologically relevant, or whether they are artefacts of the data. I am generally interested in studying the robustness of network analysis pipelines to data uncertainty.

Publication bias in protein-protein interaction networks

PPI networks are usually built using data collected from a range of primary sources, including both low- and high-throughput studies. This data is subject to centralised curation and standardisation efforts such as BioGRID. Data curation generally aims to minimise false positives i.e. erroneous interaction records, but little is done to account for false negatives and missing data.

To date, nearly a quarter of PPI data comes from focussed, low-throughput studies. This means parts of the proteome that are of particular research interest, e.g. protein kinases, are overrepresented in interaction databases. Meanwhile, even in model organisms such as S. cerevisiae many protein pairs do not cooccur in a single study and so may have never been screened for interaction.

This publication bias has a structural effect on PPI networks, as proteins of known function and specific research interest will have denser neighbourhoods. I work on developing methods for assessing the effect of this bias on downstream network analysis.