4 minute read

This post is about quickly and painlessly calculating normalised heteroplasmy shift in R. It is intended broadly for lab scientists doing their own data analysis. You can scroll to the end for the relevant lines of code.

One of the key processes we study in the lab and more generally within the mtDNA research community is how heteroplasmy levels change over time or from mother to offspring. As a brief reminder, most of our cells contain hundreds or thousands of mtDNA molecules, and it is possible that some but not all of them carry a particular mutation, such as m.5024C>T in mice or m.3243A>G in humans. This admixture of wildtype and mutant mtDNA is called heteroplasmy, and we quantify it with the proportion of mutant mtDNA in either bulk tissue samples or single cells. Consistent changes in heteroplasmy fractions are indicative of selection pressure. For example, we know that female mice carrying the pathogenic m.5024C>T mutation at high heteroplasmy tend to have lower-heteroplasmy offspring (Zhang et. al, 2021).

How do we quantify this change from mother to offspring? The obvious thing to do is to take an arithmetic difference: if a female m.5024C>T mouse with 70% heteroplasmy had a pup with 60% heteroplasmy, then the overall change is a drop of 10%. Unfortunately, this difference is not very helpful for the study of selection pressure. Here is an example which illustrates why. Suppose instead that the mother has only 5% heteroplasmy to begin with. Even if the exact same selection forces were acting against the mt.5024C>T mutation, there is no way we would see a 10% drop in the pup heteroplasmy, as that would make it negative! The heteroplasmy difference then depends not only on the selection pressure, but also on the baseline, i.e. the mother heteroplasmy.

To avoid this problem - and for other maths reasons which I will not go into today - we instead work with another quantity known as the normalised heteroplasmy shift. What is nice about this measure is that it does not depend on the baseline. This means it allows us to fairly compare between pups born to the 70% mother to pups born to the 5% mother and ask whether they are subject to the same selection pressure. Burgstaller et al. (2018) discuss normalised shift in the context of mtDNA rather extensively. Both in this paper and in others, normalised shift is usually written as

\[nH = log\left(\frac{H_{pup}(1-H_{mother})}{(1-H_{pup})H_{mother}}\right),\]

where \(H_{mother}\) and \(H_{pup}\) are the heteroplasmy fractions of the mother and pup respectively. With some mathematical trickery we can rearrange the above and express it in terms of the heteroplasmy log-odds, also known as the logit function:

\(nH = logit(H_{pup}) - logit(H_{mother})\).

The resulting shift \(nH > 0\) is positive when pups have higher heteroplasmy than their mothers, and negative \(nH < 0\) when pups have lower heteroplasmy. In our experiments we often think of consistent negative shift as evidence for selection against the mtDNA mutation. In addition to mouse mother-pup pairs, we can also use normalised shift to compare changes over a period of time, or across tissues, for example. In all these cases we need two measurements: one in the condition of interest (e.g. the pup, or after treatment with a drug, or at a certain age), and one in the baseline (e.g. the mother, or before treatment with a drug, or at birth).

Why am I telling you this? Over the last year or so I have seen people use different tools to calculate heteroplasmy shift, most often making use of Excel formulae before loading the calculated values into R or Prism to plot. This process can be frustrating and error-prone, particularly if you have to remind yourself of the exact formula with all the logs and fractions every time. Luckily, you can quickly and easily calculate heteroplasmy shift in R! Let’s start with some data.

het_df <- data.frame(
  "H_pup" = c(60, 70, 80, 2, 3, 4),
  "H_mother" = c(70, 70, 70, 5, 5, 5))

Above I have recorded mother and pup heteroplasmies as whole percentages, the same way you would find them in most spreadsheets. We will need to divide these values by 100 to get the heteroplasmy fractions, and we will then use the built-in qlogis() function to calculate the log-odds without any logs and brackets. The complete shift calculation looks like this:

qlogis(het_df$H_pup / 100) - qlogis(het_df$H_mother / 100)

If you want to plot your data or do other analysis with it, you probably want to add the normalised shift as a column in your data frame, for example called nH. You can do it like so:

library("tidyverse")

het_df <- het_df %>% 
  mutate(nH = qlogis(H_pup / 100) - qlogis(H_mother / 100))

Easy-peasy! No complicated fractions and counting brackets only to discover you have missed a division sign somewhere…