2 minute read

A few months ago I wrote a post about calculating normalised heteroplasmy shift in R from scratch. Scince then I have made a minor update to the kimura R package which now includes out-of-the-box functions for calculating both normalised heteroplasmy shift and normalised heteroplasmy variance.

I originally wrote the package to expand on and optimise a Python module by Wei Wei which was itself a (re-?)implementation of the method described by Wonnapinij et al. (2008). If you want to have a play around with it, you can install it from GitHub in the usual way via devtools::install_github("lbozhilova/kimura"). It was designed to serve two purposes: to generate random numbers from the two-parameter Kimura distribution, which can then be used as synthetic heteroplasmy values, as well as to perform goodness-of-fit tests following Wonnapinij et al. (2008). My main contribution in the R implementation was optimising the calculation of the Kimura probability density function, which now allows both simulations and tests to be performed at scale even with limited computational resources. Plus, I like R, and so do most of my colleagues.

I developed kimura as a side-project for personal use, and decided I might as well make it public. Pleasingly, others have found it to be of use too, including Giannakis et al. (2023) and Broz et al. (2022). Recently, I have been tinkering with the package once again. Giannakis et al. (2023) make some interesting points on possible flaws in the original Wonnapinij et al. (2008) hypothesis testing methodology, which has had me noodling on alternative approaches.

While the testing piece is very much a work in progress and not yet public, I think it would be great if kimura served as a generic heteroplasmy analysis tool, especially for non-computational scientists. With this in mind I have now released a minor update with two new functions: shift() and nvar(). As the names suggest, these can be used to calculate normalised heteroplasmy shift, and normalised heteroplasmy variance. Below is a code snipped which shows generating heteroplasmy data H, calculating the shift from a baseline H0, and calculating the variance, which is related to the b parameter of the Kimura distribution as nvar = 1 - b.

> library("kimura")
> # generate some heteroplasmy data
> H <- round(100 * rkimura(20, 0.5, 0.9))
> H0 <- 50
> # calculate the heteroplasmy shift - this should have mean 0
> H_shift <- shift(H, H0)
> mean(H_shift)
[1] 0.06573089
> # calculate normalised variance; this should be 1 - 0.9 = 0.1
> nvar(H)
[1] 0.1037069

Happy coding! And do let me know, either here or on GitHub, if there are other features you would like to see in kimura.