admixture

admixture: A simple accelerated EM implementation of the ADMIXTURE model in R, plus extensions. The ADMIXTURE software is widely used to estimate population structure from genotype data in part because the computation scales well to whole-genome genotype data. For example, the million+ people who have taken the AncestryDNA test have all received their ethnicity estimate from ADMIXTURE. I have developed a simple, alternative implementation of ADMIXTURE that computes maximum-likelihood estimates of the admixture proportions and population allele frequencies using the expectation maximization (EM) algorithm. (See admixture.barebones.R and admixture.barebones.demo.R for a extremely simple, or ”bare bones”, implementation that actually works, albeit slowly!) I use the turboEM library to improve the very slow convergence of EM. The ADMIXTURE software is implemented using a quasi-Newton method, and will typically converge much more quickly to a solution than the EM algorithm. I’ve modified the model to allow for genotype errors, and this seems to help convergence to some extent. In any case, the hope is that this very simple implementation will facilitate development of extensions to ADMIXTURE. One extension I have developed here is a modification to the optimization (M-step) that encourages sparse admixture estimates. This code was tested using R version 3.2.2. The admixture source code repository is free software: you can redistribute it under the terms of the MIT license. All the files in this project are part of admixture. This project is distributed in the hope that it will be useful, but without any warranty; without even the implied warranty of merchantability or fitness for a particular purpose. See file LICENSE for the full text of the license.