Accounting for population stratification in genotype samples is important to avoid false inference from genome wide association studies. It is usually quantified using model-based ancestry estimation (e.g. ADMIXTURE; Alexander et al. (2009)), which has disadvantages with regard to model assumptions and processing time. This article describes a two step procedure for estimating population stratification. In the first step a spacial cluster algorithm is used to detect clusters of genetically homogeneous animals. In a subsequent step genotypes are described as linear functions of within-cluster allele frequencies. The approach was tested on a cattle data set which consisted of 11,639 real genotypes from 11 breeds and 5,000 artificially generated cross-bred genotypes (F1 to F5). It outperformed results obtained from ADMIXTURE in terms of speed and accuracy. |
|