Efficient algorithms for using genotypic data

Ferdosi, Mohammad Hossein; van der Werf, Julius; Gondro, Cedric; Tier, Bruce

Title

Publication Date

2016

Author(s)

Ferdosi, Mohammad Hossein

( author )
OrcID: https://orcid.org/0000-0001-5385-4913
Email: mferdos3@une.edu.au
UNE Id une-id:mferdos3

van der Werf, Julius

( supervisor )
OrcID: https://orcid.org/0000-0003-2512-1696
Email: jvanderw@une.edu.au
UNE Id une-id:jvanderw

Gondro, Cedric

( supervisor )
OrcID: https://orcid.org/0000-0003-0666-656X
Email: cgondro2@une.edu.au
UNE Id une-id:cgondro2

Tier, Bruce

Type of document

Thesis Doctoral

Language

en

Entity Type

Publication

UNE publication id

une:19843

Abstract

The aim of this thesis is to explore the specific structure in livestock populations to unravel hidden information such as recombination events and parental origin of markers in the genomic data. This information then can be used to improve the accuracy of prediction of breeding values which is one of the main aims of animal breeding. In the first experimental chapter an efficient method for detecting opposing homozygotes was proposed. This method makes the detection of opposing homozygote for thousands of individuals and millions of markers feasible. An opposing homozygote matrix can be utilised to identify Mendelian inconsistency and to fix pedigree errors. The second experimental chapter used opposing homozygotes between individuals in a half-sib family to identify recombination events in the sire, to impute sire haplotype and to reconstruct haplotype of offspring. The algorithm was compared with other frequently used methods, using both simulated and real data. The accuracy of detecting recombination events and of haplotype reconstruction was higher with this algorithm than with other algorithms, especially when there were genotyping errors in the dataset. For example, the accuracy of haplotype reconstruction was around 0.97 for a half-sib family size of 4 and the accuracy of sire imputation was 0.75 and 1.00 for a half-sib family size of 4 and 40, respectively. In the third experimental chapter hsphase was developed which implements the algorithms used in the first two chapters into an efficient R package. In addition, an algorithm for grouping half-sib families utilising the opposing homozygote matrix was developed and verified with real datasets. The results show that the algorithm can group the half-sib families accurately, however the accuracy was depended on sample size and genetic diversity in the population. The package includes several diagnostic functions to visualise and check half-sib's pedigree, parentage assignments, and phased haplotypes of offspring in a half-sib family. The fourth experimental chapter utilised the half-sib population structure to fix switch errors. The switch error is a common problem in many haplotype reconstruction algorithms where the haplotype phase is locally correct but paternal and maternal strand are not consistently and correctly assigned across the longer segments (or across the entire genome). The algorithm partitions the genome into segments and creates a group matrix which is used to identify the switch points. Then the switches are fixed with a second algorithm. The results showed that this algorithm can fix the switch problems efficiently and increase the accuracy of genome-wide phasing. In chapter five relationship matrices generated from haplotype segments were used to improve the accuracy of predicting breeding values. The haplotypes were partitioned in three ways and with various size. The new relationship matrices were evaluated with three sets of real data and with simulated data. In all cases the accuracy of prediction and log-likelihood were significantly increased although the amount of increase was trait dependent.

Link

link

Efficient algorithms for using genotypic data

Files: