Haplotype metrics and functional data mining from high-throughput SNP genotyping: A bioinformatics approach

Goodswen, Stephen James; Gondro, Cedric; Werf, Julius Van Der; Kadarmideen, Haja

Title

Publication Date

2011

Author(s)

Goodswen, Stephen James

Gondro, Cedric

( supervisor )
OrcID: https://orcid.org/0000-0003-0666-656X
Email: cgondro2@une.edu.au
UNE Id une-id:cgondro2

Werf, Julius Van Der

( supervisor )
OrcID: https://orcid.org/0000-0003-2512-1696
Email: jvanderw@une.edu.au
UNE Id une-id:jvanderw

Kadarmideen, Haja

Type of document

Thesis Masters Research

Language

en

Entity Type

Publication

UNE publication id

une:9424

Abstract

The aftermath of the Human Genome Project has generated new revolutionary techniques and equipment such as high throughput measurement tools for collecting biological information. One notable tool is a microarray that can be used to genotype thousands of single nucleotide polymorphisms (SNPs) in one run. The main aim of the thesis was to implement a bioinformatics approach to transform biological data generated from high-throughput SNP genotyping into useful information. One of the main outcomes from whole genome association studies (WGAS) is a subset of statistically significant SNPs and a major challenge to a researcher is minimising false positive rates while maintaining the power to identify true positive associations. The need for a SNP annotation tool to assist a researcher in making informed judgments as to whether a significant SNP could be a causal variant or in linkage disequilibrium (LD) with a causal variant, provided the motivation to develop FunctSNP. The thesis describes the development of FunctSNP, which is an R package that provides the user interface to custom built species-specific databases. These local relational databases contain SNP data together with functional annotations extracted from online resources. The databases are scheduled for automatic creation or updated periodically by a suite of Perl scripts called dbAutoMaker. The thesis also describes the development of dbAutoMaker. The use of FunctSNP is illustrated with a livestock example. WGAS relies on a natural phenomenon of linkage disequilibrium between SNP markers and causal variants. For WGAS to be applied successfully there is a need to understand the extent and distribution of LD across the entire genome in a population. The need to know how LD (and haplotype diversity) varies from one region or population to another provided the motivation to develop SNPpattern. The thesis describes the development of SNPpattern, which is the collective name for a suite of Perl scripts essentially designed to group, count, and compare SNP allele patterns of various block sizes. Differences in SNP allele block frequency are used as a measure of haplotype diversity within and between groups. The use of SNPpattern is illustrated on sheep breeds.

Link

link

Haplotype metrics and functional data mining from high-throughput SNP genotyping: A bioinformatics approach

Files: