Journal of Psychopharmacology, Vol. 20, No. 4 suppl, 19-26 (2006)
Laboratory of Neurogenetics, National Institute on Alcohol Abuse and Alcoholism, National Institutes of Health, Bethesda, MD, USA
Abstract
A serious problem with case-control studies is that population subdivision, recent admixture and sampling variance can lead to spurious associations between a phenotype and a marker locus, or indeed may mask true associations. This is also a concern in therapeutics since drug response may differ by ethnicity. Population stratification can occur if cases and controls have different frequencies of ethnic groups or in admixed populations, different fractions of ancestry, and when phenotypes of interest such as disease, drug response or drug metabolism, also differ between ethnic groups.
Although most genetic variation is inter-individual, there is also significant inter-ethnic variation. The International HapMap Project has provided allele frequencies for approximately three million single nucleotide polymorphisms (SNPs) in Africans, Europeans and East Asians. SNP variation is greatest in Africans. Statistical methods for the detection and correction of population stratification, principally Structured Association and Genomic Control, have recently become freely available. These methods use marker loci spread throughout the genome that are unlinked to the candidate locus to estimate the ancestry of individuals within a sample, and to test for and adjust the ethnic matching of cases and controls.
To date, few case-control association studies have incorporated testing for population stratification. This paper will focus on the debate about the quantity and methods for selection of highly informative marker loci required to characterize populations that vary in substructure or the degree of admixture, and will discuss how these theoretically desirable approaches can be effectively put into practice.