Human health and disease susceptibility dominated studies for much of the twentieth century. The majority of these studies focused on identifying single genetic and environmental agents that could explain genetic variations, human health, and disease susceptibility. This new century has been characterized by huge advances in our understanding of Mendelian disorders with severe clinical outcomes.
However, the Mendelian paradigm has failed to clarify the genetic contribution to susceptibility to most common chronic diseases. Researchers know that these common chronic diseases have a substantial genetic component due to their familial aggregation. Additionally, studies have demonstrated significant heritabilities for these diseases.
Similarly, environmental and social epidemiological studies have been highly successful in demonstrating the importance of many environmental factors such as exercise, diet, stress, and overall health on disease susceptibility. However, these environmental factors alone, do not fully explain the variation and likelihood of several diseases in different populations.
Only recently, have researchers begun to study in earnest the potential interactions between the genetic and environmental factors that are likely to be contributing to a large fraction of disease in most populations. There is much that can still be done to incorporate factors of social environments into genetic studies as well as to incorporate genetic measures into social epidemiological studies.
The relatively slow advancements and progress over the last two decades in the identification of specific genes and mutations that explain genetic susceptibility to common conditions is due to a variety of reasons.
One of the first reasons for these slow advancements is the fact that the diseases being studied tend to be complex in their etiology, meaning that different people in a population will develop diseases for different genetic or environmental reasons. A single genetic or environmental factor is expected to explain only a very small fraction of disease risk in a population.
Additionally, these factors are expected to interact, and other biological processes, such as epigenetic modifications, are likely to be contributors to the complex puzzle of susceptibility. An accurate phenotypic definition of disease and its subtypes is crucial to identifying and understanding the complexities of disease-specific genetic and environmental causes.
The second reason for these limited advancements is the fact that geneticists only recently have developed the knowledge base or methods needed to measure genetic variations. In addition, recent progress has been made regarding examining their metabolic consequences with sufficient ease and cost-effectiveness so that the large number of genes thought to be involved can be studied.
Since the completion of the Human Genome Project in 2003, many different scientific entities, like the Environmental Genome Project and the International HapMap Consortium, have been working to identify the mutational spectra in human populations. Genetic epidemiologists are just now beginning to understand the extensive nature of common variations within the human genome that could be affecting people’s risk of disease.
These initiatives have allowed SNP data to be generated and centrally located in a number of public databases, including the National Center for Biotechnology Information’s dbSNPs database, the National Cancer Institute’s CGAP Genetic Annotation Initiative SNP Database, and the Karolinska Institute Human Genic Bi-Allelic Sequences Database.
Currently, the largest dataset on human variation is being generated by the International HapMap Project, which is genotyping millions of SNPs on 270 individuals from 4 geographically separated sites from around the world. The International HapMap Project has greatly increased the number of validated SNPs available to the research community to be used to study human variation. It is also producing a map of genomic haplotypes in four populations with ancestry from parts of Africa, Asia, and Europe.
Additionally, high-throughput methods of genotyping thousands of SNPs in large epidemiological cohorts are only now becoming available. However, due to the high-throughput methods of measuring the environment has not kept a similar pace. For many studies of common disease, one of the common limits to increasing our understanding will continue to be the difficult and costly measurement of environmental factors.
Lastly, another reason that progress has been limited is due to the lack of adequate investments in developing new methods of analysis that can incorporate the high-dimensional biological reality that we can now measure. The complex genetic and environmental architecture of multifactorial diseases is not easily detected or deciphered using the traditional statistical modeling methods. These are focused on the estimation of a single overall model of disease for a population. For example, using traditional logistic regression methods it would be impossible to enter all the hundreds of genetic variations that are thought to be involved in CVD risk or in any of the other common disease complexes currently being studied.
In addition to the obvious issues of power and overdetermination in such a large-scale model, researchers also do not know how to model or interpret interactions among many factors simultaneously. It is also difficult to incorporate the rare, large effects of some genes relative to the common, small effects of others.
New modeling strategies like scale-free networks, Bayesian belief networks, random forest methods that take advantage of advances in pattern recognition, machine learning, and systems analysis are needed in order to build more comprehensive, predictive models of these etiologically heterogeneous diseases.
Similar to many other areas of study, the field of human genetics is in transition. There is still much to be gained by joining forces with a wide range of other disciplines that are focused on improving prevention and reducing the disease burden in our populations.