What about the human genome?

Nearly two decades ago, attention of the research community shifted towards deciphering the genetic code of the human body. In 1990, collaborative research teams at the National Institutes of Health (NIH) began work on an initiative known as the Human Genome Project. The Project’s primary goal was to determine the sequences of chemical base pairs that make up human DNA, and to better understand how genes formed from this DNA function from both a physical and functional standpoint. The Project was successful: a working draft of the human genome was released in 2000, and a complete version in 2003.

At the same time, the private company Celera Genomics produced an additional working draft of the human genome. This parallel project, led by biotechnologist J. Craig Venter, is described in Venter’s book A Life Decoded. The book is definitely worth reading.

Once these human genome maps were completed, many geneticists advocated what is known as the common disease-common variant hypothesis. It contends that alleles, or specific variations of human genes, might be detected that would correlate with the onset of a disease. In fact, in the years after the human genome was completed, the research community harbored great hope that identification of such variants might clarify the root causes of most chronic diseases. A great number of genome-wide association studies (GWAS) were initiated.

In 2001, the NIH was directed by Francis Collins. His statement was typical of the time: “It should be possible to identify disease gene associations for many common illnesses in the next 5 to 7 years.” In other words, researchers hoped that by dissecting the human genome, patients could be informed that they had “the gene” for breast cancer, sarcoidosis, rheumatoid arthritis, or any of the other inflammatory diagnoses. Targeted gene therapies could then be developed to effectively eradicate these conditions.

Unfortunately, the above promise remains to be realized. There have been few widely successful gene therapies to date, and genome-driven personalized medicine has yet to live up to its early promise. To identify what some researchers refer to as the “missing heritability,” geneticists have tried to see if groups of genetic variants acting together may contribute to disease. But few widely-used drugs or treatments have resulted from even these larger analyses. For example, specific inherited mutations in the genes BRCA1 and BRCA2 have been associated with a higher risk of several cancers. However, mutations for BRCA1 and BRCA2 together account for only about 5 to 10 percent of all breast cancers.

To be fair, researchers studying the human genome have identified millions of single nucleotide polymorphisms or SNPs. SNPs are believed to be mutations caused when a single base pair of human DNA is replaced by a different, possibly incorrect, base pair. However, the majority of SNPs cannot be consistently detected. Their incidence varies among individuals with different ethnicities and geographical locations. SNPs in the same individual have even been shown to vary based on the body tissue in which they are detected. For example, Gottlieb and team characterized sequence variations in the BAK1 gene of patients with abdominal aortic aneurysm. The team found that SNPs in BAK1 were different in aortic tissue than in blood samples taken from the same patients. The team concluded that, “Genome-wide association studies were introduced with enormous hype several years ago, and people expected tremendous breakthroughs. Unfortunately, the reality of these studies has been very disappointing, and our [own] discovery certainly could explain at least one of the reasons why.”

In my opinion, the inconsistencies described above, and the many problems associated with SNP-based research in general, stem from a failure on the part of most human geneticists to factor the existence of the human microbiome into their analyses. Most researchers looking for SNPs in patients with a given disease study the human genome in isolation. However, as described here, humans are actually controlled by a metagenome, or a combination of human genes and microbial genes acting in tandem. Because at least 90% of the cells in our bodies are microbial in origin, the contribution of microbial genes to the metagenome is huge. It is more likely then that disease stems from dysregulation of the metagenome as whole, rather than from problems associated solely with our human genes.

This leads to a major consideration. Many genetic samples assumed to be purely “human” may actually be contaminated by large amounts of microbial RNA from the human microbiome. Artist Pablo Picasso once remarked, “Computers are useless. They can only give you answers.” In other words, the data generated by a software program is only useful if it is interpreted correctly. If “human genome” samples are contaminated with microbial RNA, then the software programs charged with reading and interpreting these sequences may assemble them improperly. They may incorrectly interpret base pair alterations caused by the presence of microbial RNA as a SNP. Even a minute amount of microbial RNA contamination, just one or two base pairs of difference, is enough to cause very significant errors in this fashion. The fact that so many bacterial and human genes are similar in structure only increases the likelihood of these possible software errors.

Luckily, technologies exist that can remove microbial RNA from a “human genome” sample before analysis. As part of their Seed Project, Argonne National Laboratories has developed sophisticated algorithms capable of identifying and removing non-human RNA from a GWAS sample prior to assembly. Any research team can send their data to this group and have microbial sequences removed from their samples free of charge! Watch this talk by Trevor Marshall in order to better understand how to take advantage of this and other similar resources.