The study, by researchers at the National Human Genome Research Institute (NHGRI), part of NIH, and the United States Department of Agriculture (USDA)*, was published March 6, 2017, in Nature Genetics.
"We developed a new technique for reconstructing highly accurate reference genomes and applied it to the domestic goat-a genome we jokingly call the Greatest Of All Time," said Adam M. Phillippy, Ph.D., a researcher in NHGRI's Computational and Statistical Genomics Branch. Accurate reference genomes are important for understanding an organism's biology, for learning about the genetic causes of health and disease and, in animals, for making breeding decisions.
Dr. Phillippy heads the branch's Genome Informatics Section (GIS), which develops and openly releases on GitHub software to enable genomics research. "Now that we've proven these methods produce high-quality genome reconstructions, they can be applied to the study of genetic diseases in individual human genomes and other animals," he said.
DNA used in the study came from a goat named Papadum, a gentle and rare male descended from animals living on San Clemente Island off the coast of San Diego, California. Hundreds of years ago, whalers and merchant ships carried goats for food and allegedly dropped them off on the island before heading to harbor. Since the San Clemente goats were confined to a small island, they mated with each other and are closely related genetically. By starting with a genome limited by inbreeding, the researchers reasoned it would be easier to complete.
"We started with a goat genome because of the animals' importance as a food source in developing countries," said Timothy P. Smith, Ph.D., a molecular geneticist with the USDA Agricultural Research Service in Clay Center, Nebraska, who initiated the project. "A finished, accurate goat genome will eventually allow farmers to select and breed animals with essential traits such as high quality milk and meat and the ability to tolerate extreme environments." The goat genome, funded with a $140,000 grant from the United States Agency for International Development as part of its Feed the Future initiative, is freely available from the National Center for Biotechnology Information.
A goat reference genome had been attempted before, but like many genomes analyzed using past technologies, it was incomplete and highly fragmented. The problem is due, in part, to the massive size of vertebrate genomes. They cannot be sequenced from end to end in a single step. Instead, DNA must first be broken into smaller pieces, with each resulting piece subjected to chemical reactions that allow the identity and order of its base pairs to be deduced. Researchers then read many, shorter fragments (of a few hundred or thousand letters each) and "assemble" the genome, like a puzzle, from these smaller pieces. The gaps and errors in the genome must be corrected through a labor intensive and expensive finishing process.
To develop the goat reference genome, researchers used a variety of methods, including PacBio sequencing, which was released in 2011. Dr. Phillippy and Sergey Koren, Ph.D., senior GIS scientist, were early adopters of this technology, and developed important tools that could assemble thousands of very accurate maps, called contigs, for individual parts of the genome.
"It is like having a great road map of each town," Dr. Phillippy said. "But there's no map showing how all the towns are connected to one another."
To link all the smaller maps of the goat genome together into a full map of each chromosome, researchers used optical mapping to observe the structure of long, single strands of DNA, and the "Hi-C" chromatin conformation capture technique, which can reconstruct the thread-like structure of folded DNA in the nucleus of each cell.
"PacBio sequencing, optical mapping and Hi-C have all been used for genome assembly before, but we showed that by combining all of them together, we could get a very comprehensive and accurate map of the genome at an affordable price," Dr. Koren said. "Previous methods have left the chromosomes either incomplete or broken into many pieces, making it very expensive to finish."
In the past decade, scientists have partially decoded the genomes of important agricultural animals like pigs, cows and chickens, but gaps remained. Drs. Phillippy and Koren hope to use the assembly "recipe" demonstrated on the goat to reconstruct complete genomes for these and many other vertebrate species. In doing this, they will contribute to the Genome10K project, which plans to sequence the genomes of at least one individual from each vertebrate genus -- approximately 10,000 genomes -- to better understand genome function and evolution and assist in the conservation of endangered species.
In addition to working on the complete genomes of humans, pigs, cows, Dr. Phillippy and his team are working on complete reference genomes for the mosquitos that transmit the Zika virus and malaria. They are also investigating emerging "nanopore" genome sequencing methods for the real-time diagnosis of human disease and cancer.
"We will continue driving down the cost and driving up the quality of genome sequencing and assembly," Dr. Phillippy said.
Sadly, Dr. Phillippy said, Papadum, the goat who donated his DNA to the project, passed away in 2015, but his contribution will live on in the world of genomics research and, potentially, in goat herds around the world.
* The three co-first authors include: Derek Bickhart, Ph.D., (USDA), Sergey Koren, Ph.D., (NHGRI), and Benjamin Rosen, Ph.D., (USDA), and three co-corresponding authors, Adam Phillippy, Ph.D., (NHGRI), Curtis Van Tassell, Ph.D., (USDA) and Timothy Smith, Ph.D. (USDA).