NHGRI logo

Why was it so difficult to fully complete the human genome sequence?

The Human Genome Project ended in 2003, but genomic researchers had not yet determined every last base (or letter) of the human genome sequence. Instead, they had only completed about 92% of the sequence at that time. Why did they stop there?

Telomere-to-telomere infographic

A woman sees an incomplete human genome sequence, with A, C, Ts and Gs, but is still missing some bases. She has doubts and says "Hmm, this doesn't look complete to me."

Reason 1

The human genome contains a massive amount of DNA.

The human genome consists of about 3 billion bases in a precise order, each of which can be represented by a letter (G, A, T or C). A genome's sequence cannot be read out end-to-end. Rather, researchers must first determine the sequence of random pieces of DNA and then use those smaller sequences to put the whole genome sequence back together like a massive puzzle.

Telomere-to-telomere infographic - Reason 1

A woman sees a map of the United States of America that shows the distance from Houston, Texas to Boston Massachusetts (1,604 miles). If you printed out ~3 billion letters of the human genome in size 12 font, it would stretch from Boston to Houston. Note: Only 20 letters of DNA sequence shown. To imagine what ~3 billion letters of the human genome sequence would look like, multiply this by 153 million! Road trip anyone?

Reason 2

Some parts of our DNA are painfully repetitive.

Some sections of the human genome sequence consist of long, repetitive stretches of letters that are difficult to put in the right place. Over the past two decades, researchers developed new technologies to read longer stretches of DNA - from only about 500 to now over 100,000 letters at a time - which allowed them to assemble the full length of the most difficult repeats.

 

Telomere-to-telomere infographic - Reason 2

The lady on the left is holding a short read sequence of a chromosome (Chromosome 1) and asks "Where does this go?" The lady on the right holding a long read sequence of a chromosome (Chromosome 2) and says, "Mine matches with #1!"

Reason 3

The first 92% was hard. The last 8% was excruciating.

Those DNA repeats and other obstacles stood between the genomic researchers and the final 8% of the human genome sequence until new laboratory and computational technologies were developed. It took almost twice as long to finish the last 8% of the human genome as it did the first 92%!

Telomere-to-telomere infographic - Reason 3

A graph shows the percent of human genome sequence released, with steep increase from late 1990s through early 2000s. The X-axis represents years: 1990 to 2000 to 2010 to 2020. The Y-axis represents percentage from 0% to 100%. In the late 1990s, approximately 10% of the human genome sequence was released. From the early 2000s through 2010s, 92% of the human genome sequence was released. From the late 2010s through 2020, the remaining 8% of the human genome sequence was released. To the left of the graph, a scientist celebrates with her hands in the air, saying "Phew, we did it!"

Reason 4

The last 8% needed a generation of dedicated genomic researchers with a vision.

Even with new technologies, genome sequencing is still tough, time-consuming work that requires a lot of skill and dedication. The current generation of genomic researchers are true perfectionists and brought everything together to finally complete the human genome sequence.

Telomere-to-telomere infographic - Reason 4

A jigsaw puzzle consists of 4 pieces with the words (clockwise, from left to right): Complexity, Technology, Cost and Patience. Specific messages accompany each puzzle piece: The lady who sits on the Complexity puzzle piece is working on her laptop with the message above her saying, "Be strong, little computer!" A gentleman attempts to join Technology puzzle piece with the Complexity puzzle piece(to the left) and Cost (below), with the message behind him saying, "These new methods are so powerful!" Another gentleman below him attempts to join the Cost puzzle piece with Patience puzzle piece (to the left), with the message behind him saying, "It's much cheaper to sequence DNA now!" And finally, another lady holds the Patience puzzle piece in place, with a message above her saying, "This is tedious!"

Last updated: August 10, 2021