Why do we need a new human pangenome reference?
The original human genome reference sequence is outdated.
The original human genome reference sequence was generated by the Human Genome Project in 2003. While this reference sequence has been regularly updated as researchers fixed errors and filled in missing regions of the genome, it only reflected data generated from about 20 people. Most of that first human genome reference sequence was just from one person.
The previous human genome reference sequence has missing pieces (i.e., gaps).
The previous human genome reference sequence is only 92% complete, with an estimated 8% of the human genome missing because of gaps in the sequence. Recently, new types of DNA sequencing technologies have helped researchers read longer stretches of DNA at a time, allowing them to fill in missing sequences within those gaps, especially in areas that were repetitive and harder to read. The new human pangenome reference is more comprehensive and incorporates the missing 8% of the human genome sequence, adding over 100 million new bases.
A human pangenome reference better reflects human diversity.
The new human pangenome reference includes genomic data from 47 people who are collectively more globally diverse. Researchers expect that number to reach 350 people by 2024. A human pangenome reference that better reflects genomic variation across all human populations will help ensure that it is beneficial for everyone and that genomics will advance in an equitable way.
The new human pangenome reference can help researchers make more genomic discoveries.
The new human pangenome reference will help researchers analyze human genome sequences and understand how genomic variants influence human health. Using the human pangenome reference, researchers are already finding genomic variants that were not previously identified using the older human genome reference sequences.