NHGRI logo

Mouse Genome Data Available in Public Databases

Coverage Now Exceeds Two-Thirds of Total Sequence Freely Available; Mouse Data Aids Study of Human Genes

February 2001

BETHESDA, Md. - A public-private effort to accelerate the sequencing of the mouse genome has exceeded its own goal of achieving 66 percent coverage of the genome just three months into the six-month project. At its current pace, the Mouse Sequencing Consortium (MSC) expects to reach its target of three-fold coverage by April of this year.

At the same time, collaborators in the MSC have extended the practice of making sequence data available for the free and unrestricted use of researchers worldwide. A new repository that contains not only the letters of the DNA sequence (as has been customary for previous large-scale sequencing projects), but also raw data, including actual "traces" from sequencing machines, has been established to make the information rapidly and freely available to the scientific community.

The Mouse Sequencing Consortium (MSC) - comprising three private companies, six institutes of the National Institutes of Health and the Wellcome Trust - was formed in October 2000 to work collaboratively to produce a draft sequence of the mouse genome in six months. The availability of these data is considered essential to the further understanding of the human genome.

"Unrestricted access to the mouse sequence should enhance efforts to identify causative genes in mouse models of diseases as well as identify human genes responsible for various disorders," says Arthur Holden, chairman of the MSC. "The rapid progress toward making these data widely available will in turn speed the search for new ways to treat or even prevent disease."

The MSC approach to sequencing the mouse genome takes advantage of the best features of the map-based shotgun and the whole genome shotgun strategies. Sequence data generated by the MSC are in short fragments (500 to 700 base pairs), and these so-called "raw reads" are now deposited weekly in new data repositories. The quality of the deposited data has been checked and found to be very good.

Sequences, quality scores, and traces from sequencing machines are accessible in databases maintained by the National Center for Biotechnology Information (NCBI), and the European Bioinformatics Institute (EBI) in a joint project with the Sanger Centre called Ensembl [ensembl.org]. At present, approximately 6.4 million traces from the MSC's whole genome shotgun sequencing effort have been deposited into the archives.

Researchers now have the opportunity, for example, to compare a sequence of interest against the available mouse traces in the archives using software programs such as "megaBLAST" [ncbi.nlm.nih.gov] and "SSAHA" [sanger.ac.uk]. Matching mouse traces can then be downloaded for further analysis.

Additionally, the EBI-Sanger Ensembl database provides direct views of homologies between the mouse traces and the human genome, which should facilitate interpretation of the human code (for example, mouse sequence matches to the human cystic fibrosis gene).

The draft sequence, when completed in April, will bring the amount of mouse sequence available to about 93 to 95 percent - albeit in small, unordered fragments. The National Human Genome Research Institute (NHGRI) will go on to complete the highly accurate, "finished" sequence of the mouse genome.

Why sequence the mouse genome?

With the working draft of the human genome sequence in hand, scientists in both industry and academia now seek to interpret its meaning.

Not only is the genome of the mouse about the same size as that of the human (approximately 3.1 billion base pairs), mice and humans share virtually the same set of genes. Thus, the DNA sequence of the mouse genome is an essential tool to identify and study the function of human genes.

For example, the gene sequences in mice and humans that encode proteins to carry out important biological functions - such as regulation of cell division, and development of major organ systems - are shared to a high degree (85 percent sequence identity). Thus, by comparing human and mouse genome sequences, the regions of high similarity are readily apparent and immediately identify protein coding regions and regulatory sequences.

In addition to its use to aid the interpretation of the human genome, the mouse genome sequence will increase the ability of scientists to use the mouse as a model system to study and understand human disease, and to develop and test new treatments in ways that can not easily be done with humans.

As recommended by scientists studying the mouse, the MSC effort is using a strain of mouse known as C57BL/6J, commonly called "Black 6."

About the Mouse Sequencing Consortium

The MSC is another example of an emerging model for supporting large-scale genomics research in which public and private sector entities join forces to produce publicly available data sets that are crucial for basic biomedical research.

The National Institutes of Health, the Wellcome Trust and three private companies formed the consortium to speed up the determination of the DNA sequence of the mouse genome. The MSC is co-chaired by Arthur Holden [Chairman and CEO, The SNP Consortium Ltd.] and Francis Collins, MD, PhD [Director, NHGRI]. The members of the Mouse Sequencing Consortium are GlaxoSmithKline, the Merck Genome Research Institute, Affymetrix, Inc., the Wellcome Trust, and six of the National Institutes, including the National Cancer Institute, the National Human Genome Research Institute, the National Institute on Deafness and Other Communication Disorders, the National Institute of Diabetes and Digestive and Kidney Disease, the National Institute of Neurological Disorders and Stroke, and the National Institute of Mental Health. Private sector participation in the MSC has been facilitated by the Foundation for the National Institutes of Health, Inc., a non-profit, charitable organization founded to support the NIH in its mission.

MSC funds are supporting mouse genome sequencing at three DNA sequencing laboratories: the Whitehead Institute for Biomedical Research in Cambridge, Mass., Washington University School of Medicine in St. Louis, and the Sanger Centre in the U.K.

Contact for the Consortium:

Mary Prescott
Phone: (312) 397-6604
E-mail: mprescott@bsmg.com

Mouse Sequencing Consortium Members Media Contacts:

GlaxoSmithKline
Graeme P. Holland
Phone: 44-12-7964-4269
E-mail: Graeme_P_Holland@sbphrd.com

Rick Koenig
Phone: (610) 270-5546
E-mail: Rick_M_Koenig@sbphrd.com

Merck Genome Research Institute
Andrea F. Kollath, DVM
Phone: (908) 423-6492
E-mail: andrea_kollath@merck.com

Affymetrix, Inc.
Anne Bowdidge
Phone: (408) 731-5925
E-mail: anne_bowdidge@affymetrix.com

National Cancer Institute
NCI Press Office
Phone: (301) 496-6641

National Human Genome Research Institute
Kathy Hudson, Ph.D.
Phone: (301) 402-0955
E-mail: hudsonk@exchange.nih.gov

National Institute on Deafness and other Communication Disorders
Marin Allen
Phone: (301) 496-7243
E-mail: marin_allen@nih.gov

National Institute of Diabetes and Digestive and Kidney Diseases
Joan Chamberlain
Phone: (301) 496-3583
E-mail: joan_chamberlain@nih.gov

National Institute of Mental Health
Marilyn Weeks
Phone: (301) 443-4536
E-mail: mweeks@nih.gov

National Institute of Neurological Disorders and Stroke
Margo Warren
Phone: (301) 496-5751
E-mail: mw76v@nih.gov

Wellcome Trust
Noorece Ahmed
44-20-7611-8540
E-mail: n.ahmed@wellcome.ac.uk

Genome Sequencing Center Media Contacts:

Whitehead Institute for Biomedical Research
Seema Kumar
Phone: (617) 258-6153
E-mail: kumar@wi.mit.edu

Washington University School of Medicine
Joni Westerhouse
Phone: (314) 286-0120
E-mail: joniw@medicine.wustl.edu

Sanger Centre
Don Powell
44-12-2349-4956
E-mail: don@sanger.ac.uk

Foundation for the National Institutes of Health, Inc.
Constance U. Battle, MD
Phone: (301) 402-5311
E-mail: cubattle@fnih.org

Other Contacts:

Arthur Holden
Phone: (847) 317-9230
E-mail: aholden@firstgenetic.net
 

Top of page

Last Updated: September 2006

Last updated: September 01, 2006