Last updated: May 01, 2006
Executive Summary of the SNP Meeting
Executive Summary of the SNP Meeting
Pooks Hill Marriott
Bethesda, Md.
June 7-8, 1999
The single nucleotide polymorphisms (SNP) meeting brought together all the principal investigaors (PI) from the SNP RFA as well as the principals from the SNP Consortium (TSC), to discuss issues related to coordination, SNP quality, resources, and databases. The conclusion of the meeting was that for SNPs to be most useful, several questions need to be addressed and several additional resources need to be provided.
Major scientific questions related to genetic variation that need to be addressed
- Patterns of variation: How much variation and linkage disequilibrium (LD) exist, and how do they vary across the genome and by population? How do other factors affect patterns of variation?
- The number and frequency of SNPs needed: How many SNPs are needed to address various questions? What allele frequencies should they have?
- Comparative analyses: How can comparing patterns of variation within and among species, including other primates and mammals, be used to make inferences about function and selection?
- Function: How can variable sites be related to functional differences, particularly when there is LD across many sites? How important is it to focus on SNPs in functionally important regions?
Assessment of currently available information and resources
- Linkage disequilibrium: In about six months it would be useful to have a meeting to summarize what is known about linkage disequilibrium and to identify gaps in our knowledge.
- Population samples: It would be useful to find out what population samples being collected by the NIH are available, to see whether they would be informative for general population studies.
New resources needed
- Technology for genotyping: To use SNPs to relate genotypes to phenotypes will require much better technologies than currently exist for cheap and large-scale genotyping. More research on novel polymorphism genotyping technologies would be useful.
- Somatic cell hybrids of the DNA Polymorphism Discovery Resource lines, and immortalized complete hydatidiform moles: These lines are useful standards for detecting duplicate genes that are incorrectly assayed as SNPs, for defining haplotypes, and for technology development.
- Primate samples: Standard samples of some species and subspecies would be useful for figuring out which human SNP alleles are ancestral, and comparing variation within and among species.
- Human samples: Much more discussion is needed of the purposes and types of human samples. More research addressing ELSI issues in defined groups would be useful.
- Analytical tools for SNP data: Methods are not yet adequate to analyze the large amount of data soon to be produced. More research on complex trait and SNP analysis would be useful.
SNP quality
- Standards: A working group was set up to make recommendations about standard sets of samples, gene regions and methods to assess SNP quality.
Summary of the SNP Meeting
The SNP meeting brought together all the PIs from the SNP RFA as well as the principals from the SNP Consortium (TSC), to discuss issues related to coordination, SNP quality, resources and databases. The conclusion of the meeting was that for SNPs to be most useful, several questions need to be addressed and several additional resources need to be provided.
Major scientific questions related to genetic variation that need to be addressed
- Patterns of variation: How much variation and linkage disequilibrium (LD) exist, and how do they vary across the genome and by population? How do other factors such as recombination, gene duplication, mutation, gene conversion, selection, population structure, and migration history affect the amount and pattern of variation and LD? How does the age of an allele affect the LD around it? Can all common human haplotypes be discovered?
- The number and frequency of SNPs needed: How many SNPs are needed to address various questions? What allele frequencies should they have? Most SNPs are in all populations, and the common haplotypes are in many populations. One could choose SNPs of similar frequencies across populations to have markers that work for all populations, and see whether this set works for finding genes. On the other hand, population-specific SNPs are rare, so choosing markers based on uniform frequency may not be needed, and some variants more common in some populations may be important for diseases more common in those populations. cSNPs in candidate genes, of any frequency, are of particular interest. Are there special strategies for finding rare alleles in genes?
- Comparative analyses: How can comparing patterns of variation within and among species, including other primates and mammals, be used to make inferences about function, selection, and mutation? What sorts of questions can be addressed by species of differing phylogenetic relatedness?
- Function: How can variable sites be related to functional differences, particularly when there is LD across many sites? For studies relating genotype to phenotype, such as linkage analysis in families and association analysis in populations, what experimental designs under which conditions will provide the most statistical power? How important is it to focus on SNPs in functionally important regions?
The SNPs Needed for Various Questions | ||
---|---|---|
Question | Number Needed | Least Frequency of Minor Allele |
Linkage | 2-3K | 10-20% |
Loss of heterozygosity | 2-10K?? | 30-40% |
Whole-genome association | ||
Large number of founders | 300-500K ?? | 60-200K ?? |
Small number of founders | ||
Finding disease-associated alleles | Focus on genes | All frequencies |
Population studies |
Current SNP discovery
- Coordination of SNP discovery: Some duplication among groups is useful for validating the various methods. The PIs looking for SNPs in known genes (Lander, Chakravarti, Olson, Oefner) feel that informal discussion among themselves will suffice for preventing much overlap. The groups looking for SNPs in random genomic DNA (Cox, Oefner, TSC) are using different methods that do not lend themselves to useful coordination. Since some duplication is useful, and there are many SNPs, coordinating these groups does not seem needed now. It might be useful for SNP producers to have a non-public listing of what genome regions are being worked on.
- SNPs from large-scale sequencing: Many potential mapped SNPs are found when overlapping regions are sequenced from the same or different libraries. Some of the sequencing groups are mining potential SNPs this way. The information on overlapping sequences is not generally deposited in sequence databases and thus is not readily available to everybody. The meeting agreed that Pui-Yan Kwok should coordinate the discovery of SNPs from all large-scale sequence production. He is being provided with additional funding to develop the software to automate this process, coordinate with the various sequencing centers, examine the sequence data, and deposit the putative SNPs in the NCBI SNP database, dbSNP. Most of these SNP alleles will be common ones; most rare alleles will not be found by this method.
Issues that need to be addressed to use SNPs and understand their patterns
- Population studies: How do we design sets of samples to assess the amount of variation and LD, and how they vary by population? We could examine 20 regions of 30-50 kb in various populations, including sites with common and with rare minor alleles. Common alleles provide large heterozygosities, but the amount of LD around them may be small because they are generally old alleles. Rarer alleles provide less heterozygosity but may be associated with longer blocks of LD. We need to study well-defined small isolated populations and we need to deal with the complexity of large open populations such as that of the US. However the studies are designed, the same regions should be studied in the same populations, using the same individual samples. Standard samples will allow methods to be compared without having additional differences due to different sets of samples. Standard samples should be widely available, so cell lines will be needed.
- Quality standards for SNPs: Groups use various methods to confirm SNPs and check whether they arise spuriously from gene duplications. These methods include sequencing both DNA strands, detecting SNPs by another method, and genotyping SNPs in multiple individuals. Most SNPs are mapped, either by RH mapping or by looking for SNPs in mapped DNA regions. Many methods for SNP discovery can provide estimates of the error that a putative SNP is not a SNP. It would be useful to choose standard genomic regions in standard samples to compare methods of SNP discovery, and to have groups cross-check to confirm SNPs discovered by different methods and by different groups. A working group was set up to consider SNP quality assessment: Eric Lander (chair), Lisa Brooks, Aravinda Chakravarti, David Cox, Deborah Nickerson, Peter Oefner, Steve Sherry, and David Wang.
- Technology for genotyping: Current initiatives will discover hundreds of thousands of SNPs. Using these SNPs to relate genotype to phenotype will require much better technologies than currently exist for cheap and large-scale SNP genotyping. Different technologies may be most efficient for scoring many SNPs in a few individuals, a few SNPs in many individuals, or many SNPs in many individuals. More research on novel genotyping technologies for SNPs and other forms of polymorphism would be useful.
- SNP information in the database: The first information people want about variation is whether a particular gene has SNPs. The additional information from genotypes is useful for validating SNPs, for assessing allele frequency and Hardy-Weinberg fit, and for inferring haplotypes and linkage disequilibrium. Haplotype information is extremely informative for LD. As genotypes or haplotypes are generated for individuals they should be placed in the database. It is also useful to report regions that have been examined and found to be monomorphic. The database will have filters to allow researchers to choose SNPs of particular frequencies or degrees of validation, so all SNPs should be deposited in the database with information about validation.
- Analytical tools for SNP data: Soon there will be data on hundreds of thousands of SNPs in thousands of individuals typed for hundreds of phenotypes. The analytical tools do not yet exist to deal with this amount of data in a statistically rigorous way. Tools are needed to find associations among alleles at different SNPs and between phenotypes and genotypes. More research on complex trait and SNP analysis would be useful.
- ELSI issues related to group definitions: Some scientific questions have social and cultural implications that need to be considered when designing the research. A current ELSI RFA addresses issues related to genetic variation and populations; more research may be needed to help define how to deal with these issues. Discussions of genetic variation research will be needed with various communities.
Assessment of currently available information and resources
- Linkage disequilibrium: When more data are generated, in about six months, it would be useful to have a meeting to summarize what is known about linkage disequilibrium, including the questions that LD addresses and the advantages of various methods of analysis. The meeting would identify gaps in understanding patterns of variation and LD and help to design a framework for population sampling.
- Population samples: It would be useful to find out what population samples being collected by the NIH are available, under what consent conditions, to see whether they would be informative for general population studies. It may be possible to use controls from disease studies. Many samples for population studies are chosen for convenience rather than with strong scientific justification.
New resources needed
- Immortalized complete hydatidiform moles: Moles are useful for detecting duplicate genes that are incorrectly assayed as SNPs, for defining haplotypes, and for technology development. There have been some unsuccessful attempts to immortalize mole lines, but nobody has focused on doing it.
- Somatic cell hybrids: Duplicate loci can also be detected using somatic cell hybrids. David Cox plans to produce somatic cell hybrids of the first 24 samples of the DNA Polymorphism Discovery Resource, since so much SNP discovery uses these samples. He discussed creating about 50 hybrid cell lines for each of the first 24 DNA Polymorphism Discovery Resource lines.
- Samples of trios: Parent and offspring sets allow haplotypes to be obtained fairly directly, without much statistical inference, but require some redundant typing.
- Primate samples: Standard samples of some species and subspecies would be useful for figuring out which human SNP variants are ancestral and comparing the amount and pattern of variation within species to the divergence among species. These comparisons allow detection of functional constraints, selective sweeps of alleles to fixation, and maintenance of variation within species. A standard set of primate samples has the same advantages as standard human samples: different researchers can compare methods on the same samples, and information can accumulate on defined samples.
PARTICIPANTS
(Listed in alphabetical order from left to right)
Aravinda Chakravarti Case Western Reserve University 10900 Euclid Avenue Cleveland, OH 44106-4955 David Cox Department of Genetics School of Medicine Stanford University Stanford, CA 94305 Daniel Geraghty Fred Hutchinson Cancer Research Center 1100 Fairview Avenue N. D2-100 PO Box 19024 Seattle, WA 98109-1024 Pui-Yan Kwok Washington University Scbool of Medicine 660 S. Euclid Avenue Campus Box 8123 St. Louis, MO 63110 Charles Langley Center for Population Biology and Section of Evolution and Ecology University of California, Davis One Shields Avenue 2320 Storer Hall Davis, CA 95616 Jean McEwen Boston College Law School 885 Centre Street Newton, MA 02459 Richard Myers Department of Genetics Stanford University School of Medicine 300 Pasteur Drive Stanford, CA 94305-5120 John Nolan Los Alamos National Laboratory University of California PO Box 1663 Los Alamos, NM 87545 Maynard Olson Genome Center University of Washington 225 Fluke Hall on Mason Rd Seattle, WA 98195 Barbara Skene The Wellcome Trust 183 Euston Rd. London NW12BE, United Kingdom 01716118690 Carl Ton University of Washington School of Medicine 225 Fluke Hall on Mason Rd. Seattle, WA 98195 James Weber Marshfield Medical Research Foundation 1000 North Oak Avenue Marshfield, WI 54449 | Mark Chee Illumina, Inc. 9390 Towne Centre Drive, Suite 200 San Diego, CA 92121 Evan Eichler Case Western Reserve University 10900 Euclid Avenue Cleveland, OH 44106-3029 Arthur Holden TSC 8770 West Bryn Mawr Avenue Suite 1300 Chicago, IL 60631 Eric Lander Whitehead Institute Center for Genome Research One Kendall Square - Bldg. 300 Cambridge, MA 02139 Robert Lipshutz Affymetrix, Inc. 3380 Central Expressway Santa Clara, CA 95051 John McPherson Washington University School of Medicine 4444 Forest Park Blvd. St. Louis, MO 63108 Deborah Nickerson Department of Molecular Biotechnology University of Washington Box 357730 Seattle, WA 98195-7730 Peter Oefner Stanford Genome Center 855 California Avenue Palo Alto, CA 94304 Michael Silber Pfizer, Inc. Central Research Division Easter Point Rd. Groton, CT 06340 Lincoln Stein Cold Spring Harbor Laboratory One Bungtown Road Cold Spring Harbor, NY 11724 David Wang Bristol-Myers Squibb Pharmaceutical Research Institute PO Box 5400 Princeton, NJ 08543-5400 Robert Weiss Department of Human Genetics University of Utah 20 S. 2030 E, Room 308 Salt Lake City, UT 84112-9454 |
NIH Attendees: | |
Douglas Bell NIH/NIEHS 111 Alexander Drive Box 12233 Research Triangle Park, NC 27709 Lisa Brooks NIH/NHGRI Bldg. 38A, Room 614 38 Library Drive Bethesda, MD 20892 Jean Cahill NIH/NHGRI Bldg. 38A, Room 613 38 Library Drive Bethesda, MD 20892 Francis Collins Director NIH/NHGRI Bldg. 31, Room 4B09 31 Center Drive Bethesda, MD 20892 Camilla Day NIH/CSR 6701 Rockledge Drive Bethesda, MD 20892 Elise Feingold NIH/NHGRI Bldg. 38A, Room 614 38 Library Drive Bethesda, MD 20892 Maria Giovanni NIH/NEI Executive Plaza South Suite 350 6120 Executive Blvd. Bethesda, MD 20892 Mark Guyer NIH/NHGRI Bldg. 38A, Room 604 38 Library Drive Bethesda, MD 20892 Kathy Hudson NIH/NHGRI Bldg. 31, Room 4B09 31 Center Drive Bethesda, MD 20892 Elke Jordan NIH/NHGRI Bldg. 31, Room 4B09 31 Center Drive Bethesda, MD 20892 Rochelle Long NIH/NIGMS Natcher Bldg., Room 4AS49 45 Center Drive Bethesda, MD 20892 Karen Mohlke NIH/NHGRI Bldg. 9, Room 1W108 9000 Rockville Pike Bethesda, MD 20892 Susan Old NIH/NHLBI Two Rockledge Centre Suite 9150 6701 Rockledge Drive Bethesda, MD 20892 Jane Peterson NIH/NHGRI Bldg. 38A, Room 610 38 Library Drive Bethesda, MD 20892 Jerry Roberts NIH/NHGRI Bldg. 38A, Room 609 38A Library Drive Bethesda, MD 20892 James Selkirk NIH/NIEHS 111 Alexander Drive PO Box 12233 Research Triangle Park, NC 27709 Grace Shen NIH/NCI EPN, Room 501 6130 Executive Blvd. Bethesda, MD 20892 Kaisa Silander NIH/NHGRI Bldg. 9, Room 1W108 Bethesda, MD 20892 Judy Small NIH/NIDCR Natcher Bldg., Room 4AN-24J 45 Center Drive Bethesda, MD 20892 Elizabeth Thomson NIH/NHGRI Bldg. 38A, Room 617 38A Library Drive Bethesda, MD 20892 Jose Velazquez NIH/NIEHS PO Box 12233 Research Triangle Park, NC 27709 Sally York NIH/NHGRI Bldg. 38A, Room 613 38 Library Drive Bethesda, MD 20892 | Joy Boyer NIH/NHGRI Bldg. 38A, Room 617 38 Library Drive Bethesda, MD 20892 Ken Buetow NIH/NCI Bldg.9, 1N105 Bethesda, MD 20892 Peter Chines NIH/NHGRI Room 3As.43 45 Center Drive Bethesda, MD 20892 Yasmin Cypel NIH/NHGRI Bldg. 38A, Room 614 38 Library Drive Bethesda, MD 20892 Mike Erdos NIH/NHGRI 49 Convent Drive Bethesda, MD 20892 Adam Felsenfeld NIH/NHGRI Bldg. 38A, Room 614 38 Library Drive Bethesda, MD 20892 Bettie Graham NIH/NHGRI Bldg. 38A, Room 614 38 Library Drive Bethesda, MD 20892 Linda Hall NIH/NHGRI Bldg. 38A, Room 613 38 Library Drive Bethesda, MD 20892 Karin Jegalian NIH/NHGRI 31 Center Drive Bldg. 31, Room 4B09 Bethesda, MD 20892 Robert Karp NIH/NIAAA Willco Bldg. 6000 Executive Blvd. Suite 402 Bethesda, MD 20892 Stephen Mockrin NIH/NHLBI Two Rockledge Centre 6701 Rockledge Drive Bethesda, MD 20892 Ken Nakamura NIH/NHGRI Bldg. 38A, Room 609 38 Library Drive Bethesda, MD 20892 Diane Patterson NIH/NHGRI Bldg. 38A, Room 613 38 Library Drive Bethesda, MD 20892 Rudy Pozzatti NIH/NHGRI Bldg. 38A, Room 609 38 Library Drive Bethesda, MD 20892 Jeffery Schloss NIH/NHGRI Bldg. 38A, Room 610 38 Library Drive Bethesda, MD 20892 Vicki Seyfert NIH/NIAID Solar Bldg., 4A21 Bethesda, MD 20892 Steve Sherry NIH/NCBI Bldg. 38A, Room 8N805 38 Library Drive Bethesda, MD 20892 Karl Sirotkin NIH/NCBI Bldg. 38A, Room 8S810 38 Library Drive Bethesda, MD 20892 Rochelle Small NIH/NIDCD Executive Plaza South Bldg. 6120 Executive Blvd. Suite 400-C Bethesda, MD 20892 Marjorie Tingle NIH/NCRR One Rockledge Centre Room 6154 6705 Rockledge Drive Bethesda, MD 20892 Cathy Yarbrough NIH/NHGRI Bldg. 31, Room 4B09 31 Center Drive Bethesda, MD 20892 |