1 International Journal of Plant Genomics 2012 Vol: 2012():. DOI: 10.1155/2012/831460

SNP Discovery through Next-Generation Sequencing and Its Applications

The decreasing cost along with rapid progress in next-generation sequencing and related bioinformatics computing resources has facilitated large-scale discovery of SNPs in various model and nonmodel plant species. Large numbers and genome-wide availability of SNPs make them the marker of choice in partially or completely sequenced genomes. Although excellent reviews have been published on next-generation sequencing, its associated bioinformatics challenges, and the applications of SNPs in genetic studies, a comprehensive review connecting these three intertwined research areas is needed. This paper touches upon various aspects of SNP discovery, highlighting key points in availability and selection of appropriate sequencing platforms, bioinformatics pipelines, SNP filtering criteria, and applications of SNPs in genetic analyses. The use of next-generation sequencing methodologies in many non-model crops leading to discovery and implementation of SNPs in various genetic studies is discussed. Development and improvement of bioinformatics software that are open source and freely available have accelerated the SNP discovery while reducing the associated cost. Key considerations for SNP filtering and associated pipelines are discussed in specific topics. A list of commonly used software and their sources is compiled for easy access and reference.

Mentions
Figures
Figure 1: Graphical user interface of Tablet, an assembly visualization program, displays the reference genome on top and the mapped reads with color-coded SNPs on the bottom. Figure 2: Validation of a T/C SNP by a KASPar assay (KBiosciences, Herts, England). Genotypes with a “T” are represented by black dots with a white cross clustered in the upper left and those with a “C” by white dots with a black cross in the bottom right cluster. The two black dots near the bottom left are negative controls. No heterozygous individuals were present in this population.
Altmetric
References
  1. K. A. Frazer, D. G. Ballinger, D. R. Cox et al., “A second generation human haplotype map of over 3.1 million SNPs,” Nature, vol. 449, no. 7164, pp. 851–861 , (2007) .
    • . . . The applications of SNP markers have clearly been demonstrated in human genomics where complete sequencing of the human genome led to the discovery of several million SNPs [1] and technologies to analyze large sets of SNPs (up to 1 million) have been developed . . .
  2. C. H. Brenner and B. S. Weir, “Issues and strategies in the DNA identification of World Trade Center victims,” Theoretical Population Biology, vol. 63, no. 3, pp. 173–178 , (2003) .
    • . . . SNPs have been applied in areas as diverse as human forensics [2] and diagnostics [3], aquaculture [4], marker assisted-breeding of dairy cattle [5], crop improvement [6], conservation [7], and resource management in fisheries [8] . . .
  3. M. I. McCarthy, G. R. Abecasis, L. R. Cardon et al., “Genome-wide association studies for complex traits: consensus, uncertainty and challenges,” Nature Reviews Genetics, vol. 9, no. 5, pp. 356–369 , (2008) .
    • . . . SNPs have been applied in areas as diverse as human forensics [2] and diagnostics [3], aquaculture [4], marker assisted-breeding of dairy cattle [5], crop improvement [6], conservation [7], and resource management in fisheries [8] . . .
  4. Z. J. Liu and J. F. Cordes, “DNA marker technologies and their applications in aquaculture genetics,” Aquaculture, vol. 238, no. 1–4, pp. 1–37 , (2004) .
    • . . . SNPs have been applied in areas as diverse as human forensics [2] and diagnostics [3], aquaculture [4], marker assisted-breeding of dairy cattle [5], crop improvement [6], conservation [7], and resource management in fisheries [8] . . .
  5. L. R. Schaeffer, “Strategy for applying genome-wide selection in dairy cattle,” Journal of Animal Breeding and Genetics, vol. 123, no. 4, pp. 218–223 , (2006) .
    • . . . SNPs have been applied in areas as diverse as human forensics [2] and diagnostics [3], aquaculture [4], marker assisted-breeding of dairy cattle [5], crop improvement [6], conservation [7], and resource management in fisheries [8] . . .
  6. H. Yu, W. Xie, J. Wang et al., “Gains in QTL detection using an ultra-high density SNP map based on population sequencing relative to traditional RFLP/SSR markers,” PLoS ONE, vol. 6, no. 3, Article ID e17595 , (2011) .
    • . . . SNPs have been applied in areas as diverse as human forensics [2] and diagnostics [3], aquaculture [4], marker assisted-breeding of dairy cattle [5], crop improvement [6], conservation [7], and resource management in fisheries [8] . . .
  7. J. M. Seddon, H. G. Parker, E. A. Ostrander, and H. Ellegren, “SNPs in ecological and conservation studies: a test in the Scandinavian wolf population,” Molecular Ecology, vol. 14, no. 2, pp. 503–511 , (2005) .
    • . . . SNPs have been applied in areas as diverse as human forensics [2] and diagnostics [3], aquaculture [4], marker assisted-breeding of dairy cattle [5], crop improvement [6], conservation [7], and resource management in fisheries [8] . . .
  8. C. T. Smith, C. M. Elfstrom, L. W. Seeb, and J. E. Seeb, “Use of sequence data from rainbow trout and Atlantic salmon for SNP detection in Pacific salmon,” Molecular Ecology, vol. 14, no. 13, pp. 4193–4203 , (2005) .
    • . . . SNPs have been applied in areas as diverse as human forensics [2] and diagnostics [3], aquaculture [4], marker assisted-breeding of dairy cattle [5], crop improvement [6], conservation [7], and resource management in fisheries [8] . . .
  9. B. N. Chorley, X. Wang, M. R. Campbell, G. S. Pittman, M. A. Noureddine, and D. A. Bell, “Discovery and verification of functional single nucleotide polymorphisms in regulatory genomic regions: current and developing technologies,” Mutation Research, vol. 659, no. 1-2, pp. 147–157 , (2008) .
    • . . . Functional genomic studies have capitalized upon SNPs located within regulatory genes, transcripts, and Expressed Sequence Tags (ESTs) [9, 10] . . .
  10. K. Faber, K. H. Glatting, P. J. Mueller, A. Risch, and A. Hotz-Wagenblatt, “Genome-wide prediction of splice-modifying SNPs in human genes using a new analysis pipeline called AASsites,” BMC Bioinformatics, vol. 12, supplement 4, article S2 , (2011) .
    • . . . Functional genomic studies have capitalized upon SNPs located within regulatory genes, transcripts, and Expressed Sequence Tags (ESTs) [9, 10] . . .
  11. S. Atwell, Y. S. Huang, B. J. Vilhjálmsson et al., “Genome-wide association study of 107 phenotypes in Arabidopsis thaliana inbred lines,” Nature, vol. 465, no. 7298, pp. 627–631 , (2010) .
    • . . . Until recently large scale SNP discovery in plants was limited to maize, Arabidopsis, and rice [11–15] . . .
  12. W. B. Barbazuk, S. J. Emrich, H. D. Chen, L. Li, and P. S. Schnable, “SNP discovery via 454 transcriptome sequencing,” Plant Journal, vol. 51, no. 5, pp. 910–918 , (2007) .
  13. A. Ching, K. S. Caldwell, M. Jung et al., “SNP frequency, haplotype structure and linkage disequilibrium in elite maize inbred lines,” BMC Genetics, vol. 3, article 19 , (2002) .
  14. T. J. Close, P. R. Bhat, S. Lonardi et al., “Development and implementation of high-throughput SNP genotyping in barley,” BMC Genomics, vol. 10, article 582 , (2009) .
    • . . . NGS-derived SNPs have been reported in humans [79], Drosophila [80], wheat [81, 82], eggplant [83], rice [84–86], Arabidopsis [87, 88], barley [14, 89], sorghum [90], cotton [91], common beans [78], soybean [92], potato [93], flax [94], Aegilops tauschii [95], alfalfa [96], oat [97], and maize [98] to name a few. . . .
    • . . . Although SNP discovery in complex genomes without a reference genome such as wheat [81, 82], barley [14, 89], oat [97], and beans [78] can be achieved through NGS, several challenges remain in other nonmodel but economically important crops . . .
  15. X. Xu, X. Liu, S. Ge et al., “Resequencing 50 accessions of cultivated and wild rice yields markers for identifying agronomically important genes,” Nature Biotechnology, vol. 30, no. 1, pp. 105–111 , (2012) .
    • . . . Until recently large scale SNP discovery in plants was limited to maize, Arabidopsis, and rice [11–15] . . .
  16. S. Kaul, H. L. Koo, J. Jenkins et al., “Analysis of the genome sequence of the flowering plant Arabidopsis thaliana,” Nature, vol. 408, no. 6814, pp. 796–815 , (2000) .
    • . . . Arabidopsis thaliana was the first plant genome sequenced [16] followed soon after by rice [17, 18] . . .
    • . . . The genomes of Arabidopsis thaliana [16], rice [67], and maize [68] were generated using a BAC-by-BAC approach while poplar [69], grape [70], and sorghum [71] genomic sequences were obtained through WGS . . .
  17. S. A. Goff, D. Ricke, T. H. Lan et al., “A draft sequence of the rice genome (Oryza sativa L. ssp. japonica),” Science, vol. 296, no. 5565, pp. 92–100 , (2002) .
    • . . . Arabidopsis thaliana was the first plant genome sequenced [16] followed soon after by rice [17, 18] . . .
  18. J. Yu, S. Hu, J. Wang et al., “A draft sequence of the rice genome (Oryza sativa L. ssp. indica),” Science, vol. 296, no. 5565, pp. 79–92 , (2002) .
    • . . . Arabidopsis thaliana was the first plant genome sequenced [16] followed soon after by rice [17, 18] . . .
  19. J. A. Shendure, G. J. Porreca, and G. M. Church, “Overview of DNA sequencing strategies,” Current Protocols in Molecular Biology, chapter 7, no. 81, pp. 7.1.1–7.1.11 , (2008) .
    • . . . Tremendous improvements in sequencing have led to the generation of large amounts of DNA information in a very short period of time [19] . . .
    • . . . In contrast, the Sanger method, for which Frederick Sanger was awarded his second Nobel Prize in chemistry in 1980, was quickly adopted by the biotechnology industry which implemented it using a broad array of chemistries and detection methods [19]. . . .
    • . . . After image capture, the fluorescent tag is removed and new set of oligonucleotides are injected into the flow cell to begin the next round of DNA ligation [19] . . .
    • . . . The advantages offered by TGS technology are (i) lower cost, (ii) high throughput, (iii) faster turnaround, and (iv) longer reads [19, 29] . . .
  20. F. Sanger, S. Nicklen, and A. R. Coulson, “DNA sequencing with chain-terminating inhibitors,” Proceedings of the National Academy of Sciences of the United States of America, vol. 74, no. 12, pp. 5463–5467 , (1977) .
    • . . . The Sanger method is a sequencing-by-synthesis (SBS) method that relies on a combination of deoxy- and dideoxy-labeled chain terminator nucleotides [20] . . .
  21. F. Sanger, G. M. Air, B. G. Barrell, et al., “Nucleotide sequence of bacteriophage phiX174 DNA,” Nature, vol. 265, no. 5596, pp. 687–695 , (1977) .
    • . . . The first complete genome sequencing, that of bacteriophage phi X174, was achieved that same year using this pioneering method [21] . . .
  22. A. M. Maxam and W. Gilbert, “A new method for sequencing DNA,” Proceedings of the National Academy of Sciences of the United States of America, vol. 74, no. 2, pp. 560–564 , (1977) .
    • . . . The chemical modification followed by cleavage at specific sites method also published in 1977 [22] quickly became the less favored of the two methods because of its technical complexities, use of hazardous chemicals, and inherent difficulty in scale-up . . .
  23. M. Kircher and J. Kelso, “High-throughput DNA sequencing—concepts and limitations,” BioEssays, vol. 32, no. 6, pp. 524–536 , (2010) .
    • . . . In the last decade, new sequencing technologies have outperformed Sanger-based sequencing in throughput and overall cost, if not quite in sequence length and error rate [23] . . .
  24. M. Ronaghi, M. Uhlén, and P. Nyrén, “A sequencing method based on real-time pyrophosphate,” Science, vol. 281, no. 5375, pp. 363–365 , (1998) .
    • . . . Pyrosequencing was the first of the new highly parallel sequencing technologies to reach the market [24] . . .
  25. M. Ronaghi, S. Karamohamed, B. Pettersson, M. Uhlén, and P. Nyrén, “Real-time DNA sequencing using detection of pyrophosphate release,” Analytical Biochemistry, vol. 242, no. 1, pp. 84–89 , (1996) .
    • . . . This pyrophosphate is detected by an enzymatic luminometric inorganic pyrophosphate detection assay (ELIDA) through the generation of a light signal following the conversion of PPi into ATP [25] . . .
  26. T. C. Glenn, “Field guide to next-generation DNA sequencers,” Molecular Ecology Resources, vol. 11, no. 5, pp. 759–769 , (2011) .
    • . . . Reagent costs are approximately $6,200 per run [26]. . . .
    • . . . Reagent costs are approximately $23,500 per run [26]. . . .
    • . . . This sequencing-by-ligation method in SOLiD-5500x1 platform generates up to 1,410 million PE reads of  nt each with an error rate of 0.01% and reagent cost of approximately $10,500 per run [26]. . . .
  27. G. Turcatti, A. Romieu, M. Fedurco, and A. P. Tairi, “A new class of cleavable fluorescent nucleotides: synthesis and optimization as reversible terminators for DNA sequencing by synthesis,” Nucleic Acids Research, vol. 36, no. 4, article e25 , (2008) .
    • . . . The molecules are sequenced by flooding the flow cell with a new class of cleavable fluorescent nucleotides and the reagents necessary for DNA polymerization [27] . . .
  28. J. Shendure, G. J. Porreca, N. B. Reppas et al., “Molecular biology: accurate multiplex polony sequencing of an evolved bacterial genome,” Science, vol. 309, no. 5741, pp. 1728–1732 , (2005) .
    • . . . The SOLiD system was jointly developed by the Harvard Medical School and the Howard Hughes Medical Institute [28] . . .
  29. E. E. Schadt, S. Turner, and A. Kasarskis, “A window into third-generation sequencing,” Human Molecular Genetics, vol. 19, no. 2, pp. R227–R240 , (2010) .
    • . . . The advantages offered by TGS technology are (i) lower cost, (ii) high throughput, (iii) faster turnaround, and (iv) longer reads [19, 29] . . .
    • . . . The first (Sanger) and the second (next) generation sequencing technologies have enabled researchers to characterize DNA sequence variation, sequence entire genomes, quantify transcript abundance, and understand mechanisms such as alternative splicing and epigenetic regulation [29]. . . .
  30. T. D. Harris, P. R. Buzby, H. Babcock et al., “Single-molecule DNA sequencing of a viral genome,” Science, vol. 320, no. 5872, pp. 106–109 , (2008) .
    • . . . The fluorescent tag on the incorporated nucleotide is then chemically cleaved to allow subsequent elongation of DNA [30] . . .
  31. C. S. Pareek, R. Smoczynski, and A. Tretyn, “Sequencing technologies and genome sequencing,” Journal of Applied Genetics, vol. 52, no. 4, pp. 413–435 , (2011) .
    • . . . Heliscope sequencers can generate up to 28 GB of sequence data per run (50 channels) with maximum read length of 55 bp at 99% accuracy [31] . . .
    • . . . Among the TGS technologies, Pacific Biosciences SMART and Heliscope tSMS have been used in characterizing bacterial genomes and in human-disease-related studies [31]; however, TGS has yet to be capitalized upon in plant genomes . . .
  32. J. Eid, A. Fehr, J. Gray et al., “Real-time DNA sequencing from single polymerase molecules,” Science, vol. 323, no. 5910, pp. 133–138 , (2009) .
    • . . . A laser placed below the ZMW excites only the fluorophores of the incorporated nucleotides as the ZMW entraps the light and does not allow it to reach the unincorporated nucleotides above [32] . . .
  33. S. Koren, M. C. Schatz, B. P. Walenz et al., “Hybrid error correction and de novo assembly of single-molecule sequencing reads,” Nature Biotechnology, vol. 30, no. 7, pp. 693–700 , (2012) .
    • . . . However, their long reads offer a definite advantage to fill gaps in genomic sequences and, at least in bacterial genomes, NGS reads have proven capable of “correcting” the base call errors of this TGS technology [33–36] . . .
  34. F. Ribeiro, D. Przybylski, S. Yin, et al., “Finished bacterial genomes from shotgun sequence data,” Genome Research. In press , .
  35. A. Bashir, A. A. Klammer, W. P. Robins et al., “A hybrid approach for the automated finishing of bacterial genomes,” Nature Biotechnology, vol. 30, no. 7, pp. 701–707 , (2012) .
  36. X. Zhang, K. W. Davenport, W. Gu et al., “Improving genome assemblies by sequencing PCR products with PacBio,” BioTechniques, vol. 53, no. 1, pp. 61–62 , (2012) .
    • . . . However, their long reads offer a definite advantage to fill gaps in genomic sequences and, at least in bacterial genomes, NGS reads have proven capable of “correcting” the base call errors of this TGS technology [33–36] . . .
  37. P. Kothiyal, S. Cox, J. Ebert, B. J. Aronow, J. H. Greinwald, and H. L. Rehm, “An overview of custom array sequencing,” Current Protocols in Human Genetics, no. 61, chapter 7, pp. 7.17.1–17.17.11 , (2009) .
    • . . . The pre- and postprocessing protocols such as library construction [37] and pipeline development and implementation for data analysis [38] are also important. . . .
  38. J. D. McPherson, “Next-generation gap,” Nature Methods, vol. 6, no. 11, supplement, pp. S2–S5 , (2009) .
    • . . . The pre- and postprocessing protocols such as library construction [37] and pipeline development and implementation for data analysis [38] are also important. . . .
  39. A. P. M. Weber, K. L. Weber, K. Carr, C. Wilkerson, and J. B. Ohlrogge, “Sampling the arabidopsis transcriptome with massively parallel pyrosequencing,” Plant Physiology, vol. 144, no. 1, pp. 32–42 , (2007) .
    • . . . RNA sequencing has been performed on a number of plant species including Arabidopsis [39], soybean [40], rice [41], and maize [42] for transcript profiling and detection of splice variants . . .
  40. M. Libault, A. Farmer, T. Joshi et al., “An integrated transcriptome atlas of the crop model Glycine max, and its use in comparative analyses in plants,” Plant Journal, vol. 63, no. 1, pp. 86–99 , (2010) .
    • . . . RNA sequencing has been performed on a number of plant species including Arabidopsis [39], soybean [40], rice [41], and maize [42] for transcript profiling and detection of splice variants . . .
  41. T. Lu, G. Lu, D. Fan et al., “Function annotation of the rice transcriptome at single-nucleotide resolution by RNA-seq,” Genome Research, vol. 20, no. 9, pp. 1238–1249 , (2010) .
    • . . . RNA sequencing has been performed on a number of plant species including Arabidopsis [39], soybean [40], rice [41], and maize [42] for transcript profiling and detection of splice variants . . .
  42. W. B. Barbazuk, S. Emrich, and P. S. Schnable, “SNP mining from maize 454 EST sequences,” Cold Spring Harbor Protocols. In press , .
    • . . . RNA sequencing has been performed on a number of plant species including Arabidopsis [39], soybean [40], rice [41], and maize [42] for transcript profiling and detection of splice variants . . .
  43. E. Novaes, D. R. Drost, W. G. Farmerie et al., “High-throughput gene and SNP discovery in Eucalyptus grandis, an uncharacterized genome,” BMC Genomics, vol. 9, article 312 , (2008) .
    • . . . RNA sequencing has been used in de novo assemblies followed by SNP discovery performed in nonmodel plants such as Eucalyptus grandis [43], Brassica napus [44], and Medicago sativa [45]. . . .
  44. M. Trick, Y. Long, J. Meng, and I. Bancroft, “Single nucleotide polymorphism (SNP) discovery in the polyploid Brassica napus using Solexa transcriptome sequencing,” Plant Biotechnology Journal, vol. 7, no. 4, pp. 334–346 , (2009) .
    • . . . RNA sequencing has been used in de novo assemblies followed by SNP discovery performed in nonmodel plants such as Eucalyptus grandis [43], Brassica napus [44], and Medicago sativa [45]. . . .
  45. S. S. Yang, Z. J. Tu, F. Cheung et al., “Using RNA-Seq for gene identification, polymorphism detection and transcript profiling in two alfalfa genotypes with divergent cell wall composition in stems,” BMC Genomics, vol. 12, no. 1, article 199 , (2011) .
    • . . . RNA sequencing has been used in de novo assemblies followed by SNP discovery performed in nonmodel plants such as Eucalyptus grandis [43], Brassica napus [44], and Medicago sativa [45]. . . .
  46. F. Ozsolak, D. T. Ting, B. S. Wittner et al., “Amplification-free digital gene expression profiling from minute cell quantities,” Nature Methods, vol. 7, no. 8, pp. 619–621 , (2010) .
    • . . . RNA deep-sequencing technologies such as digital gene expression [46] and Illumina RNASeq [47] are both qualitative and quantitative in nature and permit the identification of rare transcripts and splice variants [48] . . .
  47. Z. Wang, M. Gerstein, and M. Snyder, “RNA-Seq: a revolutionary tool for transcriptomics,” Nature Reviews Genetics, vol. 10, no. 1, pp. 57–63 , (2009) .
    • . . . RNA deep-sequencing technologies such as digital gene expression [46] and Illumina RNASeq [47] are both qualitative and quantitative in nature and permit the identification of rare transcripts and splice variants [48] . . .
  48. H. Xu, Y. Gao, and J. Wang, “Transcriptomic analysis of rice (Oryza sativa) developing embryos using the RNA-Seq technique,” PLoS ONE, vol. 7, no. 2, Article ID e30646 , (2012) .
    • . . . RNA deep-sequencing technologies such as digital gene expression [46] and Illumina RNASeq [47] are both qualitative and quantitative in nature and permit the identification of rare transcripts and splice variants [48] . . .
  49. J. D. Roberts, B. D. Preston, L. A. Johnston, A. Soni, L. A. Loeb, and T. A. Kunkel, “Fidelity of two retroviral reverse transcriptases during DNA-dependent DNA synthesis in vitro,” Molecular and Cellular Biology, vol. 9, no. 2, pp. 469–476 , (1989) .
    • . . . This method is, however, prone to error due to (i) the inefficient nature of reverse transcriptases (RTs) [49], (ii) DNA-dependent DNA polymerase activity of RT causing spurious second strand DNA [50], and (iii) artifactual cDNA synthesis due to template switching [51] . . .
  50. U. Gubler, “Second-strand cDNA synthesis: mRNA fragments as primers,” Methods in Enzymology, vol. 152, pp. 330–335 , (1987) .
    • . . . This method is, however, prone to error due to (i) the inefficient nature of reverse transcriptases (RTs) [49], (ii) DNA-dependent DNA polymerase activity of RT causing spurious second strand DNA [50], and (iii) artifactual cDNA synthesis due to template switching [51] . . .
  51. J. Cocquet, A. Chong, G. Zhang, and R. A. Veitia, “Reverse transcriptase template switching and false alternative transcripts,” Genomics, vol. 88, no. 1, pp. 127–131 , (2006) .
    • . . . This method is, however, prone to error due to (i) the inefficient nature of reverse transcriptases (RTs) [49], (ii) DNA-dependent DNA polymerase activity of RT causing spurious second strand DNA [50], and (iii) artifactual cDNA synthesis due to template switching [51] . . .
  52. F. Ozsolak, A. R. Platt, D. R. Jones et al., “Direct RNA sequencing,” Nature, vol. 461, no. 7265, pp. 814–818 , (2009) .
    • . . . Direct RNA sequencing (DRS) developed by Helicos Biosciences Corporation is a high throughput and cost-effective method which eliminates the need for cDNA synthesis and ligation/amplification leading to improved accuracy [52]. . . .
  53. M. J. Solomon, P. L. Larsen, and A. Varshavsky, “Mapping protein-DNA interactions in vivo with formaldehyde: evidence that histone H4 is retained on a highly transcribed gene,” Cell, vol. 53, no. 6, pp. 937–947 , (1988) .
    • . . . Chromatin immunoprecipitation (ChIP) is a specialized sequencing method that was specifically designed to identify DNA sequences involved in in vivo protein DNA interaction [53] . . .
  54. T. S. Mikkelsen, M. Ku, D. B. Jaffe et al., “Genome-wide maps of chromatin state in pluripotent and lineage-committed cells,” Nature, vol. 448, no. 7153, pp. 553–560 , (2007) .
    • . . . Deep sequence coverage leading to dense SNP maps permits the identification of transcription factor binding sites and histone-mediated epigenetic modifications [54] . . .
  55. P. Ng, J. J. Tan, H. S. Ooi et al., “Multiplex sequencing of paired-end ditags (MS-PET): a strategy for the ultra-high-throughput analysis of transcriptomes and genomes,” Nucleic Acids Research, vol. 34, no. 12, p. e84 , (2006) .
    • . . . ChIP-Seq can be performed on serial analysis of gene expression (SAGE) tags or PE using Sanger, 454, and Illumina platforms [55, 56]. . . .
  56. G. Robertson, M. Hirst, M. Bainbridge et al., “Genome-wide profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencing,” Nature Methods, vol. 4, no. 8, pp. 651–657 , (2007) .
    • . . . ChIP-Seq can be performed on serial analysis of gene expression (SAGE) tags or PE using Sanger, 454, and Illumina platforms [55, 56]. . . .
  57. B. Giardine, C. Riemer, R. C. Hardison et al., “Galaxy: a platform for interactive large-scale genome analysis,” Genome Research, vol. 15, no. 10, pp. 1451–1455 , (2005) .
    • . . . Web-based portals such as Galaxy [57] are tailored to a multitude of analyses, but the requirement to transfer multigigabyte sequence files across the internet can limit its usability to smaller datasets . . .
  58. W. Wang, Z. Wei, T.-W. Lam, and J. Wang, “Next generation sequencing has lower sequence coverage and poorer SNP-detection capability in the regulatory regions,” Scientific Reports, vol. 1, article 55 , (2011) .
    • . . . A recent review by Wang et al. [58] recommends Linux-based programs because they are often free, not specific to any sequencing platform, and less computing power hungry and, as a consequence, tend to perform faster . . .
    • . . . For example, in aligning 2 × 13,326,195 paired-end reads (76 bp) from The Cancer Genome Atlas project (SRR018643) [64], SHRiMP [65] took 1,065 hrs with a peak memory footprint of 12 gigabytes to achieve the mapping of 81% of the reads to the human genome reference whereas Bowtie used 2.9 gigabytes of memory, a run time of 2.2 hrs but only achieved a 67% mapping rate [58] . . .
  59. B. Langmead, C. Trapnell, M. Pop, and S. L. Salzberg, “Ultrafast and memory-efficient alignment of short DNA sequences to the human genome,” Genome Biology, vol. 10, no. 3, article R25 , (2009) .
    • . . . Linux-based software such as Bowtie [59], BWA [60], and SOAP2/3 [61] have been used widely for the analysis of NGS data . . .
  60. H. Li and R. Durbin, “Fast and accurate short read alignment with Burrows-Wheeler transform,” Bioinformatics, vol. 25, no. 14, pp. 1754–1760 , (2009) .
    • . . . Linux-based software such as Bowtie [59], BWA [60], and SOAP2/3 [61] have been used widely for the analysis of NGS data . . .
  61. R. Li, C. Yu, Y. Li et al., “SOAP2: an improved ultrafast tool for short read alignment,” Bioinformatics, vol. 25, no. 15, pp. 1966–1967 , (2009) .
    • . . . Linux-based software such as Bowtie [59], BWA [60], and SOAP2/3 [61] have been used widely for the analysis of NGS data . . .
  62. H. Li and N. Homer, “A survey of sequence alignment algorithms for next-generation sequencing,” Briefings in Bioinformatics, vol. 11, no. 5, Article ID bbq015, pp. 473–483 , (2010) .
    • . . . For reviews on NGS software, see Li and Homer [62], Wang et al. [58], and Treangen and Salzberg [63] . . .
  63. T. J. Treangen and S. L. Salzberg, “Repetitive DNA and next-generation sequencing: computational challenges and solutions,” Nature Reviews Genetics, vol. 13, no. 1, pp. 36–46 , (2012) .
    • . . . For reviews on NGS software, see Li and Homer [62], Wang et al. [58], and Treangen and Salzberg [63] . . .
    • . . . It should be noted that a higher percentage of mapped reads is not a strict measure of quality because it may be indicative of a higher level of misaligned reads or reads aligned against repetitive elements, features that are not desirable [63]. . . .
    • . . . The presence of repeat elements, paralogs, and incomplete or inaccurate reference genome sequences can create ambiguities in SNP calling [63] . . .
    • . . . Reviews of SNP calling software have been published [63, 105] . . .
  64. R. McLendon, A. Friedman, D. Bigner et al., “Comprehensive genomic characterization defines human glioblastoma genes and core pathways,” Nature, vol. 455, no. 7216, pp. 1061–1068 , (2008) .
    • . . . For example, in aligning 2 × 13,326,195 paired-end reads (76 bp) from The Cancer Genome Atlas project (SRR018643) [64], SHRiMP [65] took 1,065 hrs with a peak memory footprint of 12 gigabytes to achieve the mapping of 81% of the reads to the human genome reference whereas Bowtie used 2.9 gigabytes of memory, a run time of 2.2 hrs but only achieved a 67% mapping rate [58] . . .
  65. S. M. Rumble, P. Lacroute, A. V. Dalca, M. Fiume, A. Sidow, and M. Brudno, “SHRiMP: accurate mapping of short color-space reads,” PLoS Computational Biology, vol. 5, no. 5, Article ID e1000386 , (2009) .
    • . . . For example, in aligning 2 × 13,326,195 paired-end reads (76 bp) from The Cancer Genome Atlas project (SRR018643) [64], SHRiMP [65] took 1,065 hrs with a peak memory footprint of 12 gigabytes to achieve the mapping of 81% of the reads to the human genome reference whereas Bowtie used 2.9 gigabytes of memory, a run time of 2.2 hrs but only achieved a 67% mapping rate [58] . . .
  66. S. Rounsley, P. R. Marri, Y. Yu et al., “De novo next generation sequencing of plant genomes,” Rice, vol. 2, no. 1, pp. 35–43 , (2009) .
    • . . . In the absence of a reference genome, de novo assembly of a plant genome is achieved using sequence information obtained through a combination of Sanger and/or NGS of bacterial artificial chromosome (BAC) clones, or by whole genome shotgun (WGS) with NGS [66] . . .
  67. T. Sasaki, “The map-based sequence of the rice genome,” Nature, vol. 436, no. 7052, pp. 793–800 , (2005) .
    • . . . The genomes of Arabidopsis thaliana [16], rice [67], and maize [68] were generated using a BAC-by-BAC approach while poplar [69], grape [70], and sorghum [71] genomic sequences were obtained through WGS . . .
  68. E. Pennisi, “Plant sciences: corn genomics pops wide open,” Science, vol. 319, no. 5868, p. 1333 , (2008) .
    • . . . The genomes of Arabidopsis thaliana [16], rice [67], and maize [68] were generated using a BAC-by-BAC approach while poplar [69], grape [70], and sorghum [71] genomic sequences were obtained through WGS . . .
  69. G. A. Tuskan, S. DiFazio, S. Jansson et al., “The genome of black cottonwood, Populus trichocarpa (Torr. & Gray),” Science, vol. 313, no. 5793, pp. 1596–1604 , (2006) .
    • . . . The genomes of Arabidopsis thaliana [16], rice [67], and maize [68] were generated using a BAC-by-BAC approach while poplar [69], grape [70], and sorghum [71] genomic sequences were obtained through WGS . . .
  70. O. Jaillon, J. M. Aury, B. Noel et al., “The grapevine genome sequence suggests ancestral hexaploidization in major angiosperm phyla,” Nature, vol. 449, no. 7161, pp. 463–467 , (2007) .
    • . . . The genomes of Arabidopsis thaliana [16], rice [67], and maize [68] were generated using a BAC-by-BAC approach while poplar [69], grape [70], and sorghum [71] genomic sequences were obtained through WGS . . .
  71. A. H. Paterson, J. E. Bowers, R. Bruggmann et al., “The Sorghum bicolor genome and the diversification of grasses,” Nature, vol. 457, no. 7229, pp. 551–556 , (2009) .
    • . . . The genomes of Arabidopsis thaliana [16], rice [67], and maize [68] were generated using a BAC-by-BAC approach while poplar [69], grape [70], and sorghum [71] genomic sequences were obtained through WGS . . .
  72. C. Feuillet, J. E. Leach, J. Rogers, P. S. Schnable, and K. Eversole, “Crop genome sequencing: lessons and rationales,” Trends in Plant Science, vol. 16, no. 2, pp. 77–88 , (2011) .
    • . . . A list of current plant genome sequencing projects, their sequencing strategies, and status from standard draft to finished can be found in the review by Feuillet et al. [72]. . . .
    • . . . Numerous plant genomes are now sequenced at various levels of completion and many more are underway [72] . . .
  73. B. Chevreux, T. Pfisterer, B. Drescher et al., “Using the miraEST assembler for reliable and automated mRNA transcript assembly and SNP detection in sequenced ESTs,” Genome Research, vol. 14, no. 6, pp. 1147–1159 , (2004) .
    • . . . Software programs such as Mira [73], SOAPdenovo [74], ABySS [75], and Velvet [76] have been used for de novo assembly . . .
  74. R. Li, Y. Li, X. Fang et al., “SNP detection for massively parallel whole-genome resequencing,” Genome Research, vol. 19, no. 6, pp. 1124–1132 , (2009) .
    • . . . Software programs such as Mira [73], SOAPdenovo [74], ABySS [75], and Velvet [76] have been used for de novo assembly . . .
    • . . . Broadly used SNP calling software include Samtools [103], SNVer [104], and SOAPsnp [74] . . .
  75. J. T. Simpson, K. Wong, S. D. Jackman, J. E. Schein, S. J. M. Jones, and I. Birol, “ABySS: a parallel assembler for short read sequence data,” Genome Research, vol. 19, no. 6, pp. 1117–1123 , (2009) .
    • . . . Software programs such as Mira [73], SOAPdenovo [74], ABySS [75], and Velvet [76] have been used for de novo assembly . . .
  76. D. R. Zerbino and E. Birney, “Velvet: algorithms for de novo short read assembly using de Bruijn graphs,” Genome Research, vol. 18, no. 5, pp. 821–829 , (2008) .
    • . . . Software programs such as Mira [73], SOAPdenovo [74], ABySS [75], and Velvet [76] have been used for de novo assembly . . .
  77. D. Chagné, R. N. Crowhurst, M. Troggio et al., “Genome-wide SNP detection, validation, and development of an 8K SNP array for apple,” PLoS ONE, vol. 7, no. 2, Article ID e31745 , (2012) .
    • . . . The assembly generated by SOAPdenovo can be used for SNP discovery using SOAPsnp as implemented for the apple genome [77] . . .
    • . . . For example, a subset of 144 SNPs from a total of 2,113,120 SNPs were validated using the Goldengate assay on 160 accessions in apple [77] . . .
  78. A. J. Cortés, M. C. Chavarro, and M. W. Blair, “SNP marker diversity in common bean (Phaseolus vulgaris L.),” Theoretical and Applied Genetics, vol. 123, no. 5, pp. 827–845 , (2011) .
    • . . . The most common application of NGS is SNP discovery, whose downstream usefulness in linkage map construction, genetic diversity analyses, association mapping, and marker-assisted selection has been demonstrated in several species [78] . . .
    • . . . Although SNP discovery in complex genomes without a reference genome such as wheat [81, 82], barley [14, 89], oat [97], and beans [78] can be achieved through NGS, several challenges remain in other nonmodel but economically important crops . . .
  79. D. Altshuler, V. J. Pollara, C. R. Cowles et al., “An SNP map of the human genome generated by reduced representation shotgun sequencing,” Nature, vol. 407, no. 6803, pp. 513–516 , (2000) .
    • . . . NGS-derived SNPs have been reported in humans [79], Drosophila [80], wheat [81, 82], eggplant [83], rice [84–86], Arabidopsis [87, 88], barley [14, 89], sorghum [90], cotton [91], common beans [78], soybean [92], potato [93], flax [94], Aegilops tauschii [95], alfalfa [96], oat [97], and maize [98] to name a few. . . .
    • . . . Reduced representation libraries (RRLs), that is, sequencing an enriched subset of a genome by eliminating a proportion of its repetitive fractions [79], reduce the probability of misalignments to repeats and thus potential downstream erroneous SNP calling . . .
  80. J. Berger, T. Suzuki, K. A. Senti, J. Stubbs, G. Schaffner, and B. J. Dickson, “Genetic mapping with SNP markers in Drosophila,” Nature Genetics, vol. 29, no. 4, pp. 475–481 , (2001) .
    • . . . NGS-derived SNPs have been reported in humans [79], Drosophila [80], wheat [81, 82], eggplant [83], rice [84–86], Arabidopsis [87, 88], barley [14, 89], sorghum [90], cotton [91], common beans [78], soybean [92], potato [93], flax [94], Aegilops tauschii [95], alfalfa [96], oat [97], and maize [98] to name a few. . . .
  81. A. M. Allen, G. L. Barker, S. T. Berry et al., “Transcript-specific, single-nucleotide polymorphism discovery and linkage analysis in hexaploid bread wheat (Triticum aestivum L.),” Plant Biotechnology Journal, vol. 9, no. 9, pp. 1086–1099 , (2011) .
    • . . . NGS-derived SNPs have been reported in humans [79], Drosophila [80], wheat [81, 82], eggplant [83], rice [84–86], Arabidopsis [87, 88], barley [14, 89], sorghum [90], cotton [91], common beans [78], soybean [92], potato [93], flax [94], Aegilops tauschii [95], alfalfa [96], oat [97], and maize [98] to name a few. . . .
    • . . . Although SNP discovery in complex genomes without a reference genome such as wheat [81, 82], barley [14, 89], oat [97], and beans [78] can be achieved through NGS, several challenges remain in other nonmodel but economically important crops . . .
  82. D. Trebbi, M. Maccaferri, P. de Heer et al., “High-throughput SNP discovery and genotyping in durum wheat (Triticum durum Desf.),” Theoretical and Applied Genetics, vol. 123, no. 4, pp. 555–569 , (2011) .
    • . . . NGS-derived SNPs have been reported in humans [79], Drosophila [80], wheat [81, 82], eggplant [83], rice [84–86], Arabidopsis [87, 88], barley [14, 89], sorghum [90], cotton [91], common beans [78], soybean [92], potato [93], flax [94], Aegilops tauschii [95], alfalfa [96], oat [97], and maize [98] to name a few. . . .
    • . . . Although SNP discovery in complex genomes without a reference genome such as wheat [81, 82], barley [14, 89], oat [97], and beans [78] can be achieved through NGS, several challenges remain in other nonmodel but economically important crops . . .
  83. L. Barchi, S. Lanteri, E. Portis et al., “Identification of SNP and SSR markers in eggplant using RAD tag sequencing,” BMC Genomics, vol. 12, article 304 , (2011) .
    • . . . NGS-derived SNPs have been reported in humans [79], Drosophila [80], wheat [81, 82], eggplant [83], rice [84–86], Arabidopsis [87, 88], barley [14, 89], sorghum [90], cotton [91], common beans [78], soybean [92], potato [93], flax [94], Aegilops tauschii [95], alfalfa [96], oat [97], and maize [98] to name a few. . . .
  84. F. A. Feltus, J. Wan, S. R. Schulze, J. C. Estill, N. Jiang, and A. H. Paterson, “An SNP resource for rice genetics and breeding based on subspecies Indica and Japonica genome alignments,” Genome Research, vol. 14, no. 9, pp. 1812–1819 , (2004) .
    • . . . NGS-derived SNPs have been reported in humans [79], Drosophila [80], wheat [81, 82], eggplant [83], rice [84–86], Arabidopsis [87, 88], barley [14, 89], sorghum [90], cotton [91], common beans [78], soybean [92], potato [93], flax [94], Aegilops tauschii [95], alfalfa [96], oat [97], and maize [98] to name a few. . . .
  85. K. L. McNally, K. L. Childs, R. Bohnert et al., “Genomewide SNP variation reveals relationships among landraces and modern varieties of rice,” Proceedings of the National Academy of Sciences of the United States of America, vol. 106, no. 30, pp. 12273–12278 , (2009) .
    • . . . SNPs can help to decipher breeding pedigree, to identify genomic divergence of species to elucidate speciation and evolution, and to associate genomic variations to phenotypic traits [85] . . .
  86. T. Yamamoto, H. Nagasaki, J. I. Yonemaru et al., “Fine definition of the pedigree haplotypes of closely related rice cultivars by means of genome-wide discovery of single-nucleotide polymorphisms,” BMC Genomics, vol. 11, no. 1, article 267 , (2010) .
    • . . . NGS-derived SNPs have been reported in humans [79], Drosophila [80], wheat [81, 82], eggplant [83], rice [84–86], Arabidopsis [87, 88], barley [14, 89], sorghum [90], cotton [91], common beans [78], soybean [92], potato [93], flax [94], Aegilops tauschii [95], alfalfa [96], oat [97], and maize [98] to name a few. . . .
    • . . . SNP discovery using NGS is readily accomplished in small plant genomes for which good reference genomes are available such as rice and Arabidopsis [86, 99] . . .
  87. G. Jander, S. R. Norris, S. D. Rounsley, D. F. Bush, I. M. Levin, and R. L. Last, “Arabidopsis map-based cloning in the post-genome era,” Plant Physiology, vol. 129, no. 2, pp. 440–450 , (2002) .
    • . . . NGS-derived SNPs have been reported in humans [79], Drosophila [80], wheat [81, 82], eggplant [83], rice [84–86], Arabidopsis [87, 88], barley [14, 89], sorghum [90], cotton [91], common beans [78], soybean [92], potato [93], flax [94], Aegilops tauschii [95], alfalfa [96], oat [97], and maize [98] to name a few. . . .
  88. X. Zhang and J. O. Borevitz, “Global analysis of allele-specific expression in Arabidopsis thaliana,” Genetics, vol. 182, no. 4, pp. 943–954 , (2009) .
    • . . . NGS-derived SNPs have been reported in humans [79], Drosophila [80], wheat [81, 82], eggplant [83], rice [84–86], Arabidopsis [87, 88], barley [14, 89], sorghum [90], cotton [91], common beans [78], soybean [92], potato [93], flax [94], Aegilops tauschii [95], alfalfa [96], oat [97], and maize [98] to name a few. . . .
  89. R. Waugh, J. L. Jannink, G. J. Muehlbauer, and L. Ramsay, “The emergence of whole genome association scans in barley,” Current Opinion in Plant Biology, vol. 12, no. 2, pp. 218–222 , (2009) .
    • . . . NGS-derived SNPs have been reported in humans [79], Drosophila [80], wheat [81, 82], eggplant [83], rice [84–86], Arabidopsis [87, 88], barley [14, 89], sorghum [90], cotton [91], common beans [78], soybean [92], potato [93], flax [94], Aegilops tauschii [95], alfalfa [96], oat [97], and maize [98] to name a few. . . .
    • . . . Although SNP discovery in complex genomes without a reference genome such as wheat [81, 82], barley [14, 89], oat [97], and beans [78] can be achieved through NGS, several challenges remain in other nonmodel but economically important crops . . .
  90. J. C. Nelson, S. Wang, Y. Wu et al., “Single-nucleotide polymorphism discovery by high-throughput sequencing in sorghum,” BMC Genomics, vol. 12, article 352 , (2011) .
    • . . . NGS-derived SNPs have been reported in humans [79], Drosophila [80], wheat [81, 82], eggplant [83], rice [84–86], Arabidopsis [87, 88], barley [14, 89], sorghum [90], cotton [91], common beans [78], soybean [92], potato [93], flax [94], Aegilops tauschii [95], alfalfa [96], oat [97], and maize [98] to name a few. . . .
  91. R. L. Byers, D. B. Harker, S. M. Yourstone, P. J. Maughan, and J. A. Udall, “Development and mapping of SNP assays in allotetraploid cotton,” Theoretical and Applied Genetics, vol. 124, no. 7, pp. 1201–1214 , (2012) .
    • . . . NGS-derived SNPs have been reported in humans [79], Drosophila [80], wheat [81, 82], eggplant [83], rice [84–86], Arabidopsis [87, 88], barley [14, 89], sorghum [90], cotton [91], common beans [78], soybean [92], potato [93], flax [94], Aegilops tauschii [95], alfalfa [96], oat [97], and maize [98] to name a few. . . .
    • . . . In polyploid genomes such as cotton (allotetraploid), homoeologous sequences can cause similar misalignment [91] . . .
    • . . . Additionally, in polyploid species, separate assembly of homoeologs using stringent mapping parameters is often essential for genome-wide SNP identification to avoid spurious SNP calls caused by erroneous homoeologous read mapping [91]. . . .
    • . . . SNP-based linkage maps have been constructed in many economically important species such as rice [126], cotton [91] and Brassica [127] . . .
  92. D. L. Hyten, S. B. Cannon, Q. Song et al., “High-throughput SNP discovery through deep resequencing of a reduced representation library to anchor and orient scaffolds in the soybean whole genome sequence,” BMC Genomics, vol. 11, no. 1, article 38 , (2010) .
    • . . . NGS-derived SNPs have been reported in humans [79], Drosophila [80], wheat [81, 82], eggplant [83], rice [84–86], Arabidopsis [87, 88], barley [14, 89], sorghum [90], cotton [91], common beans [78], soybean [92], potato [93], flax [94], Aegilops tauschii [95], alfalfa [96], oat [97], and maize [98] to name a few. . . .
  93. J. P. Hamilton, C. N. Hansey, B. R. Whitty et al., “Single nucleotide polymorphism discovery in elite north American potato germplasm,” BMC Genomics, vol. 12, article 302 , (2011) .
    • . . . NGS-derived SNPs have been reported in humans [79], Drosophila [80], wheat [81, 82], eggplant [83], rice [84–86], Arabidopsis [87, 88], barley [14, 89], sorghum [90], cotton [91], common beans [78], soybean [92], potato [93], flax [94], Aegilops tauschii [95], alfalfa [96], oat [97], and maize [98] to name a few. . . .
  94. Y.-B. Fu and G. W. Peterson, “Developing genomic resources in two Linum species via 454 pyrosequencing and genomic reduction,” Molecular Ecology Resources, vol. 12, no. 3, pp. 492–500 , (2012) .
    • . . . NGS-derived SNPs have been reported in humans [79], Drosophila [80], wheat [81, 82], eggplant [83], rice [84–86], Arabidopsis [87, 88], barley [14, 89], sorghum [90], cotton [91], common beans [78], soybean [92], potato [93], flax [94], Aegilops tauschii [95], alfalfa [96], oat [97], and maize [98] to name a few. . . .
  95. F. M. You, N. Huo, K. R. Deal et al., “Annotation-based genome-wide SNP discovery in the large and complex Aegilops tauschii genome using next-generation sequencing without a reference genome sequence,” BMC Genomics, vol. 12, article 59 , (2011) .
    • . . . NGS-derived SNPs have been reported in humans [79], Drosophila [80], wheat [81, 82], eggplant [83], rice [84–86], Arabidopsis [87, 88], barley [14, 89], sorghum [90], cotton [91], common beans [78], soybean [92], potato [93], flax [94], Aegilops tauschii [95], alfalfa [96], oat [97], and maize [98] to name a few. . . .
    • . . . In the absence of a reference genome, this is achieved by comparing reads from different genotypes using de novo assembly strategies [95] . . .
    • . . . NGS platforms have different levels of sequencing accuracies, and this may be the most important factor determining the variation in the validation, from 88.2% for SOLiD followed by Illumina at 85.4% and Roche 454 at 71% [95] . . .
  96. Y. Han, Y. Kang, I. Torres-Jerez et al., “Genome-wide SNP discovery in tetraploid alfalfa using 454 sequencing and high resolution melting analysis,” BMC Genomics, vol. 12, p. 350 , (2011) .
    • . . . NGS-derived SNPs have been reported in humans [79], Drosophila [80], wheat [81, 82], eggplant [83], rice [84–86], Arabidopsis [87, 88], barley [14, 89], sorghum [90], cotton [91], common beans [78], soybean [92], potato [93], flax [94], Aegilops tauschii [95], alfalfa [96], oat [97], and maize [98] to name a few. . . .
  97. R. E. Oliver, G. R. Lazo, J. D. Lutz et al., “Model SNP development for complex genomes based on hexaploid oat using high-throughput 454 sequencing technology,” BMC Genomics, vol. 12, no. 1, article 77 , (2011) .
    • . . . NGS-derived SNPs have been reported in humans [79], Drosophila [80], wheat [81, 82], eggplant [83], rice [84–86], Arabidopsis [87, 88], barley [14, 89], sorghum [90], cotton [91], common beans [78], soybean [92], potato [93], flax [94], Aegilops tauschii [95], alfalfa [96], oat [97], and maize [98] to name a few. . . .
    • . . . Although SNP discovery in complex genomes without a reference genome such as wheat [81, 82], barley [14, 89], oat [97], and beans [78] can be achieved through NGS, several challenges remain in other nonmodel but economically important crops . . .
  98. E. Jones, W. C. Chu, M. Ayele et al., “Development of single nucleotide polymorphism (SNP) markers for use in commercial maize (Zea mays L.) germplasm,” Molecular Breeding, vol. 24, no. 2, pp. 165–176 , (2009) .
    • . . . NGS-derived SNPs have been reported in humans [79], Drosophila [80], wheat [81, 82], eggplant [83], rice [84–86], Arabidopsis [87, 88], barley [14, 89], sorghum [90], cotton [91], common beans [78], soybean [92], potato [93], flax [94], Aegilops tauschii [95], alfalfa [96], oat [97], and maize [98] to name a few. . . .
  99. S. Ossowski, K. Schneeberger, R. M. Clark, C. Lanz, N. Warthmann, and D. Weigel, “Sequencing of natural strains of Arabidopsis thaliana with short reads,” Genome Research, vol. 18, no. 12, pp. 2024–2033 , (2008) .
    • . . . SNP discovery using NGS is readily accomplished in small plant genomes for which good reference genomes are available such as rice and Arabidopsis [86, 99] . . .
  100. I. Milne, M. Bayer, L. Cardle et al., “Tablet-next generation sequence assembly visualization,” Bioinformatics, vol. 26, no. 3, pp. 401–402 , (2009) .
    • . . . In assemblies generated allowing single nucleotide variants and insertions/deletions (indels), a list of SNP and indel coordinates is generated and the read mapping results can be visualized using graphical user interface programs such as Tablet [100] (Figure 1), SNP-VISTA [101], or Savant [102] (refer to Table 4 for download information) . . .
  101. N. Shah, M. V. Teplitsky, S. Minovitsky et al., “SNP-VISTA: an interactive SNP visualization tool,” BMC Bioinformatics, vol. 6, no. 1, article 292 , (2005) .
    • . . . In assemblies generated allowing single nucleotide variants and insertions/deletions (indels), a list of SNP and indel coordinates is generated and the read mapping results can be visualized using graphical user interface programs such as Tablet [100] (Figure 1), SNP-VISTA [101], or Savant [102] (refer to Table 4 for download information) . . .
  102. M. Fiume, V. Williams, A. Brook, and M. Brudno, “Savant: genome browser for high-throughput sequencing data,” Bioinformatics, vol. 26, no. 16, Article ID btq332, pp. 1938–1944 , (2010) .
    • . . . In assemblies generated allowing single nucleotide variants and insertions/deletions (indels), a list of SNP and indel coordinates is generated and the read mapping results can be visualized using graphical user interface programs such as Tablet [100] (Figure 1), SNP-VISTA [101], or Savant [102] (refer to Table 4 for download information) . . .
  103. H. Li, B. Handsaker, A. Wysoker et al., “The sequence alignment/map format and SAMtools,” Bioinformatics, vol. 25, no. 16, pp. 2078–2079 , (2009) .
    • . . . Broadly used SNP calling software include Samtools [103], SNVer [104], and SOAPsnp [74] . . .
  104. Z. Wei, W. Wang, P. Hu, G. J. Lyon, and H. Hakonarson, “SNVer: a statistical tool for variant calling in analysis of pooled or individual next-generation sequencing data,” Nucleic acids research, vol. 39, no. 19, article e132 , (2011) .
    • . . . Broadly used SNP calling software include Samtools [103], SNVer [104], and SOAPsnp [74] . . .
  105. R. Nielsen, J. S. Paul, A. Albrechtsen, and Y. S. Song, “Genotype and SNP calling from next-generation sequencing data,” Nature Reviews Genetics, vol. 12, no. 6, pp. 443–451 , (2011) .
    • . . . Reviews of SNP calling software have been published [63, 105] . . .
    • . . . Assembly programs such as Novoalign (http://www.novocraft.com/main/index.php) and STAMPY [108], although memory and time intensive, are highly sensitive for simultaneous mapping of short reads from multiple individuals [105]. . . .
  106. R. Ragupathy, R. Rathinavelu, and S. Cloutier, “Physical mapping and BAC-end sequence analysis provide initial insights into the flax (Linum usitatissimum L.) genome,” BMC Genomics, vol. 12, article 217 , (2011) .
    • . . . Large parts of plant genomes consist of repetitive elements [106] which can cause spurious SNP calling by erroneous read mapping to paralogous repeat element sequences . . .
  107. A. McKenna, M. Hanna, E. Banks et al., “The genome analysis toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data,” Genome Research, vol. 20, no. 9, pp. 1297–1303 , (2010) .
    • . . . Read assembly algorithms such as Bowtie and SOAP as well as variant calling/genotyping softwares such as GATK [107] are rapidly evolving to accommodate an ever increasing number of reads, increased read length, nucleotide quality values, and mate-pair information of PE reads . . .
  108. G. Lunter and M. Goodson, “Stampy: a statistical algorithm for sensitive and fast mapping of Illumina sequence reads,” Genome Research, vol. 21, no. 6, pp. 936–939 , (2011) .
    • . . . Assembly programs such as Novoalign (http://www.novocraft.com/main/index.php) and STAMPY [108], although memory and time intensive, are highly sensitive for simultaneous mapping of short reads from multiple individuals [105]. . . .
  109. R. M. Durbin, “A map of human genome variation from population-scale sequencing,” Nature, vol. 467, no. 7319, pp. 1061–1073 , (2010) .
    • . . . This strategy identifies the most common sources of error and is applied in the 1000 genome project [109] . . .
  110. J. B. Fan, M. S. Chee, and K. L. Gunderson, “Highly parallel genomic assays,” Nature Reviews Genetics, vol. 7, no. 8, pp. 632–644 , (2006) .
    • . . . Usually a small subset of the SNPs is used for validation through assays such as the Illumina Goldengate [110], KBiosciences Competitive Allele­Specific-PCR SNP genotyping system (KASPar) (http://www.lgcgenomics.com/) or the High Resolution Melting (HRM) curve analysis . . .
  111. M. R. Garvin, K. Saitoh, and A. J. Gharrett, “Application of single nucleotide polymorphisms to non-model species: a technical review,” Molecular Ecology Resources, vol. 10, no. 6, pp. 915–934 , (2010) .
    • . . . Other validation strategies used in nonmodel organisms are tabulated in Garvin et al. [111] . . .
  112. Z. Tsuchihashi and N. C. Dracopoli, “Progress in high throughput SNP genotyping methods,” Pharmacogenomics Journal, vol. 2, no. 2, pp. 103–110 , (2002) .
    • . . . Detailed analyses of SNP genotyping assays and their features are reviewed in Tsuchihashi and Dracopoli [112], Sobrino and Carracedo [113], Giancola et al. [114], Kim and Misra [115], Gupta et al. [116], and Ragoussis [117] . . .
  113. B. Sobrino and A. Carracedo, “SNP typing in forensic genetics: a review,” Methods in Molecular Biology, vol. 297, pp. 107–126 , (2005) .
    • . . . Detailed analyses of SNP genotyping assays and their features are reviewed in Tsuchihashi and Dracopoli [112], Sobrino and Carracedo [113], Giancola et al. [114], Kim and Misra [115], Gupta et al. [116], and Ragoussis [117] . . .
  114. S. Giancola, H. I. McKhann, A. Bérard et al., “Utilization of the three high-throughput SNP genotyping methods, the GOOD assay, Amplifluor and TaqMan, in diploid and polyploid plants,” Theoretical and Applied Genetics, vol. 112, no. 6, pp. 1115–1124 , (2006) .
    • . . . Detailed analyses of SNP genotyping assays and their features are reviewed in Tsuchihashi and Dracopoli [112], Sobrino and Carracedo [113], Giancola et al. [114], Kim and Misra [115], Gupta et al. [116], and Ragoussis [117] . . .
  115. S. Kim and A. Misra, “SNP genotyping: technologies and biomedical applications,” Annual Review of Biomedical Engineering, vol. 9, pp. 289–320 , (2007) .
    • . . . Detailed analyses of SNP genotyping assays and their features are reviewed in Tsuchihashi and Dracopoli [112], Sobrino and Carracedo [113], Giancola et al. [114], Kim and Misra [115], Gupta et al. [116], and Ragoussis [117] . . .
  116. P. K. Gupta, S. Rustgi, and R. R. Mir, “Array-based high-throughput DNA markers for crop improvement,” Heredity, vol. 101, no. 1, pp. 5–18 , (2008) .
    • . . . Detailed analyses of SNP genotyping assays and their features are reviewed in Tsuchihashi and Dracopoli [112], Sobrino and Carracedo [113], Giancola et al. [114], Kim and Misra [115], Gupta et al. [116], and Ragoussis [117] . . .
  117. J. Ragoussis, “Genotyping technologies for genetic research,” Annual Review of Genomics and Human Genetics, vol. 10, pp. 117–133 , (2009) .
    • . . . Detailed analyses of SNP genotyping assays and their features are reviewed in Tsuchihashi and Dracopoli [112], Sobrino and Carracedo [113], Giancola et al. [114], Kim and Misra [115], Gupta et al. [116], and Ragoussis [117] . . .
  118. J. W. Davey, P. A. Hohenlohe, P. D. Etter, J. Q. Boone, J. M. Catchen, and M. L. Blaxter, “Genome-wide genetic marker discovery and genotyping using next-generation sequencing,” Nature Reviews Genetics, vol. 12, no. 7, pp. 499–510 , (2011) .
    • . . . There have been a number of approaches developed that use complexity reduction strategies to lower the cost and simplify the discovery of SNP markers using NGS, RNA-Seq, complexity reduction of polymorphic sequences (CRoPS), restriction-site-associated DNA sequencing (RAD-Seq), and GBS [118] . . .
  119. R. J. Elshire, J. C. Glaubitz, Q. Sun et al., “A robust, simple genotyping-by-sequencing (GBS) approach for high diversity species,” PLoS ONE, vol. 6, no. 5, Article ID e19379 , (2011) .
    • . . . Briefly, GBS involves digesting the genome of each individual in a population to be studied with a restriction enzyme [119] . . .
  120. J. A. Poland, P. J. Brown, M. E. Sorrells, and J.-L. Jannink, “Development of high-density genetic maps for barley and wheat using a novel two-enzyme genotyping-by-sequencing approach,” PLoS ONE, vol. 7, no. 2, Article ID e32253 , (2012) .
    • . . . Poland et al. [120] recently demonstrated the use of two restriction enzymes to perform GBS in bread wheat, a hexaploid genome. . . .
  121. M. W. Horton, A. M. Hancock, Y. S. Huang et al., “Genome-wide patterns of genetic variation in worldwide Arabidopsis thaliana accessions from the RegMap panel,” Nature Genetics, vol. 44, no. 2, pp. 212–216 , (2012) .
    • . . . NGS and SNP genotyping technologies have made SNPs the most widely used marker for genetic studies in plant species such as Arabidopsis [121] and rice [122] . . .
  122. G. K. Subbaiyan, D. L. E. Waters, S. K. Katiyar, A. R. Sadananda, S. Vaddadi, and R. J. Henry, “Genome-wide DNA polymorphisms in elite indica rice inbreds discovered by whole-genome sequencing,” Plant Biotechnology Journal, vol. 10, no. 6, pp. 623–634 , (2012) .
    • . . . NGS and SNP genotyping technologies have made SNPs the most widely used marker for genetic studies in plant species such as Arabidopsis [121] and rice [122] . . .
  123. J. C. Nelson, “Methods and software for genetic mapping,” in The Handbook of Plant Genome Mapping, pp. 53–74, Wiley-VCH, Weinheim, Germany , (2005) .
    • . . . Genetic maps are essential tools in molecular breeding for plant genetic improvement as they enable gene localization, map-based cloning, and the identification of QTL [123] . . .
  124. A. Rafalski, “Applications of single nucleotide polymorphisms in crop genetics,” Current Opinion in Plant Biology, vol. 5, no. 2, pp. 94–100 , (2002) .
    • . . . SNPs discovered using RNA-Seq and expressed sequence tags (ESTs) have the added advantage of being gene specific [124] . . .
  125. L. Kruglyak, “The use of a genetic map of biallelic markers in linkage studies,” Nature Genetics, vol. 17, no. 1, pp. 21–24 , (1997) .
    • . . . Most SNPs are biallelic thereby having a lower polymorphism information content (PIC) value as compared to most other marker types which are often multiallelic [125] . . .
  126. W. Xie, Q. Feng, H. Yu et al., “Parent-independent genotyping for constructing an ultrahigh-density linkage map based on population sequencing,” Proceedings of the National Academy of Sciences of the United States of America, vol. 107, no. 23, pp. 10578–10583 , (2010) .
    • . . . SNP-based linkage maps have been constructed in many economically important species such as rice [126], cotton [91] and Brassica [127] . . .
  127. F. Li, H. Kitashiba, K. Inaba, and T. Nishio, “A Brassica rapa linkage map of EST-based SNP markers for identification of candidate genes controlling flowering time and leaf morphological traits,” DNA Research, vol. 16, no. 6, pp. 311–323 , (2009) .
    • . . . SNP-based linkage maps have been constructed in many economically important species such as rice [126], cotton [91] and Brassica [127] . . .
  128. E. S. Buckler, J. B. Holland, P. J. Bradbury et al., “The genetic architecture of maize flowering time,” Science, vol. 325, no. 5941, pp. 714–718 , (2009) .
    • . . . The identification of candidate genes for flowering time in Brassica [127] and maize [128] are practical examples of gene discovery through SNP-based genetic maps. . . .
  129. S. A. Flint-Garcia, J. M. Thornsberry, and S. B. Edward, “Structure of linkage disequilibrium in plants,” Annual Review of Plant Biology, vol. 54, pp. 357–374 , (2003) .
    • . . . Association mapping (AM) panels provide a better resolution, consider numerous alleles, and may provide faster marker-trait association than biparental populations [129] . . .
  130. P. K. Gupta, S. Rustgi, and P. L. Kulwal, “Linkage disequilibrium and association studies in higher plants: present status and future prospects,” Plant Molecular Biology, vol. 57, no. 4, pp. 461–485 , (2005) .
    • . . . AM, often referred to as linkage disequilibrium (LD) mapping, relies on the nonrandom association between markers and traits [130] . . .
  131. M. J. Aranzana, S. Kim, K. Zhao et al., “Genome-wide association mapping in Arabidopsis identifies previously known flowering time and pathogen resistance genes,” PLoS Genetics, vol. 1, no. 5, p. e60 , (2005) .
    • . . . In plants, such a study was first reported in Arabidopsis for flowering time and pathogen-resistance genes [131] . . .
  132. X. Huang, X. Wei, T. Sang et al., “Genome-wide asociation studies of 14 agronomic traits in rice landraces,” Nature Genetics, vol. 42, no. 11, pp. 961–967 , (2010) .
    • . . . A GWAS performed in rice using ~3.6 million SNPs identified genomic regions associated with 14 agronomic traits [132] . . .
  133. K. L. Kump, P. J. Bradbury, R. J. Wisser et al., “Genome-wide association study of quantitative resistance to southern leaf blight in the maize nested association mapping population,” Nature Genetics, vol. 43, no. 2, pp. 163–168 , (2011) .
    • . . . The genetic structure of northern leaf blight, southern leaf blight, and leaf architecture was studied using ~1.6 million SNPs in maize [133–135] . . .
  134. J. A. Poland, P. J. Bradbury, E. S. Buckler, and R. J. Nelson, “Genome-wide nested association mapping of quantitative resistance to northern leaf blight in maize,” Proceedings of the National Academy of Sciences of the United States of America, vol. 108, no. 17, pp. 6893–6898 , (2011) .
  135. F. Tian, P. J. Bradbury, P. J. Brown et al., “Genome-wide association study of leaf architecture in the maize nested association mapping population,” Nature Genetics, vol. 43, no. 2, pp. 159–162 , (2011) .
    • . . . The genetic structure of northern leaf blight, southern leaf blight, and leaf architecture was studied using ~1.6 million SNPs in maize [133–135] . . .
  136. R. K. Pasam, R. Sharma, M. Malosetti et al., “Genome-wide association studies for agronomical traits in a world wide spring barley collection,” BMC Plant Biology, vol. 12, article 16 , (2012) .
    • . . . SNP-based GWAS was also performed on species such as barley for which a reference genome sequence is not available [136] . . .
  137. B. J. Soto-Cerda and S. Cloutier, “Association mapping in plant genomes,” in Genetic Diversity in Plants, M. Çalişkan, Ed., pp. 29–54, InTech , (2012) .
    • . . . Soto-Cerda and Cloutier [137] have reviewed the concepts, benefits, and limitations of AM in plants. . . .
  138. P. A. Morin, G. Luikart, and R. K. Wayne, “SNPs in ecology, evolution and conservation,” Trends in Ecology and Evolution, vol. 19, no. 4, pp. 208–216 , (2004) .
    • . . . SSRs and mitochondrial DNA have been used in evolutionary studies since the early 1990s [138] . . .
  139. P. W. Hedrick, “Perspective: highly variable loci and their interpretation in evolution and conservation,” Evolution, vol. 53, no. 2, pp. 313–318 , (1999) .
    • . . . However, the biological inferences from results of these two marker types may be misinterpreted due to homoplasy, a phenomenon in which similarity in traits or markers occurs due to reasons other than ancestry, such as convergent evolution, evolutionary reversal, gene duplication, and horizontal gene transfer [139] . . .
  140. A. Vignal, D. Milan, M. SanCristobal, and A. Eggen, “A review on SNP and other types of molecular markers and their use in animal genetics,” Genetics Selection Evolution, vol. 34, no. 3, pp. 275–305 , (2002) .
    • . . . The advantage of SNPs over microsatellites and mitochondrial DNA resides in the fact that SNPs represent single base nucleotide substitutions and, as such, they are less affected by homoplasy because their origin can be explained by mutation models [140] . . .
  141. S. Konishi, T. Izawa, S. Y. Lin et al., “An SNP caused loss of seed shattering during rice domestication,” Science, vol. 312, no. 5778, pp. 1392–1396 , (2006) .
    • . . . Seed shattering (or loss thereof) has been associated with an SNP through a GWAS aimed at unraveling the evolution of rice that led to its domestication [141] . . .
  142. O. Wei, Z. Peng, Y. Zhou, Z. Yang, K. Wu, and Z. Ouyang, “Nucleotide diversity and molecular evolution of the WAG-2 gene in common wheat (Triticum aestivum L.) and its relatives,” Genetics and Molecular Biology, vol. 34, no. 4, pp. 606–615 , (2011) .
    • . . . SNPs have also been used to study the evolution of genes such as WAG-2 in wheat [142] . . .
  143. J. D. Retief, “Phylogenetic analysis using PHYLIP,” Methods in Molecular Biology, vol. 132, pp. 243–258 , (2000) .
    • . . . Algorithms such as neighbor-joining and maximum likelihood implemented in the PHYLIP [143] and MEGA [144] software are commonly used to generate phylogenetic trees. . . .
  144. K. Tamura, J. Dudley, M. Nei, and S. Kumar, “MEGA4: Molecular Evolutionary Genetics Analysis (MEGA) software version 4.0,” Molecular Biology and Evolution, vol. 24, no. 8, pp. 1596–1599 , (2007) .
    • . . . Algorithms such as neighbor-joining and maximum likelihood implemented in the PHYLIP [143] and MEGA [144] software are commonly used to generate phylogenetic trees. . . .
  145. M. W. Ganal, T. Altmann, and M. S. Röder, “SNP identification in crop plants,” Current Opinion in Plant Biology, vol. 12, no. 2, pp. 211–217 , (2009) .
    • . . . Many issues remain to be addressed, such as the ascertainment bias of popular biparental populations and the low validation rate of some array-based genotyping platforms [145] . . .
  146. T. Koepke, S. Schaeffer, V. Krishnan et al., “Rapid gene-based SNP and haplotype marker development in non-model eukaryotes using 3'UTR sequencing,” BMC Genomics, vol. 13, no. 1, article 18 , (2012) .
    • . . . RNA and ChIP-sequencing projects, similar to RNA-Seq in the nonmodel plant sweet cherry to identify SNPs and haplotypes [146], can be undertaken to study functional genomics . . .
Expand