1 Nature 2011 Vol: 475(7355):189-195. DOI: 10.1038/nature10158

Genome sequence and analysis of the tuber crop potato

Potato (Solanum tuberosum L.) is the world’s most important non-grain food crop and is central to global food security. It is clonally propagated, highly heterozygous, autotetraploid, and suffers acute inbreeding depression. Here we use a homozygous doubled-monoploid potato clone to sequence and assemble 86% of the 844-megabase genome. We predict 39,031 protein-coding genes and present evidence for at least two genome duplication events indicative of a palaeopolyploid origin. As the first genome sequence of an asterid, the potato genome reveals 2,642 genes specific to this large angiosperm clade. We also sequenced a heterozygous diploid clone and show that gene presence/absence variants and other potentially deleterious mutations occur frequently and are a likely cause of inbreeding depression. Gene family expansion, tissue-specific expression and recruitment of genes to new pathways contributed to the evolution of tuber development. The potato genome sequence provides a platform for genetic improvement of this vital crop.

Editor's summary

The potato genome The genome of the potato (Solanum tuberosum L.), a staple crop vital to food security, has been sequenced. The Potato Genome Sequencing Consortium sequenced a homozygous doubled-monoploid potato clone as well as a heterozygous diploid clone. Genome analysis reveals traces of at least two genome duplication events and genes specific to Asterids, a large clade of flowering plants of which the potato is the first to be sequenced. Gene presence/absence variants and other potentially deleterious mutations are frequent and may be the cause of inbreeding depression. The genome sequence will facilitate genetic improvements in the potato with a view to improving yield and to increasing disease and stress resistance of this crop, which is a now a significant component of worldwide food production and is becoming increasingly important in the developing world.

Mentions
Figures
Figure 1: The potato genome.a, Ideograms of the 12 pseudochromosomes of potato (in Mb scales). Each of the 12 pachytene chromosomes from DM was digitally aligned with the ideogram (the amount of DNA in each unit of the pachytene chromosomes is not in proportion to the scales of the pseudochromosomes). b, Gene density represented as number of genes per Mb (non-overlapping, window size = 1 Mb). c, Percentage of coverage of repetitive sequences (non-overlapping windows, window size = 1 Mb). d, Transcription state. The transcription level for each gene was estimated by averaging the fragments per kb exon model per million mapped reads (FPKM) from different tissues in non-overlapping 1-Mb windows. e, GC content was estimated by the per cent G+C in 1-Mb non-overlapping windows. f, Distribution of the subtelomeric repeat sequence CL14_cons. Figure 2: Comparative analyses and evolution of the potato genome.a, Clusters of orthologous and paralogous gene families in 12 plant species as identified by OrthoMCL33. Gene family number is listed in each of the components; the number of genes within the families for all of the species within the component is noted within parentheses. b, Genome duplication in dicot genomes as revealed through 4DTv analyses. c, Syntenic blocks between A. thaliana, potato, and V. vinifera (grape) demonstrating a high degree of conserved gene order between these taxa. Figure 3: Haplotype diversity and inbreeding depression.a, Plants and tubers of DM and RH showing that RH has greater vigour. b, Illumina K-mer volume histograms of DM and RH. The volume of K-mers (y-axis) is plotted against the frequency at which they occur (x-axis). The leftmost truncated peaks at low frequency and high volume represent K-mers containing essentially random sequencing errors, whereas the distribution to the right represents proper (putatively error-free) data. In contrast to the single modality of DM, RH exhibits clear bi-modality caused by heterozygosity. c, Genomic distribution of premature stop, frameshift and presence/absence variation mutations contributing to inbreeding depression. The hypothetical RH pseudomolecules were solely inferred from the corresponding DM ones. Owing to the inability to assign heterozygous PS and FS of RH to a definite haplotype, all heterozygous PS and FS were arbitrarily mapped to the left haplotype of RH. d, A zoom-in comparative view of the DM and RH genomes. The left and right alignments are derived from the euchromatic and heterochromatic regions of chromosome 5, respectively. Most of the gene annotations, including PS and RH-specific genes, are supported by transcript data. Figure 4: Gene expression of selected tissues and genes.a, KTI gene organization across the potato genome. Black arrows indicate the location of individual genes on six scaffolds located on four chromosomes. b, Phylogenetic tree and KTI gene expression heat map. The KTI genes were clustered using all potato and tomato genes available with the Populus KTI gene as an out-group. The tissue specificity of individual members of the highly expanded potato gene family is shown in the heat map. Expression levels are indicated by shades of red, where white indicates no expression or lack of data for tomato and poplar. c, A model of starch synthesis showing enzyme activities is shown on the left. AGPase, ADP-glucose pyrophosphorylase; F16BP, fructose-1,6-biphosphatase; HexK, hexokinase; INV, invertase; PFK, phosphofructokinase; PFPP, pyrophosphate-fructose-6-phosphate-1-phosphotransferase; PGI, phosphoglucose isomerase; PGM, phosphoglucomutase; SBE, starch branching enzyme; SP, starch phosphorylase; SPP, sucrose phosphate phosphatase; SS, starch synthase; SuSy, sucrose synthase; SUPS, sucrose phosphate synthase; UDP-GPP, UDP-glucose pyrophosphorylase. The grey background denotes substrate (sucrose) and product (starch) and the red background indicates genes that are specifically upregulated in RH versus DM. On the right, a heat map of the genes involved in carbohydrate metabolism is shown. ADP-glucose pyrophosphorylase large subunit, AGPase (l); ADP-glucose pyrophosphorylase small subunit, AGPase (s); ADP-glucose pyrophosphorylase small subunit 3, AGPase 3 (s); cytosolic fructose-1,6-biphosphatase, F16BP (c); granule bound starch synthase, GBSS; leaf type L starch phosphorylase, Leaf type SP; plastidic phosphoglucomutase, pPGM; starch branching enzyme II, SBE II; soluble starch synthase, SSS; starch synthase V, SSV; three variants of plastidic aldolase, PA.
Altmetric
References
  1. Hijmans, R. J. Global distribution of the potato crop Am. J. Potato Res. 78, 403-412 (2001) .
    • . . . Potato occupies a wide eco-geographical range1 and is unique among the major world food crops in producing stolons (underground stems) that under suitable environmental conditions swell to form tubers . . .
  2. Burlingame, B.; Mouillé, B.; Charrondiére, R. Nutrients, bioactive non-nutrients and anti-nutrients in potatoes J. Food Compost. Anal. 22, 494-502 (2009) .
    • . . . The tubers are a globally important dietary source of starch, protein, antioxidants and vitamins2, serving the plant as both a storage organ and a vegetative propagation system . . .
  3. Paz, M. M.; Veilleux, R. E. Influence of culture medium and in vitro conditions on shoot regeneration in Solanum phureja monoploids and fertility of regenerated doubled monoploids Plant Breed. 118, 53-57 (1999) .
    • . . . To overcome the key issue of heterozygosity and allow us to generate a high-quality draft potato genome sequence, we used a unique homozygous form of potato called a doubled monoploid, derived using classical tissue culture techniques3 . . .
  4. Li, R. De novo assembly of human genomes with massively parallel short read sequencing Genome Res. 20, 265-272 (2010) .
    • . . . The genome was assembled using SOAPdenovo4, resulting in a final assembly of 727 Mb, of which 93.9% is non-gapped sequence . . .
    • . . . We generated a high-quality potato genome using the short read assembly software SOAPdenovo4 (Version 1014) . . .
    • . . . These data were filtered using a custom C program and assembled using SOAPdenovo 1.03 (ref. 4) . . .
  5. Arumuganathan, K.; Earle, E. Nuclear DNA content of some important plant species Plant Mol. Biol. Rep. 9, 208-218 (1991) .
    • . . . The 17-nucleotide depth distribution (Supplementary Fig. 1) suggests a genome size of 844 Mb, consistent with estimates from flow cytometry5 . . .
  6. Tang, X. Assignment of genetic linkage maps to diploid Solanum tuberosum pachytene chromosomes by BAC-FISH technology Chromosome Res. 17, 899-915 (2009) .
    • . . . Karyotypes of RH and DM suggested similar heterochromatin content6 (Supplementary Table 6 and Supplementary Fig. 4) with large blocks of heterochromatin located at the pericentromeric regions (Fig. 1) . . .
  7. Peters, S. A. Solanum lycopersicum cv. Heinz 1706 chromosome 6: distribution and abundance of genes and retrotransposable elements Plant J. 58, 857-869 (2009) .
    • . . . However, many predicted genes in heterochromatic regions are expressed, consistent with observations in tomato7 that genic ‘islands’ are present in the heterochromatic ‘ocean’. . . .
  8. Albach, D. C.; Soltis, P. S.; Soltis, D. E. Patterns of embryological and biochemical evolution in the Asterids Syst. Bot. 26, 242-262 (2001) .
    • . . . Potato is the first sequenced genome of an asterid, a clade within eudicots that encompasses nearly 70,000 species characterized by unique morphological, developmental and compositional features8 . . .
  9. Tang, H. Unraveling ancient hexaploidy through multiply-aligned angiosperm gene maps Genome Res. 18, 1944-1954 (2008) .
    • . . . The ancient WGD corresponds to the ancestral hexaploidization (γ) event in grape (Fig. 2b), consistent with a previous report based on EST analysis that the two main branches of eudicots, the asterids and rosids, may share the same palaeo-hexaploid duplication event9 . . .
    • . . . After removing the self and multiple matches, the syntenic blocks (≥5 genes per block) were identified using MCscan9 and i-adhore 3.0 (ref. 50) based on the aligned protein gene pairs (Supplementary Table 8) . . .
  10. Jaillon, O. The grapevine genome sequence suggests ancestral hexaploidization in major angiosperm phyla Nature 449, 463-467 (2007) .
    • . . . The γ event probably occurred after the divergence between dicots and monocots about 185 ± 55 million years ago10 . . .
  11. Fawcett, J. A.; Maere, S.; Van de Peer, Y. Plants with double genomes might have had a better chance to survive the Cretaceous–Tertiary extinction event Proc. Natl Acad. Sci. USA 106, 5737-5742 (2009) .
    • . . . The recent duplication can therefore be placed at ~67 million years ago, consistent with the WGD that occurred near the Cretaceous–Tertiary boundary (~65 million years ago)11 . . .
  12. Lai, J. Genome-wide patterns of genetic variation among elite maize inbred lines Nature Genet. 42, 1027-1030 (2010) .
    • . . . We used this data set to explore the possible causes of inbreeding depression by quantifying the occurrence of premature stop, frameshift and presence/absence variants12, as these disable gene function and contribute to genetic load (Supplementary Tables 13–16) . . .
  13. Ashburner, M. Gene ontology: tool for the unification of biology Nature Genet. 25, 25-29 (2000) .
    • . . . Finally, we identified presence/absence variations for 275 genes; 246 were RH specific (absent in DM) and 29 were DM specific, with 125 and 9 supported by RNA-Seq and/or Gene Ontology13 annotation for RH and DM, respectively (Supplementary Tables 15 and 16) . . .
  14. Gore, M. A. A first-generation haplotype map of maize Science 326, 1115-1117 (2009) .
    • . . . The divergence between potato haplotypes is similar to that reported between out-crossing maize accessions14 and, coupled with our inability to successfully align 45% of the BAC sequences, intra- and inter-genome diversity seem to be a significant feature of the potato genome . . .
  15. Prat, S. Gene expression during tuber development in potato plants FEBS Lett. 268, 334-338 (1990) .
    • . . . Foremost among these were the genes encoding proteinase inhibitors and patatin (15 genes), in which the phospholipase A function has been largely replaced by a protein storage function in the tuber15 . . .
  16. Glaczinski, H.; Heibges, A.; Salamini, R.; Gebhardt, C. Members of the Kunitz-type protease inhibitor gene family of potato inhibit soluble tuber invertase in vitro Potato Res. 45, 163-176 (2002) .
    • . . . KTIs are frequently induced after pest and pathogen attack and act primarily as inhibitors of exogenous proteinases16; therefore the expansion of the KTI family may provide resistance to biotic stress for the newly evolved vulnerable underground organ. . . .
  17. Shannon, J. C.; Pien, F. M.; Liu, K. C. Nucleotides and nucleotide sugars in developing maize endosperms: synthesis of ADP-glucose in brittle-1 Plant Physiol. 110, 835-843 (1996) .
    • . . . This contrasts with the cereal endosperm where carbon is transported into the amyloplast in the form of ADP-glucose via a specific transporter (brittle 1 protein17) . . .
    • . . . In total, we predicted 246 RH specific genes, 34 of which are supported by Gene Ontology annotation17. . . .
  18. Tauberger, E. Antisense inhibition of plastidial phosphoglucomutase provides compelling evidence that potato tuber amyloplasts import carbon from the cytosol in the form of glucose-6-phosphate Plant J. 23, 43-53 (2000) .
    • . . . Carbon transport into the amyloplasts of potato tubers is primarily in the form of glucose-6-phosphate18, although recent evidence indicates that glucose-1-phosphate is quantitatively important under certain conditions19 . . .
  19. Fettke, J. Glucose 1-phosphate is efficiently taken up by potato (Solanum tuberosum) tuber parenchyma cells and converted to reserve starch granules New Phytol. 185, 663-675 (2010) .
    • . . . Carbon transport into the amyloplasts of potato tubers is primarily in the form of glucose-6-phosphate18, although recent evidence indicates that glucose-1-phosphate is quantitatively important under certain conditions19 . . .
  20. Sonnewald, U. Control of potato tuber sprouting Trends Plant Sci. 6, 333-335 (2001) .
    • . . . SP/FT is a multi-gene family (Supplementary Text and Supplementary Fig. 7) and expression of a second FT homologue, SP5G, in mature tubers suggests a possible function in the control of tuber sprouting, a photoperiod-dependent phenomenon20 . . .
  21. Yoo, S. K. CONSTANS activates SUPPRESSOR OF OVEREXPRESSION OF CONSTANS 1 through FLOWERING LOCUS T to promote flowering in Arabidopsis Plant Physiol. 139, 770-778 (2005) .
    • . . . Likewise, expression of a homologue of the A. thaliana flowering time MADS box gene SOC1, acting downstream of FT21, is restricted to tuber sprouts (Supplementary Fig. 8) . . .
  22. Kohler, A. Genome-wide identi?cation of NBS resistance genes in Populus trichocarpa Plant Mol. Biol. 66, 619-636 (2008) .
    • . . . The DM assembly contains 408 NBS-LRR-encoding genes, 57 Toll/interleukin-1 receptor/plant R gene homology (TIR) domains and 351 non-TIR types (Supplementary Table 20), similar to the 402 resistance (R) gene candidates in Populus22 . . .
  23. Ballvora, A. Comparative sequence analysis of Solanum and Arabidopsis in a hot spot for pathogen resistance on potato chromosome V reveals a patchwork of conserved and rapidly evolving genome segments BMC Genomics 8, 112 (2007) .
    • . . . In RH, the chromosome 5 R1 cluster contains two distinct haplotypes; one is collinear with the R1 region in DM (Supplementary Fig. 10), yet neither the DM nor the RH R1 regions are collinear with other potato R1 regions23, 24 . . .
  24. Kuang, H. The R1 resistance gene cluster contains three groups of independently evolving, type I R1 homologues and shows substantial structural variation among haplotypes of Solanum demissum Plant J. 44, 37-51 (2005) .
    • . . . In RH, the chromosome 5 R1 cluster contains two distinct haplotypes; one is collinear with the R1 region in DM (Supplementary Fig. 10), yet neither the DM nor the RH R1 regions are collinear with other potato R1 regions23, 24 . . .
  25. Kuang, H.; Woo, S. S.; Meyers, B. C.; Nevo, E.; Michelmore, R. W. Multiple genetic processes result in heterogeneous rates of evolution within the major cluster disease resistance genes in lettuce Plant Cell 16, 2870-2894 (2004) .
    • . . . Comparison of the DM potato R gene sequences with well-established gene models (functional R genes) indicates that many NBS-LRR genes (39.4%) are pseudogenes owing to indels, frameshift mutations, or premature stop codons including the R1, R3a and Rpi-vnt1.1 clusters that contain extensive chimaeras and exhibit evolutionary patterns of type I R genes25 . . .
  26. Haas, B. J. Genome sequence and analysis of the Irish potato famine pathogen Phytophthora infestans Nature 461, 393-398 (2009) .
    • . . . This high rate of pseudogenization parallels the rapid evolution of effector genes observed in the potato late blight pathogen, Phytophthora infestans26 . . .
  27. Haynes, F. L.; French, E. R. Prospects for the Potato in the Developing World: an International Symposium on Key Problems and Potentials for Greater Use of the Potato in the Developing World , 100-110 (1972) .
  28. van Os, H. Construction of a 10,000-marker ultradense genetic recombination map of potato: providing a framework for accelerated gene isolation and a genomewide physical map Genetics 173, 1075-1087 (2006) .
    • . . . RH is the male parent of the mapping population of the ultra-high-density (UHD) linkage map28 used for construction and genetic anchoring of the physical map using the RHPOTKEY BAC library39 . . .
    • . . . The indirect mapping approach exploited in silico anchoring using the RH genetic and physical map28, 40, as well as tomato genetic map data from SGN (http://solgenomics.net/) . . .
  29. Chen, N. Using RepeatMasker to identify repetitive elements in genomic sequences Curr. Protoc. Bioinformatics 25, 4.10.1-4.10.14 (2004) .
    • . . . Transposable elements (TEs) in the potato genome assembly were identified at the DNA and protein level. RepeatMasker29 was applied using Repbase43 for TE identification at the DNA level . . .
    • . . . At the protein level, RepeatProteinMask29, 44 was used in a WuBlastX36 search against the TE protein database to further identify TEs . . .
  30. Elsik, C. G. Creating a honey bee consensus gene set Genome Biol. 8, R13 (2007) .
    • . . . To predict genes, we performed ab initio predictions on the repeat-masked genome and then integrated the results with spliced alignments of proteins and transcripts to genome sequences using GLEAN30 . . .
  31. Trapnell, C.; Pachter, L.; Salzberg, S. L. TopHat: discovering splice junctions with RNA-Seq Bioinformatics 25, 1105-1111 (2009) .
    • . . . To finalize the gene set, we aligned the RNA-Seq from 32 libraries, of which eight were sequenced with both single- and paired-end reads, to the genome using Tophat31 and the alignments were then used as input for Cufflinks32 using the default parameters . . .
    • . . . The aligned read data were generated by Tophat31 and the selected transcripts used as input into Cufflinks32, a short-read transcript assembler that calculates the fragments per kb per million mapped reads (FPKM) as expression values for each transcript . . .
  32. Trapnell, C. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation Nature Biotechnol. 28, 511-515 (2010) .
    • . . . To finalize the gene set, we aligned the RNA-Seq from 32 libraries, of which eight were sequenced with both single- and paired-end reads, to the genome using Tophat31 and the alignments were then used as input for Cufflinks32 using the default parameters . . .
    • . . . The aligned read data were generated by Tophat31 and the selected transcripts used as input into Cufflinks32, a short-read transcript assembler that calculates the fragments per kb per million mapped reads (FPKM) as expression values for each transcript . . .
  33. Li, L.; Stoeckert, C. J., Jr; Roos, D. S. OrthoMCL: identification of ortholog groups for eukaryotic genomes Genome Res. 13, 2178-2189 (2003) .
    • . . . a, Clusters of orthologous and paralogous gene families in 12 plant species as identified by OrthoMCL33 . . .
  34. Li, R. SOAP2: an improved ultrafast tool for short read alignment Bioinformatics 25, 1966-1967 (2009) .
    • . . . For the DM v3.0 assembly, 95.45% of 880 million usable reads could be mapped back to the assembled genome by SOAP 2.20 (ref. 34) using optimal parameters . . .
    • . . . The resulting non-redundant contigs were scaffolded by mapping the RH whole-genome Illumina and 454 mated sequences against these contigs using SOAPalign 2.20 (ref. 34) and subsequently processing these mapping results with a custom Python script . . .
    • . . . RH reads generated by the Illumina GA2 were mapped onto the DM genome assembly using SOAP2.20 (ref. 34) allowing at most four mismatches and SNPs were called using SOAPsnp . . .
  35. Chaisson, M.; Pevzner, P.; Tang, H. Fragment assembly with short reads Bioinformatics 20, 2067-2074 (2004) .
    • . . . Consensus base calling errors in the BAC sequences were corrected using custom Python and C scripts using a similar approach to that described previously35 (Supplementary Text) . . .
  36. Altschul, S. F. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs Nucleic Acids Res. 25, 3389-3402 (1997) .
    • . . . Sequence overlaps between BACs within the same physical tiling path were identified using megablast from BLAST 2.2.21 (ref. 36) and merged with megamerger from the EMBOSS 6.1.0 package37 . . .
    • . . . Amplified fragment length polymorphism markers from the RH genetic map were linked to DM sequence scaffolds via BLAST alignment36 of whole-genome-profiling sequence tags41 obtained from anchored seed BACs in the RH physical map, or by direct alignment of fully sequenced RH seed BACs to the DM sequence . . .
    • . . . At the protein level, RepeatProteinMask29, 44 was used in a WuBlastX36 search against the TE protein database to further identify TEs . . .
  37. Rice, P.; Longden, I.; Bleasby, A. EMBOSS: the European Molecular Biology Open Software Suite Trends Genet. 16, 276-277 (2000) .
    • . . . Sequence overlaps between BACs within the same physical tiling path were identified using megablast from BLAST 2.2.21 (ref. 36) and merged with megamerger from the EMBOSS 6.1.0 package37 . . .
  38. Van Ooijen, J. W.; Kyazma, B. V. JoinMap 4, Software for the Calculation of Genetic Linkage Maps in Experimental Populations , (2006) .
    • . . . The data from 2,603 polymorphic STS markers comprising 1,881 DArTs, 393 SNPs and 329 SSR alleles were analysed using JoinMap 4 (ref. 38) and yielded the expected 12 potato linkage groups . . .
  39. Borm, T. J. Construction and Use of a Physical Map of Potato , (2008) .
    • . . . RH is the male parent of the mapping population of the ultra-high-density (UHD) linkage map28 used for construction and genetic anchoring of the physical map using the RHPOTKEY BAC library39 . . .
  40. Visser, R. G. F. Sequencing the potato genome: outline and first results to come from the elucidation of the sequence of the world's third most important crop Am. J. Potato Res. 86, 417-429 (2009) .
    • . . . The indirect mapping approach exploited in silico anchoring using the RH genetic and physical map28, 40, as well as tomato genetic map data from SGN (http://solgenomics.net/) . . .
  41. Van der Vossen, E. Whole Genome Profiling of the Diploid Potato Clone RH89-039-16 , (2010) .
    • . . . Amplified fragment length polymorphism markers from the RH genetic map were linked to DM sequence scaffolds via BLAST alignment36 of whole-genome-profiling sequence tags41 obtained from anchored seed BACs in the RH physical map, or by direct alignment of fully sequenced RH seed BACs to the DM sequence . . .
  42. Ning, Z.; Cox, A. J.; Mullikin, J. C. SSAHA: a fast search method for large DNA databases Genome Res. 11, 1725-1729 (2001) .
    • . . . The tomato sequence markers from the genetic maps were aligned to the DM assembly using SSAHA2 (ref. 42) . . .
  43. Jurka, J. Repbase Update, a database of eukaryotic repetitive elements Cytogenet. Genome Res. 110, 462-467 (2005) .
    • . . . RepeatMasker29 was applied using Repbase43 for TE identification at the DNA level . . .
  44. Jiang, Z.; Hubley, R.; Smit, A.; Eichler, E. E. DupMasker: a tool for annotating primate segmental duplications Genome Res. 18, 1362-1368 (2008) .
    • . . . At the protein level, RepeatProteinMask29, 44 was used in a WuBlastX36 search against the TE protein database to further identify TEs . . .
  45. Kuang, H. Identification of miniature inverted-repeat transposable elements (MITEs) and biogenesis of their siRNAs in the Solanaceae: new functional implications for MITEs Genome Res. 19, 42-56 (2009) .
    • . . . The potato genome was masked by identified repeat sequences longer than 500 bp, except for miniature inverted repeat transposable elements which are usually found near genes or inside introns45 . . .
  46. Stanke, M.; Steinkamp, R.; Waack, S.; Morgenstern, B. AUGUSTUS: a web server for gene finding in eukaryotes Nucleic Acids Res. 32, W309-W312 (2004) .
    • . . . The software Augustus46 and Genscan47 was used for ab initio predictions with parameters trained for A. thaliana . . .
  47. Burge, C.; Karlin, S. Prediction of complete gene structures in human genomic DNA J. Mol. Biol. 268, 78-94 (1997) .
    • . . . The software Augustus46 and Genscan47 was used for ab initio predictions with parameters trained for A. thaliana . . .
  48. Birney, E.; Clamp, M.; Durbin, R. GeneWise and Genomewise Genome Res. 14, 988-995 (2004) .
    • . . . For similarity-based gene prediction, we aligned the protein sequences of four sequenced plants (A. thaliana, Carica papaya, V. vinifera and Oryza sativa) onto the potato genome using TBLASTN with an E-value cut-off of 1 × 10−5, and then similar genome sequences were aligned against the matching proteins using Genewise48 for accurately spliced alignments . . .
  49. Chen, F.; Mackey, A. J.; Vermunt, J. K.; Roos, D. S. Assessing performance of orthology detection strategies applied to eukaryotic genomes PLoS ONE 2, e383 (2007) .
    • . . . Paralogous and orthologous clusters were identified using OrthoMCL49 using the predicted proteomes of 11 plant species (Supplementary Table 28) . . .
  50. Simillion, C.; Janssens, K.; Sterck, L.; Van de Peer, Y. i-ADHoRe 2.0: an improved tool to detect degenerated genomic homology using genomic profiles Bioinformatics 24, 127-128 (2008) .
    • . . . After removing the self and multiple matches, the syntenic blocks (≥5 genes per block) were identified using MCscan9 and i-adhore 3.0 (ref. 50) based on the aligned protein gene pairs (Supplementary Table 8) . . .
  51. Bailey, T. L.; Elkan, C. The value of prior knowledge in discovering motifs with MEME Proc. Int. Conf. Intell. Syst. Mol. Biol. 3, 21-29 (1995) .
    • . . . Both TIR and LRR domains were validated using NCBI conserved domains and multiple expectation maximization for motif elicitation (MEME)51 . . .
  52. Mun, J. H.; Yu, H. J.; Park, S.; Park, B. S. Genome-wide identification of NBS-encoding resistance genes in Brassica rapa Mol. Genet. Genomics 282, 617-631 (2009) .
    • . . . As previously reported52, Pfam analysis could not identify the CC motif in the N-terminal region . . .
    • . . . CC domains were thus analysed using the MARCOIL53 program with a threshold probability of 90 (ref. 52) and double-checked using paircoil2 (ref. 54) with a P-score cut-off of 0.025 (ref. 55) . . .
  53. Delorenzi, M.; Speed, T. An HMM model for coiled-coil domains and a comparison with PSSM-based predictions Bioinformatics 18, 617-625 (2002) .
    • . . . CC domains were thus analysed using the MARCOIL53 program with a threshold probability of 90 (ref. 52) and double-checked using paircoil2 (ref. 54) with a P-score cut-off of 0.025 (ref. 55) . . .
  54. McDonnell, A. V.; Jiang, T.; Keating, A. E.; Berger, B. Paircoil2: improved predictions of coiled coils from sequence Bioinformatics 22, 356-358 (2006) .
    • . . . CC domains were thus analysed using the MARCOIL53 program with a threshold probability of 90 (ref. 52) and double-checked using paircoil2 (ref. 54) with a P-score cut-off of 0.025 (ref. 55) . . .
  55. Porter, B. W. Genome-wide analysis of Carica papaya reveals a small NBS resistance gene family Mol. Genet. Genomics 281, 609-626 (2009) .
    • . . . CC domains were thus analysed using the MARCOIL53 program with a threshold probability of 90 (ref. 52) and double-checked using paircoil2 (ref. 54) with a P-score cut-off of 0.025 (ref. 55) . . .
  56. Sanseverino, W. PRGdb: a bioinformatics platform for plant resistance gene analysis Nucleic Acids Res. 38, D814-D821 (2010) .
    • . . . Selected genes (±1.5 kb) were searched using BLASTX against a reference R-gene set56 to find a well-characterized homologue . . .
  57. Kanehisa, M.; Goto, S.; Kawashima, S.; Okuno, Y.; Hattori, M. The KEGG resource for deciphering the genome Nucleic Acids Res. 32, D277-D280 (2004) .
    • . . . We identified 35 DM-specific genes, 11 of which are supported by similarity to entries in the KEGG database57 . . .
Expand