Overview
Early Online Access to the Assembled Peach Genome for
Browsing and BLASTing
The
International Peach Genome Initiative (IPGI)
would like to welcome you to the early online access to the draft
assembled and annotated peach genome (peach v1.0). Rest assured,
despite the fact that the genome is being made available on April 1, 2010,
this is no joke! Before we talk more about the genome itself, let's do
some housekeeping regarding data access.
As a public service, and in agreement with the
Fort Lauderdale agreement,
the peach genome is being made available by IPGI prior to peer-reviewed
publication of the data. IPGI and its partners are making this data
available with the expectation and desire to publish this data in a
reasonable time without preemption by other groups.
By accessing these data, you agree not to publish any articles containing
analyses of genes or genomic data on a whole genome or chromosome scale
prior to publication by IPGI and/or its collaborators of a comprehensive
genome analysis ("Reserved Analyses").
"Reserved analyses" include the identification of complete (whole genome)
sets of genomic features such as genes, gene families, regulatory elements,
repeat structures, GC content, or any other genome feature, and whole-genome
or chromosome- scale comparisons with other species.
If you are interested in collaboration on one of these topics involving
the peach genome, please contact one of the project coordinators.
Work towards the publication of the peach genome is underway, and we plan
to submit a manuscript in the coming months.
If you will be employing the data for non-reserved analyses, such as cloning
a gene of interest or to analyze a gene family etc., please feel free to
do so, we only ask that you reference the International Peach Genome
Initiative as your citation.
One more disclaimer - peach v1.0 represents an initial draft of the
assembled genome. While we believe peach v1.0 is a very high quality
plant genome, we are aware that it contains both known and unknown errors
and discrepancies that will be addressed in upcoming releases of the genome.
For instance, we are aware of a few minor situations where the
sequences have been correctly assigned to a location, but the orientation
is in question. We hope and believe that any problems that arise from these
discrepancies are compensated by the rapid release of the data.
If you believe that you have identified a discrepancy in the data,
please contact IPGI at
peachgenome@bioinfo.wsu.edu
and we will be sure to address your concerns in an upcoming release.
History
Now, back to the interesting part - questions and answers regarding peach
v1.0 and how we got to this point. At the Plant and Animal Genome XV
Meeting on 01/16/07, Jerry Tuskan from the Joint Genome Institute (JGI)
announced plans to sequence the peach genome.
Since then, an international consortium (IPGI) coalesced to do the work
cooperatively. This consortium, under the direction of Drs Bryon Sosinski,
Ignazio Verde and Daniel Rokhsar, includes numerous researchers from
countries around the globe including the US, Italy (Drupomics),
Spain and Chile. The specific roles of the participants will be outlined
in the publication of the peach genome.
Background
Peach (Prunus persica) is considered one of the genetically most well
characterized species in the Rosaceae, and it has distinct advantages
that make it suitable as a model genome species for Prunus as well as for
other species in the Rosaceae. While some Prunus species, such as
cultivated plums and sour cherries, are polyploid, peach is a diploid with
n = 8 and has a comparatively small genome currently estimated to be
~220-230 Mbp based upon the peach v1.0
assembly. Peach has a relatively short juvenility period of 2-3 years
compared to most other fruit tree species that require 6-10 years.
In addition, a number of genes for fundamentally important traits have
been genetically described in peach, including genes controlling flower
and fruit development, tree growth habit, dormancy, cold hardiness,
and disease and pest resistance.
Annotation and repeats
Transcript assemblies were constructed using PASA from Prunus persica
ESTs (~88K) and ESTs of related species (~424K). Loci were
determined by BLAT alignments of above transcript assemblies and/or
BLASTX alignments of peptides from arabi (Arabidopsis thaliana), rice,
soybean, grape and poplar peptides to repeat-soft-masked P. persica genome.
Gene models were predicated by homology-based predictors, mainly by
FGENESH+ with the addition of GenomeScan if FGENESH+ produced no model at
the locus. Predicted genes were UTR-extended and/or improved by PASA.
Final gene set was made from gene selection based on ESTs support or
peptide homology support subjected to filtering of repeats/transposable
elements.
The relative small genome size is probably due to the fact that only
62.3Mb (27.4%) of the genome is constituted by repeats. Comparison of the
genome with the plant section of RepBase disclosed that only 19.7Mb (8.7%)
are similar to other plant repeats, while de novo identification
of repeats within peach proved that a good portion of repeats is present
in high number: 40.3Mb (17.7%) were the result of a repeats detection with
ReAS successively projected onto the peach genome and 27.4Mb (12.1%),
detected with LTR_finder, are formed exlusively by LTR transposable
elements.
Statistics
Genome Size
Approximately 227.3 Mb arranged in 202 scaffolds
Approximately 224.6 Mb arranged in 2730 contigs (~ 1.2% gap)
Scaffold N50 (L50) = 4 (26.8 Mbp)
Contig N50 (L5) = 294 (214.2 Kbp)
21 scaffolds larger than 50 Kbp, with 99.4% of the genome in scaffolds
larger than 50 Kbp
Loci
27852 loci containing protein-coding genes
Transcripts
28689 protein-coding transcripts
Genome Facts
Peach v1.0 was generated from DNA from the doubled haploid cultivar 'Lovell'
which means that the genes and intervening DNA is "fixed" or identical for
all alleles and both chromosomal copies of the genome. This doubled haploid
nature was confirmed by the evaluation of >200 SSRs, and has facilitated a
highly accurate and consistent assembly of the peach genome.
Peach v1.0 currently consists of 8 pseudomolecules (scaffolds) representing
the 8 chromosomes of peach, and are numbered according to their
corresponding linkage groups. The genome sequencing consisted of
approximately 7.7 fold whole genome shotgun sequencing employing the
accurate Sanger methodology, and was assembled using Arachne.
The assembled peach scaffolds cover nearly 99% of the peach genome,
with over 92% having confirmed orientation. To further validate the quality
of the assembly, 74,757 Prunus ESTs were queried against the genome at 90%
identity and 85% coverage, and we found that only ~2% were missing.
This is truly a high quality genome! Gene prediction and annotation,
is an ongoing process that may take years to complete, but current
estimates indicate that peach has a typical plant gene repertoire of
approximately 35,000 protein coding sequences.
Peach genome browsers are available at
JGI
and the
Genome Database for Rosaceae,
while the Italian version is hosted at the
Istituto di Genomica Applicata (IGA).
Access to the raw sequence data is provided via the GBrowse link at the
top of this page.
Once again, welcome to peach v1.0!
On behalf of IPGI and its collaborators,
Bryon Sosinski, NC State University (sosinski AT ncsu.edu)
Ignazio Verde, Consiglio per la Ricerca e la Sperimentazione in Agricoltura
(ignazio.verde AT entecra.it)
Daniel Rokhsar, DOE Joint Genome Institute (dsrokhsar AT gmail.com)