Search research reports:
Genome Sequence for the Apicomplexan Sarcocystis neurona
D. Howe, C. L. Schardl, J.C. Kissinger
Department of Veterinary Sciences
Sarcocystis neurona is a protozoan parasite that is the leading cause of neurologic disease in horses and an emerging pathogen of marine mammals. In addition, S. neurona is closely related to several important human parasites (e.g., Toxoplasma, Plasmodium). The primary goal of this project is to sequence the genome of Sarcocystis neurona, to characterize the genome sequence by comparing it to sequences from other organisms, and to make the information available to the research community. The S. neurona genome sequence will serve as a valuable resource for identifying important parasite genes, and it will allow researchers to better utilize state-of-the-art technologies and experimental approaches to investigate this pathogen. As well, the S. neurona genome sequence will be compared to the genomes from related human parasites, which may reveal valuable information about this important group of pathogens.
2011 Project Description
The basic sequencing strategy utilized shotgun and paired-end (3 kb and 8 kb span) 454 pyrosequencing of DNA fragments coupled with paired-end sequencing of clones from a large-insert (fosmid) S. neurona genomic DNA library by standard Sanger dideoxy-termination chemistry. A bulk of the sequence (~29X coverage) was generated by shotgun and paired-end 454 Titanium pyrosequencing. An additional 0.2X sequence coverage was generated by paired-end Sanger sequencing of 13,724 fosmid clones. Sequences were assembled into contigs and supercontigs (scaffolds). Transcriptome data has been generated using 454/Roche pyrosequencing (673,331 reads from merozoite and schizont stages) and Illumina (paired-end, 480 million reads from merozoite-stage parasites) platforms. All available Sanger ESTs and transcripts assembled from the next generation sequencing data have been mapped to the genome to aid annotation.
Mapped ESTs have been used to generate the required training data sets for use with the Augustus, Twinscan, GlimmerHMM, and SNAP gene finders. A preliminary BLAST-searchable database and sequence viewer database has been established and an Apollo instance has been created to display the data needed for annotation. A first-pass annotation of the genome sequence is being conducted, and phylogenomic analyses will be performed with sequences from other members of the Apicomplexa. Access to the preliminary S. neurona genome sequence database (SarcoDB) has been made available upon request to other investigators interested in the Apicomplexa.
The S. neurona genome assembled into 3193 contigs that come together into 172 scaffolds and suggests an approximate genome size of 124 Mb. Preliminary analyses based on mapping of transcriptome data to the genome suggest ~8400 transcripts plus ~2000 alternatively-spliced transcripts with an average length of ~7500bp.
Compared to the closest species with a genome sequence (T. gondii), S. neurona genes contain a similar number of introns (average 4/gene) but the average intron size is nearly twice as large at ~1400 bp. We are also observing "overlapping transcripts", a phenomenon that has been seen in the much smaller and gene-dense apicomplexan genomes of Cryptosporidium and Theileria.
This finding is surprising given the much larger size of the S. neurona genome. A cursory search of the S. neurona genome with orthologs retained in all other sequenced apicomplexan and 2 ciliate genomes (1,088 genes) revealed that 95% were detectable in S. neurona . Inspection of introns and the culled repeat sequences does not, as of yet, provide any insight into the larger genome size of S. neurona .