For this tutorial, we will later import a set of genes, and their associated mRNA, CDS, UTRs, etc. Tripal requires that an analysis be associated with all imported features. This has several advantages, including:
- The source of features (sequences) can be traced. Even for features simply downloaded from a database, someone else can see where the features came from.
- Provides a mechanism for describing how the features were created (e.g. whole genome structural and functional annotation description)
- The analysis links all of the features together which can be useful for querying for specific features from an analysis.
To create an analysis for loading our genomic data, navigate to the Add content and click on the link: Analysis
The analysis creation page will appear:
Here you can provide the necessary details to help others understand the source of your data. For this tutorial, enter the following:
- Analysis Name: Whole Genome Assembly and Annotation of Citrus Sinensis (JGI)
- Program, Pipeline Name or Method Name: Performed by JGI
- Program, Pipeline Name or Method Name: v1.0
- Source Name: JGI Citrus sinensis assembly/annotation v1.0 (154)
- Source URI: http://www.phytozome.net/citrus.php
- Time Executed: Feb 1, 2011
- Materials and Methods: (if using CKEditor, click the 'Source' button before pasting)
<p> <strong><em>Note: </em>The following text comes from phytozome.org:</strong></p> <p> <u>Genome Size / Loci</u><br /> This version (v.1) of the assembly is 319 Mb spread over 12,574 scaffolds. Half the genome is accounted for by 236 scaffolds 251 kb or longer. The current gene set (orange1.1) integrates 3.8 million ESTs with homology and ab initio-based gene predictions (see below). 25,376 protein-coding loci have been predicted, each with a primary transcript. An additional 20,771 alternative transcripts have been predicted, generating a total of 46,147 transcripts. 16,318 primary transcripts have EST support over at least 50% of their length. Two-fifths of the primary transcripts (10,813) have EST support over 100% of their length.</p> <p> <u>Sequencing Method</u><br /> Genomic sequence was generated using a whole genome shotgun approach with 2Gb sequence coming from GS FLX Titanium; 2.4 Gb from FLX Standard; 440 Mb from Sanger paired-end libraries; 2.0 Gb from 454 paired-end libraries</p> <p> <u>Assembly Method</u><br /> The 25.5 million 454 reads and 623k Sanger sequence reads were generated by a collaborative effort by 454 Life Sciences, University of Florida and JGI. The assembly was generated by Brian Desany at 454 Life Sciences using the Newbler assembler.</p> <p> <u>Identification of Repeats</u><br /> A de novo repeat library was made by running RepeatModeler (Arian Smit, Robert Hubley) on the genome to produce a library of repeat sequences. Sequences with Pfam domains associated with non-TE functions were removed from the library of repeat sequences and the library was then used to mask 31% of the genome with RepeatMasker.</p> <p> <u>EST Alignments</u><br /> We aligned the sweet orange EST sequences using Brian Haas's PASA pipeline which aligns ESTs to the best place in the genome via gmap, then filters hits to ensure proper splice boundaries.</p>
Note:: Above we entered HTML. This is not the easiest way to enter text, but makes it simple for this tutorial. When the ckeditor module is installed and properly setup the user is provided with editor tools that makes it much easier to add text to any page.
After saving, you should have the following analysis page: