Importing Publications

Page Sections

Tripal provides an interface for automatically and manually adding publications. First we will manually add a new publication. To do this, we must first enable the Tripal Pub module. We have previously used Drush to install modules in this tutorial and the commands to install the Tripal Pub module are similar. The Tripal Contact module is a dependency of the Tripal Pub module, so we must enable both:

cd /var/www/html
drush pm-enable tripal_contact tripal_pub

You will notice that two jobs were submitted. These jobs will load a contact and publication ontology. The Tripal Contact and Pub ontologies are custom vocabularies used for organizing information about publications and contact information. So, before we can add publications (or contacts) we need to run these jobs:

cd /var/www/html
drush trp-run-jobs --user=administrator

Note: Always remember to set permissions for any new modules that are installed.

Manually Adding a Publication

Now that the Tripal publication and contact ontologies are loaded we can add publications. First, we will manually add a publication. Click the Add Content link in the administrative menu and then Publication.

Tripal2.0 pub create.png

We will add information about the Tripal publication. Enter the following values:

  • Publication Title: Tripal v1.1: a standards-based toolkit for construction of online genetic and genomic databases.
  • Publication Type: Journal Article
  • Publication Year: 2013
  • Citation: Sanderson LA, Ficklin SP, Cheng CH, Jung S, Feltus FA, Bett KE, Main D. Tripal: a construction Toolkit for Online Genome Databases. Database, Oct 25 2013. bat075

To further describe the publication we will add all other details as properties. Select the property in the drop-down, add the text and click the add button for each of the following properties:

  • Journal Name: Database
  • Abstract: Tripal is an open-source freely available toolkit for construction of online genomic and genetic databases. It aims to facilitate development of community-driven biological websites by integrating the GMOD Chado database schema with Drupal, a popular website creation and content management software. Tripal provides a suite of tools for interaction with a Chado database and display of content therein. The tools are designed to be generic to support the various ways in which data may be stored in Chado. Previous releases of Tripal have supported organisms, genomic libraries, biological stocks, stock collections and genomic features, their alignments and annotations. Also, Tripal and its extension modules provided loaders for commonly used file formats such as FASTA, GFF, OBO, GAF, BLAST XML, KEGG heir files and InterProScan XML. Default generic templates were provided for common views of biological data, which could be customized using an open Application Programming Interface to change the way data are displayed. Here, we report additional tools and functionality that are part of release v1.1 of Tripal. These include (i) a new bulk loader that allows a site curator to import data stored in a custom tab delimited format; (ii) full support of every Chado table for Drupal Views (a powerful tool allowing site developers to construct novel displays and search pages); (iii) new modules including 'Feature Map', 'Genetic', 'Publication', 'Project', 'Contact' and the 'Natural Diversity' modules. Tutorials, mailing lists, download and set-up instructions, extension modules and other documentation can be found at the Tripal website located at http://tripal.info. DATABASE URL: http://tripal.info/.
  • Publication Date: 2013 Oct 25
  • Authors: Sanderson LA, Ficklin SP, Cheng CH, Jung S, Feltus FA, Bett KE, Main D

Next, to link this publication to it's record in PubMed we need to add an entry in the section titled External References. Add the following

  • Database: PMID
  • Accession: 24163125
  • Version: leave blank

Click the Add button to add the external references and to complete the page click the Save button at the bottom

Our publication has been added and you should see the following page:

Tripal2.0 pub new.png

Now we have a publication page, but the title links to the PubMed page for the article. If we want to change this link to be at the online journal. We can edit the publication by clicking the Edit link and adding a new property of type URL with the value:

http://database.oxfordjournals.org/content/2013/bat075.long

After saving the page, the title is now linked to the article on the Journal site rather than the PubMed site. However, the link to PubMed is still found under the Cross References link.

Searching for Publications

By default, Tripal provides simple search tools for many data types (e.g. organisms, analyses, features, etc). These can be found in the menu under Search Data. To search for publications, click the Publications link under Search Data.

On the search form, clicking the search button without providing any criteria will provide a list of all publications. For this tutorial, we only have a single publication:

Tripal2.0 pub search.png

However, you will notice that if you try to select a criteria that nothing is available. Tripal allows you to set which fields a user can use as criteria. In some cases not all fields will be appropriate given the publications available on the site. All of the properties available when adding a publication can be searched, but some properties like the URL may not be necessary for searching. You can specify which fields to use for search criteria by clicking the Publication Module Settings Page in the administrator information box just above the search form. On the resulting page, scroll until you will see the section titled Searching Options:

Tripal2.0 pub search options.png

Here you can select which properties a user can use for searching. For this tutorial, find and check these options:

  • Abstract
  • Authors
  • Journal Name
  • Title

Then click the Save configuration button at the bottom. If we return to the publication search page, we now have criteria for searching.

Import of Publications

Tripal supports importing of publications from remote databases such as NCBI PubMed and the USDA National Agricultural Library (AGL). Support of PubMed is built-in to the Tripal module, but support for AGL requires some additional setup on the server. You can find instructions for preparing the server for AGL on the TripalChado ModulesPublicationsHelp page. For this tutorial we will create an importer for PubMed.

Creation of an importer is an administrative function. A publication importer is created by the site administrator and consists of a set of search criteria for finding multiple publications at one time. When the importer is run, it will query the remote database, retrieve the publications that match the criteria and add them to the database. Because we loaded genomic data for Citrus sinensis we will create an importer that will find all publications related to this species.

First, navigate to TripalChado ModulesPublicationsPublication Importers and click the link New Importer. You will see the following page:

Tripal2.0 pub new importer.png

Enter the following values in the fields:

  • Remote Database: PubMed
  • Loader Name: Pubs for Citrus sinensis
  • Criteria #1:
    • Scope: Abstract/Title
    • Search Terms: Citrus sinensis
    • is Phrase?: checked

Now, click the 'Test Importer' button. This will connect to PubMed and search for all publications that match our provided criteria. On the date this portion of the tutorial was written, 532 publications were found:

Tripal2.0 pub new importer test.png

Now, save this importer. You should see that we have one importer in the list:

Tripal2.0 pub importer list.png

We can use this importer to load all 532 publications related to Citrus sinensis from PubMed into our database (how to load these will be shown later). However, what if new publications are added? We would like this importer to be run monthly so that we can automatically add new publications as they become available. But we do not need to try to reload these 532 again. So, we will create a new importer that only finds publications within the last 30 days. To do this, click the link New Importer. Now, add the following criteria:

  • Remote Database: PubMed
  • Loader Name: Pubs for Citrus sinensis last 30 days
  • Days since record modified: 30
  • Criteria #1:
    • Scope: Abstract/Title
    • Search Terms: Citrus sinensis
    • is Phrase?: checked

Now, when we test the importer we find only 1 publications that have been add (created) in PubMed in the last 30 days:

Tripal2.0 pub new importer test30.png

Save this importer.

Next, there are two ways to import these publications. The first it to manually import them. There is a Drush command that is used for importing publications. Return to the terminal and run the following command:

cd /var/www/html
drush trp-import-pubs --username=administrator

You should see output to the terminal that begins like this:

NOTE: Loading of publications is performed using a database transaction. 
If the load fails or is terminated prematurely then the entire set of 
insertions/updates is rolled back and will not be found in the database

Importing: Pubs for Citrus sinensis

The importer will import 100 publications at a time and pause between each set of 100 as it requests more.

Some things to know about the publication importer:

  1. The importer keeps track of publications from the remote database using the publication accession (e.g. PubMed ID).
  2. If a publication with an accession (e.g. PubMed ID) already exists in the local database, the record will be updated.
  3. If a publication in the local database matches by title, journal and year with one that is to be imported, then the record will be updated. You can change the requirement of which fields to match at the TripalChado ModulesPublicationsSettings page. On the settings page, look for the Import Settings section.

The second way to import publications is to add an entry to the UNIX cron. We did this previously for the Tripal Jobs management system when we first installed Tripal. We will add another entry for importing publications. But first, now that we have imported all of the relevant pubs, we need to return to the importers list at TripalChado ModulesPublicationsPublication Importers and disable the first importer we created. We do not want to run that importer again, as we've already imported all historical publications on record at PubMed. Click the edit button next to the importer named Pubs for Citrus sinensis, click the disable checkbox and then save the template. The template should now be disabled.

Tripal2.0 pub impoter disabled.png

Now we have the importer titled Pubs for Citrus sinensis last 30 days enabled. This is the importer we want to run on a monthly basis. The cron entry will do this for us. On the terminal open the crontab with the following command:

sudo crontab -e

Now add the following line to the bottom of the crontab:

30 8 1,15 * *  su - www-data -c '/usr/local/drush/drush -r /var/www/html -l http://[site url] trp-import-pubs --report=[your email] > /dev/null'

Where

  • [site url] is the full URL of your site
  • [your email] is the email address of the user that should receive an email containing a list of publications that were imported. You can separate multiple email addresses with a comma.

The cron entry above will launch the importer at 8:30am on the first and fifteenth days of the month. We will run this importer twice a month in the event it fails to run (e.g. server is down) at least one time during the month.