extensions

i5k Workspace@NAL: https://i5k-stage-node1-cbo.nal.usda.gov Registrations

i5k Workspace@NAL

i5k Workspace@NAL: https://i5k.nal.usda.gov Registrations

i5k Workspace@NAL

Introducing the Helium Data Exporter Tripal Module Technical Blog Post

There are two very common problems that researchers reach out to bioinformaticists looking for support. First, how to install a software that they just want to try out and see if it will help them with their analyses, and then, once software is installed, how can they format their data such that the software will perform as designed? In an ideal world, researchers dealing with biological data could access a web portal that caters to their field of research, along with all of the analysis tools they could ever need. Those tools would instantaneously pull the necessary data from a meticulously curated database. Thus, no software installation is needed, and no data formatting headaches either. If you are reading this blog, then you likely have heard of Tripal and how it can provide exactly all these benefits and more! However, occasionally researchers will discover software that meets a particular need very well, but unfortunately isn’t integratable within an existing web portal. Our team recently encountered this with Helium, a stand-alone pedigree visualization application developed at The James Hutton Institute in Scotland. 

 

HELIUM AND THE HELIUM DATA EXPORTER

The Helium Pedigree Visualization System (Shaw et al. 2014) offers users an interactive tool to visualize large-scale pedigree data with abundant features, such as the ability to overlay categorical data like phenotypes, zoom in or zoom out a specific line of interest, configure visual elements to their preference and many other options that make this tool pleasingly seamless and user-friendly. Our researchers have been very excited about the potential of using Helium to enhance their breeding and research, but they were overwhelmed at the prospect of formatting pedigree data that goes back multiple decades, in addition to formatting the large volume of categorical data to go with it. Chances are high that you may have encountered difficulties formatting large datasets as input for new software, since this problem extends well beyond biological analyses. For example, user documentation may lack detailed information for what to include in each row and column, or if you transferred data between different operating systems, then your file may even be subjected to having incompatible end of line characters (see: https://datacadamia.com/data/type/text/eol). 

 

Helium Visualization Framework Interface (https://github.com/cardinalb/helium-docs/wiki)

 

Over the years, our team, spearheaded by Dr. Kirstin Bett, has successfully conducted numerous experiments that have produced a significant volume of phenotypic and genotypic data. Tripal enabled us to manage such quantities of data at every stage of research through extension modules with strategic purpose. One of the modules we developed is the Raw Phenotypes module, which is designed to store, retrieve and summarize raw phenotypic measurements. Due to the uncurated nature of these data, we opted to use Tripal Custom Tables as back-end storage linked to germplasm stored in Chado.

 

Raw Phenotypes module showing pages for Upload, Download and Summary

 

The Helium Data Exporter was developed to prevent the data formatting headache holding our researchers back from using Helium by providing an automated export solution for germplasm and phenotypic data stored in our Tripal portal. In other words, this exporter acts as the bridge between the data stored by Tripal (specifically, germplasm and phenotypic data managed by the Raw Phenotypes Module) and Helium. This is the first version of this module and future implementations may extend to other potential datasets available through Tripal (e.g. curated phenotypic datasets within Chado).

For first-time users to either Helium or the Helium Data Exporter, the module’s interface provides a graphical setup guide including relevant links. Information is provided on how to get you started with installing Helium, downloading datasets of interest, and loading these datasets into Helium. This guide can be collapsed when not needed. The form field selection area prompts the user to select a specific phenotypic experiment, and includes the ability to filter for germplasm and traits. Moreover, all the options shown to users for both the germplasm field and trait field are based on the experiment selected.

 

Helium Data Exporter interface showing the graphical setup guide (collapsible section) and the form field selection area.

 

PORTING PEDIGREE AND CATEGORICAL DATA FOR VISUALIZATION USING HELIUM

Researchers can prepare pedigree and categorical data for visualization through their Tripal web portal by navigating to the Helium Data Exporter. First, select the experiment you are interested in and other subsequent fields will repopulate with experiment-specific options. In addition, the selected experiment will become an entry in the helium file as metadata for context. Germplasm and Traits fields are equipped with inline search capability to search a specific line and a checkbox to select and deselect items. The checkbox labeled Parental Relationships Only ensures that when selected, only male and female parents of a line are returned. This option is unchecked by default indicating that all germplasm relationship types (e.g. is a selection of, is a registered cultivar, etc.) are returned. Although Helium does not currently indicate a graphical representation for the type of relationship other than male and female parents, such relationships between a line can be cross-checked by examining the exported datafile for pedigree.

Germplasm relationship when exporting data with field option Parental Relationships Only is unchecked (A) and checked (B). 

 

Next, the user’s filter criteria selections are forwarded to Tripal Download API to prepare the pedigree and categorical data files for Helium. Both files are of type tab-separated-values with the extension name of .helium. Helium requires a file with pedigree information, and an additional file with categorical information may be supplemented. Helium also allows three different datafile formats for pedigree information, and we opted for the Pedigree Germinate Format to ensure we maintained any custom relationship types. For the phenotypic data, we opted for Phenotype Format without Type Hints.

Lastly, the user simply saves the files to their computer and imports it into Helium for visualization. This is where the magic begins! The initial configuration of Helium at startup instructs users to upload a pedigree datafile (A friendly reminder that pedigree data must be loaded prior to categorical data). It is at this point, once a file has been loaded, that all panels of the software come alive with pedigree visualization and are fully interactive. Now is a good time to upload the phenotypic datafile provided by the module and thus provide valuable context to explore beautiful pedigrees for hours on end! 

DISCLAIMER: We are not responsible for losing all sense of time while tinkering on Helium, leading to possible neglect of other responsibilities ;)
 

This is the workflow for filtering, downloading and visualizing pedigree with phenotypic data in Helium.

 

CONCLUDING THOUGHTS

As the volume and complexity of biological data continues to increase, visualization tools like Helium have become immensely helpful to make sense of it all for researchers. Our goal was to simplify the process of exporting data from a database to import into a standalone visualization tool our researchers were interested in. First, we eliminated the tedious process of sifting through the tangled web of pedigree information and finding the corresponding phenotypic data. Then, we took care of the error-prone process of formatting the data to meet software input requirements. We believe that in making the work easier for researchers, we are also encouraging proper data curation and re-use, which opens the door for making more exciting discoveries. 

We welcome any and all feedback if you choose to utilize the Helium Data Exporter for your own Tripal site by creating an issue through Github (Helium Data Exporter). For additional information on Helium you can contact helium[at]hutton.ac.uk.

Job Openings at UTenn Announcement
Job Openings at WSU Announcement
Join us for the 2021 Tripal CodeFest! Basic page

Tripal Codefest 2021

Calling all Tripal Core, Extension Module and Tool Integration Developers!

When: Jan 11-15 2021

Cost:  Its free to participate!

Registration:  Online Registration Form

WhereGatherTown room (please register to receive a link to the room)

Who: Anyone interested in developing core Tripal, extension modules, or integration with Tripal or Tripal dependencies (i.e. Chado). We openly welcome anyone from the GMOD community or other open-source developers.

 

Schedule

Prior to the Event

  • Teams will organize with a team leader to focus on a topic
  • Team leaders will organize the group during December to optimize coding time.
  • Participants should block out time on their calendars during the week. Please try to reserve at least 8 hours or more but any amount is welcome!

January 11th Kick off Meetings 

The kick off meeting is meant to officially kick off the Codefest, answer questions, familiarize our selves with the schedule and conduct any necessary business.  The meeting begins at 17:00 UTC (view for your local area)

  •  

Jan 11-15 Schedule

The schedule is not fully completed. Check back closer to the event for a full schedule of official team meetings and times.

January 15th Wrap up Meetings 

The wrap up meeting is meant to showcase what we've done.  The exact time will be set during the Kickoff meeting on Jan 11th.

The following topics currently have group leaders. 

  • Breeding API (BrAPI) -- Upgrade Tripal BrAPI module to current version of BrAPI (https://github.com/tripal/brapi)
  • Phenotying Filtering -- Make existing MGIS module generic and cooperate with BrAPI - See "Morpholigical descriptor > Selection interface/Display graphical interface" at https://www.crop-diversity.org/mgis/accession-search
  • Glyphs for gene/mRNA pages -- Improve the appearance, usability, and availability of information of default gene pages
  • Tripal Galaxy -- Improvements to the Galaxy project integration with Tripal (https://github.com/tripal/tripal_galaxy)
  • Chado -- Improve efficiency for storing some data types (e.g. Genotyping, Phenotyping) using ElasticSearch style functionality in PostgreSQL.
  • Tripal v4 Core Development -- Continue development of Tripal v4 to support Drupal 8 and 9 (https://github.com/tripal/t4d8)

 

Join us for the 2022 Tripal CodeFest! Basic page

Tripal CodeFest 2022

POSTPONED: 

The 2022 Codefest has been postponed.  Updated scheduling information will be sent via the Tripal Mailing List and on the Slack workspace.

--

Calling all Tripal Core, Extension Module, and Tool Integration Developers!

When: Jan 26-27 2022. 9-5pm in your time zone.
Cost:  It is free to participate!
Registration:  Registration is still open!  Please fill out the Online Registration Form to attend. 
WhereGatherTown room (please register to receive a link to the room)
Who: Anyone interested in developing core Tripal, extension modules, or integration with Tripal or Tripal dependencies (i.e. Chado). We openly welcome anyone from the GMOD community or other open-source developers.
 

Currently Scheduled Topics

The following topics are currently being organized for the meeting. Consider joining one of these during registration or, if you'd like to lead your own topic, please indicate it during registration.

Group Topic Description Group Lead
Implementation of Tripal v4 Fields The Entity and Fields interfaces are ready and the next step in the development of Tripal v4 is the creation of specific fields.  For example the organism, publication, gene fields to name a few. Fields form the basis for all data organization including page displays, web services and searching. Josh Burns / Sean Buehler
Production Docker Tripal v4 A Tripal v4 Docker image can help reduce the complexity for the installation of Tripal and its dependencies (e.g., PostgreSQL, Apache, PHP, etc.). Additionally, some users would like a production-level image that is easy to port and maintains.  This group will work towards the development of that resource. Stephen Ficklin
Chado Updates Chado is the standard on which data is housed in Tripal.  In order for Tripal to support newer and emerging data types new tables, or changes to tables are necessary.  This group will focus on providing suggestions for those updates as well as working with the Chado management committee towards updating Chado. Lacey Sanderson

Schedule

Prior to the Event

  • Teams will organize with a team leader to focus on a topic
  • Team leaders will organize the group during December to optimize coding time.
  • Participants should block out time on their calendars during the week. Please try to reserve at least 8 hours or more but any amount is welcome!

During the Event

Because this is an online virtual event, attendees will be in different time zones during the event.  Group leaders will organize group-specific events but there are two event-wide events. 

  • Kick-off Meeting:  Jan 26th at 9am Pacific; 11am Regina; 12pm Eastern; 1pm Atlantic; 6pm European Central in Gathertown
  • Wrap-up Meeting:  Jan 27th at 1pm Pacific; 3pm Regina; 4pm Eastern; 5pm Atlantic; 10pm European Central in Gathertown.

Currently, attendees who have registered are all from North America therefore the above meeting times are set to accommodate those time zones. However, meeting times could change other time zones if needed.

 

Kiwifruit Genome Database Sites Using Tripal
KnowPulse Sites Using Tripal
KnowPulse: https://knowpulse.usask.ca Registrations

KnowPulse