Introducing the Helium Data Exporter Tripal Module

Subtitle

An additional way to visualize pedigree and phenotypic data stored in a Tripal web portal.

Author Contributions

This post was written by Reynold Tan and Carolyn Caron. The module was developed by Reynold Tan and Lacey-Anne Sanderson, and tested by the entire KnowPulse Web Portal Development Team. Dr. Kirstin Bett contributed to user specifications, design, and funding of the module, and editing of this blog post. Dr. Paul Shaw graciously provided feedback for our module and blog post.

Citations

Shaw, P., Graham, M., Kennedy, J., Milne, I., & Marshall, D. (2014). Helium: visualization of large scale plant pedigrees. BMC Bioinformatics, 15(1), 259. https://doi.org/10.1186/1471-2105-15-259

Zenodo DOI

10.5281/zenodo.6611672

Post Content

There are two very common problems that researchers reach out to bioinformaticists looking for support. First, how to install a software that they just want to try out and see if it will help them with their analyses, and then, once software is installed, how can they format their data such that the software will perform as designed? In an ideal world, researchers dealing with biological data could access a web portal that caters to their field of research, along with all of the analysis tools they could ever need. Those tools would instantaneously pull the necessary data from a meticulously curated database. Thus, no software installation is needed, and no data formatting headaches either. If you are reading this blog, then you likely have heard of Tripal and how it can provide exactly all these benefits and more! However, occasionally researchers will discover software that meets a particular need very well, but unfortunately isn’t integratable within an existing web portal. Our team recently encountered this with Helium, a stand-alone pedigree visualization application developed at The James Hutton Institute in Scotland.

HELIUM AND THE HELIUM DATA EXPORTER

The Helium Pedigree Visualization System (Shaw et al. 2014) offers users an interactive tool to visualize large-scale pedigree data with abundant features, such as the ability to overlay categorical data like phenotypes, zoom in or zoom out a specific line of interest, configure visual elements to their preference and many other options that make this tool pleasingly seamless and user-friendly. Our researchers have been very excited about the potential of using Helium to enhance their breeding and research, but they were overwhelmed at the prospect of formatting pedigree data that goes back multiple decades, in addition to formatting the large volume of categorical data to go with it. Chances are high that you may have encountered difficulties formatting large datasets as input for new software, since this problem extends well beyond biological analyses. For example, user documentation may lack detailed information for what to include in each row and column, or if you transferred data between different operating systems, then your file may even be subjected to having incompatible end of line characters (see: https://datacadamia.com/data/type/text/eol).

Helium Visualization Framework Interface (https://github.com/cardinalb/helium-docs/wiki)

Over the years, our team, spearheaded by Dr. Kirstin Bett, has successfully conducted numerous experiments that have produced a significant volume of phenotypic and genotypic data. Tripal enabled us to manage such quantities of data at every stage of research through extension modules with strategic purpose. One of the modules we developed is the Raw Phenotypes module, which is designed to store, retrieve and summarize raw phenotypic measurements. Due to the uncurated nature of these data, we opted to use Tripal Custom Tables as back-end storage linked to germplasm stored in Chado.

Raw Phenotypes module showing pages for Upload, Download and Summary

The Helium Data Exporter was developed to prevent the data formatting headache holding our researchers back from using Helium by providing an automated export solution for germplasm and phenotypic data stored in our Tripal portal. In other words, this exporter acts as the bridge between the data stored by Tripal (specifically, germplasm and phenotypic data managed by the Raw Phenotypes Module) and Helium. This is the first version of this module and future implementations may extend to other potential datasets available through Tripal (e.g. curated phenotypic datasets within Chado).

For first-time users to either Helium or the Helium Data Exporter, the module’s interface provides a graphical setup guide including relevant links. Information is provided on how to get you started with installing Helium, downloading datasets of interest, and loading these datasets into Helium. This guide can be collapsed when not needed. The form field selection area prompts the user to select a specific phenotypic experiment, and includes the ability to filter for germplasm and traits. Moreover, all the options shown to users for both the germplasm field and trait field are based on the experiment selected.

Helium Data Exporter interface showing the graphical setup guide (collapsible section) and the form field selection area.

PORTING PEDIGREE AND CATEGORICAL DATA FOR VISUALIZATION USING HELIUM

Researchers can prepare pedigree and categorical data for visualization through their Tripal web portal by navigating to the Helium Data Exporter. First, select the experiment you are interested in and other subsequent fields will repopulate with experiment-specific options. In addition, the selected experiment will become an entry in the helium file as metadata for context. Germplasm and Traits fields are equipped with inline search capability to search a specific line and a checkbox to select and deselect items. The checkbox labeled Parental Relationships Only ensures that when selected, only male and female parents of a line are returned. This option is unchecked by default indicating that all germplasm relationship types (e.g. is a selection of, is a registered cultivar, etc.) are returned. Although Helium does not currently indicate a graphical representation for the type of relationship other than male and female parents, such relationships between a line can be cross-checked by examining the exported datafile for pedigree.

Germplasm relationship when exporting data with field option Parental Relationships Only is unchecked (A) and checked (B).

Next, the user’s filter criteria selections are forwarded to Tripal Download API to prepare the pedigree and categorical data files for Helium. Both files are of type tab-separated-values with the extension name of .helium. Helium requires a file with pedigree information, and an additional file with categorical information may be supplemented. Helium also allows three different datafile formats for pedigree information, and we opted for the Pedigree Germinate Format to ensure we maintained any custom relationship types. For the phenotypic data, we opted for Phenotype Format without Type Hints.

Lastly, the user simply saves the files to their computer and imports it into Helium for visualization. This is where the magic begins! The initial configuration of Helium at startup instructs users to upload a pedigree datafile (A friendly reminder that pedigree data must be loaded prior to categorical data). It is at this point, once a file has been loaded, that all panels of the software come alive with pedigree visualization and are fully interactive. Now is a good time to upload the phenotypic datafile provided by the module and thus provide valuable context to explore beautiful pedigrees for hours on end!

DISCLAIMER: We are not responsible for losing all sense of time while tinkering on Helium, leading to possible neglect of other responsibilities ;)

This is the workflow for filtering, downloading and visualizing pedigree with phenotypic data in Helium.

CONCLUDING THOUGHTS

As the volume and complexity of biological data continues to increase, visualization tools like Helium have become immensely helpful to make sense of it all for researchers. Our goal was to simplify the process of exporting data from a database to import into a standalone visualization tool our researchers were interested in. First, we eliminated the tedious process of sifting through the tangled web of pedigree information and finding the corresponding phenotypic data. Then, we took care of the error-prone process of formatting the data to meet software input requirements. We believe that in making the work easier for researchers, we are also encouraging proper data curation and re-use, which opens the door for making more exciting discoveries.

We welcome any and all feedback if you choose to utilize the Helium Data Exporter for your own Tripal site by creating an issue through Github (Helium Data Exporter). For additional information on Helium you can contact helium[at]hutton.ac.uk.