Tripal User's Meeting @ San Diego 2017-01-13

Meeting Date: 
Friday, January 13, 2017
Attendees: 
  • Lacey-Anne Sanderson (both days)

  • Katheryn Buble (both days)

  • Valentin Guignon (both days)

  • Nic Herndon (both days)

  • Emily Grau (both days)

  • Eric Wafula (both days, beginner)

  • Prakash Timilsena (both days, beginner)

  • Meg Staton (meeting only)

  • Abdullah Almsaeed (both days)

  • Ming Chen (both days)

  • Steven Cannon (day 1 and first half of day 2)

  • Eliot Cline (both days)

  • Sudhansu Dash (day 1 and first half of day 2; beginner)

  • Chris Childers (both days)

  • Ethy Cannon (1 ½ days)

  • Qiaoshan Lin (both days)

  • Sofia Robb (both days)

  • Taein Lee (both days)

  • Victor Unda (both days)

Agenda: 

Friday, January 13

8am - 12pm Round-table Discussion

12pm - 1pm LUNCH

1pm - 5pm Free Collaboration, continued Round-table Discussion

 
  • Tripal 3 update
  • Full stable release: May 2017
  • Just about finished main core functionality done
  • Uses the semantic web and controlled vocabularies to exchange data between Tripal sites as well as to expose it to the outside world
  • Uses controlled vocabularies for everything: column names are given cvterms as well as relationships, etc.
  • Content created by type (ie: gene) rather than by chado table (ie: feature)
  • All chado content is now fields which allows you to use the web interface to reorder and change display rather than needing a php template
  • What you see on the page is what is available on your web services
  • Thus your web services are tailored to your site
  • Beautiful upgrade process that allows you to upgrade immediately while keeping your nodes. Then you can upgrade on a per content type basis
  • You can also use your old templates on entities
  • Tripal v2.3
  • Victor has started unit testing module which will allow much quicker releases
  • Data type evolution: dealing with NGS type of data with Tripal and what about Chado? (Valentin)
  • Includes: Genotype analysis and viewing tools: use cases and survey of tools now under development (Ethy)

Challenges:

  • How to pick sets of germplasm efficiently
  • Core data storage and extraction problem
  • How to display the data to the user: zoomed in, haplotype viewer, etc.
  • Speed, avoiding server overload (can some analysis be done on client side?)
  • Valentin’s team stores its NGS data in MongoDB
  • Uses tomcat on top of MongoDB
  • https://www.crop-diversity.org/mgis/gigwa
  • Genotypic data associated with markers
  • Lacey uses PostgreSQL (tested 5 billion: 5 million SNPs x 1000 germplasm)
  • Ethy: MaizeGDB
  • Mini, incomplete slides on other tools here: https://docs.google.com/presentation/d/1TVt74oQi3EON6DtQlcjZDirJpw5R53_J...
  • HDF5 in the backend (TASSEL)
  • MaizeGDB SNPversity is still having troubles with some of the larger jobs completing
  • Flapjack is great but is a standalone java application
  • Support for VCF is good (easier for researchers)
  • VCF export? So far, not wanted, prefer matrix formats now, but need is likely 
  • Tripal Job launcher (Ethy)
  • Can’t limit the number of parallel jobs
  • This one shouldn’t be too difficult
  • Drupal module cron queue
  • Would be nice if job launcher could send jobs to a separate system
  • Would be nice for the job launcher to respect priority
  • Need a lot more fine-grained control about which jobs can be run in parallel and which can not
  • Cannot cancel jobs through the job interface
  • Job log is not extremely helpful -date/time for when messages are logged
  • About 5 are using the drush daemon
  • Tripal Elasticsearch launches multiple queues with
  • QueueUI
  • UltimateCron
  • Import/export module: extend Tripal Download API? (Valentin)
  • Sofia uses perl to write SQL
  • Ethy uses perl
  • MainGroup lab has a PHP chado loader https://www.rosaceae.org/mcl 
  • RNAseq has it’s own internal loader in php
  • Data checking as you go, perl interactive loaders
  • Also need to ability to share these loaders with other groups
  • Perhaps it would be useful for the API to be a PHP library/class-based so it would not need drupal
  • Mention this on the mailing list
  • Some people have a Tripal site solely for loading data, not for content display
  • Tripal cv loader fails for GO.
  • Probably due to a chado stored procedure
  • Disable stored procedure then run chado’s perl set-cvtermpath script by hand? ISU will test.
  • Tree Visualization of multiple species relationships using the organism and phylo tables (Chris)
  • There is a newick file loader in Tripal 2.1 that populates the phylo* tables
  • This one will pull lineage from NCBI taxonomy; documentation embedded in bulk loader documentation. Taxonomy/Organism linker (http://tripal.info/node/109 )
  • Can be used to store taxonomy tree
  • Has a visualize of a tree
  • LegumeInfo phylotree module here.
  • This one more based on gene families
  • How indices of phylonodes are computed: http://archive.oreilly.com/pub/a/network/2002/11/27/bioconf.html
  • And have look to the picture there: http://archive.oreilly.com/pub/a/network/2002/11/27/bioconf.html?page=2
  • Tripal 3 and entity permissions based on drupal users and roles (Sofia)
  • Will be very similar but will be on an entity-bases
  • Would like the permissions moved as well **
  • Will it handle permissions on a per-node basis?
  • Need documentation on how to do this in Tripal 3
  • Some people are using Organic groups
  • Galaxy module and Data Exchange
  • Trying to integrate Tripal with Galaxy so that users can see a Tripal interface but run galaxy workflows
  • Uses webforms to create the interface for workflows
  • Module queries galaxy and creates the webform for it
  • Then admin can go in and tweak various things like defaults and help text
  • Parameters can be re-arranged and grouped, etc.
  • BLAST Module https://github.com/tripal/tripal_blast
  • How many people are using it?
  • Is it meeting your needs?
  • BLAST module new feature: filter database list by organism. (Sofia)
  • Overview of BLAST at PeanutBase and LegumeInfo here.
  • Slides also show CViTjs
  • Shows features of a gff3 file in the context of the whole genome
  • blast https://docs.google.com/presentation/d/1iKnHgVyeGWe2pE2OFrjTD_FHb7yTIGslWGiPqQc5qX4/edit?usp=sharing
  • BLAST at KnowPulse (current core module)
  • Consumption of Web services planned, especially of CoGe, which has a nice REST API and a way of grouping target databases that are of interest to your users.
  • Question from LegFed meeting: if target is a set of genomes, could multiple genomes be displayed on multiple instances of CViTjs?
  • Meta data submission system (Chris)
  • Drupal forms to control submission of metadata
  • Data is not in chado
  • Chado Multi-chado
  • Needs reviews on Drupal.org --Please help by reviewing this module!
  • Supports multiple chado databases attached at the same time
  • Each user/session can only access one chado at a time
  • Elastic search
  • Site-wide search -much faster than Drupal views
  • Dorries group wants to be able to use it with multiple sites on the same server
  • Now available
  • Expression module
  • Creates biomaterials (reflects NCBIs concept)
  • Separately you can load in your gene expression data -beautiful heatmap visualization
  • Has an independent page where you enter a number of genes
  • Builds a two dimensional heatmaps
  • basket/cart functionality would be helpful
  • Does anyone want biomaterials separate from expression?
  • Search by normalized values above or below a threshold
  • Might want to store p-values
  • Sofia volunteered to be a tester for search functionality
  • Hackathon produced an implementation of the Tripal Download API to download the expression values for a given node
  • New functionality for blast analysis and interpro and upgraded the go module
  • Basket/Cart functionality
  • What are examples of existing carts on biological (or not) sites that people like?
  • Valentin - how is flag module working out? Should we continue to leverage that?
  • Flag module - working well with Musa to hold stock and 3 other entities, can apply actions on cart
  • Long term goal: workspace with multiple user-provided and db-source datasets
  • Think about a chart focused on entities rather then nodes so that it works with Tripal 3
  • Do we want to ensure a cart can only contain one subtype (ie: only features (Tripal2) or only genes (Tripal3))?
  • What do you love/hate about Tripal?
    • Difficult to become proficient
    • Drupal can be challenging to learn.
    • Very Large (improved in Drupal8/Tripal 3?)
    • Reusability of code and functionality
    • Customizable
    • Need a sustainability plan
    • NSF research coordination network grant
    • Stephen has ideas from DIBBS meeting about long term funding goals for NSF
    • Chado (several versions) instantiation/update made easy
    • Online documentation
    • perhaps develop documentation standards for Tripal modules.
    • currently have tripal.info - could pull in extension docs as well
  • Is there a way to leverage read the docs functionality to combine all the documentation together?
    • Mailing lists
    • Idea: remove all mailing lists except announce. All bugs/requests routed through issue queue on github
    • Seems ok with everyone
    • Can become part of organization and auto-watch all repos without adding their own repos

Feature requests:

  • Interproscan html output: Can we add functionally to incorporate this into the interpro scan feature node/entity tab? (Sofia)
  • This now has visualizations thanks to A. Bretaudau
  • Feature Request: Interpro scan module -being able to add terms outside GO terms, etc. For example Reactome, KEGG. (Sofia)
  • These are native options to IPS, need to add code to deal with these new options.
  • KAAS/KEGG loader required output that is no longer available from KEGG. I have hacked a method for downloading the KEGG output with a shell script using curl and I have modified the KEGG loader to work with this new downloaded data. But I loose all the links. (new downloaded data does not have any links). Would be nice to get links back in. (Sofia) 
  • I have an undergraduate student working on this, hope to have something to release within a month or two (Meg)
  • Possible Problems/Bugs:
  • Issues with updating custom cv terms. Might be my fault. Do I/can I reload my obo files to update terms. I get errors when trying this. (Sofia)

 

Random questions: