Meeting Date
Attendees
-
Lacey-Anne Sanderson (both days)
-
Katheryn Buble (both days)
-
Valentin Guignon (both days)
-
Nic Herndon (both days)
-
Emily Grau (both days)
-
Eric Wafula (both days, beginner)
-
Prakash Timilsena (both days, beginner)
-
Meg Staton (meeting only)
-
Abdullah Almsaeed (both days)
-
Ming Chen (both days)
-
Steven Cannon (day 1 and first half of day 2)
-
Eliot Cline (both days)
-
Sudhansu Dash (day 1 and first half of day 2; beginner)
-
Chris Childers (both days)
-
Ethy Cannon (1 ½ days)
-
Qiaoshan Lin (both days)
-
Sofia Robb (both days)
-
Taein Lee (both days)
-
Victor Unda (both days)
Friday, January 13
8am - 12pm Round-table Discussion
12pm - 1pm LUNCH
1pm - 5pm Free Collaboration, continued Round-table Discussion
- Tripal 3 update
- Full stable release: May 2017
- Just about finished main core functionality done
- Uses the semantic web and controlled vocabularies to exchange data between Tripal sites as well as to expose it to the outside world
- Uses controlled vocabularies for everything: column names are given cvterms as well as relationships, etc.
- Content created by type (ie: gene) rather than by chado table (ie: feature)
- All chado content is now fields which allows you to use the web interface to reorder and change display rather than needing a php template
- What you see on the page is what is available on your web services
- Thus your web services are tailored to your site
- Beautiful upgrade process that allows you to upgrade immediately while keeping your nodes. Then you can upgrade on a per content type basis
- You can also use your old templates on entities
- Tripal v2.3
- Victor has started unit testing module which will allow much quicker releases
- Data type evolution: dealing with NGS type of data with Tripal and what about Chado? (Valentin)
- Includes: Genotype analysis and viewing tools: use cases and survey of tools now under development (Ethy)
Challenges:
- How to pick sets of germplasm efficiently
- Core data storage and extraction problem
- How to display the data to the user: zoomed in, haplotype viewer, etc.
- Speed, avoiding server overload (can some analysis be done on client side?)
- Valentin’s team stores its NGS data in MongoDB
- Uses tomcat on top of MongoDB
- https://www.crop-diversity.org/mgis/gigwa
- Genotypic data associated with markers
- Lacey uses PostgreSQL (tested 5 billion: 5 million SNPs x 1000 germplasm)
- Ethy: MaizeGDB
- Mini, incomplete slides on other tools here: https://docs.google.com/presentation/d/1TVt74oQi3EON6DtQlcjZDirJpw5R53_…;
- HDF5 in the backend (TASSEL)
- MaizeGDB SNPversity is still having troubles with some of the larger jobs completing
- Flapjack is great but is a standalone java application
- Support for VCF is good (easier for researchers)
- VCF export? So far, not wanted, prefer matrix formats now, but need is likely
- Tripal Job launcher (Ethy)
- Can’t limit the number of parallel jobs
- This one shouldn’t be too difficult
- Drupal module cron queue
- Would be nice if job launcher could send jobs to a separate system
- Would be nice for the job launcher to respect priority
- Need a lot more fine-grained control about which jobs can be run in parallel and which can not
- Cannot cancel jobs through the job interface
- Job log is not extremely helpful -date/time for when messages are logged
- About 5 are using the drush daemon
- Tripal Elasticsearch launches multiple queues with
- QueueUI
- UltimateCron
- Import/export module: extend Tripal Download API? (Valentin)
- Sofia uses perl to write SQL
- Ethy uses perl
- MainGroup lab has a PHP chado loader https://www.rosaceae.org/mcl
- RNAseq has it’s own internal loader in php
- Data checking as you go, perl interactive loaders
- Also need to ability to share these loaders with other groups
- Perhaps it would be useful for the API to be a PHP library/class-based so it would not need drupal
- Mention this on the mailing list
- Some people have a Tripal site solely for loading data, not for content display
- Tripal cv loader fails for GO.
- Probably due to a chado stored procedure
- Disable stored procedure then run chado’s perl set-cvtermpath script by hand? ISU will test.
- Tree Visualization of multiple species relationships using the organism and phylo tables (Chris)
- There is a newick file loader in Tripal 2.1 that populates the phylo* tables
- This one will pull lineage from NCBI taxonomy; documentation embedded in bulk loader documentation. Taxonomy/Organism linker (http://tripal.info/node/109 )
- Can be used to store taxonomy tree
- Has a visualize of a tree
- LegumeInfo phylotree module here.
- This one more based on gene families
- How indices of phylonodes are computed: http://archive.oreilly.com/pub/a/network/2002/11/27/bioconf.html
- And have look to the picture there: http://archive.oreilly.com/pub/a/network/2002/11/27/bioconf.html?page=2
- Tripal 3 and entity permissions based on drupal users and roles (Sofia)
- Will be very similar but will be on an entity-bases
- Would like the permissions moved as well **
- Will it handle permissions on a per-node basis?
- Need documentation on how to do this in Tripal 3
- Some people are using Organic groups
- Galaxy module and Data Exchange
- Trying to integrate Tripal with Galaxy so that users can see a Tripal interface but run galaxy workflows
- Uses webforms to create the interface for workflows
- Module queries galaxy and creates the webform for it
- Then admin can go in and tweak various things like defaults and help text
- Parameters can be re-arranged and grouped, etc.
- BLAST Module https://github.com/tripal/tripal_blast
- How many people are using it?
- Is it meeting your needs?
- BLAST module new feature: filter database list by organism. (Sofia)
- Overview of BLAST at PeanutBase and LegumeInfo here.
- Slides also show CViTjs
- Shows features of a gff3 file in the context of the whole genome
- blast https://docs.google.com/presentation/d/1iKnHgVyeGWe2pE2OFrjTD_FHb7yTIGslWGiPqQc5qX4/edit?usp=sharing
- BLAST at KnowPulse (current core module)
- Consumption of Web services planned, especially of CoGe, which has a nice REST API and a way of grouping target databases that are of interest to your users.
- Question from LegFed meeting: if target is a set of genomes, could multiple genomes be displayed on multiple instances of CViTjs?
- Meta data submission system (Chris)
- Drupal forms to control submission of metadata
- Data is not in chado
- Chado Multi-chado
- Needs reviews on Drupal.org --Please help by reviewing this module!
- Supports multiple chado databases attached at the same time
- Each user/session can only access one chado at a time
- Elastic search
- Site-wide search -much faster than Drupal views
- Dorries group wants to be able to use it with multiple sites on the same server
- Now available
- Expression module
- Creates biomaterials (reflects NCBIs concept)
- Separately you can load in your gene expression data -beautiful heatmap visualization
- Has an independent page where you enter a number of genes
- Builds a two dimensional heatmaps
- basket/cart functionality would be helpful
- Does anyone want biomaterials separate from expression?
- Search by normalized values above or below a threshold
- Might want to store p-values
- Sofia volunteered to be a tester for search functionality
- Hackathon produced an implementation of the Tripal Download API to download the expression values for a given node
- New functionality for blast analysis and interpro and upgraded the go module
- Basket/Cart functionality
- What are examples of existing carts on biological (or not) sites that people like?
- Valentin - how is flag module working out? Should we continue to leverage that?
- Flag module - working well with Musa to hold stock and 3 other entities, can apply actions on cart
- Long term goal: workspace with multiple user-provided and db-source datasets
- Think about a chart focused on entities rather then nodes so that it works with Tripal 3
- Do we want to ensure a cart can only contain one subtype (ie: only features (Tripal2) or only genes (Tripal3))?
- What do you love/hate about Tripal?
- Difficult to become proficient
- Drupal can be challenging to learn.
- Very Large (improved in Drupal8/Tripal 3?)
- Reusability of code and functionality
- Customizable
- Need a sustainability plan
- NSF research coordination network grant
- Stephen has ideas from DIBBS meeting about long term funding goals for NSF
- Chado (several versions) instantiation/update made easy
- Online documentation
- perhaps develop documentation standards for Tripal modules.
- currently have tripal.info - could pull in extension docs as well
- Is there a way to leverage read the docs functionality to combine all the documentation together?
- Mailing lists
- Idea: remove all mailing lists except announce. All bugs/requests routed through issue queue on github
- Seems ok with everyone
- Can become part of organization and auto-watch all repos without adding their own repos
Feature requests:
- Interproscan html output: Can we add functionally to incorporate this into the interpro scan feature node/entity tab? (Sofia)
- This now has visualizations thanks to A. Bretaudau
- Feature Request: Interpro scan module -being able to add terms outside GO terms, etc. For example Reactome, KEGG. (Sofia)
- These are native options to IPS, need to add code to deal with these new options.
- KAAS/KEGG loader required output that is no longer available from KEGG. I have hacked a method for downloading the KEGG output with a shell script using curl and I have modified the KEGG loader to work with this new downloaded data. But I loose all the links. (new downloaded data does not have any links). Would be nice to get links back in. (Sofia)
- I have an undergraduate student working on this, hope to have something to release within a month or two (Meg)
- Possible Problems/Bugs:
- Issues with updating custom cv terms. Might be my fault. Do I/can I reload my obo files to update terms. I get errors when trying this. (Sofia)
Random questions:
- Can the feature names in Tripal/Chado contain more than one word? Some NCBI GenBank names have multi work name/definitions and do not have a gene symbol. (Sofia)
- Yes they can. Many of mine do :-) Assuming you mean can they contain spaces?
- Thank you!
- Can someone talk about he JBrowse Tripal module? (Sofia)
- Example gene page with JBrowse iframe: https://i5k.nal.usda.gov/OFAS025035
- NOTES: https://docs.google.com/document/d/12EYDxl9gC7nHHJsXkaRAPrpRnFvX1nmyjEE8bzQHDtY/edit?usp=sharing
Meeting Type