Tripal Devel: http://202.46.1.154 |
Registrations |
Tripal Devel
|
Tripal Devel: http://panselnas.menpan.go.id |
Registrations |
Tripal Devel
|
Tripal Developer's Meeting 2014-10-07 |
Tripal Community Meeting Notes |
-
Recently funded projects for Tripal development.
-
NSF DIBBS:
-
http://www.nsf.gov/news/news_summ.jsp?cntn_id=132880
-
http://nsf.gov/awardsearch/showAward?AWD_ID=1443040&HistoricalAwards=false
-
API for cross-communication and web services.
-
Integrate Galaxy with Tripal.
-
Software defined networking.
-
NRSP: Breeding resources, Customizable query building, open-source applications for field collections of breeding data and linking with Tripal.
-
NSF PGRP
-
Tripal Workshop at PAG: Call for Extension Modules to be presented
-
Existing modules
-
ND Natural Diversity Genotypes (U Sask)
-
Blast Tripal Modules (U Sask/Univ Iowa/WSU)
-
Tripal Germplasm (U Sask)
-
Functional Annotation Modules (KEGG/InterPro)
-
JBrowse module (U Sask)
-
Tripal Daemon module (U Sask) (forgot to mention in call, added afterwards)
-
What’s coming
-
EImage Module (WSU)
-
QTL Module (Iowa / WSU) (See demo later today at PeanutBase under QTL tab).
-
Main lab Customizations for Genotypes/Searching (WSU).
-
Network Module / Expression data (Discuss with Mara / Kris)
-
Modules included in other talks: Andrew’s Phylotree module (NCGR) in Ethy’s talk.
-
Module Development: Drupal "git vetted user role"
-
Please email mailing list to have other users review your modules. We can review each other’s code.
-
To place your module on Drupal.org and to have permission to release full projects follow the steps on this page: https://www.drupal.org/node/1011196
-
Other ongoing projects
-
GenSAS: automated and manual structural and functional annotation of genomic sequence. Integrates with WebApollo. Probably release Q1 of 2015. With Tripal integration coming later. (WSU)
-
WebApollo Tripal module (needs lots of work) (WSU)
-
JBrowse Tripal module for integrating JBrowse with Tripal (Univ Sask)
-
What else is needed?
-
Nelson: Issues of access permissions for specific data types.
-
Valentin: uses chado controller, works with feature table primarily, and allows you to restrict access to specific records. Thinking about creating a Tripal module for install for chado controller.
-
CHADO Controller URL: http://www.gnpannot.org/content/chado-controller
-
Added Issue to track issue: https://www.drupal.org/node/2352859
-
Feature requests.
-
Mara: More command-line ways to interface with Tripal. Specifically because she has pipelines to interact with Chado database. (i.e. creating organism/feature pages, etc). Issue added at: https://www.drupal.org/node/2314235
-
Mara: Blast loader requires a database cross-reference. Can we handle blasting without specifying a database. Issue added at https://www.drupal.org/node/2352329
-
JBrowse integration example: http://coffee-genome.org/jbrowse
-
Chado
-
Scott: Updated release perhaps by end of 2014.
-
Mara: Group module, new update with Flybase requests.
-
TODO:
-
Next meeting have a shared google doc.
Next Month’s Agenda
-
Inclusion of new extension modules as part of Tripal (if they are generic enough)
- Should Tripal be broken into separate modules?
|
Tripal Developer's Meeting 2014-12-02 |
Tripal Community Meeting Notes |
-
Reminders
-
Module Development: Drupal "git vetted user role"
-
Please email mailing list to have other users review your modules. We can review each other’s code.
-
To place your module on Drupal.org and to have permission to release full projects follow the steps on this page: https://www.drupal.org/node/1011196
-
Funded Projects:
-
NSF DIBBs (PI Ficklin)
-
Tripal Exchange Module: REST web services. Work begins now. First developer’s meeting held mid December. All invited. Watch for forthcoming email.
-
Andrew farmer implemented API for gene family module (based on phylotree module).
-
Ethy has had requests for web services for QTL module.
-
Tripal Galaxy Module: Work will begin later in 2015.
-
Tripal SDN Module: exploration of SDN technology for faster data transfers between Tripal databases.
-
USDA NRSP (PI Main). Formal list of items coming next month. Started in October.
-
USDA SCRI (PI Main). Formal list of items coming next month. Started in October.
-
NCBI parser/loader/Gene/Gene family module (year 1)
-
Tripal module for RightField (done in year 1)
-
Tripal Module for Field Book (breeding data application for collecting phenotype data (pictures, measurements)) (year 2). Any relation to IBP field book? Will be completely open-source.
-
Customizable Query builder tool (year 1 & 2)
-
Conversion of Breeders Toolbox to be Tripal compatible.
-
breeding data management (year 1-3)
-
other breeding decision tools (year 2-5)
-
Andrew, Ethy, Steve,... NSF PGRP proposal outstanding. Legume database federation.
-
Nobel Foundation: Alfalfa. Extension of Chris Town Medicago database.
-
Job Postings:
-
Post on Tripal Site (with expiration date) if desired.
-
Post on GMOD Site (coordinate with Scott).
-
Current Openings:
-
Web Developer at WSU for DIBBs project.
-
Upcoming Activities:
-
Posted on Tripal Website. Let us know and we will add them.
-
Tripal PAG Workshop. Jan 11th, Sunday at 4:00-6:15pm PST.
-
Extension Modules presented by Lacey:
-
ND Natural Diversity Genotypes (U Sask)
-
Blast Tripal Modules (U Sask / Univ Iowa / WSU)
-
Tripal Germplasm (U Sask)
-
JBrowse module (U Sask)
-
Tripal Daemon module (U Sask)
-
EImage module (if finished in time)
-
Phlyotree module ??
-
QTL module ??
-
GMOD PAG Workshop. Jan 14th, Wed 10:30-5:00pm PST.
-
GMOD Online Training / Summer School ?
-
GMOD Annual Meeting ?
-
Others ??
-
BOSC: Dublin Ireland. 2 days before ISMB conference. Provide support for hackathons.
-
DIBBs web services design meetings (Start mid Dec 2014)
-
Any conferences where your work will be presented?
-
Continued Tripal Developer Meetings (change to every other month rather than monthly?)
-
Interleave… some more topic and every other month the larger broader meeting.
-
Extension Modules Under Development
-
QTL Module (Iowa / WSU / USask)
-
Main lab Customizations (WSU).
-
Network Module / Expression data (WSU / Clemson; Discuss with Mara & Kris).
-
GenSAS: automated and manual structural and functional annotation of genomic sequence. Itegrates with WebApollo. Probably release Q1 of 2015. With Tripal integration coming later (WSU).
-
WebApollo Tripal module (needs lots of work) (WSU).
-
VCF Loader (WSU).
-
Pedigree/Relationship Viewer (USask).
-
Does anyone need tree-like views of relationship data.
-
Ontologies are configurable via the interface.
-
Using JavaScript InfoViz Toolkit (JIT).
-
Check the phylotree module from Andrew’s group. Visualization based on D3 libraries.
-
ND Genotypes (has a Released version; documentation needed) (USask)
-
Tripal Germplasm & Cross Management (USask)
-
Genbank Parser. Includes search tool. Will implement in group module. Includes bulk loader templates for importing Genbank files (WSU)
-
Phylotree Module (NCGR/Iowa)
-
Shopping Cart style data store (NCGR). Generating lists of items to do operations.
-
Light-weight list generating tool in Tripal and passing those/integrating with iPlant.
-
Uses LightShop Drupal module.
-
Implemented for BioMaterial module already.
-
Tripal Core Future Developments
-
What is core functionality? Should web services be incorporated as core functionality? What data loaders should come with Tripal by default (i.e. VCF, GenBank)? Should only Tripal Core module be Tripal and all other modules be extensions?
-
Reason for the question:
-
Too much stuff, is it overwhelming?
-
Takes a long time to convert to new Drupal version & Release updates.
-
Currently, Tripal is heavy on the whole genome / transcriptomics which may confuse folks who may be looking to construct a resource for storing stocks/germplasm, breeding resources, etc.
-
Maintenance: hard to keep up with all these modules on the same release schedule.
-
Pros for separating
-
Allow for more releases, more often
-
Simplified code base to convert as new versions of Drupal are released.
-
Users only need download modules they need. They don’t get a bunch of other modules they don’t need.
-
Forces dependencies to be properly specified (currently all come together).
-
Cons:
-
Increases the difficulty for installation: users’ must be more aware of dependencies and which modules to install.
-
Concerns:
-
The core modules must remain generic for common usage. By making other modules not “core” (e.g. feature) there is the risk of losing oversight and loss of generic functionality. Who maintains?
-
* TODO: plan out what should go into core, what should be separate (specific to a certain subgroup).
-
RSS feed on Tripal.info that lists extensions available.
-
Add to the Tripal core module a web interface to parse read that RSS feed and show users all available modules.
-
Tag modules with categories to help select modules for installation.
-
Installation Profiles:
-
RSS feed provides profile setup details.
-
pre-configured set of modules by need (e.g. genomics site, breeding site, genetics site, etc…).
-
functions just like a manually installed Tripal site with the ability to add additional modules, disable core ones, etc.
-
Data access permissions.
-
Proposed expansion of web services to include authentication/access NSF PGRP (PI Main @ WSU)
-
Current solution: chado controller
-
Expanded Drush commands for full command-line management -- no current funding to do this. Must be done step-by-step.
-
Tripal changes for Drupal 8
-
Use of Drupal Entities (object oriented)
-
Provides an API for every chado table.
-
Store results of queries into a “Shopping Cart” style.
-
Chado
-
Updated release
-
List of new tables
-
* Progress getting into git.
-
Group Module
-
iPlant Data Commons (Steve Cannon)
-
A new feature at iPlant. Website development toolkit within iPlant. Tools for sharing managing, publishing data to be used with iplant analysis tools. Connects to workflows metadata, etc.
-
Looks like a new GMOD-style toolkit. For creating custom websites within iPlant.
-
* Talk with iPlant during development to have design interaction.
-
Perhaps PAG Meeting to discuss.
-
Issues
-
Tripal GFF3 loader doesn’t create polypeptide features.
-
Topics for future meeting
-
Open up discussion of germplasm / cross management / breeding.
- Javascript graphing visualization library.
|
Tripal Developer's Meeting 2015-01-06 |
Tripal Community Meeting Notes |
Gerard: Status of Chado/Tripal.
-
Not sure of Chado status need to ask Scott
-
Tripal: exploration of future development for object oriented and restructuring of how modules are delivered and support of other data storage mechanisms.
-
Development of Web Services
-
Integration with Galaxy.
Nathan: ETA for 2.0 stable release. Tripal doesn’t yet have unit testing.
Gerard: Security Issue for D7 sites.
Solutions:
-
For sites without active addition of content: have a staging site for checking before production. Have a scheduled release for updates.
-
For sites with active addition of content: not sure what do …. Need a solution,
-
Stay on top of updates. Install as quick as they come out.
Ethy/Steve: Kick off meeting to construct Tripal QTL meeting at PAG @12pm.
-
To help with conformity between: Ames sites, WSU sites, U. Sask.
-
Blast module: will bring info for that to PAG as well.
Stephen F: Will Phylotree module be release soon.
Sunday evening dinner. Walking distance. New fresh food restaurant in the mall towards the far end of the mall.
Action Item
-
Ask Scott about current status of Chado.
-
Invite to web services discussion at PAG.
-
Get an ETA for a 2.0 expected release.
- How to restore sites from a potential hack event if content is actively added.
|
Tripal Developer's Meeting 2015-02-03 |
Tripal Community Meeting Notes |
-
Thanks to PAG speakers!
-
Status of Chado
-
Ideas for changes: changes to organism table for infraspecific names; Group module needs some looking at further; adding property tables for existing tables (also how to deal with units); Eimage table; db table in it’s own module. And some changes requested on the mailing list.
-
Will make document of changes public for other comments.
-
Idea: deal with easy issues (organism table, property tables) for a quick release and more difficult issues in future release.
-
Move to GitHub… request for any further commits.
-
GMOD Summer School (Tripal workshop) + GMOD Meeting
-
GMOD/Google Summer of code
-
“Google Summer of Code is a global program that offers student developers stipends to write code for various open source software projects”
-
http://gmod.org/wiki/GSoC
-
Status of Tripal v2.0 stable release. Look for release in March.
-
Tripal extension modules currently under development
-
QTL (U. Iowa, WSU, U. Sask)
-
Need to pull together a vocabulary and then a list of requirements for feedback from other groups.
-
Waiting on feedback of template file.
-
BLAST (U. Iowa, U. Sask)
-
plan to put development code in GitHub and move to Drupal repository once it’s ready for general use.
-
Separate but somehow combined to reduce confusion.
-
Networks (WSU, Clemson)
-
Web Services (WSU + community)
-
Demo currently online, using AGAVE + HAL format.
-
Need to address data privacy issues within the web services.
-
differential access (different groups have access to differenet things).
-
Anonymous registration to track usage.
-
Phylogeny (NCGR, U. Iowa)
-
Make public soon.
-
Pedigrees (U Sask)
-
d3 support (U Sask)
-
want to support basic pie/bar charts, feature location diagrams, etc.
-
KEGG module issues (WSU)
-
Ecommerce style shopping cart (NCGR) for collecting data. Currently gene oriented has some sequence download functionality.
-
Searching?
-
Add a section on how to deal with searching to the online documentation?
-
Valentin Guignon provided a suggestion for integration with ElasticSearch and Solr and provided some example code: https://www.drupal.org/node/2214775
-
Need a single search box capability (google type approach). Perhaps something like Entrez.
-
Setup Improvements?
-
Installation, ease of use, recommended improvements?
-
Installation profiles vs Distribution
-
Installation profiles can be executed after Drupal is setup to download everything needed.
-
Distribution is a full package with everything need that can contain an installation profile to complete some additional steps.
-
What kind of “Tripal” Distributions/Installation profiles should we offer?
-
Genomics site for whole genomes transcriptomes?
-
Breeding-specific site?
-
E-commerce/stock site?
-
Need tutorials for these.
-
Other thoughts
-
Not include default organisms, analyses. Or ability to control what data gets preloaded.
-
Ethy offers to provide feedback from her group.
-
Include a tour in distributions: https://www.drupal.org/project/joyride
-
-
Change to gene/mRNA default page: merge into a single gene page?
-
difficult to traverse feature relationships (mostly gene centric).
-
Yes, combine, at least for genes, into a single gene page.
-
Chris Childers: i5K would be willing to share their efforts as a potential default solution.
-
TODO:
-
Add a section to the Tripal tutorial about permissions… how to make some date private for certain groups.
|
Tripal Developer's Meeting 2015-03-03 |
Tripal Community Meeting Notes |
-
VCF “non” loader.
-
Associating files with data.
-
Use Cases:
-
BAM files for RNA-Seq / integrated with Expression Modules
-
VCF
-
Whole genomes: FASTA, GFF, Excel.
-
Multi-Chado and Chado Controller for Tripal: some updates (about the projects) for the community (by VG).
-
View initial design: https://drive.google.com/open?id=1q4rWcncmqvDseV3Np6NldoYKFIikXMhNqvXAUQEzbTc&authuser=0
-
Chado Controller
-
Sandbox: https://www.drupal.org/sandbox/guignonv/2428743
-
Enforces constraints at the PostgreSQL level (not Drupal)
-
Chado history: extension of Chado Audit module.
-
Access restrictions: limits read/write access at a granular level.
-
Works on feature table at the moment but will support other tables soon.
-
‘feature’ table is renamed to ‘feature_data’, and a ‘feature’ view takes the place of the feature table. PostgreSQL user account are not allowed to query ‘feature_data’ table but can query the ‘feature view’.
-
Annotation Inspector
-
Allows for review of annotations made by reviewers.
-
Allows to automate some changes according to the data that has been modified by somebody. (ex. auto-change a color property according to a gene annotation state).
-
Still some redesign to be done and it will be renamed into “Chado Inspector”.
-
Use cases:
-
could be run by an annotator on his favorite genes to check his work.
-
could be run on the whole database by and admin to have statistics and see what’s good and not so good.
-
automate tasks.
-
force some changes regardless what the users asked; for instance the “owner” property of a gene so the annotator who changed the gene can’t use a different name that his user name (ie. sometimes “John Smith”, “JOHN”, “jsmith”, whatever... only his real user name will be used regardless what he entered as a value for the “owner” property).
-
Performance?
-
Audit module has no noticeable impact.
-
Annotation inspector is not optimized, and cause slowness depending on the number of rules and elements on which the rules are checked.
-
Access restrictions, not fully explored in the context of Tripal usage. May have noticeable impact. Impact might only be at the first connection time.
-
Available as an “alpha” Drupal Sandbox module.
-
Multi-Chado access (separate from Chado Controller)
-
Sandbox: https://www.drupal.org/sandbox/guignonv/2429515
-
Needs updates to core module to support this.
-
Access restriction will support multiple chado instances.
-
Use Cases:
-
public and a private version of a same chado database in 2 different schema, “chado” and “chado_private”; the Tripal site is public; anonymous people will only see what’s in “chado” schema; users that are logged on will see what’s inside “chado_private”;
-
Different funding sources necessitate separating database.
-
Possible use for access control for web services as long as the access can be limited through PostgreSQL access layer (GRANT/REVOKE) at PostgreSQL user level (ie. “REVOKE ALL ON TABLE feature FROM ROLE some_user;”).
-
Improvements / Changes to standard modules.
-
Instead of a TOC have all data in a single page
-
Publication module: Need viewers for all linking tables.
-
Stock images attach to stock pages and link to GRIN.
-
Need to accommodate genetic maps created from crosses of multiple species (some may be synthetic).
-
multi-species and synthetic crosses in stock table
-
required organism field in stock table points to a genus organism record
-
individuals in stock table are linked to multiple organisms (perhaps via a new stock_organism linker table)
-
featuremap records are linked to stocks via a custom linker table, featuremap_stock.
-
features (e.g. markers, linkage groups) in the genetic map are assigned to the genus organism record since the organism field must be set in the feature table.
-
need perhaps an accommodation in Chado to support this.
-
Additional use case: interspecific crosses (ie: Lens culinaris by Lens ervoides) cannot currently be stored in chado since you have to pick a single organism for each stock.
-
Stable 2.0 version still planned for March.
-
Next meeting Apr. 7: a formal agenda meeting.
|
Tripal Developer's Meeting 2015-04-14 |
Tripal Community Meeting Notes |
-
Status of Tripal stable release v2.0 -- May 31st.
-
Changes to tripal.info site.
-
Future Updates for Tripal
-
Entities for Drupal 8.
-
Installation profiles.
-
Make links for these notes online tripal.info.
-
Lacey has a generic installation profile. It’s close to be usable already. Lacey will reinvigorate efforts on this. She’ll explore including this in stable release.
-
Changes to default pages.
-
Addition of a Tripal module repository resource on tripal.info to make finding extensions easier.
-
Add a list of modules being worked on to help reduce duplication of effort.
-
Folks should also list new modules on listserv
-
Should add a document for how to share modules on tripal.info.
-
Integration of Andrew Farmer’s (NCGR) Phylonode module.
-
Unit Testing (to speed testing for release).
-
Documentation Updates
-
Add instructions for setting permissions on page level
-
Add instructions for integrating non-Drupal Search
-
Is the documentation useful? How to improve?
-
Tripal default page interface
-
Ethy, Meg: don’t like the collapsible sidebar.
-
Ethy suggested wrapping panels under the sidebar if space is available.
-
Ethy likes the breadcrumb trail. But how can we design this.
-
Steve mentioned the circular pages to gene/mRNA pages.
-
On gene pages give this information for the primary transcript by default.
-
Chris: i5K folks will be willing to share styles and templates.
-
Need to add a place to share templates on tripal.info an extension theme.
-
Need a way to more easily enhance templates on a “type” basis (for features, stocks, etc...)
-
Need to add a missing templates for pub linker (and other linker).
-
Any new work? Questions?
-
Status of Extension Modules
-
QTL (U. Iowa, WSU, U. Sask)
-
Sook and Ethy are trying to get it started up again. Nice to have more involvement. Need someone to implement it as there isn’t a much time perhaps in the summer. Goal is to have a working draft by PAG 2016.
-
Perhaps expand into genetic module to include markers and maps.
-
Meg expressed interest in being involved.
-
Lacey expressed interest in keeping genetic markers separate from QTLs.
-
BLAST (U. Iowa, U. Sask)
-
Networks (WSU, Clemson)
-
Web Services (WSU + community)
-
Phylotree (NCGR, U. Iowa)
-
Pedigrees/d3 support (U Sask)
-
https://www.drupal.org/project/tripald3
-
There is a current production release at the URL above
-
Future plans include moving pedigrees out to a separate module (perhaps in the same download) and further developing a Tripal D3 API for developers allowing for consistent themeing, pop-ups, etc. between modules.
-
KEGG module issues (WSU)
-
Chado Controller (Bioversity International)
-
Multi-Chado (Bioversity International)
-
Ecommerce style shopping cart (NCGR) for collecting data. Currently gene oriented has some sequence download functionality.
-
File module (associating images too). (WSU)
-
Daemon API/Drush Daemon API / Tripal Daemon (USask)
-
Others?
- Next Meeting May 5th, an informal agenda.
|
Tripal Developer's Meeting 2015-05-05 |
Tripal Community Meeting Notes |
-
Meeting notes are now archived on the Tripal.info site: http://tripal.info/meetings/developers/summary
-
Strategies for dealing with large numbers of features: memory and slowness issues. (EC)
-
Related to Tripal issue https://www.drupal.org/node/2463211.
-
Problematic pages:
-
feature admin.
-
Timeout in the first instance has been traced to very slow queries to count, retrieve, and sort all features.
-
FIX:
-
change the select boxes to text boxes in the view.
-
disable the sorting will speed up the query.
-
loose index scan is faster than a select DISTINCT. See notes posted by Nathan below for solution to code.
-
The pagination does a query to get the count and second to populate the page.
-
We can setup the view so we only display results when the user presses the submit/filter button
-
organisms with many features
-
Feature browser: not very useful (Ethy). Maybe have some sort of example page to replace the feature page. Perhaps a block.
-
Feature summary.
-
maps with many features.
-
possible solution would be to limit the nested objects that are returned from chado_generate_var().
-
Problematic processes:
-
Sync-ing (it is possible to go directly to sync page, which by-passes the time-out on the feature list pages).
-
Updating URLs (though much improved now that it no longer runs out of memory).
-
Want to select to a subset of features (by type, organism, or simply a count).
-
Removing orphaned features.
-
Runs out of memory.
-
Partition feature table some way? Or the queries? Alternative methods for counting records?
-
Suggest using a bigserial for primary keys in Chado.
-
Need to explore table partitioning in Chado. Valentin will be potentially exploring this.
-
Multiple schema? (E.g. one per organism) But how to deal with foreign keys across schema? Could feature be replaced with a materialized view?
-
For questions please contact Valentin for how this could be done.
-
Statistics for the database: Taein Lee is working on a module to show the number and amount of data types. To provide information about structure in the database in relation to classes.
-
Sync-ing a large number of features/stocks in a bulk fashion.
-
memory problems on syncing stocks.
-
drush command. **
-
You can limit the number of stocks you want to sync at one time
-
Search Engine for Chado: https://www.drupal.org/node/2214775
-
feedback on the use of elasticsearch search API module (by Valentin)
http://www.crop-diversity.org/mgis/accession-search
-
The ElasticSearch Drupal module ( https://www.drupal.org/project/search_api_elasticsearch) is new and under development.
-
More details: https://www.drupal.org/node/2214775#comment-9895567
-
has anyone worked on incorporating ontologies into searches for tripal? (ADF).
-
A project is in the works at Ames to take advantage of ontology hierarchy to search for data objects attached to ontology terms. Stephen is doing something similar with REST services. Will collaborate.
-
Add to the Tripal User’s Guide a section on options for setup of Searching.
-
USDA/ARS is working on integrating Solr into the Tripal site to index Drupal/Chado and other file types.
-
Two modules: Drupal Solr (well supported) and an additional module to add file support (not well supported)
-
Mailing lists (gmod-tripal vs gmod-tripal-devel).
-
Updates to Extension page on tripal.info. Forthcoming interface on Tripal sites for easier installation.
-
Issues we didn’t cover, but will be on agenda for next month’s meeting:
-
Downloads: it would be nice to have downloads built in to Tripal, especially downloads of search results, configurable as to which fields from the record pages to include. In the meantime, what have other groups done? (EC) NB: this relates to an issue I raised for our site regarding the views-data-export module listed as a "Highly Recommended Module" in the README that comes with tripal (ADF)
-
Permissions: having to manually reset permissions for modules on install seems to trip us up frequently during development/deployment. Is there a reason that modules should not have some default permission settings and if not, is there a way to effect this? (ADF)
Action Items
-
Stephen will fix the slowness on the admin pages and memory issues for stable release v2.0
-
Stephen will experiment with bigserial for pkeys in Chado.
Loose Index Scan Notes:
It turns out that the primary bottleneck in the 3rd query is the "SELECT DISTINCT":
SELECT distinct(type_id) FROM chado.feature
This is apparently implemented by doing a sequential scan on the chado.feature table:
drupal=> explain SELECT distinct(type_id) FROM chado.feature;
QUERY PLAN
------------------------------------------------------------------------
HashAggregate (cost=176753.74..176753.85 rows=11 width=4)
-> Seq Scan on feature (cost=0.00..165521.79 rows=4492779 width=4)
I stumbled upon the following optimization, called a "loose indexscan", which works well when there are many rows containing few distinct values:
https://wiki.postgresql.org/wiki/Loose_indexscan
Using that to implement a faster equivalent to the 3rd query results in a run time of ~5 milliseconds (vs over 13 _seconds_):
LOG: statement: WITH RECURSIVE t AS (
SELECT MIN(type_id) AS type_id FROM chado.feature
UNION ALL
SELECT (SELECT MIN(type_id) FROM chado.feature WHERE type_id > t.type_id)
FROM t WHERE t.type_id IS NOT NULL
)
SELECT cvterm_id, name FROM chado.cvterm WHERE cvterm_id IN
(SELECT type_id FROM t WHERE type_id IS NOT NULL) ORDER BY cvterm.name ASC;
LOG: duration: 4.762 ms
I think we should update tripal_views_handler_filter_select_cvterm.inc to use the new query.
--- a/tripal_views/views/handlers/tripal_views_handler_filter_select_cvterm.inc
+++ b/tripal_views/views/handlers/tripal_views_handler_filter_select_cvterm.inc
@@ -67,7 +67,7 @@ class tripal_views_handler_filter_select_cvterm extends tripal_views_handler_fil
$where = ' AND ' . implode(' AND ', $where_clauses);
}
- $sql = "SELECT cvterm_id, name FROM {cvterm} WHERE cvterm_id IN (SELECT distinct(" . $this->field . ") FROM {" . $this->table . "}) " . $where . ' ORDER BY cvterm.name ASC';
+ $sql = "WITH RECURSIVE t AS (SELECT MIN(" . $this->field . ") AS col FROM {" . $this->table . "} " . ($where == '' ? '' : "WHERE " . $where) . " UNION ALL SELECT (SELECT MIN(" . $this->field . ") FROM {" . $this->table . "} WHERE " . $this->field . " > col " . $where . ") FROM t WHERE col IS NOT NULL) SELECT cvterm_id, name FROM {cvterm} WHERE cvterm_id IN (SELECT col FROM t where col IS NOT NULL) ORDER BY cvterm.name ASC";
$resource = chado_query($sql);
$cvterms = array();
|
Tripal Developer's Meeting 2015-06-02 |
Tripal Community Meeting Notes |
-
Tripal v2.0 stable release: released June 1st!
-
Overview/Progress of changes since last meeting
-
Admin pages are faster.
-
Forms don’t perform search (show data) unless submitted
-
Drop-downs populated with loose index scan (patch supplied by Nathan).
-
Feature admin page has d3 bar graph summary of features.
-
URL setting will have a ‘type’ filter setting but it is not yet there.
-
Memory leaks have been fixed for removing orphaned features and syncing features..
-
Feature browser is now deprecated in v2.0, but is still available in the event someone is using it. It will go away in some release in the future.
-
Add a drush command for syncing: This didn’t make it in the v2.0 release but an issue was added. https://www.drupal.org/node/2498995
-
Chado v1.3 should have bigserial for the pkey fields to help with large tables.
-
There should be a Chado v1.3 release soon!
-
Tripal will be Chado v1.3 compatible shortly thereafter. A new Tripal release will be made at that point.
-
Ask Scott about the type_id on the organism table.
-
Andrew: how to handle features associated at higher taxonomic ranks than genus.
-
Steve & Ethy: how to handle features at the genus level (or synthetic crosses). How to link featuremaps (currently using stocks for these). For featuremaps linked to stock record, and then linked the stock record to multiple stocks and those secondary stocks are linked to features. Create an organism record to represent multiple species.
-
Sook suggested: markers are assigned to the organism from which the source DNA was extracted. Then the marker feature can be associated with any maps even if organism doesn’t match.
-
Lacey: for features that are for various species use ‘spp’ in species field of organism table.
-
Add / update your extensions on the tripal.info site. http://tripal.info now supports logins.
-
Upcoming Changes / Features / Events
-
Phylogeny module release.
-
Other modules with upcoming releases?
-
Contemplating migration of Tripal to GitHub for improved code sharing and forking
-
A synced copy of the code will continue to remain on Drupal.org
-
Drupal.org will continue to provide releases.
-
Possibility of splitting the core Tripal modules into separate packages for easier code sharing (e.g. phylogeny module).
-
Web services design meeting.
-
Needs?
-
Valentin: Is there a way to remove (or replace) data that was added via the bulk loader. For instance, for the stock_cvterm table. I have a stock linked to the cvterm ‘in vitro collection’ (through stock_cvterm table) and I’d like to have it linked (during an update job) to the cvterm ‘cryo-preserved’ instead. It means either removing previous relationship and adding the new one or updating the stock_cvterm.cvterm_id. Those 2 terms belong to the same CV but other terms from that CV might be used for current stock and should not be removed.
-
Andrew: Need a public accessible way to view terms with information about parental terms (tree to show relationships).
-
Sook/Jing: How to add bulk load publication that are not available in PubMed or Agricola. How about a bibtex loader? Lacey may make a bulk loader template. Will add an issue to the Tripal issue queue.
-
Steve Cannon: EndNote exports to flat files and a perl script is used to load those pubs. Will be happy to share that script. Can post on Tripal.info site: 1) Documentation for how to dump from EndNote and convert to flat file. 2) Use perl loader to import.
-
Valentin: how can we integrate with the Drupal biblio module.
-
Items from last meeting agenda that were not discussed
-
Downloads: it would be nice to have downloads built in to Tripal, especially downloads of search results, configurable as to which fields from the record pages to include. In the meantime, what have other groups done? (EC) NB: this relates to an issue I raised for our site regarding the views-data-export module listed as a "Highly Recommended Module" in the README that comes with tripal (ADF)
-
Lacey uses the Drupal module views_data_export. But she’s not very happy. It can be very slow and doesn’t clean up interrupted downloads.
-
Lacey is willing to share cleanup script for cleaning files.
-
Ex. of use: http://www.crop-diversity.org/mgis/bulk-export
-
Chun-huai has a custom search module with a download function which runs plain SQL and uses PHP to generate CSV file for users to download.
-
Stephen: will add an issue to the Tripal issue queue. Update the documentation to indicate the issues with the views_data_export.
-
Permissions: having to manually reset permissions for modules on install seems to trip us up frequently during development/deployment. Is there a reason that modules should not have some default permission settings and if not, is there a way to effect this? (ADF)
-
Lacey: setting permission by default may make security assumptions not desired by others.
-
Chun-huai: add a global setting to set default module permissions.
-
Andrew other site-specific settings… a topic to discuss for more use cases…. Stephen: add an issue so we don’t forget it.
-
Any other items?
-
Sudhansu: Help/user-guide feature for modules (e.g., Phylotree module). Need a way to provide help for users. Is working on a module(a generic include file for now) for adding ‘Help’ tabs to each content type page. The inc file creates a small block at the top of a target page with a short message and an optional link to a more detailed help page. The detailed help page is included as an html file in the module dir. So, the code is not dispersed and stays with the module for portability.
-
Lacey: Drupal working towards interactive site tours. She is creating this for knowpulse. The tools is a JavaScript library called Shepherd. Another javascript library is Joyride but Lacey feels it wasn’t as accurate as highlighting the exact element she wanted.
-
http://github.hubspot.com/shepherd/docs/welcome/ shows an example (obviously color is configurable :) )
-
Stephen: Can use collapsible fieldset that contains the help text.
-
Maybe meet later in July - August for a separate meeting on help.
-
Next meeting July 7th.
-
Action Items:
-
Get input on admin charts (email sent to mailing list, poll on tripal.info)
-
URL/title setting needs a type filter: https://www.drupal.org/node/2470789
-
Follow-up with Scott:
-
Make space available on tripal.info for sharing of Steve Cannon et al.’s perl script for loading publications
-
Update documentation stating Views Data Export as “Highly Recommended”
-
Lacey: Share drush script for cleaning up cancelled views data export tables (Mentioned in https://www.drupal.org/node/2499297 )
-
Add Issues:
-
Feature Request (Valentin): allow bulk loader to delete records
-
Feature Request (Andrew): public cvterm description pages (showing hierarchy would be helpful)
-
Feature Request (Sook/Jing, Lacey): method to bulk load publications (look into developing a bibtext loader, explore integration with Drupal Biblio module)
-
Feature Request: Bulk Data Export. Document problems with views data export.
-
Feature Request: setting default permissions for Tripal; explore the addition of a global setting making it easier to set these permissions.
-
Task: Come up with a best practices for module help & user guide. Collaborate with Sudhansu & Lacey.
-
Feature Request (Andrew): More site-specific settings on module pages
-
Feature Request (Andrew): Be able to display new feature admin chart to users.
https://www.drupal.org/node/2499881
|