ND Genotypes (create marker/stock) Bulk Loader Template

laceysanderson's picture
This template provides a means of loading marker by germplasm genotype matricies (See GenotypeSampleData.txt for an example) into Chado by creating Natural Diversity Experiments for each Genotype (ie: each element in the matrix) and linking it to the corresponding stock and marker.  This module requires the sequence and relationship ontologies.

Bulk Loader Template name: 

ND Genotypes (create marker/stock)

Authors: 

Lacey Sanderson (Univ. Saskatchewan)

Categories: 

Compatible Tripal Version: 

Compatible Chado version: 

 

This data can be stored in chado without additional modification needed. Below is a table showing what chado table the various data is stored in.

TYPE OF DATA CHADO TABLE ADDITIONAL NOTES
Genotypes genotype The actual allele call is stored in genotype.description
Markers feature For markers the feature.residues field can either be left NULL or contain the sequence of the current genome build if there is one available.
Genentic Stock stock It's best practice for this to be the actual DNA sample that was assayed by the marker. Then a separate record in the stock table, which is not directly attached to the genotype, is created for the line/variety/individual and related back to the DNA stock via the stock_relationship table

The genotypes are directly linked to the marker producing them through a single entry in the feature_genotype table. This linking record requires a cvterm_id in addition to the feature_id and genotype_id which is meant to further explain the relationship between the feature and the genotype. At a minimum this could be the "derives_from" term from the relationship ontology distributed with Tripal to indicate the genotype was derived from the marker but it would be better to create a term in the Tripal controlled vocabulary along the lines of "produced by" or "sequence_variant_of".

The link between genotypes and the genetic stock they are describing is a little more convoluted and takes place through the Chado Natural Diversity module. In this case, a record is created in the nd_experiment table for each genetic stock assayed. This experiment is then linked to the genotype through the nd_experiment_genotype table and the genetic stock through the nd_experiment_stock table effectively creating a link between the genotype and the stock it described. This additional experiment record is used to allow for recording of additional information relating to the experiment such as the protocol used, the location it was done in and who did it, although only the location is actually required.The nd_experiment type should be "genotype_assay" (a custom cvterm) and the nd_experiment_stock type is meant to indicate how the stock is related to the experiment which in this case could be "participates_in" from the relationship ontology or a more descriptive custom cvterm.

As usual, you need to create a Tripal Bulk Loadin Job (Content > Add Content > Bulk Loading Job) for each file of genotypes you wish to load. When creating the job, make sure to indicate that the file has a header. Furthermore, it is NOT recommended to keep traack of inserted IDs since the magnitude of records will slow the job down drastically. Once the job page is created you need to add a constant set for each column in your matrix (ie: for each germplasm) you wish to load. For example, to load the Sample Genotype Data provided you would need to enter the following constant sets:

Although there are repetitive constants, this allows the template to be as flexible as possible. For example, it supports germplasm from different species in the same file as well as allowing you to load a file with a custom number of columns.

Once you have the constant sets entered, just click on "Submit Job" and run the job from the command-line like you would any other Tripal Job.

 

Sample Data File: 

Bulk Loader Export: 

[{"table":"organism","record_id":"Organism","fields":[{"type":"constant","title":"Genus","field":"genus","required":0,"constant value":"","exposed":1,"exposed_validate":0},{"type":"constant","title":"Species","field":"species","required":0,"constant value":"","exposed":1,"exposed_validate":0}],"mode":"select_once","select_if_duplicate":0,"update_if_duplicate":0,"select_optional":0,"disable":0,"optional":0},{"table":"cv","record_id":"Sequence CV","fields":[{"type":"constant","title":"Name","field":"name","required":0,"constant value":"sequence","exposed":0,"exposed_validate":1}],"mode":"select_once","select_if_duplicate":0,"update_if_duplicate":0,"select_optional":0,"disable":0,"optional":0},{"table":"cv","record_id":"Relationship CV","fields":[{"type":"constant","title":"Name","field":"name","required":0,"constant value":"relationship","exposed":0,"exposed_validate":1}],"mode":"select_once","select_if_duplicate":0,"update_if_duplicate":0,"select_optional":0,"disable":0,"optional":0},{"table":"cvterm","record_id":"Marker Type","fields":[{"type":"constant","title":"Name","field":"name","required":0,"constant value":"genetic_marker","exposed":0,"exposed_validate":1},{"type":"foreign key","title":"CV","field":"cv_id","show_all_records":0,"foreign key":"Sequence CV","foreign field":"cv_id","required":0}],"mode":"select_once","select_if_duplicate":0,"update_if_duplicate":0,"select_optional":0,"disable":0,"optional":0},{"table":"feature","record_id":"Marker","fields":[{"type":"table field","title":"Unique Name","field":"uniquename","required":0,"spreadsheet column":"1","exposed":0,"exposed_description":""},{"type":"foreign key","title":"Organism","field":"organism_id","show_all_records":0,"foreign key":"Organism","foreign field":"organism_id","required":0},{"type":"foreign key","title":"Type","field":"type_id","show_all_records":0,"foreign key":"Marker Type","foreign field":"cvterm_id","required":0}],"mode":"insert","select_if_duplicate":1,"update_if_duplicate":0,"select_optional":0,"disable":0,"optional":0},{"table":"cvterm","record_id":"Stock Type","fields":[{"type":"constant","title":"Name","field":"name","required":0,"constant value":"genomic_DNA","exposed":0,"exposed_validate":1},{"type":"foreign key","title":"CV","field":"cv_id","show_all_records":0,"foreign key":"Sequence CV","foreign field":"cv_id","required":0}],"mode":"select_once","select_if_duplicate":0,"update_if_duplicate":0,"select_optional":0,"disable":0,"optional":0},{"table":"stock","record_id":"Stock","fields":[{"type":"constant","title":"Unique Name","field":"uniquename","required":0,"constant value":"","exposed":1,"exposed_validate":0},{"type":"foreign key","title":"Type","field":"type_id","show_all_records":0,"foreign key":"Stock Type","foreign field":"cvterm_id","required":0},{"type":"foreign key","title":"Organism","field":"organism_id","show_all_records":0,"foreign key":"Organism","foreign field":"organism_id","required":0}],"mode":"insert","select_if_duplicate":1,"update_if_duplicate":0,"select_optional":0,"disable":0,"optional":0},{"table":"cvterm","record_id":"Genotype Type","fields":[{"type":"constant","title":"Name","field":"name","required":0,"constant value":"genotype","exposed":0,"exposed_validate":1},{"type":"foreign key","title":"CV","field":"cv_id","show_all_records":0,"foreign key":"Sequence CV","foreign field":"cv_id","required":0}],"mode":"select_once","select_if_duplicate":0,"update_if_duplicate":0,"select_optional":0,"disable":0,"optional":0},{"table":"genotype","record_id":"Genotype","fields":[{"type":"table field","title":"Allele Call","field":"description","required":0,"spreadsheet column":"2","exposed":1,"exposed_description":"Enter the column number that contains the allele calls for the current stock (the first column is 1)."},{"type":"table field","title":"Unique Name (Generated)","field":"uniquename","required":0,"spreadsheet column":"2","exposed":1,"exposed_description":"Enter the column number that contains the allele calls for the current stock (the first column is 1).","regex":{"pattern":["\/^(.*)$\/"],"replace":["<#column:1#>_\\1"]}},{"type":"foreign key","title":"Type","field":"type_id","show_all_records":0,"foreign key":"Genotype Type","foreign field":"cvterm_id","required":0}],"mode":"insert","select_if_duplicate":1,"update_if_duplicate":0,"select_optional":0,"disable":0,"optional":0},{"table":"cvterm","record_id":"Genotype Feature Link Type","fields":[{"type":"constant","title":"Name","field":"name","required":0,"constant value":"derives_from","exposed":0,"exposed_validate":1},{"type":"foreign key","title":"CV","field":"cv_id","show_all_records":0,"foreign key":"Relationship CV","foreign field":"cv_id","required":0}],"mode":"select_once","select_if_duplicate":0,"update_if_duplicate":0,"select_optional":0,"disable":0,"optional":0},{"table":"feature_genotype","record_id":"Genotype => Feature Link","fields":[{"type":"foreign key","title":"Genotype","field":"genotype_id","show_all_records":0,"foreign key":"Genotype","foreign field":"genotype_id","required":0},{"type":"foreign key","title":"Feature (Marker)","field":"feature_id","show_all_records":0,"foreign key":"Marker","foreign field":"feature_id","required":0},{"type":"foreign key","title":"Type","field":"cvterm_id","show_all_records":0,"foreign key":"Genotype Feature Link Type","foreign field":"cvterm_id","required":0},{"type":"constant","title":"Rank","field":"rank","required":0,"constant value":"0","exposed":0,"exposed_validate":0},{"type":"constant","title":"CGroup","field":"cgroup","required":0,"constant value":"0","exposed":0,"exposed_validate":0}],"mode":"insert","select_if_duplicate":1,"update_if_duplicate":0,"select_optional":0,"disable":0,"optional":0},{"table":"cvterm","record_id":"ND Experiment Type","fields":[{"type":"constant","title":"Name","field":"name","required":0,"constant value":"genotype","exposed":0,"exposed_validate":1},{"type":"foreign key","title":"CV","field":"cv_id","show_all_records":0,"foreign key":"Sequence CV","foreign field":"cv_id","required":0}],"mode":"select_once","select_if_duplicate":0,"update_if_duplicate":0,"select_optional":0,"disable":0,"optional":0},{"table":"nd_geolocation","record_id":"Geolocation","fields":[{"type":"constant","title":"Location Description","field":"description","required":0,"constant value":"","exposed":1,"exposed_validate":0}],"mode":"insert_once","select_if_duplicate":1,"update_if_duplicate":0,"select_optional":0,"disable":0,"optional":0},{"table":"nd_experiment","record_id":"ND Experiment","fields":[{"type":"foreign key","title":"Type","field":"type_id","show_all_records":0,"foreign key":"ND Experiment Type","foreign field":"cvterm_id","required":0},{"type":"foreign key","title":"Geolocation","field":"nd_geolocation_id","show_all_records":0,"foreign key":"Geolocation","foreign field":"nd_geolocation_id","required":0}]},{"table":"nd_experiment_genotype","record_id":"ND Experiment => Genotype","fields":[{"type":"foreign key","title":"ND Experiment","field":"nd_experiment_id","show_all_records":0,"foreign key":"ND Experiment","foreign field":"nd_experiment_id","required":0},{"type":"foreign key","title":"Genotype","field":"genotype_id","show_all_records":0,"foreign key":"Genotype","foreign field":"genotype_id","required":0}]},{"table":"cvterm","record_id":"Nd Experiment Stock Link Type","fields":[{"type":"constant","title":"Name","field":"name","required":0,"constant value":"has_participant","exposed":0,"exposed_validate":1},{"type":"foreign key","title":"CV","field":"cv_id","show_all_records":0,"foreign key":"Relationship CV","foreign field":"cv_id","required":0}],"mode":"select_once","select_if_duplicate":0,"update_if_duplicate":0,"select_optional":0,"disable":0,"optional":0},{"table":"nd_experiment_stock","record_id":"ND Experiment => Stock","fields":[{"type":"foreign key","title":"ND Experiment","field":"nd_experiment_id","show_all_records":0,"foreign key":"ND Experiment","foreign field":"nd_experiment_id","required":0},{"type":"foreign key","title":"Stock","field":"stock_id","show_all_records":0,"foreign key":"Stock","foreign field":"stock_id","required":0},{"type":"foreign key","title":"Type","field":"type_id","show_all_records":0,"foreign key":"Nd Experiment Stock Link Type","foreign field":"cvterm_id","required":0}]}]