Working with Large Record Sets

When expanding tables it is sometimes possible to return hundreds or thousands (or more) of records. There are two problems that can occur when large datasets are returned

  1. When an object is expanded to include tables in a one-to-many relationship using the chado_expand_var() function it will automatically recurse through all of the one-to-one foreign-key relationships and expand the object. However, recursion through all of their foreign-key relationships may dramatically slow the page load.
  2. Thousands of records can be too much to display on the browser, making the page load slowly or not at all. Therefore, limiting the number of result and providing a pager is desired.

Limiting Recursion During Object Expansion

To limit recursion during expansion of an object, we must pass additional options to the chado_expand_var() function.  To demonstrate the use of these options an example using the featuremap table of Chado will be used.  The featuremap table of Chado is used for storing maps such as genetic maps.  A map may have many features (e.g. genetic markers) located in various positions along landmark features (e.g. linkage groups).  The features that are located in positions along the map are housed in the featurepos table of Chado.  The Tripal Map module of Chado handles display of records in the featuremap table of Chado.  In this module, the tripal_featuremap_featurepos.tpl.php template tries to expand the featuremap object to include the features that are part of the map.  To do this, it uses the chado_expand_var() function to include the featurepos table.  The featurepos table of Chado has the following structure:

F-Key Name Type Description
  featurepos_id serial PRIMARY KEY
  mappos double precision NOT NULL

feature

feature_id integer NOT NULL

feature

map_feature_id integer NOT NULL

map_feature_id links to the feature (map) upon which the feature is being localized.

featuremap

featuremap_id serial NOT NULL

If the Tripal Map module were to simply call the chado_expand_var() function, and if the map had thousands of features, then recursion would occur for the these fields:  feature_id, map_feature_id, and featuremap_id . (The map_feature_id indicates the landmark feature, the feature_id indicates the feature on the map, and the featuremap_id indicates the featuremap record, and the position of the feature is stored in the mappos field).   We can limit the amount of recursion to include just information we need to display on the page.  To do this we can pass a set of options to the chado_expand_var() function as shown in the following code example:

$featuremap = $variables['node']->featuremap;
$feature_positions = array();

// get the features aligned on this map
$options = array(  
  'return_array' => 1,
  'order_by' => array('map_feature_id' => 'ASC'),
  'include_fk' => array(
    'map_feature_id' => array(
      'type_id' => 1,
      'organism_id' => 1,
    ),
    'feature_id' => array(
      'type_id' => 1,
    ),
    'featuremap_id' => array(
       'unittype_id' => 1,
    ),
  ),
);

$featuremap = chado_expand_var($featuremap, 'table', 'featurepos', $options);
$feature_positions = $featuremap->featurepos;

In the options passed to the tripal_core_expand_chado_vars function, notice the include_fk element. Here we specify which fields of the featurepos table we want to recurse. We want the type and organism information for the landmark feature of the map (map_feature_id), we want the type information from the feature on the map (feature_id field) and we want the map units (featuremap_id field).  When the featurepos table is expanded, it will recurse using the map_feature_id and add the feature object. Then it will continue to recurse further into the feature object and add an object for the cvterm table via the type_id and also add an organism object via the organism_id. It will not recurse further than those tables. Normally, the record for the cv table would be added as well via the cv_id FK relationships in the cvterm table. But, because we did not specify to expand that relationship it is not expanded.

Using a Pager for Large Data Sets

You can use a pager, to limit the number of records returned when expanding an object.  The pager limits the results shown to a specified number and provides links below to allow the user to navigate through "pages" of results.  To create the pager we must add some additional elements to the options passed to the chado_expand_var() function.   The following code builds on the example shown previously for limiting recursion for maps.  Below is the same options array but with new elements added for defining the pager:

// expand the featuremap object to include the records from the featurepos table
// specify the number of features to show by default and the unique pager ID
$num_results_per_page = 25;
$featurepos_pager_id = 0;

// get the features aligned on this map
$options = array(  
  'return_array' => 1,
  'order_by' => array('map_feature_id' => 'ASC'),
  'pager' => array(
    'limit' => $num_results_per_page,
    'element' => $featurepos_pager_id
  ),
  'include_fk' => array(
    'map_feature_id' => array(
      'type_id' => 1,
      'organism_id' => 1,
    ),
    'feature_id' => array(
      'type_id' => 1,
    ),
    'featuremap_id' => array(
       'unittype_id' => 1,
    ),
  ),
);

To create a pager, the pager element is added to the options.   It has two subkeys the limit (or the number of records to return) and the element which is a unique integer identifying the pager. If you have multiple pagers on a page then each one must have a unique element number.  The chado_expand_var() function will now return only the number of records specified and will manage data access via a pager.  The pager can be added to the page using the following Drupal theme function call:

  // the $pager array values that control the behavior of the pager.  For
  // documentation on the values allows in this array see:
  // https://api.drupal.org/api/drupal/includes!pager.inc/function/theme_pager/7
  // here we add the paramter 'block' => 'features'. This is because
  // the pager is not on the default block that appears. When the user clicks a
  // page number we want the browser to re-appear with the page is loaded.
  $pager = array(
    'tags' => array(),
    'element' => $featurepos_pager_id,
    'parameters' => array(
      'block' => 'featurepos'
    ),
    'quantity' => $num_results_per_page,
  );
  print theme_pager($pager);

To see a working example, visit the Tripal demo site at this page: http://demo.tripal.info/2.x/node/3637?block=featurepos