Controlling which fields get indexed

By default, Tripal indexes the name, uniquename and all foreign keys (ie: organism_id > organism, type_id > cvterm, dbxref_id > dbxref) for all node types where the machine name is prefixed by "chado_". When indexing the foreign keys it attempts to make a readable representation using a combination of name, genus/species/common_name, accession/db.name. These fields were choosen because they are most likely to be used and meaningful for all Tripal sites without being too large or complicated to index efficiently. However, we recognize that this won't suit everyone needs since all Tripal sites are meant to be different, so we exposed a number of ways to customize what parts of your chado database get indexed. The only restriction (and one enfornced by Drupal) is that you must be able to associate the chado content to a node.

Option #1: Remove fields from index or change their relevance in search results.

To remove fields from your search index or change their relevance/rating in the search results, simply configure the search index fields. This can be done by going to Configure > Search API in the administrative toolbar and clicking on the arrow beside the edit button for the "Default node index". This opens up a drop-down from which you should select "Fields".

This opens up a long listing of all the fields available to be indexed. Simply deselect and fields you would like to remove and click "Save settings". You can also change the "Boost" value where a higher value means any nodes with keywords from this field should be higher up in the result listing than those from fields with a lower boost value.

 

Option #2: Add fields to the index that are not already available.

This option requires that you create a custom module but don't let that scare you off! It really is just a matter of creating a yourmodule.info file (Drupal.org Tutorial) in the correct location to tell Drupal about your module and a yourmodule.module file (Drupal.org Tutorial) which will contain the following code:

/**
 * Implements hook_search_include_chado_fields().
 *
 * This hook allows Tripal Admin/modules to specify which chado fields should be indexed
 * for searching in a simple manner.
 *
 * @return
 *   An array of chado fields you would like available for indexing. Each element should
 *   be the name of the table followed by the field and separated by a period. For example.
 *   feature.uniquename to indicate the uniquename field from the feature table.
 */
function yourmodule_search_include_chado_fields() {
  return array(
    'organism.comment',
    'organism.abbreviation',
  );
}

In the above code-snippet we are telling Tripal to also index the comment and abbreviation for our organism nodes. Once you define the hook above, simply clear the cache and go to the Search API Index Fields configuration as shown in Option #1 and select they newly added fields from the list before clicking "Save settings". Don't forget to re-index!

 

Option #3: Add anything you want!

If neither of the above options allowed you to index the datas you wanted then this option is for you. This option also requires that you create a custom module by creating a yourmodule.info file (Drupal.org Tutorial) in the correct location to tell Drupal about your module and a yourmodule.module file (Drupal.org Tutorial). For this option you are going to modify the node entity properties to describe your data to be indexed. As an example, I'm going to show you how to index feature properties of a specific type. The first step is to implement hook_tripal_search_properties_alter(&$info) in your custom module and alter the node bundle properties for the content type (in our example chado_feature) you would like the search results to link to. This is done in yourmodule.module as shown in the following code-snippet:

/**
 * Implements hook_tripal_search_properties_alter().
 */
function yourmodule_tripal_search_properties_alter (&$info) {
  //dpm($info, 'Entity Properties');
}

Quick Tip: When working on a custom module, I always install the Devel module since it provides a number of very useful functions for development. The most useful function is dpm($var, 'my message'); which allows you to print the contents of any variable to the screen in a very readable way. The dpm function was used to generate the following screenshot.

Next, you need to alter the $info associative array to add your custom field. The screenshot above shows the $info variable (using the dpm function provided via the devel module) with the chado_feature content type expanded to show the feature.name field definition for search indexing. When creating our own custom field we need to include all of the elements that the feature.name field does except perhaps the schema field depending upon the getter callback. The following code snippet creates a custom field to index our feature properties:

/**
 * Implements hook_tripal_search_properties_alter().
 */
function yourmodule_tripal_search_properties_alter (&$info) {
  //dpm($info, 'Entity Properties');
  
  $field_definition = array(
    // The following two elements define how your field will appear in the Search API Index Fields list.
    'label' => 'Feature Properties',
    'description' => 'A custom field to index my feature properties',
    // This defines the type of field. I recommend keeping this as text so keywords from
    // this field can be searched via the main search box.
    'type' => 'text',
    // The name of a function that you will define to return the value of your
    // field for a given node.
    'getter callback' => 'yourmodule_feature_properties_getter_callback'
  );
  $info['node']['bundles']['chado_feature']['properties']['feature_properties'] = $field_definition;
}

After you clear the cache the above field should appear in the fields list of the "Default node index" (Configure > Search API on the administrative toolbar), don't forget to check that all important little checkbox or your field will not get indexed and you will not see any dpms from your getter callback ;-). Next you need to define the getter callback in order to tell the search api how to index your field. In the following code snippet we define a getter callback that expands a feature node to include feature properties and then return the type name and value of all properties to be indexed.

/**
 * A callback to return the feature properties for a given node.
 *
 * @param $data
 *   The entity object (i.e. the node we need to retrieve feature properties for)
 * @param $options
 * @param $field_name
 *   The key you defined under properties in hook_tripal_search_properties_alter().
 *   In our example, this will be feature_properties.
 * @param $info
 *   The full field definition you defined in hook_tripal_search_properties_alter().
 *
 * @return
 *   A string representing the feature properties of the given node.
 */
function yourmodule_feature_properties_getter_callback ($data, $options, $field_name, $type, $info) {
  $keywords = array();

  // First check that we are dealing with a feature node.
  if ($data->type == 'chado_feature') {

    // First expand the featureprop table for the given node.
    $feature = chado_expand_var($data->feature, 'table', 'featureprop');

    // Then iterate through each property and grab out keywords...
    if (is_array($feature->featureprop)) {
      foreach($feature->featureprop as $prop) {

        // You could easily filter by type at this point if you only wanted
        // to index a given set of property types.

        // ... like the actual value of the property ;-) ...
        $keywords[] = $prop->value;
        // ... and the human-readable type of property...
        $keywords[] = $prop->type_id->name;

        // You might also want to process the value for keywords if you have
        // something unreadable like XML blast results ;-).
      }
    }
  }

  // Then just concatenate all the keywords together separated with spaces
  // and let your search api database deal with them from here.
  if (!empty($keywords)) {
    return implode(' ',$keywords);
  }
  else {
    // Make sure you return NULL and not just an empty string!
    return NULL;
  }
}
Quick Tip: You will only see changes and dpms from hook_tripal_search_properties_alter() when you clear the cache and you will only see changes and dpms for you custom getter callback when you re-index your content.

And that's it! After re-indexing your content you should be able to search for a word stored in the featureprop table and be told which feature node contains it. For example, below I have a comment feature property indicating that IPSUM.1 is my favourite gene :-).