Views module has long been the killer feature of Drupal, making it easy for a site builder or skilled administrator to essentially create complex SQL queries through a web interface, without knowing SQL. All kinds of things are possible through views - relationships, filters, sorting, access control, aggregation, argument handling, and more.

What makes it even more powerful are views handlers. If there is some corner case you need to address, it's not that hard for a skilled developer to create a custom views handler to insert the SQL you need into the query, while exposing a nice configuration form for the site builder. There are tutorials and instructions all over the place to guide you in creating various kinds of field handlers, and you can also alter the query views generates through some "hook" code.

Yet certain problems can be really hard to solve with views, and recently I came across two that were very similar:

  1. A company holds multi-day seminars, and stores each day as a separate date range entry in a multi-value field, but wanted to list upcoming seminars with a single date range formatted with no duplication -- e.g. instead of "April 20, 2020 - April 20, 2020, April 21, 2020 - April 21, 2020, April 22, 2020 - April 22, 2020" they wanted it to just say "April 20 - 22, 2020". And not have the same class appear 3 times.
  2. For each event, they wanted a report showing how many people have registered to attend each upcoming seminar.

These seem like simple, obvious requests -- and yet they are nearly impossible to implement in views without some serious behind the scenes coding.

Deciding upon an approach

How do you solve a problem like this? There are many different ways to go about this. As an experienced Drupal developer, I came up with 4 different ways, and I'm sure there's others:

  1. Create an aggregate view
  2. Create a custom views field handler
  3. Alter the views query
  4. Use a views subquery join handler and create an aggregate field.

Here's a brief run-down of these, with the one I chose and why.

Create an aggregate view

This is the obvious answer for creating reports: Check the box under "Advanced" to make the entire view an aggregate view. This changes each field to either be aggregated by some SQL aggregation function, or be used as a unique row.

This approach is ok for summary reports, but it feels a bit like a sledgehammer, especially for the date range field -- it's a lot of extra work to set up an aggregate view, and it's extremely tricky to get right, especially when you go to filter out data to match exactly what you want. And, I have the tingly sense that I've tried this before and failed -- there are certain things you cannot do when you aggregate the entire view -- I can't necessarily put my finger on just what doesn't work in these cases, but I know it's not going to work in the end.

Create a custom views field handler

Drupal Console makes it really easy to generate a plugin for Drupal 8, and views handlers are all plugins. It's actually really easy to make a custom views handler, especially if you can base it on an existing handler and just override whatever it is you need to make custom.

This seemed at first to be the way to go. For years, taxonomy fields were multi-valued -- e.g. multiple terms can be associated with a single node -- and the taxonomy field handlers have options to select whether you want an entirely new row for each term, or to collapse all the terms into a single field, perhaps separated by a comma.

You would think this would work well for the date field, and it can certainly give us a count of attendees to an event with a little magic sql under the hood.

I whipped up a field handler for the date, grabbing the earliest start date, the latest end date, and doing some ugly but workable code to show what the client wanted. And hit the problem with this approach: We didn't need "just" a field, but also a filter and a sort.

We only wanted to show future seminars, and we wanted them sorted by date. But as soon as I added a date filter or sort, because there were multiple values, each seminar ended up with multiple rows, one for each of the date range values. Crap.

I could make a filter handler. And a sort handler. But to get them to all use the same query started to feel -- dirty. Lots of repeating code, lots of altering queries or checking on whether the other handlers were already instantiated, lots of stuff that just felt wrong.

Especially when I was in the views_data structures trying to hook them up -- it seemed entirely too hard.

Alter the views query

I probably spent upwards of 12 hours scouring the Internet, and the Drupal codebase for help or example code that might get me there. This seemed to be a problem that people just rolled their sleeves up and hacked their way through -- usually using a views_query_alter.

No doubt, this is a fast way to get the job done -- just hook into the generated query, and alter it as you see fit.

The biggest problem is, years down the road when you need to change something, you'll look at the view and say WTF? How does this thing even work? You end up with a views UI that just plain lies to you -- it does not make it clear what is really happening.

I'll use a views_query_alter in a pinch, to get the job done -- but if you're reaching for this, it's usually because something wasn't done right.

Use a subquery join

Scouring the codebase, I found a views join handler called "subquery". That's exactly what I want to do -- create a small aggregate subquery, and join the main query to that. This solves both my problems -- I can create essentially a few aggregate fields while keeping the main sql query non-aggregated.

The problem is, I don't think this handler even works! The details are here: Issue #3125146 on Drupal.org. I could not find a single place in the Drupal code base, or on the Internet, where this plugin was used. And it does not make sense to me how the SQL it generates is useful.

But the concept is extremely useful -- having a subquery join makes a lot of sense. The challenge was, how to create and use it.

Implementing an aggregate "dummy" views field

So how to do it? The plugin/handler itself was the easy part. What was extremely hard was to figure out how to hook it up.

In Drupal 8, the views_data structure basically is a registry of all the views handlers, mapped to the data structures they can handle. It is populated through the use of old-style Drupal hooks -- to add items to the views_data structure, you implement a "hook_views_data()" function in your module, and to change existing items, you implement a "hook_views_data_alter()" function. While I found snippets here and there on what to add here, it wasn't until I found a series of posts by Oleksandr Trotsenko about Drupal Views for Developers that I figured out how to hook this all up.

Go read that series first to learn how to create handlers and the views data structure. But there's some huge missing pieces about this data structure that I hope to illuminate here. Because that's where the true power lies...

The subquery join handler

First, the join handler that should be in core:

<?php

namespace Drupal\rng\Plugin\views\join;

use Drupal\Core\Plugin\ContainerFactoryPluginInterface;
use Drupal\views\Plugin\views\join\JoinPluginBase;
use Symfony\Component\DependencyInjection\ContainerInterface;

/**
 * Join handler for relationships that join with a subquery as a table.
 *
 * For example:
 *
 * @code
 * LEFT JOIN (SELECT subquery_fields[], subquery_expressions[]
 *   WHERE subquery_where GROUP BY subquery_groupby) table
 * ON base_table.left_field = table.field
 * @endcode
 *
 * Join definition: same as \Drupal\views\Plugin\views\join\JoinPluginBase,
 * plus:
 * - subquery_fields[]
 * - subquery_expressions[]
 * - subquery_where
 * - subquery_groupby
 *
 * See https://www.drupal.org/project/drupal/issues/3125146.
 *
 * @ingroup views_join_handlers
 * @ViewsJoin("rng_subquery")
 */
class Subquery extends JoinPluginBase implements ContainerFactoryPluginInterface {

  /**
   * @var \Drupal\Core\Database\Connection
   */
  protected $database;

  /**
   * {@inheritDoc}
   */
  public static function create(ContainerInterface $container, array $configuration, $plugin_id, $plugin_definition) {
    $instance = new static($configuration, $plugin_id, $plugin_definition);
    $instance->database = $container->get('database');
    return $instance;
  }

  /**
   * Builds the SQL for the join this object represents.
   *
   * @param \Drupal\Core\Database\Query\SelectInterface $select_query
   *   The select query object.
   * @param string $table
   *   The base table to join.
   * @param \Drupal\views\Plugin\views\query\QueryPluginBase $view_query
   *   The source views query.
   */
  public function buildJoin($select_query, $table, $view_query) {
    $alias = $this->configuration['subquery_alias'];
    $subquery = $this->database->select($this->configuration['subquery_table'], $alias);

    if (!empty($this->configuration['subquery_fields'])) {
      foreach ($this->configuration['subquery_fields'] as $field_alias=>$field) {
        $subquery->addField($alias, $field, $field_alias);
      }
    }
    if (!empty($this->configuration['subquery_expressions'])) {
      foreach ($this->configuration['subquery_expressions'] as $field_alias=>$expression) {
        $subquery->addExpression($expression, $field_alias);
      }
    }
    if (!empty($this->configuration['subquery_groupby'])) {
      $subquery->groupBy($this->configuration['subquery_groupby']);
    }
    if (!empty($this->configuration['subquery_where'])) {
      foreach ($this->configuration['subquery_where'] as $condition) {
        $subquery->where($condition);
      }
    }

    $right_table = $subquery;

    $left_table = $view_query->getTableInfo($this->leftTable);
    $left_field = "$left_table[alias].$this->leftField";

    // Add our join condition, using a subquery on the left instead of a field.
    $condition = "$left_field = $table[alias].$this->field";
    $arguments = [];

    // Tack on the extra.
    // This is just copied verbatim from the parent class, which itself has a
    //   bug: https://www.drupal.org/node/1118100.
    if (isset($this->extra)) {
      if (is_array($this->extra)) {
        $extras = [];
        foreach ($this->extra as $info) {
          // Figure out the table name. Remember, only use aliases provided
          // if at all possible.
          $join_table = '';
          if (!array_key_exists('table', $info)) {
            $join_table = $table['alias'] . '.';
          }
          elseif (isset($info['table'])) {
            $join_table = $info['table'] . '.';
          }

          $placeholder = ':views_join_condition_' . $select_query->nextPlaceholder();

          if (is_array($info['value'])) {
            $operator = !empty($info['operator']) ? $info['operator'] : 'IN';
            // Transform from IN() notation to = notation if just one value.
            if (count($info['value']) == 1) {
              $info['value'] = array_shift($info['value']);
              $operator = $operator == 'NOT IN' ? '!=' : '=';
            }
          }
          else {
            $operator = !empty($info['operator']) ? $info['operator'] : '=';
          }

          $extras[] = "$join_table$info[field] $operator $placeholder";
          $arguments[$placeholder] = $info['value'];
        }

        if ($extras) {
          if (count($extras) == 1) {
            $condition .= ' AND ' . array_shift($extras);
          }
          else {
            $condition .= ' AND (' . implode(' ' . $this->extraOperator . ' ', $extras) . ')';
          }
        }
      }
      elseif ($this->extra && is_string($this->extra)) {
        $condition .= " AND ($this->extra)";
      }
    }
    $select_query->addJoin($this->type, $right_table, $table['alias'], $condition, $arguments);
  }
}

This is basically what the subquery join handler in core should look like.

The "standard" join handler does have a "table formula" configuration that looks like is meant to provide the functionality we're trying to add here. That works, if you create a join handler programmatically and assign a DB Select query object to the "table formula" configuration -- but that's not possible to do from a views_data structure, because query objects aren't something you can serialize into configuration. And at the very bottom of the buildJoin method, the $select_query->addJoin method's second parameter is treated as a table name if it's a string -- and only added as a subquery if it's a Select object.

So the only way to join a subquery is to build it inside this buildJoin() method, which means we need to pass everything needed to create that subquery in as configuration strings.

Views Data Structure

Here's the meat of the entire thing: Views Data. With a join handler like above, you can create all kinds of aggregate fields, without having to create a handler -- you can just let the subquery handle the aggregation and then use existing handlers for the field, filter, and sort.

This ends up being much more elegant, and less confusing, than any of the other alternatives. However this structure is really lacking documentation, so that's what I hope to illuminate here. Read the comments inline...

/**
 * Implements hook_views_data().
 */
function mymodule_views_data() {
  // The top level key is usually the entity type, or a database table, but can actually be anything.
  // Inside that top level key, you need to specify a literal 'table' element, and any number of "fields"
    $data['custom_registrants']['table'] = [
      // The crucial thing to add is the 'join' key
      'join' => [
        // Under the join, the top level key is the "base table" -- e.g. what table is the left side of our join.
        // THIS IS IMPORTANT! Make sure this is the base table for your entity type -- if this base table is in
        // the view, then any fields defined as siblings to the 'table' key will be available in the view.
        'commerce_product_field_data' => [
          // IMPORTANT! 'join_id' must match the Plugin ID defined by the join plugin -- e.g. in the @ViewsJoin annotation.
          'join_id' => 'rng_subquery',
          // The following are configurations available inside the handler.
          // subquery_alias is used inside the subquery for the main table.
          'subquery_alias' => 'mymodule_summary',
          'subquery_table' => 'registrant',
          // the key is the alias for the field, the value is the database column name.
          'subquery_fields' => [
            'target_id' => 'event__target_id',
          ],
          // Expressions can use aggregate functions. The key is the alias for the expression.
          'subquery_expressions' => [
            'num_registrants' => 'count(id)',
          ],
          'subquery_where' => [
            "event__target_type = 'commerce_product'",
          ],
          'subquery_groupby' => 'target_id',
          // This is the id field of the base table (the key for the parent array).
          'left_field' => 'product_id',
          // This field should be an alias of a field inside the subquery.
          'field' => 'target_id',
          // This is the right side of the join -- will be the alias for the entire subquery, and must match
          // the root key of this array structure.
          'table' => 'custom_registrants',
        ],
      ],
    ];
    // This is a sibling of the "table" -- num_registrants is considered a "dummy" field on the "custom_registrants"
    // table -- which itself is the alias of the subquery.
    $data['custom_registrants']['num_registrants'] = [
      'title' => t('Attendee Count'),
      'help' => t('Count of attendees registered for an event'),
      'group' => t('Product'),
      'field' => [
        'title' => t('Attendee count'),
        'help' => t('Count of current attendees'),
        // What type of entity is this field available on
        'entity_type' => 'commerce_product',
        // This is one of the field aliases inside the subquery
        'field_name' => 'num_registrants',
        // This is the field handler plugin to use
        'id' => 'standard',
      ],
      'filter' => [
        'id' => 'numeric',
        'title' => t('Num Registrants'),
        'help' => t('Filter based on the count of attendees'),
        'entity_type' => 'commerce_product',
        'field_name' => 'num_registrants',
      ],
      'sort' => [
        'id' => 'standard',
        'title' => t('Num Registrants'),
        'help' => t('Sort based on count of attendees'),
        'real field' => 'num_registrants',
      ],
    ];

    return $data;
}

Now, that may seem like a lot of boilerplate. But the solution is elegant -- 6 lines for an array that makes it so this field can be added to the available columns to sort is all it takes! Now on a report that uses it, you can click the column header to sort by the most attended course to the least attended (or vice versa) without needing a custom handler. And the filter was just as easy!

 

Permalink

Good job, Thank you!

Add new comment

The content of this field is kept private and will not be shown publicly.

Filtered HTML

  • Web page addresses and email addresses turn into links automatically.
  • Allowed HTML tags: <a href hreflang> <em> <strong> <blockquote cite> <cite> <code> <ul type> <ol start type> <li> <dl> <dt> <dd> <h1> <h2 id> <h3 id> <h4 id> <h5 id> <p> <br> <img src alt height width>
  • Lines and paragraphs break automatically.