Test Driving Database Indexes

Database indexes are conceptually very simple, but in practice, I’ve found that it’s hard to predict when they’ll get used and what indexes a given table needs. On a project at work I came up with the idea to test-drive my database indexes, just like I test-drive the rest of my code. I’d like to share the approach I came up with.

The Goal

Conceptually, this is how I wanted to write a test:

my_model_spec.rb
describe MyModel do
  include CheckForFullTableScans

  it 'performs a complex query and returns data' do
    check_for_full_table_scans(MyModel.db) do
      results = MyModel.for_timeframe('current_year').widget_details
      results.should eq(whatever)
    end
  end
end

Essentially, just wrap a block of code in check_for_full_table_scans. Besides running the provided block, I wanted check_for_full_table_scans to run an explain plan on each underlying query, keep track of all of them, and, at the completion of the block, raise an error if any of the explain plans included full table scans. In this way, I could start with a failing test, add an index, and watch the test pass after the change–just like with normal TDD. In addition, the tests would provide regression coverage to protect against future schema changes introducing full table scans.

The Code

Here’s what I came up with:

check_for_full_table_scans.rb
module CheckForFullTableScans
  FullTableScanError = Class.new(StandardError)

  def check_for_full_table_scans(db)
    extension = FullTableScanChecker::DatabaseExtension
    unless db.singleton_class.ancestors.include?(extension)
      db.extend extension
    end

    FullTableScanChecker.check { yield }
  end

  class FullTableScanChecker
    attr_reader :explain_plans

    ExplainPlan = Struct.new(:sql, :results, :backtrace) do
      def formatted_backtrace
        ([''] + backtrace).join("\n       ")
      end
    end

    def initialize
      @explain_plans = []
    end

    def self.current_instance
      Thread.current[:__full_table_scan_checker]
    end

    def self.check
      instance = new
      Thread.current[:__full_table_scan_checker] = instance
      yield.tap { instance.verify_no_full_table_scans }
    ensure
      Thread.current[:__full_table_scan_checker] = nil
    end

    def explain_sql(sql)
      return sql if sql.start_with?("EXPLAIN ")
      "EXPLAIN " + sql
    end

    def verify_no_full_table_scans
      bad_plans = explain_plans.select { |ep| has_full_table_scan?(ep) }
      return if bad_plans.empty?

      msg_parts = ["expected the executed queries would have no full table scans, but had #{bad_plans.size}:"] +
        bad_plans.map do |plan|
          "  - SQL: #{plan.sql}\n    Explain Results: #{plan.results.inspect}" +
          "\n    Called From:#{plan.formatted_backtrace}"
        end

      raise FullTableScanError, msg_parts.join("\n\n")
    end

    def has_full_table_scan?(explain_plan)
      explain_plan.results.any? do |result|
        # http://dev.mysql.com/doc/refman/5.0/en/explain-output.html#jointype_all
        # http://dev.mysql.com/doc/refman/5.0/en/explain-output.html#explain-extra-information
        result.fetch(:type) == "ALL" ||
        result.fetch(:Extra) =~ /full scan/i
      end
    end

    module DatabaseExtension
      def _execute(connection, sql, opts, &block)
        full_table_scan_checker = FullTableScanChecker.current_instance

        if full_table_scan_checker && opts[:type] == :select
          explain_sql = full_table_scan_checker.explain_sql(sql)
          backtrace = caller
          super(connection, explain_sql, opts) do |result|
            ep = ExplainPlan.new(sql, result.enum_for(:each).to_a, backtrace)
            full_table_scan_checker.explain_plans << ep
          end
        end

        super(connection, sql, opts.dup, &block)
      end
    end
  end
end

Let’s take it a piece at a time:

check_for_full_table_scans.rb
module DatabaseExtension
  def _execute(connection, sql, opts, &block)
    full_table_scan_checker = FullTableScanChecker.current_instance

    if full_table_scan_checker && opts[:type] == :select
      explain_sql = full_table_scan_checker.explain_sql(sql)
      backtrace = caller
      super(connection, explain_sql, opts) do |result|
        ep = ExplainPlan.new(sql, result.enum_for(:each).to_a, backtrace)
        full_table_scan_checker.explain_plans << ep
      end
    end

    super(connection, sql, opts.dup, &block)
  end
end

DatabaseExtension is the piece that hooks into each query to run the explain plan. On this project we’re using Sequel, which runs every query through an _execute method, so that’s where we hook in. You’ll notice this method has two calls to super; that’s because when we have a current FullTableScanChecker instance (which indicates we’re in a check_for_full_table_scans block) and it is a SELECT query, we want to run the query twice: once prepended with EXPLAIN to get the explain plan, and once as normal to get the real results. We store the SQL statement, the explain plan results, and the backtrace in a list of explain plans on the current FullTableScanChecker instance.

This module gets extended onto the database instance in check_for_full_table_scans if it has not already been extended with the module:

check_for_full_table_scans.rb
def check_for_full_table_scans(db)
  extension = FullTableScanChecker::DatabaseExtension
  unless db.singleton_class.ancestors.include?(extension)
    db.extend extension
  end

  FullTableScanChecker.check { yield }
end

This also delegates to FullTableScanChecker to do the actual checking for full table scans. That class uses a thread local to store the current instance:

check_for_full_table_scans.rb
def self.current_instance
  Thread.current[:__full_table_scan_checker]
end

def self.check
  instance = new
  Thread.current[:__full_table_scan_checker] = instance
  yield.tap { instance.verify_no_full_table_scans }
ensure
  Thread.current[:__full_table_scan_checker] = nil
end

As you can see, check yields, then checks all of the collected explain plans using verify_no_full_table_scans:

check_for_full_table_scans.rb
def verify_no_full_table_scans
  bad_plans = explain_plans.select { |ep| has_full_table_scan?(ep) }
  return if bad_plans.empty?

  msg_parts = ["expected the executed queries would have no full table scans, but had #{bad_plans.size}:"] +
    bad_plans.map do |plan|
      "  - SQL: #{plan.sql}\n    Explain Results: #{plan.results.inspect}" +
      "\n    Called From:#{plan.formatted_backtrace}"
    end

  raise FullTableScanError, msg_parts.join("\n\n")
end

def has_full_table_scan?(explain_plan)
  explain_plan.results.any? do |result|
    # http://dev.mysql.com/doc/refman/5.0/en/explain-output.html#jointype_all
    # http://dev.mysql.com/doc/refman/5.0/en/explain-output.html#explain-extra-information
    result.fetch(:type) == "ALL" ||
    result.fetch(:Extra) =~ /full scan/i
  end
end

This looks for full table scans based on some things I found mentioned in the MySQL docs. There are probably other things I could check for as well.

The Output

Here’s what the output from this code looks like:

CheckForFullTableScans::FullTableScanError:
   expected the executed queries would have no full table scans, but had 1:

     - SQL: SELECT * FROM `top_rankings` INNER JOIN `ranking_deltas` ON ((`ranking_deltas`.`top_ranking_id` = `top_rankings`.`id`) AND (`ranking_deltas`.`type` = 'previous_week')) WHERE (`top_rankings`.`competitor_name` = 'Some Competitor') ORDER BY `ranking_deltas`.`sort_order` DESC
       Explain Results: [{:id=>1, :select_type=>"SIMPLE", :table=>"top_rankings", :type=>"ALL", :possible_keys=>"PRIMARY", :key=>nil, :key_len=>nil, :ref=>nil, :rows=>4, :Extra=>"Using where; Using temporary; Using filesort"}, {:id=>1, :select_type=>"SIMPLE", :table=>"ranking_deltas", :type=>"eq_ref", :possible_keys=>"ranking_deltas_top_ranking_id_type_index", :key=>"ranking_deltas_top_ranking_id_type_index", :key_len=>"771", :ref=>"vanguard_test.top_rankings.id,const", :rows=>1, :Extra=>"Using where"}]
       Called From:
          <full backtrace>

Conclusion

We’ve been using this for a couple of months, and it’s working really well for us so far. As I mentioned above, we’re using Sequel; I’m sure you could use a similar technique for ActiveRecord, but you’d have to hook into each query in a different manner (I suspect most of FullTableScanChecker could remain the same, though).

I’d be curious to hear if anyone has tried something like this before!

blog comments powered by Disqus

About Me

Husband and father, musician, software engineer at SEOmoz, open source developer specializing in Ruby and Rails, world traveler and Christian.