Database indexes are conceptually very simple, but in practice, I’ve found that it’s hard to predict when they’ll get used and what indexes a given table needs. On a project at work I came up with the idea to test-drive my database indexes, just like I test-drive the rest of my code. I’d like to share the approach I came up with.
The Goal
Conceptually, this is how I wanted to write a test:
my_model_spec.rb
describeMyModeldoincludeCheckForFullTableScansit'performs a complex query and returns data'docheck_for_full_table_scans(MyModel.db)doresults=MyModel.for_timeframe('current_year').widget_detailsresults.shouldeq(whatever)endendend
Essentially, just wrap a block of code in check_for_full_table_scans. Besides running the provided block, I wanted check_for_full_table_scans to run an explain plan on each underlying query, keep track of all of them, and, at the completion of the block, raise an error if any of the explain plans included full table scans. In this way, I could start with a failing test, add an index, and watch the test pass after the change–just like with normal TDD. In addition, the tests would provide regression coverage to protect against future schema changes introducing full table scans.
The Code
Here’s what I came up with:
check_for_full_table_scans.rb
moduleCheckForFullTableScansFullTableScanError=Class.new(StandardError)defcheck_for_full_table_scans(db)extension=FullTableScanChecker::DatabaseExtensionunlessdb.singleton_class.ancestors.include?(extension)db.extendextensionendFullTableScanChecker.check{yield}endclassFullTableScanCheckerattr_reader:explain_plansExplainPlan=Struct.new(:sql,:results,:backtrace)dodefformatted_backtrace(['']+backtrace).join("\n ")endenddefinitialize@explain_plans=[]enddefself.current_instanceThread.current[:__full_table_scan_checker]enddefself.checkinstance=newThread.current[:__full_table_scan_checker]=instanceyield.tap{instance.verify_no_full_table_scans}ensureThread.current[:__full_table_scan_checker]=nilenddefexplain_sql(sql)returnsqlifsql.start_with?("EXPLAIN ")"EXPLAIN "+sqlenddefverify_no_full_table_scansbad_plans=explain_plans.select{|ep|has_full_table_scan?(ep)}returnifbad_plans.empty?msg_parts=["expected the executed queries would have no full table scans, but had #{bad_plans.size}:"]+bad_plans.mapdo|plan|" - SQL: #{plan.sql}\n Explain Results: #{plan.results.inspect}"+"\n Called From:#{plan.formatted_backtrace}"endraiseFullTableScanError,msg_parts.join("\n\n")enddefhas_full_table_scan?(explain_plan)explain_plan.results.any?do|result|# http://dev.mysql.com/doc/refman/5.0/en/explain-output.html#jointype_all# http://dev.mysql.com/doc/refman/5.0/en/explain-output.html#explain-extra-informationresult.fetch(:type)=="ALL"||result.fetch(:Extra)=~/full scan/iendendmoduleDatabaseExtensiondef_execute(connection,sql,opts,&block)full_table_scan_checker=FullTableScanChecker.current_instanceiffull_table_scan_checker&&opts[:type]==:selectexplain_sql=full_table_scan_checker.explain_sql(sql)backtrace=callersuper(connection,explain_sql,opts)do|result|ep=ExplainPlan.new(sql,result.enum_for(:each).to_a,backtrace)full_table_scan_checker.explain_plans<<ependendsuper(connection,sql,opts.dup,&block)endendendend
DatabaseExtension is the piece that hooks into each query to run the explain plan. On this project we’re using Sequel, which runs every query through an _execute method, so that’s where we hook in. You’ll notice this method has two calls to super; that’s because when we have a current FullTableScanChecker instance (which indicates we’re in a check_for_full_table_scans block) and it is a SELECT query, we want to run the query twice: once prepended with EXPLAIN to get the explain plan, and once as normal to get the real results. We store the SQL statement, the explain plan results, and the backtrace in a list of explain plans on the current FullTableScanChecker instance.
This module gets extended onto the database instance in check_for_full_table_scans if it has not already been extended with the module:
This also delegates to FullTableScanChecker to do the actual checking for full table scans. That class uses a thread local to store the current instance:
As you can see, check yields, then checks all of the collected explain plans using verify_no_full_table_scans:
check_for_full_table_scans.rb
defverify_no_full_table_scansbad_plans=explain_plans.select{|ep|has_full_table_scan?(ep)}returnifbad_plans.empty?msg_parts=["expected the executed queries would have no full table scans, but had #{bad_plans.size}:"]+bad_plans.mapdo|plan|" - SQL: #{plan.sql}\n Explain Results: #{plan.results.inspect}"+"\n Called From:#{plan.formatted_backtrace}"endraiseFullTableScanError,msg_parts.join("\n\n")enddefhas_full_table_scan?(explain_plan)explain_plan.results.any?do|result|# http://dev.mysql.com/doc/refman/5.0/en/explain-output.html#jointype_all# http://dev.mysql.com/doc/refman/5.0/en/explain-output.html#explain-extra-informationresult.fetch(:type)=="ALL"||result.fetch(:Extra)=~/full scan/iendend
This looks for full table scans based on some things I found mentioned in the MySQL docs. There are probably other things I could check for as well.
The Output
Here’s what the output from this code looks like:
CheckForFullTableScans::FullTableScanError:
expected the executed queries would have no full table scans, but had 1:
- SQL: SELECT * FROM `top_rankings` INNER JOIN `ranking_deltas` ON ((`ranking_deltas`.`top_ranking_id` = `top_rankings`.`id`) AND (`ranking_deltas`.`type` = 'previous_week')) WHERE (`top_rankings`.`competitor_name` = 'Some Competitor') ORDER BY `ranking_deltas`.`sort_order` DESC
Explain Results: [{:id=>1, :select_type=>"SIMPLE", :table=>"top_rankings", :type=>"ALL", :possible_keys=>"PRIMARY", :key=>nil, :key_len=>nil, :ref=>nil, :rows=>4, :Extra=>"Using where; Using temporary; Using filesort"}, {:id=>1, :select_type=>"SIMPLE", :table=>"ranking_deltas", :type=>"eq_ref", :possible_keys=>"ranking_deltas_top_ranking_id_type_index", :key=>"ranking_deltas_top_ranking_id_type_index", :key_len=>"771", :ref=>"vanguard_test.top_rankings.id,const", :rows=>1, :Extra=>"Using where"}]
Called From:
<full backtrace>
Conclusion
We’ve been using this for a couple of months, and it’s working really well for us so far. As I mentioned above, we’re using Sequel; I’m sure you could use a similar technique for ActiveRecord, but you’d have to hook into each query in a different manner (I suspect most of FullTableScanChecker could remain the same, though).
I’d be curious to hear if anyone has tried something like this before!