SQL EXPLAIN

Description

In BBj 23.00 and higher, The EXPLAIN statement is used to analyze and provide information about the execution plan of a query. It helps to understand how BBj’s SQL engine will execute a query and provides insights into the optimization process. By examining the execution plan, you can identify potential bottlenecks, inefficient queries, and make informed decisions to optimize your queries.

By analyzing the output of the EXPLAIN statement, you can identify areas where indexes can be added or modified, or where query optimizations can be made to improve the performance of your queries.

Understanding the Results

The following is an example of a simple EXPLAIN statement using the sample ChileCompany database:

EXPLAIN SELECT cust_num, order_num, order_date, first_name, last_name
FROM order_header oh INNER JOIN customer c ON oh.cust_num = c.cust_num
WHERE last_name = 'Baldrake'

Note:

The SELECT statement used in conjunction with EXPLAIN can be arbitrarily complex.

Results

The output of the EXPLAIN statement resembles a standard ResultSet, similar to a SELECT statement. Each row offers valuable insights into the execution plan of the SELECT statement.

Explanation

TABLE

This references the single table related to the data for that row. Note that rows 1 and 2 have <null> in this column. This is because the data in those rows refer to table order rather than a single table (see TABLE_ORDER for more information).

Note:

The other relevant columns when there is a non-null value in this column are USABLE_KEY, USABLE_KNUM, ACTUALLY_USED, and SCORE. The TABLE_ORDER column does not apply.

TABLE_ORDER

Rows 1 and 2 refer to table order, so they have non-null data in the TABLE_ORDER column and null data in the TABLE column. Rows referring to table order provide information about the automatic reordering of tables that the SQL engine does during optimization (when possible).

Note:

When there is a non-null value in this column, the two relevant columns are ACTUALLY_USED and SCORE. The other columns in the result data do not apply.

USABLE_KEY

This column references the columns that make up the index/key being considered for use during optimization. Each key on a BBx data file such as an MKEYED, XKEYED, or VKEYED file corresponds to an SQL index.

Note:

The key number on the file is represented by the value in USABLE_KNUM.

The order of the columns in the key play an important role during optimization. For example, if you have an index with the segments defined as LAST_NAME, FIRST_NAME and a WHERE clause is used that only refers to the FIRST_NAME, the SQL engine cannot use this for optimization. However, if the WHERE clause only refers to the LAST_NAME column, it could be used since LAST_NAME is the first segment in the index/key.

So, when you see an entry in this column it indicates that this key meets the criteria necessary to be considered for optimization. However, the SCORE will determine which one the SQL engine ultimately chooses.

USABLE_KNUM

This column is simply the key number on the underlying data file for the table that corresponds to the segments shown in the USABLE_KEY column. See USABLE_KEY for an explanation of how the SQL engine uses this information.

ACTUALLY_USED

This column contains true or false, indicating whether or not the SQL engine determined to use the key number shown in USABLE_KNUM for optimization. When optimizing queries, the SQL engine must choose a single key for iteration. The primary role of the optimization strategy is to determine which key (index in SQL terminology) will result in the fewest file-read operations. This determination is made based on the value in the SCORE column. The lower the score, the better.

SCORE

The score is a rough estimate of the number of file read operations required to gather and process the information requested by the query. For example, in the example above, row 1 shows a score of 157 when the table order is ORDER_HEADER followed by CUSTOMER. Row 2 shows that the score is 61 if the order is swapped. To understand this, take a look at the query.

The ON clause tells the SQL engine how to join the two tables together. The join occurs on the CUST_NUM column. Regardless of what key the optimizer chooses, the join is going to be quite fast (i.e. it looks up a record in the table on the left, then immediately locates the record or records matching the CUST_NUM in the table on the right).

However, the WHERE clause provides some additional information. It also indicates that we’re only interested in customers with the LAST_NAME of Baldrake. If the ORDER_HEADER table is placed on the left, since there is no limiting criteria on that table in the WHERE clause, it would need to iterate over every record in the ORDER_HEADER table, look up its corresponding record in the CUSTOMER table, check to see if the LAST_NAME matches, then move on to the next record in the ORDER_HEADER table. The sample database is small so you would not see any performance differences. However, imagine a real-world scenario where the ORDER_HEADER table contains 1,000,000 records. That would involve a minimum of 2,000,000 reads to return the results (i.e. 1 read from the left, then 1 read from the right).

If the optimizer simply changes the order and puts the CUSTOMER table on the left, you will see a dramatic increase in performance. This is because due to the indexing of the CUSTOMER table, the SQL engine can limit the number of records it reads from the CUSTOMER table to only those that match the WHERE clause. Returning to our large table example, imagine that there are 1,000 records in the customer table that match with a LAST_NAME of Baldrake. Placing the CUSTOMER table first means that the SQL engine can iterate using KNUM 1 (LAST_NAME and CUST_NUM combined) thus only looking at 1,000 records in the table on the left and thus only looking up matches for 1,000 records in the right table. This results in a total of 2,000 read operations instead of 2,000,000.

Summary

The SQL EXPLAIN statement doesn't provide specific recommendations for optimizing query performance. However, it does offer valuable information that helps understand how the SQL engine optimizes a query. This information is beneficial for developers and administrators in identifying reasons behind slower-than-expected queries. By considering additional filtering criteria and indexing of tables, it becomes possible to improve performance and potentially address any slowdowns.

ClosedVersion History

  • BBj 23.00: SQL EXPLAIN statement added.

See Also

Advantage Article: Unleashing the Power of SQL Explain

SQL Introduction

SQL Grammar - BBj