Understand process order in a SQL query and why it is important

Stefan Graf
3 min readMar 18, 2022

--

Understanding the process order of SQL queries can be very beneficial

Different from most other programming languages, the order in which SQL code is processed is not the same as it’s written. In fact, the SELECT part, which is typed first, gets processed almost as last. Understanding this order can be very useful to understand SQL.

But first, let’s start with a short explanation why SQL is so different from other languages.

SQL originates from SEQUEL, where the first E stands for English. The intention behind this language was to use a declarative language, similar to Exnglish, to define your instructions.

In English you would say something like “Bring me the knife from the kitchen”. You specify first the object you want, before you indicate the location.

Likewise a SQL statement starts first with the SELECT clause, with the columns you need, and later the FROM clause defines where the needed columns are located.

Technical order

Your query processing is internally handled by virtual tables. Each step consumes the virtual table from the previous step and generates a new one, which will be used by the following step. If a step is not defined it gets simply skipped. Also, these virtual tables are not visible to the user or application, only the final result is usable at the end.

The written order does not necessarily indicate the logical one

But now let’s take a look at the processing order:

  1. From: This phase defines the source tables used in this query. Additionally, clauses like JOIN, PIVOT or APPLY are used. These lead to subphases. The JOIN clause has for example the following 3 Subphases. 1.1 Cartesian Product, 1.2 ON Predicate, 1.3 Adding Outer Rows. This phase generates the first Virtual Table (V1).
  2. WHERE: In this phase the filters are being applied, which are defined by the WHERE clause. The result of this filtering gets loaded into the second Virtual Table (V2).
  3. GROUP BY: This phase groups rows from V2, that have the same values, into summary rows, generating V3.
  4. HAVING: Similar to the WHERE phase, specifies a search condition for a group or an aggregate and only inserts into V4 where the search condition is met.
  5. SELECT: Finally, we come to the clause which we are typing first, when implementing a query. Yet another virtual table (V5) gets generated from the previous one, according to everything you defined in your SELECT clause. Subphases are again a thing in this phase. First in 5.1 all expressions get evaluated. After that the distinct subphase 5.2 removes all duplicates.
  6. ORDER BY: This phase is used to order the results coming from V5 into V6
  7. TOP/OFFSET-FETCH: Finally, as a last phase, TOP or OFFSET-FETCH are applied to the V6 table.

So what’s the deal?

Understanding this order of processing enables you to gain a deeper understanding how queries work and why they behave a certain way.

This order is for example the reason why you cant use your alias from the SELECT clause inside the WHERE clause, because WHERE gets applied before SELECT. Therefore while executing WHERE the alias can’t be known already.

Also the knowledge of this order gives you insights into how you should structure your tables before you join them. If you want to use a WHERE clause on one table, it could be beneficial to do this prior to the JOIN phase. This way the tables are often a lot smaller and won’t require as much performance to join them. This can be achieved by using CTEs or subqueries.

These were just two examples how the understanding of this can improve your SQL skills. But you should got a grasp by now, why this is so beneficial.

--

--

Stefan Graf
Stefan Graf

Written by Stefan Graf

Data Engineer Consultant @Microsoft — Data and Cloud Enthusiast

No responses yet