After working frequently with open-source databases comparable to MySQL and PostgreSQL, I not too long ago had the prospect to work on a SQL Server undertaking and found a refined however vital distinction within the SQL landscapes. I noticed that in SQL Server, I used to be unable to GROUP BY ordinal positions (GROUP BY 1, 2, 3…), which is a performance I incessantly used inside different databases, significantly for fast testing.
This discovery led me to discover a number of nuances of each database techniques, significantly the SQL execution order, which would be the focus of this text.
Why does this matter? When working with database techniques, understanding the refined variations can significantly influence your workflow and enhance your productiveness. It could prevent important troubleshooting time. Additionally, by understanding the SQL execution order of assorted databases, you’ll be able to craft extra optimum SQL queries based mostly on the system you might be working with.
On this article, we are going to look into one main use case the place this behaviour happens — GROUP BY — and examine why. Nevertheless, this perception might be utilized to the HAVING, WHERE or every other SQL command clause.
Let’s have a look at this instance within the question under. It will not work in SQL Server despite the fact that it really works in MySQL:
DATEPART(12 months, day) AS order_date,
SUM(value) as value
GROUP BY 1;
If you happen to run this, you’ll most likely get an error like this:
Every GROUP BY expression should include a minimum of one column that's not an outer reference.
Nevertheless, this revised question works after changing the GROUP BY ordinal reference with the specific expression. Additionally, you will discover which you can reference the ordinal positions within the ORDER BY clause, which I discovered unusual:
datepart(12 months, day),
sum(value) as value
GROUP BY datepart(12 months, day)
ORDER BY 1;
In SQL Server, I shortly discovered that I had to make use of express column names or expressions within the GROUP BY clause. That is thought-about a finest observe because it makes the code simpler to know. Nevertheless, I used to be interested in why this conduct differed between databases. Moreover, I discovered it fascinating that the
ORDER BY clause in SQL Server works with ordinal positions, which additional piqued my curiosity.
To seek out out, lets have a look at the
SELECT assertion execution/processing order for SQL server vs. different databases. It’s vital to notice that in SQL databases, every a part of a question is executed sequentially and that order differs from how it’s written.
In SQL Server, for example, we are able to see from the picture under and from Microsoft docs that the FROM clause is the primary command to be evaluated. Moreover, the SELECT clause runs after the GROUP BY clause. That’s the reason we have been unable to reference a column‘s place and even its alias within the GROUP BY clause in our first instance!
Nevertheless, we’re free to reference the ordinal place and/or alias within the ORDER BY clause, as that’s evaluated after the SELECT clause. The SELECT clause tells the database what columns will probably be returned and thus, the positioning is thought at this level. Cool, proper?
SQL Server execution order
In MySQL nonetheless, I discovered it troublesome to search out clear documentation stating the order of execution of a SQL question. The execution order appears to rely upon the contents of the question and what the question optimiser defines as the most effective path.
However from what we are able to see from the MySQL docs here, the clue exhibits us how the execution order may be and that the SELECT clause is evaluated earlier than the GROUP BY clause:
For GROUP BY or HAVING clauses, it searches the FROM clause earlier than looking within the select_expr values. (For GROUP BY and HAVING, this differs from the pre-MySQL 5.0 conduct that used the identical guidelines as for ORDER BY.)
If we additionally have a look at the GoogleSQL (previously Commonplace SQL) docs which is the syntax utilized in Google BigQuery, you will notice comparable deviation from the best way queries are executed in SQL Server:
GROUP BY and ORDER BY also can seek advice from a 3rd group: Integer literals, which seek advice from gadgets within the SELECT record. The integer 1 refers back to the first merchandise within the SELECT record, 2 refers back to the second merchandise, and so forth.
As you’ll be able to see, this behaviour is just not supported in SQL Server. The Google docs additionally point out that GROUP BY, ORDER BY, and HAVING, can seek advice from aliases from the SELECT record.
With that, we are able to conclude with a excessive likelihood that the execution order for these different databases follows a path just like the picture under:
MySQL, PostgreSQL & BigQuery possible execution order
This was a brief put up the place we checked out how the execution order in MySQL, GoogleSQL and different databases’ SQL syntaxes, differ from SQL Server’s, based mostly on the noticed behaviours and documentation. SQL Server emphasises explicitness within the GROUP BY clause for code readability, whereas MySQL’s execution order undoubtedly evaluates that SELECT clause earlier than the GROUP BY clause, permitting us to reference the ordinal positions in it.
Be at liberty to share your ideas on this topic and catch you in the subsequent one.