That is, if I want to detect the presence of a string in any position within a column, getting that data will be inefficient: Without use of additional features or design considerations, SQL Server is not good at fuzzy string searching. If not, can we use hashes, n-grams, or some other solution? For frequently searched string columns, we need to ensure that: String searching efficiently can be challenging, and there are far more ways to grind through strings inefficiently than efficiently. This is an easy to identify query pattern that will often lead to poor performance. If you are reviewing a poorly performing application and run across an OR across different columns or tables, then focus on that as a possible cause. Test and verify that performance is adequate and that you are not accidentally introducing a performance bomb similar to what we observed above. Use caution when writing queries with an OR clause. Note that there are still a boatload of index scans in the execution plan, but despite the need to scan tables four times to satisfy our query, performance is much better than before. The reads have been cut down from 1.2 million to 750, and the query executed in well under a second, rather than in 2 seconds. The execution plan got significantly more complex, as we are querying each table twice now, instead of once, but we no longer needed to play pin-the-tail-on-the-donkey with the result sets as we did before. UNION concatenates the result set and removes duplicates. In this rewrite, we took each component of the OR and turned it into its own SELECT statement. Breaking a short and simple query into a longer, more drawn-out query may not seem elegant, but when dealing with OR problems, it is often the best choice: The best way to deal with an OR is to eliminate it (if possible) or break it into smaller queries.
The take-away from this scary demo is that SQL Server cannot easily process an OR condition across multiple columns. In addition, the query took about 2 seconds to execute on a relatively speedy SSD-powered desktop.
1.2 million reads were made in this effort! Considering that Product contains only 504 rows and SalesOrderDetail contains 121317 rows, we read far more data than the full contents of each of these tables. We did scan both tables, but processing the OR took an absurd amount of computing power.
Here is the resulting performance of this query: Expensive, but at least something we can comprehend. Even if none of these columns were indexed, our expectation would be a table scan on Product and a table scan on SalesOrderDetail. The query is simple enough: 2 tables and a join that checks both ProductID and rowguid. Here is a very simple example of how an OR can cause performance to become far worse than you’d ever imagine it could be: Even if only a few tables or columns are involved, the performance can become mind-bogglingly bad. We not only need to evaluate each component of the OR clause, but need to follow that path through the other filters and tables within the query. The scenario in which OR performs worst is when multiple columns or tables are involved.
When this expensive operation is completed, the results can then be concatenated and returned normally. Instead, each component of the OR must be evaluated independently. Because it is inclusive, SQL Server cannot process it in a single operation. By being exclusive, these operations take data and slice it into progressively smaller pieces, until only our result set remains. SQL Server can efficiently filter a data set using indexes via the WHERE clause or any combination of filters that are separated by an AND operator. Tips and tricks OR in the Join Predicate/WHERE Clause Across Multiple Columns
While we should perform our due diligence and prove that any changes we make are optimal, knowing where to start can be a huge time saver!įor more information about Query optimization, see the SQL Query Optimization - How to Determine When and If It’s Needed article Whereas performance tuning can often be composed of hours of collecting extended events, traces, execution plans, and statistics, being able to identify potential pitfalls quickly can short-circuit all of that work. Sometimes we can quickly cut that time by identifying common design patterns that are indicative of poorly performing TSQL.ĭeveloping pattern recognition for these easy-to-spot eyesores can allow us to immediately focus on what is most likely to the problem. Fixing bad queries and resolving performance problems can involve hours (or days) of research and testing.