Introduction (the good)
This blog post is going to discuss the pros and cons of linked servers as well as how to reduce the cons using an OPENQUERY.
Linked server is effectively removing the need for a solution that will move and synchronize the data. You just to setup a Linked Server (see Fig #2) and add a Linked Server reference in front of the 3 part-naming convention (see Fig #3) as well security context with other minor things. And that is all. A small price to pay for lots of convenience.
Problem (the bad)
While Linker Server feature is making it easy to join tables between two or more different servers, it’s not free and it comes with a price tag – performance overhead. While joining few small tables might not add any noticeable pressure to the server, joining 3-5 fairly large remote tables might introduce locking and blocking and increase the run-time into minutes. The main problem occurs when you are running queries against a remote server, which is not healthy.
When SQL Server runs query with Linked Server, it will use the least optimal execution plan due to lack of knowledge of those remote tables. Meaning, your local SQL Server is clueless on remote table indexes and statistics, so it might use incorrect joining mechanism and might be grossly inefficient.
For example, if were tasked to join all January 2013 orders between OLTP and OLAP tables and compare revenue per product while showing top 5 contributors, we might build the following query (see Fig #4) to achieve that goal.
Now if were to review Statistics IO (see Fig #5) and execution plan details (see Fig #6), this is what we are going to see.
As you might notice, 93% of the query execution cost goes to a “mysterious” remote query. Needless to say, it will very hard to improve the query with that execution plan.
Solution (a way to make it perfect)
One of the easiest ways to fix performance problems with a Linked Server is to run it via OPENQUERY.
What is OPENQUERY? According to Microsoft, OPENQUERY executes the specified pass-through query on the specified linked server. This server is an OLE DB data source. OPENQUERY can be referenced in the FROM clause of a query as if it were a table name. OPENQUERY can also be referenced as the target table of an INSERT, UPDATE, or DELETE statement. This is subject to the capabilities of the OLE DB provider. Although the query may return multiple result sets, OPENQUERY returns only the first one. https://docs.microsoft.com/en-us/sql/t-sql/functions/openquery-transact-sql?view=sql-server-ver15
One of the main advantages of OPENQUERY is remote execution, meaning that the query is sent from the local server to remote server while having knowledge of those remote tables that are now local to the query. By the way, remote execution also enables the use of native syntax of the remote server, so you can take advantage of other RDBMS system’s performance tricks.
Here is how our original query will look like with an OPENQUERY (see Fig #7). It’s still using the same linked server, but it happens indirectly now with an OPENQUERY.
Now if were to review Statistics IO (see Fig #8) and execution plan details (see Fig #9), this is what we are going to see.
Let’s compare Statistics IO and execution plans to see the differences between direct query linked server usage vs. an OPENQUERT linked server.
The most noticeable differences are:
- WorkFile 0 scans/reads vs. 15/1064 in Statistics IO (see Fig #10)
- Remote Query 93% cost vs. most distributed cost in Execution Plan (see Fig #11) and appearance of Remote Scans.
The main reason why OPENQUERY will usually perform better is the knowledge of the table in terms of indexes and stats, knowledge that a direct linked server doesn’t have.
As you can see, we have managed to cut down the run-time from 22 seconds (using direct link server) down to 2 seconds (using OPENQUERY). Additionally, we can add indexes specific to the query to make OPENQUERY even faster.
Since nothing is certain in this life other than taxes and death, your mileage may vary (“YMMV”). Please test the before and after (preferably using DBCC FREEPROCCACHE) prior to deployment. In addition to tested query above that reduced run-time by 11 times, I also witnessed a drastic performance gain due to the use of OPENQUERY in one of the clients. Replacing all 3 direct linked server references, reduced run-time by 10 times, to just few seconds.
The only few caveats with OPENQUERY since it can’t accept parameters are the use of dynamic SQL and “fighting the quotes” – making sure to escape the single quotes that are internally used by a SQL Server to identify literal, i.e. text.