Introduction (the good)

If you are not familiar with a Linked Server concept in SQL Server, don’t worry, you probably didn’t have a need for that yet. You are fortunate to source all your data needs from one single database server and that is fine or maybe you are consuming your SQL Server needs from Azure Single Database or AWS RDS (both public cloud solutions don’t support linked server out of the box). Most likely this is going to change in a future (Azure VM and AWS EC2 have full support) and you will be tasked to join data between multiple database servers or even between different RDBMS systems, like for example: all the transactional sales data is stored in SQL Server, but all the analytical sales data is stored on another SQL Server server (this could be even MySQL). This is where Linked Server comes handy, especially at the data discovery phase where building a prototype needs to happen quick and there no time to adhere to the best practices.

This blog post is going to discuss the pros and cons of linked servers as well as how to reduce the cons using an OPENQUERY.

According to Microsoft, Linked servers enable the SQL Server Database Engine and Azure SQL Database Managed Instance to read data from the remote data sources and execute commands against the remote database servers (for example, OLE DB data sources) outside of the instance of SQL Server. – https://docs.microsoft.com/en-us/sql/relational-databases/linked-servers/linked-servers-database-engine?view=sql-server-ver15

Linked Server is a pure convenience feature. With very little to nothing code changes, you can suddenly join multiple tables between local and remote/cloud servers. If were trying to join a local table in WideWorldImporters transactional database

Figure #1

Linked server is effectively removing the need for a solution that will move and synchronize the data. You just to setup a Linked Server (see Fig #2) and add a Linked Server reference in front of the 3 part-naming convention (see Fig #3) as well security context with other minor things. And that is all. A small price to pay for lots of convenience.

Problem (the bad)

While Linker Server feature is making it easy to join tables between two or more different servers, it’s not free and it comes with a price tag – performance overhead. While joining few small tables might not add any noticeable pressure to the server, joining 3-5 fairly large remote tables might introduce locking and blocking and increase the run-time into minutes. The main problem occurs when you are running queries against a remote server, which is not healthy.

When SQL Server runs query with Linked Server, it will use the least optimal execution plan due to lack of knowledge of those remote tables. Meaning, your local SQL Server is clueless on remote table indexes and statistics, so it might use incorrect joining mechanism and might be grossly inefficient.

For example, if were tasked to join all January 2013 orders between OLTP and OLAP tables and compare revenue per product while showing top 5 contributors, we might build the following query (see Fig #4) to achieve that goal.

Fig #4

Now if were to review Statistics IO (see Fig #5) and execution plan details (see Fig #6), this is what we are going to see.

Fig #5
Fig #6

As you might notice, 93% of the query execution cost goes to a “mysterious” remote query. Needless to say, it will very hard to improve the query with that execution plan.

Solution (a way to make it perfect)

One of the easiest ways to fix performance problems with a Linked Server is to run it via OPENQUERY.

What is OPENQUERY? According to Microsoft, OPENQUERY executes the specified pass-through query on the specified linked server. This server is an OLE DB data source. OPENQUERY can be referenced in the FROM clause of a query as if it were a table name. OPENQUERY can also be referenced as the target table of an INSERT, UPDATE, or DELETE statement. This is subject to the capabilities of the OLE DB provider. Although the query may return multiple result sets, OPENQUERY returns only the first one. https://docs.microsoft.com/en-us/sql/t-sql/functions/openquery-transact-sql?view=sql-server-ver15

One of the main advantages of OPENQUERY is remote execution, meaning that the query is sent from the local server to remote server while having knowledge of those remote tables that are now local to the query. By the way, remote execution also enables the use of native syntax of the remote server, so you can take advantage of other RDBMS system’s performance tricks.

Here is how our original query will look like with an OPENQUERY (see Fig #7). It’s still using the same linked server, but it happens indirectly now with an OPENQUERY.

Fig #7

Now if were to review Statistics IO (see Fig #8) and execution plan details (see Fig #9), this is what we are going to see.

Fig #8
Fig #9

Let’s compare Statistics IO and execution plans to see the differences between direct query linked server usage vs. an OPENQUERT linked server.

The most noticeable differences are:

  1. WorkFile 0 scans/reads vs. 15/1064 in Statistics IO (see Fig #10)
  2. Remote Query 93% cost vs. most distributed cost in Execution Plan (see Fig #11) and appearance of Remote Scans.
Fig. #10
Fig. #11

The main reason why OPENQUERY will usually perform better is the knowledge of the table in terms of indexes and stats, knowledge that a direct linked server doesn’t have.

As you can see, we have managed to cut down the run-time from 22 seconds (using direct link server) down to 2 seconds (using OPENQUERY). Additionally, we can add indexes specific to the query to make OPENQUERY even faster.

Disclaimer

Since nothing is certain in this life other than taxes and death, your mileage may vary (“YMMV”). Please test the before and after (preferably using DBCC FREEPROCCACHE) prior to deployment. In addition to tested query above that reduced run-time by 11 times, I also witnessed a drastic performance gain due to the use of OPENQUERY in one of the clients. Replacing all 3 direct linked server references, reduced run-time by 10 times, to just few seconds.

The only few caveats with OPENQUERY since it can’t accept parameters are the use of dynamic SQL and “fighting the quotes” – making sure to escape the single quotes that are internally used by a SQL Server to identify literal, i.e. text.