SQL Server Query Store Overview

Query Store came out in SQL Server 2016. It was the next evolution of Microsoft’s attempts to provide valuable performance data. In this post you’re going to learn a bit about the history of native monitoring in SQL Server, why you should be using the Query Store, as well as how to view the information made available in SQL Server Query Store. This post is just an overview meant to get you to jump into using the feature. This feature is rich enough that someone can present on it easily for an entire day.

A Very Brief History of Native Monitoring in SQL Server

Query Store came out in SQL Server 2016. It was the next evolution of Microsoft’s attempts to provide valuable performance data. With SQL Server 2005, the Dynamic Management View and Dynamic Management Functions were a giant leap forward in terms of providing data about what was going on inside SQL Server. Performance tuning with those objects opened up a whole new world of insight. The “Waits and Queues” methodology for performance tuning rapidly sprung up.

The Data Collection and Management Data Warehouse feature, new in SQL Server 2008, collects both server and database level performance metrics. Deployment of this feature is possible on both a single instance of SQL Server and across multiple instances. That data could then be centralized and viewed through a series of reports.

Extended Events also came out in SQL Server 2008. Despite being a lighter weight replacement for SQL Server Profiler, Extended Events was a bit unwieldy and there was no UI. Adoption was slow.

In SQL Server 2012, Microsoft built a UI for Extended Events, and that helped improve adoption.  Extended Events opened up a wealth of information that Profiler cannot offer.

Then in SQL Server 2016, the Query Store came on the scene. Now that we’ve arrived at this feature, let’s talk about it!

Why Use SQL Server Query Store?

 

  1. Query Store solves the challenge of persisting performance data over time.

The feature collects and persists information about query performance by gathering the queries that are ran in a particular time interval, their query plans and aggregated runtime performance data for those queries. To do this natively, you would need to run extended events continuously and capture a lot of data. Then that data would need to be persisted and done so in a way that allows for easy analysis later. Capturing query text, the plans and other data about query performance isn’t free, even with something like Extended Events. Also, query plans are pushed out of memory as time goes on and,  of course, every time you restart the SQL Server valuable performance data is lost. Query Store solves this problem.

2. Persisted data in Query Store allows for detecting issues related to query plan changes.

The feature allows the DBA to track queries across time and displays all the different query plans associated with the query, up to the default of 200 execution plans per query. When a plan changes, the Query Store reports bubble up this information. The information also includes aggregated runtime data associated with each plan. Detecting this sort of issue without Query Store is much more challenging.

3. With Query Store you can see query performance across time and from a variety of perspectives.

Microsoft added query wait statistics to the feature in SQL Server 2017. Bar graphs in the Query Wait Statistics report show categories of wait types. Queries that contribute to that wait type, such as CPU, Memory, Lock etc. ,   are visible in the report. The DBA then has valuable insight into waits experienced by queries thanks to Query Store. This report can be based on Total Wait Time, like below, or Avg, Min, Max, and Standard Deviation. Left clicking on the wait category shows the queries contributing to that wait type.

 

Example report for Query store Wait Statistics report

With the built-in Query Store reports, the DBA can see expensive, poor performing queries with the highest CPU, duration, logical reads and more. This data can also be based on totals, averages min/max and standard deviation. By doing so, Query Store offers insight into the most expensive, painful queries that are hurting performance during any particular time slice and across days of history retained in the Query Store.

4. You can view missing index recommendations for expensive queries  using Query Store. Now, you shouldn’t go out and blindly apply the recommendations, but this does offer you the chance to analyze existing queries and the missing index recommendation to see if index cleanup and consolidation needs to happen. This process can benefit not only the query with the missing index recommendation, but other queries in the workload as well.

How To Enable the SQL Server Query Store

Enabling the Query Store is simple enough with two lines of T-SQL, but there are some defaults and options that you should understand before you enable the feature. Not all defaults are going to be desirable and you’ll learn which ones you should adjust as we progress. I will be showing you this in SQL Server 2017.

Below is what the query store UI looks like when initially opened and Query Store is not enabled. To get to this, open SSMS and connect to a SQL Server that is at least version 2016. then drill down into the databases folder and right click a database name and left click the “Properties” option. Then left click Query Store on the left in the “Select a Page” area.

Initial Query Store SSMS
Initial Query Store Screen Before Enabling

Now, when you left click on the drop down menu for Operation Mode(Requested) you get options for Read Only and Read write. To enable the query store and have it capture performance data, select the Read write option. Now you have a screen that looks like the below. Please note the two options I have bracketed in red. You will want to change these options and we will discuss that as I explain what these options do.

Query Store Defaults
Defaults for the Query Store

Query Store does not write collected data synchronously, but asynchronously. The “Data Flush Interval (Minutes)” value controls how long collected query data sits in memory before being written to disk. There is a trace flag that controls whether the data is written to disk prior to a shutdown as well as one that controls whether the data in Query Store is loaded first before a database comes online.

The “Statistics Collection Interval” also known as Interval_Length_Minutes, determines the time period over which aggregations of the data occur. Valid values are (1, 5, 10, 15, 30, 60, 1440). The smaller the value, the more granular your data will be and the more space that will be used by the feature.

The defaults for the following options should be changed. “Max Size (MB)” controls how large the Query Store can be and works in conjunction with a couple of different settings. The default size is 100 MB, and that is simply too small to store data on SQL Servers with much of a workload. The recommended setting for this is 2048 MB from Erin Stellato of SQLSkills.com. That is a starting point and size can be monitored from there.

The setting “Query Store Capture Mode” in SQL Server 2017 defaults to “All”. This captures even queries with virtually no impact on performance metrics like CPU, duration, logical reads, etc. It is unlikely that you need to capture every single query ran. It is probably better to set this to “Auto”.

The amount of space used by this feature is also influenced by the Size Based Cleanup Mode setting. The default is auto and it should be left that way. When the storage size for Query Store gets close to the maximum set in Max Storage Size (MB), then a clean up process kicks in, removing older data.

The space used is also influenced by the “Stale Query Threshold (Days) ”  setting. The default is 30 days and this controls how much query performance history is persisted. The default is a good starting point and depending on needs, you may want to keep more so you can see changes over a longer period of time. You might choose to do this if you really have no Dev or QA environment to speak of at your employer so most development is done in production. You know you shouldn’t be doing that, but I also know that many DBAs don’t work in the ideal, perfect environment.

As mentioned earlier, the “Wait Statistics Capture Mode” is available and on by default. If you leave it on, which is recommended, you get the benefits discussed earlier. You also will have a larger footprint for your Query Store so this is another option that influences size.

Once you have all of these options configured, simply press OK in the SSMS dialog.

If you prefer to use T-SQL, below is a sample script to enable this feature and change the two settings discussed above that are not great defaults.

USE [master]
GO
ALTER DATABASE [MyDB] SET QUERY_STORE = ON
GO
ALTER DATABASE [MyDB] SET QUERY_STORE (OPERATION_MODE = READ_WRITE, MAX_STORAGE_SIZE_MB = 2048, 
QUERY_CAPTURE_MODE = AUTO)
GO

Next Steps to Take

  1. Consider your current monitoring solutions and consider making Query Store part of your approach to performance tuning.
  2. If you like reading books, Tracy Boggiano and Grant Fritchey have a great book on Query Store.
  3. If you would like help with anything in this post, or with something else related to SQL Server, reach out to me here, or on Twitter, and I’ll be glad to offer assistance.

Correlating IO Wait with Queries

Have you ever looked at sys.dm_os_virtual_file_stats and saw horrible latency for your database files? Generally speaking, you want to see latency below 20 ms on your database files.  much above that and there are some problems somewhere that need some tuning.

But, how do you know what queries are causing your issue? What’s causing those IO waits?

How to Find Queries Associated with Pending IO Requests

I responded to a question on SQLServerCentral.com about performance that is related to these types of scenarios. The poster wanted to know if there was a way to correlate pending io requests with the queries that were experiencing the waits using the DMV sys.dm_io_pending_requests.

This was a well timed question that got my attention because I was working on my own scenario with a SQL Server and IO issues.  I had set up a custom data collector set in perfmon and analyzed the data.   The data showed very high Avg. Disk Queue Length, low Page Life Expectancy and high Page Reads/sec for sustained periods of time.  By correlating all this data together, it is easy for you to see that I had one or more queries hitting the disk to retrieve data to load into memory and causing pages to be flushed out. The forum question, and my own scenario, sent me off to do additional investigation and I arrived at the query below.

Reviewing SQL Server IO Latency

SELECT    
ipir.io_type, 
ipir.io_pending,
ipir.scheduler_address, 
ipir.io_handle,
os.scheduler_id, 
os.cpu_id, 
os.pending_disk_io_count,
ER.session_id, 
ER.command, 
ER.cpu_time, 
ST.text

FROM sys.dm_io_pending_io_requests ipir
INNER JOIN sys.dm_os_schedulers os ON ipir.scheduler_address = os.scheduler_address
INNER JOIN sys.dm_exec_requests AS ER ON os.scheduler_id = ER.Scheduler_id
CROSS APPLY sys.dm_exec_sql_text(ER.sql_handle) AS ST

I began by looking at sys.dm_io_pending_requests in BOL.  I noticed that this DMV had Scheduler_Address as one of its outputs.  I quickly realized I could join to sys.dm_os_schedulers on the scheduler_address and then make an additional join to get to sys.dm_exec_requests.  From there I was only a CROSS APPLY away from getting the SQL statement associated with the IO waits.

Sys.dm_os_schedulers has its own column that keeps track of pending IO called “pending_disk_io_count.”  As a result, you could eliminate sys.dm_io_pending_requests from the above query if you were truly interested only in the pending io count.  However, using both DMV’s does provide a more detailed picture.

Performance Tuning IO Waits

When doing performance tuning, keep in mind that observed issues may only be symptoms.  In my case, after reviewing execution plans and data from SET STATISTICS IO ON, I could see that the high IO wasn’t due to poor performing storage, but queries that were performing millions of physical reads.  I examined the query plan and could see that an index was missing.  I applied the index and the physical reads dropped to almost zero.  I used the results from the above query to bolster my case to the application vendor that the index was needed because extremely high physical IO was killing the server.

Try out these DMV’s and see what other interesting gems you can get from them.

Next Steps to Take

  1. If you want to track queries causing IO Latency, you could create a table to hold the output of the query in this post and schedule this query to run in a SQL Agent job on an interval. Then examine your table over time.
  2. Be sure to review execution plans related to these queries to look for other performance tuning  opportunities.
  3. If you would like help with anything in this post, or with something else related to SQL Server, reach out to me here, or on Twitter, and I’ll be glad to offer assistance.

3 Reasons Not to Install Databases On the C Drive

We’re probably all familiar with the fact that SQL Server by default will install not only the program binaries but all your databases on the C drive as well.  Accidental DBA’s, i.e. System Administrators and Developers who inherited the maintenance of a database, may not be aware of how bad it can be if someone simply clicked the “Next” button all the way through the SQL Server install.  There are surely many reasons not to accept this default during the install, but I will start with three.

First, from a purely practical standpoint, I think installing to the default path makes backups and restore scripts unduly difficult.  Which is easier to remember, “C:\Program Files\Microsoft SQL Server\MSSQL12.MSSQLSERVER\MSSQL\Backup” or something simple like “D:\SQLBackups”? Which one would you rather have to type or copy from Windows Explorer after browsing the path?  With the second one it is so easy you don’t have to spend time browsing and copying it.  You just quickly type it out.  For the restore script would you rather type “C:\Program Files\Microsoft SQL Server\MSSQL12.MSSQLSERVER\MSSQL\Data\SomeDB.mdf” or “D:\SQLData\SomeDB.mdf”?  The same point applies here – Simplicity.  Why make it hard on yourself?

Second, if you don’t put your databases on the C drive then database growth can’t crash the entire computer by filling up the C drive.  We all know that databases grow and that growth has to be monitored and managed.  If the System Administrator misses those emails from Database Mail or her favorite monitoring tool, then she will be awakened first thing in the morning by a company end user who can’t do their regular work only to discover that the entire machine is experiencing paralysis.  If the database files are not on the C drive with the OS then at least database growth will not be the culprit of server paralysis from a full C drive.

Third, if you install all your databases to the C drive, then you are inviting performance issues.  Databases and the OS will be fighting for drive resources.  Disks only spin so fast.  If you have the OS and 10, 14, 50 or more database files all trying to read and write to the same drive, then your C drive is going to be spinning like a yo yo.  Not only is there contention between the OS and the database files, but the database files will be fighting each other for disk usage too.

Let me provide some real numbers from an environment I have seen.  I had a system with 12 databases all on the C drive.  That number includes 4 system databases and 8 user databases.  Latency was in the thousands for several of the databases.  I moved the user databases and TempDB off to another drive and left the System databases on the c drive.  I know I can move those too, but one step at a time.  See the comparisons below.

Database FileName Overall Latency Overall Latency Next Morning Overall Latency 3 Weeks Later
Master.mdf 8330 8287 4
Mastlog.ldf 2409 1607 0
Model.mdf 8129 No Change 3
Modellog.ldf 2881 No Change 1
msdbdata.mdf 32446 16031 5
msdblog.ldf 31270 22969 0
TempDB.mdf 254 Moved to Fusion IO card Moved to Fusion IO card
Templog.ldf 155 Moved to Fusion IO card Moved to Fusion IO card

 

In some cases there was an immediate impact for latency.  Clearly, removing the activity of the eight user databases made a huge difference in performance.  As you can imagine, the user databases performed much better from a latency perspective as well, especially since I put them on a Fusion IO card.

To see latency for your drives you can run the following query from Paul Randal:

 

/*

How to examine IO subsystem latencies from within SQL Server


*/

SELECT
[ReadLatency] =
CASE WHEN [num_of_reads] = 0
THEN 0 ELSE ([io_stall_read_ms] / [num_of_reads]) END,
[WriteLatency] =
CASE WHEN [num_of_writes] = 0
THEN 0 ELSE ([io_stall_write_ms] / [num_of_writes]) END,
[Latency] =
CASE WHEN ([num_of_reads] = 0 AND [num_of_writes] = 0)
THEN 0 ELSE ([io_stall] / ([num_of_reads] + [num_of_writes])) END,
[AvgBPerRead] =
CASE WHEN [num_of_reads] = 0
THEN 0 ELSE ([num_of_bytes_read] / [num_of_reads]) END,
[AvgBPerWrite] =
CASE WHEN [num_of_writes] = 0
THEN 0 ELSE ([num_of_bytes_written] / [num_of_writes]) END,
[AvgBPerTransfer] =
CASE WHEN ([num_of_reads] = 0 AND [num_of_writes] = 0)
THEN 0 ELSE
(([num_of_bytes_read] + [num_of_bytes_written]) /
([num_of_reads] + [num_of_writes])) END,
LEFT ([mf].[physical_name], 2) AS [Drive],
DB_NAME ([vfs].[database_id]) AS [DB],
[mf].[physical_name]
FROM
sys.dm_io_virtual_file_stats (NULL,NULL) AS [vfs]
JOIN sys.master_files AS [mf]
ON [vfs].[database_id] = [mf].[database_id]
AND [vfs].[file_id] = [mf].[file_id]
— WHERE [vfs].[file_id] = 2 — log files
— ORDER BY [Latency] DESC
— ORDER BY [ReadLatency] DESC
ORDER BY [WriteLatency] DESC;
GO

Well, I hope this has given you something to think about.  Your homework is to find out the latency in your environment, do some research, make changes as needed and then follow up to see if the changes you made helped.  Let me know how the homework turns out.