Wednesday
Apr302014

Understanding your baselines - Perfmon Batch Requests/sec and Transactions/sec

This information is supplemental to the post found here that details a method to establish a performance baseline for your SQL Servers.

After collecting Perfmon baselines for a while on a number of my servers an eagle-eyed QA colleague of mine pointed out what looked like a rather interesting anomaly:  The maximum value for Transactions/sec was consistently higher than the corresponding reading for Batch Requests/sec.

All squares are rectangles, but not all rectangles are squares.  In a similar sense a lot more statements qualify as "batches" (e.g. things like ad hoc SELECT statements) than those statements participating in, and being classified as, transactions.  Why then would the maximum reading for Transactions/sec be higher?

I looked at other servers with varied roles like SSIS, OLE DB, and those hosting predominantly read-only workloads.  The behavior was the same!  What was going on?

An example of the behavior is in the screenshot below of select output taken from my Perfmon baseline solution:

Note that the average readings for Batch Requests/sec are indeed higher.  To me that's intuitive and to be expected.  But why the higher maximum values for Transactions/sec?

Officially Transactions/sec is a measurement of transactions started in the last second, but it would seem toadd those transactions whose duration surpasses the capture interval.  My data is captured at 5 minute intervals.  Any transaction whose duration surpasses an interval is counted for each interval it extends through.  (Therefore it’s potentially counted multiple times if it’s a long-running transaction.)

The behavior is perhaps better explained in this thread.

The definition of Batch Requests/Sec can be found here.  Whereas more information on Transactions/sec can be found here.

 

I'm not affiliated with external links in this article, but wish to share my gratitude to those responsible for sharing them with the public.  Mystery solved.

Friday
Jan242014

Baseline your SQL Servers - PerfMon counters

I wanted a daily e-mailed report of the performance counters exposed in the sys.dm_os_performance_counters DMV sliced by day-of-week and time-of-day. So I came up with the following solution.


Note that I'm reporting on 21 different counters which are represent either cumulative or "actual observed" values. There's other counters and other types of counters you can capture as well. Depending on the type of counter you may just be able to grab the current value (e.g. Page life expectancy) and that will tell you what the Page life expectancy reading is at the time you took the sample. For other types (i.e Page reads/sec) you may need to take the difference of two consecutive readings and divide the result by the number of seconds between the two readings.


All cntr_type's and methods for evaluating them are described in this great article (with whom I'm not affiliated with in any way).


If you include other cntr_type's you'll need to modify the [PerfMonArchivePopulate] SPROC and possibly add additional columns to the [PerfmonDelta] table outlined below. Capturing values of different counters will require changes to the same SPROC's.


If you decide to capture counter data more/less frequently than 300 seconds it will simply require a change to the @SecondsBetweenCaptures in the [PerfMonArchivePopulate] SPROC.


Finally I'm basing the report on the last seven days' worth of data and retaining the data for 60 days. Increasing the scope of the data evaluated for the report will probably add duration and resource utilization to your daily report.


I'm assuming you have a database created to store performance data like this. In the examples below I'm using one called [Baseline].


Here are the 21 counters I want to report on:

Create a table for these values [PerfmonDelta]:

Populate [PerfmonDelta] with the counters you wish to track:

Create a table to store the captured data in:

Consider the following indexes for the [CaptureArchive] table. These are in addition to the clustered index based on the PK:

Create a SPROC to capture the data. Note that you'll need to modify this if you decide to report on different counters and/or a different frequency for how often data is captured:

Create a SPROC to clean up old captures:

Create a SPROC to send your daily e-mail. As written it'll handle 21 counters and the scope of the report will be the last seven days. I've noted in the comments where to modify either of these:

Finally schedule the SPROC's to run.

Schedule your [PerfMonArchivePopulate] SPROC to run every, say, five minutes
Schedule your e-mail report to run daily followed immediately by your cleanup SPROC (the [PerfmonStatsAlertHistoric] and [PerfMonStatsCleanup] SPROC's respectively

Tuesday
Apr302013

Baseline your SQL Servers - Wait Stats

I've been working to establish performance baselines for my SQL databases. I'm trying to build a scalable solution that will support a variety of information (i.e. Wait Statistics, long-running queries, index analysis/usage, et cetera).

I settled on storing this information in a table with an XML column. I figured that way the schema would be versatile and that I could store a lot of information about a lot of different stuff I care about. I could then pluck out the information specific to what I was looking for and send a daily e-mail out with trending information. Though I would hope this same XML data could eventually be used to populate more sophisticated reporting and KPI displays.

One of the first things I want to track is Wait Statistics. I want to see what the server has been waiting on today and what it typically waits on during certain days of the week and at what times. I populate a table with the last 30 days of Wait Statistics then blat out the top waits on the day, top waits across day of week, and top waits across timeslots at 3 hour intervals.

I run a SQL Agent job every three hours that captures Wait Statistics, commits them to a table, then clears Wait Statistics. It also cleans up Wait Statistics I've archived that are older than six months. Another SQL Agent job collects this and formats it in an e-mail. At the end of the day I send out a report that grabs the last 30 days' worth of Wait Statistics and dimensions them on today, the day of the week, and the time of day (regardless of day of the week).

I managed this in just a few short scripts and the two SQL Agent jobs alluded to above. I've found it's good to clear Wait Statistics prior to putting this in place - otherwise if your server has been up for a while your top waits might be a bit atypical of a capture done over the last three hours. It would then take a while for the data to balance itself out.

The following creates a table to store your Wait Statistics in. There's also DDL for a procedure that establishes when your Wait Statistics were last cleared then commits the Wait Statistics and the time it's capturing them to the table.

This next bit of T-SQL creates a procedure to clear Wait Statistics and delete any archived data that is older than six months. Again - best to clear Wait Statistics on a known schedule. Three hours may be a bit excessive.

This final bit creates a procedure that gathers the last 30 days' worth of archived WaitStatistics. It then sends an e-mail out to whomever with three tables: one with the day's waits, one with the top waits by the day of the week, and one with the typical top waits for the time of day.

The results in the e-mail will look a little something like this:

Wednesday
Sep192012

What's hanging my SQL Server right now?

When figuring out why SQL Server is slow (right now!) I like to identify waits that are occurring on the system presently and any activities that are accumulating those waits.  I also like to see all the blocked sessions.

The following query gets session information (id, wait duration, sqltext, blocker) for any blockers and non-background activities that are incurring waits that the system is currently waiting on.  . 

A description of one of the dmv's is here.  Another more robust tool I like to use is here.  I'm not affiliated with either of these two links.