This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Dashboards

Use dashboards to monitor instance health metrics

1: Global Overview
2: Instance Overview

2.1: Query Detail

3: SQL Server Events

3.1: Errors
3.2: Deadlocks
3.3: Blocking
3.4: Timeouts

4: SQL Server Agent Jobs
5: Query Stats

5.1: Query Stats Detail

6: Capacity Planning
7: Index Analysis
8: Custom Metrics
9: Geek Stats
10: Always On Availability Groups

10.1: Always On Availability Group Detail

Using the taskbar on the left, you can click on the topmost button to open a list of the available dashboards, that you can use to monitor your SQL Server instances.

QMonitor uses Grafana dashboards: Grafana is a powerful data analytics platform that provides advanced dashboarding capabilities and represents a de-facto standard for monitoring and observability applications.

All the data in the dashboards can be filtered using the time filter on the top right corner: it offers predefined quick time ranges, like “Last 5 minutes”, “Last 1 hour”, “Last 7 days” and so on. These are usually the easiest way to select the time range.

If you want, you can also use absolute time ranges, that you can select with the calendar on the left side of the time picker popup. You can use the calendar buttons on the From and To fields to pick a date or you can enter the time range manually.

1 - Global Overview

An overall view of your SQL Server estate

The Global Overview dashboard is your entry point to the SQL Server infrastructure: it provides an at-a-glance view of all the instances, along with useful performance metrics.

At the top left of the dashboard, you have KPIs for the total number of monitored instances, divided between on-premises and Azure instances. At the top right you have the same KPI for the total number of monitored databases, again divided between on-premises and Azure.

The middle of the dashboard contains the Instances Overview table, with the following information:

SQL Instance: The name of the instance. For on-premises SQL Servers, this corresponds to the name returned by @@SERVERNAME, except that the backslash is replaced by a colon in named instances (you have SERVER:INSTANCE instead of SERVER\INSTANCE).
For Azure SQL Managed Instances and Azure SQL Databases, the name is the network name of the logical instance.
Database: for Azure SQL Databases, the name of the database
Elastic Pool: for Azure SQL Databases, the name of the elastic pool if in use, <No Pool> otherwise.
Database Count: the number of databases in the instance
Edition: the edition of SQL Server (Enterprise, Standard, Developer, Express). For Azure SQL Databases it is “Azure SQL Database”. For Azure SQL Managed Instances, it can be GeneralPurpose or BusinessCritical.
Version: The version of SQL Server. For Azure SQL Database it contains the service tier (Basic, Standard, Premium…)
Last CPU: the last value captured for CPU usage in the selected time interval
Average CPU: the average CPU usage in the time interval
Lowest disk space %: the percent of free space left in the disk that has the least space available. For Azure SQL Databases and Azure SQL Managed Instances the percentage is calculated on the maximum space available for the current tier.

At the bottom of the dashboard, you have the detail of the disk space available on all instances. The table contains the following information:

SQL Instance: the name of the instance, Azure SQL Database or Azure SQL Managed Instance.
Database: for Azure SQL Databases, the name of the database
Elastic Pool: for Azure SQL Databases, the name of the elastic pool if in use, <No Pool> otherwise.
Volume: drive letter or mount point of the volume
Free %: Percentage of free space in the volume
Available Space: Available space in the volume. The unit measure is included in the value.
Used Space: Used space in the volume
Total Space: Size of the Volume (Used space + Available space)

2 - Instance Overview

Detailed information about the performance of a SQL Server instance

This dashboard is one of the main sources of information to control the health and performance of a SQL Server instance. It contains the main performance metrics that describe the behavior of the instance over time.

At the top you can find the Instance Info section, where the properties of the instance are displayed. You have information about the name, version, edition of the instance, along with hardware resources available (Total Server CPUs and Total Server Memory).

You also have KPIs for the number of databases, with the counts for different states (online, corrupt, offline, restoring, recovering and recovery pending).

At the bottom of the section, you have a summary of the state of any configured Always On Availability Groups.

Cpu & Wait Stats

At the top of this section you have the chart that represents the percent CPU usage for the SQL Server process and for other processes on the same machine.

The second chart represents the percent CPU usage by resource pool. This chart will help you understand which parts of the workload are consuming the most CPU, according to the resource pool that you defined on the instance. If you are on an Azure SQL Managed Instance or on an Azure SQL Database, you will see the predefined resource pools available from Azure, while on an Enterprise or Developer edition you will see the user defined resource pools. For a Standard Edition, this chart will only show the internal pool.

The Wait Stats (by Category) chart represents the average wait time (per second) by wait category. The individual wait classes are not shown on this chart, which only represents wait categories: in order to inspect the wait classes, go to the Geek Stats dashboard.

Memory

This section contains charts that display the state of the instance in respect to the use of memory. The chart at the top left is called “Server Memory”, and shows Target Server Memory vs Total Server Memory. The former represents the ideal amount of memory that the SQL Server process should be using, the latter is the amount of memory currently allocated to the SQL Server process. When the instance is under memory pressure, the target server memory is usually higher than total server memory.

The second chart shows the distribution of the memory between the memory clerks. A healthy SQL Server instance allocates most of the memory to the Buffer Pool memory clerk. Memory pressure could show on this chart as a fall in the amount of memory allocated to the Buffer Pool.
Another aspect to keep under control is the amount of memory used by the SQL Plans memory clerk. If SQL Server allocates too much memory to SQL Plans, it is possible that the cache is polluted by single-use ad-hoc plans.

The third chart displays Page Life Expectancy. This counter is defined as the amount of time that a database page is expected to live in the buffer cache before it is evicted to make room for other pages coming from disk. A very old recommendation from Microsoft was to keep this counter under 5 minutes every 4 Gb of RAM, but this threshold was identified in a time when most servers had mechanical disks and much less RAM than today.
Instead of focusing on a specific threshold, you should interpret this counter as the level of volatility of your buffer cache: a too low PLE may be accompanied by elevated disk activity and higher disk read latency.

Next to the PLE you have the Memory Grants chart, which represents the number of memory grant outstanding and pending. At any time, having Memory Grants Pending greater that zero is a strong indicator of memory pressure.

Lazy Writes / sec is a counter that represents the number of writes performed by the lazy writer process to eliminate dirty pages from the Buffer Pool outside of a checkpoint, in order to make room for other pages from disk. A very high number for this counter may indicate memory pressure.

Next you have the chart for Page Splits / sec, which represents how many page splits are happening on the instance every second. A page split happens every time there is not enough space in a page to accommodate new data and the original page has to be split in two pages.
Page splits are not desirable and have a negative impact on performance, especially because split pages are not completely full, so more pages are required to store the same amount of information in the Buffer Cache. This reduces the amount of data that can be cached, leading to more physical I/O operations.

Activity

This section contains charts that display multiple SQL Server performance counters.

First you have the User Connections chart, which displays the number of active connections from user processes. This number should be consistent with then number of people or processes hitting the database and should not increase indefinitely (connection leak).

Next, we have the number of Compilations/sec vs Recompilations/sec. A healthy SQL Server database caches most of its execution plans for reuse, so that it does not need to compile a plan again: compiling plans is a CPU-intensive operation and SQL Server tries to avoid it as much as it can. A rule of thumb is to have a number of compilations per second that is 10% of the number of Batch Requests per second. A workload that contains a high number of ad-hoc queries will generate a higher rate of compilations per second.
Recompilations are very similar to compilations: SQL Server identifies in the cache a plan with one or more base objects that have changed and sends the plan to the optimizer to recompile it.
Compiles and recompiles are expensive operations and you should look for excessively high values for these counters if you suffer from CPU pressure on the instance.

The Access Methods charts displays Full Scans/sec vs Index Searches/sec. A typical OLTP system should get a low number of scans and a high number of Index Searches. On the other hand, a typical OLAP system will produce more scans.

The Transactions/sec panel displays the number of transactions/sec on the instance. This allows you to identify which database is under the higher load, compared to the ones that are not heavily utilized.

TempDB

This section contains panels that describe the state of the Tempdb database. The tempdb database is a shared system database that is crucial for SQL Server performance.

The Data Used Space displays the allocated File(s) size compared to the actual Used Space in the database. Observing these metrics over time allows you to plan the size of your tempdb database, avoiding autogrow events. It also helps you size the database correctly, to avoid wasting too much disk space on a data file that is never entirely used by actual database pages.

The Log Used Space panel does the same, with log files.

Active Temp Tables shows the number of temporary tables in tempdb. This is not only the number of temporary tables created explicitly from the applications (table names with the # or ## prefix), but also worktables, spills, spools and other temporary objects used by SQL Server during the execution of queries.

The Version Store Size panel shows the size of the Version Store inside tempdb. The Version Store holds data for implementing optimistic locking by taking transaction-consistent snapshots of the data on the tables instead of imposing locks. If you see the size of Version Store going up continuously, you may have one or more open transactions that are not being committed or rolled back: in that case, look for long standing sessions with transaction count greater than one.

Database & Log Size

Database Volume I/O

Queries

2.1 - Query Detail

Detailed information about a specific SQL query

The Query Detail dashboard displays details for a single SQL query.

The top panel shows the query text as QMonitor captured it. Queries generated by ORMs or written on a single line can be hard to read. Click “Format” to apply readable SQL formatting.

Click “Copy to Clipboard” to copy the query for running or analysis in external tools (such as SSMS). Use “Download” to save the query as a .sql file.

The table below lists all executions of this query within the selected time range. QMonitor captures a sample every 15 seconds: long-running queries will produce multiple samples, and queries running at the instant of capture will produce a sample as well.

Samples alone may not fully reflect a query’s resource usage or execution time. For a complete impact analysis, rely on the Query Stats data: Query Stats.

3 - SQL Server Events

Events analysis

The Events dashboard shows the number of events that occurred on the SQL Server instance during the selected time range.

The top chart breaks events down by type:

Errors
Deadlocks
Blocking
Timeouts

Expand a row to view a chart for that event type by database and a list of individual events. Click a row’s hyperlink to open a detailed dashboard for that event type, where you can inspect the event details.

3.1 - Errors

Details about errors occurring on the instance

Expand the “Errors” row to see a chart that shows the number of errors per database over time.

Below the chart, a table lists individual error events with these columns: timestamp, database name, application name, host name, username, severity, error number, and error message.

Only errors with severity >= 16 are included. Error number 17830 is excluded because it can occur very frequently.

Use the filter controls in the column headers to filter the table. Click a column header to sort by that column: each click cycles through ascending, descending, and no sort.

Click the link in the Event Sequence column to open the error details dashboard. It shows the full error message and, when available, the SQL statement that caused the error. The SQL text may be unavailable for some error types.

3.2 - Deadlocks

Information on deadlocks

Expand the “Deadlocks” row to view a chart that shows the number of deadlocks for each database.

SQL Server detects a deadlock when two or more sessions block each other so none can proceed. To resolve the conflict, SQL Server selects a victim and terminates that session’s statement. QMonitor captures deadlock events and stores the deadlock graph (XML) for analysis.

Under the chart is a list of deadlock events with columns for time, sequence, database, and user name. Use the column filters and sort controls to filter and sort the table.

Click a row to open the Deadlock detail dashboard.

The Deadlock XML panel displays the deadlock graph in XML format. That graph contains nodes for processes, resources, execution stacks and inputbuf; documenting every node is beyond the scope of this documentation.

The XML includes one or more nodes that identify the victim and the participating processes, and that provide details about the resources and SQL statements involved. Use the graph and statements to identify the conflict and to find candidate fixes (indexing, query changes, or retry logic).

The bottom grid lists sessions that were running around the event time, giving a quick overview of related activity. Use the buttons above the grid to set the time window around the event from 1 to 15 minutes.

3.3 - Blocking

Blocking Events

Expand the “Blocking” row to view a chart that shows the number of blocking events for each database.

Blocking events are raised by SQL Server when a session waits on a lock longer than the blocked process threshold. By default the threshold is 0 (no events). The QMonitor setup script sets the threshold to 10 seconds as a recommended starting point. If you see too many events, increase the threshold. After you’ve resolved most blocking events, you can experiment with lowering the threshold.

Under the chart is a list of blocking events with columns for time, sequence, database, object ID, duration, lock mode, and resource owner type. Use the column filters and sort controls to filter and sort the table.

Click a row to open the Blocking detail dashboard.

The top table in the detail view shows the same event information as the events dashboard. The Blocking XML panel displays the blocked process report in XML format. That report contains many nodes; documenting every node is beyond the scope of this documentation.

The XML includes one or more nodes and one or more nodes. These nodes identify the blocked and blocking processes and provide details about the resources the blocked process was waiting on.

The bottom grid lists sessions that were running around the event time, giving a quick overview of blocking and blocked processes. Use the buttons above the grid to set the time window around the event from 1 to 15 minutes.

3.4 - Timeouts

Information on query timeouts

Expand the “Timeouts” row to view a chart that shows the number of timeouts for each database.

Query timeouts occur when a client, application, or command exceeds its configured timeout before the operation completes. QMonitor captures timeout events and records the error text, session details, and, when available, the SQL text.

Under the chart is a list of timeout events with columns for time, sequence, database, duration, application, username and duration. Use the column filters and sort controls to filter and sort the table.

Click a row to open the Timeout detail dashboard.

The top table in the detail view shows the same event information as the events dashboard. The Timeout Statement panel displays the SQL statement found int the timeout event. The SQL text may be unavailable for some event types.

Investigate frequent timeouts by reviewing the SQL, execution plan, and wait types. Consider query tuning or indexing, increase client timeouts only after identifying the root cause, or add retry logic where appropriate.

4 - SQL Server Agent Jobs

Check the job activity

The SQL Server Agent Jobs dashboard provides a compact view of job activity and execution history so you can monitor health, spot failures, and investigate scheduling or duration issues.

Jobs Overview

KPIs show totals for the selected interval and instances:
- Total Job Executions: total runs observed.
- Jobs Succeeded: completed successfully.
- Jobs Failed: finished with an error.
- Jobs Retried: runs that retried after transient failures.
- Jobs Canceled: runs canceled manually or programmatically.
- Jobs In Progress: currently running jobs.
Use these KPIs for a quick health check and to detect elevated failure or retry rates that need attention.

Job Summary

A summary table groups executions by job and highlights aggregate stats:
- Job Name
- Total Executions
- Average Duration
- Max Duration
- Last Executed At
- Last Outcome
- Last Duration
Sort and filter the table to prioritize jobs with long or frequently varying durations, or with recent failures.

Job Execution Timeline

A Gantt-style timeline plots each job as a row with start/end bars for individual executions across the selected interval.
Bars are color-coded by status (succeeded, failed, in progress) so you can quickly see scheduling conflicts, overlapping runs, and periods with failures.
Zoom and pan to inspect specific windows and correlate with other metrics.

Job Execution Details

A detailed table lists individual executions with full context:
- Job Name
- Job ID
- Job Duration
- Start Time
- End Time
- Job Status
- Execution Type (Scheduled, Manual, etc.)
- Error Message (when available)
Click a row to view step-level output, error details, and the execution log.

Investigation tips

Filter by instance, owner, or outcome to isolate problematic jobs.
Correlate job failures and long durations with CPU, I/O, and blocking at the same timestamps to find root causes.
For recurring transient failures, consider retry logic or schedule changes to avoid resource contention windows.
Use the timeline to detect overlapping schedules; stagger long-running jobs to reduce contention.

5 - Query Stats

General Workload analysis

The Query Stats dashboard summarizes workload characteristics and surfaces high cost queries so you can prioritize tuning and capacity decisions.

At the top:

Worker Time by Database: a chart that shows query worker time (CPU) attributed to each database. Use this to spot databases driving CPU usage and to compare trends over time.
Logical Reads by Database: a chart that shows logical page reads per database. High or rising reads indicate I/O pressure or inefficient query plans.

Below the charts are three sections that drill into queries and their history.

Query Stats by Database and Query

Shows top queries grouped by database and text (or normalized text).
Columns include cumulative worker time, logical reads, duration, and execution count. Use this view to find the heaviest queries within a specific database.
Use filters and sorting to narrow by database, application, or host, then click a row to open the query detail dashboard.

Query Stats by Query

Aggregates statistics across databases for identical or normalized query text. This helps identify widely-used queries or shared code paths that impact multiple databases.
Columns include totals and averages (CPU, reads, duration, executions) and can be used to detect candidates for indexing, parameterization, or query rewrite.

Query Regressions

Highlights queries with significant changes in performance over the selected time window (for example, increases in duration, CPU, or logical reads).
Uses baseline comparisons and historical plan information to flag likely regressions caused by plan changes, data distribution shifts, or blocking.
Click a query hash to inspect historical plans, execution stats, and complete query text.

Query Store and data sources

Query statistics are gathered from QMonitor event capture and, when enabled, from SQL Server Query Store. Query Store provides persisted plan history and runtime aggregates that are useful for regression analysis and plan forcing.
If Query Store is disabled or unavailable, QMonitor will use Query Stats.

Investigating queries

Drill into a query to view SQL text, execution plans, and example runtime statistics. Compare current and historical plans to identify plan changes.
Check execution counts — high cumulative cost with many executions may be fixed via caching or tuning; high single-execution cost may need plan-level fixes.
Use execution plans, wait types, and logical reads to decide between indexing, query rewrite, statistics updates, or parameter sniffing mitigations.

Controls and tips

Use the time-range selectors and per-column filters to focus analysis on the relevant interval and workloads.
Sort by cumulative worker time or logical reads to prioritize the biggest opportunities.
When investigating regressions, capture a longer history where possible to distinguish transient spikes from sustained regressions.

5.1 - Query Stats Detail

Detailed statistics about a specific SQL server query

The Query Stats Detail dashboard focuses on a single query (text or normalized text) and shows how each compiled plan for that query performed over the selected interval.

Top: Query text

The full SQL text (or normalized text) is shown at the top for context.
Use the copy and download controls to move the text into SSMS or a local editor.
Use the format button to view a formatted version of the query text

Plans summary table (Totals by plan)

A compact table that lists each plan compiled for this query with totals and averages:
- Execution count
- Worker time (total)
- Total time (total)
- Averages (worker time / exec, total time / exec)
- Rows returned
- Memory grant (total)
Use the table to quickly identify the highest-cost plans and their relative impact.

Totals section

This section contains a time-series table of 5-minute aggregated samples for the selected time frame. Each row represents a 5-minute bucket and includes:
- Sample time
- Execution count
- Logical reads
- Logical writes
- Memory
- Physical reads
- Rows
- Total time
- Worker time
- Plan hash (clickable)
Plan hash actions:
- Click a plan hash to download the plan as a .sqlplan file.
- Open the downloaded .sqlplan in SSMS to inspect the graphical plan and operator costs.
Charts in Totals:
- Execution count by plan
- Memory by plan
- Total time by plan
- Worker time by plan
Use these charts to compare plan behavior over time and to spot periods where a specific plan dominated resource usage.

Averages section

Similar to Totals, but values are averaged per sample or per execution:
- The table shows the same 5-minute samples with averaged metrics (per-exec averages where applicable).
- Columns mirror the Totals table (execution count, reads, writes, memory, physical reads, rows, total time, worker time, plan hash).
Charts in Averages:
- Total time (avg) by plan
- Worker time (avg) by plan
Use averages to find plans with high per-execution cost even if execution counts are low.

Controls and investigation tips

Use the dashboard time-range controls to focus on the interval of interest.
Filter or sort the tables by plan hash, execution count, or cost metrics to prioritize investigation.
Download and open .sqlplan files in SSMS to review operator costs, warnings, and estimated vs actual row counts.
Compare Totals and Averages: high totals with low averages usually indicate frequent cheap executions; high averages suggest expensive single executions.

6 - Capacity Planning

An overall view of resource consumption to plan resource upgrades

This dashboard presents historical CPU capacity and utilization for your SQL Server instances so you can spot trends, judge current load, and predict when additional resources will be needed. Use the KPIs, charts, and summary table to compare differently sized servers on a common scale, identify sustained or growing load, and prioritize upgrades, VM rightsizing, or consolidation.

CPU History

The CPU History section begins with KPIs that summarize the selected instances: total available cores, the average CPU utilization expressed on a per-core basis (normalized to one core), and the average number of cores effectively used. Normalizing lets you compare a small VM and a large host on the same footing — for example, 20% average CPU on a 4‑core host becomes 80% when scaled to a single-core equivalent (20 * 4).
“Cores used” converts that normalized percentage into an estimated count of cores in use (Avg CPU% * Total Server Cores / 100). These KPIs provide a quick sense of both intensity (per-core pressure) and absolute demand.

Charts and trends

SQL Server CPU Usage (normalized to 1 core) is a time-series view that scales each instance’s CPU to a single-core equivalent. Use it to compare intensity across instances and to see how load changes over time.
SQL Server Core Usage shows estimated cores in use over time. This chart helps you understand aggregate core demand, spot sustained increases, and evaluate capacity headroom.

Summary table

The table ties the charts to individual instances and provides:
- SQL Instance name
- Avg CPU Usage % over the selected interval
- Total Server Cores assigned to the host
- CPU Usage % (Normalized) = Avg CPU% * Total Server Cores
- Cores Used (Normalized) = (Avg CPU% * Total Server Cores) / 100
Use the table to rank instances by CPU absolute usage, and to identify machines that would benefit from deeper investigation or redistribution of workload.

Interpreting results and next steps

Look for rising trends in normalized CPU% or sustained high cores-used values; these indicate growing pressure that should be addressed before performance degrades.
For sustained high per-core utilization, investigate top CPU queries, parallelism settings, or workload placement. For high aggregate core demand, consider adding cores, resizing VMs, or offloading noncritical workloads.
Correlate CPU trends with memory, I/O, and wait-type dashboards to form a complete capacity plan; allow headroom (commonly 20–30%) for growth and transient spikes unless autoscaling is available.

Controls and tips

Adjust the dashboard time range to reveal hourly, daily, or weekly trends depending on your planning horizon. Filter by environment (prod/test) or cluster (applications or domain) to focus analysis. Combine these views with query stats dashboards to identify root causes before changing capacity.

Data & Log Size

This section tracks historical data and log file growth so you can spot rapidly growing databases and plan storage capacity or maintenance.

Top: Data & Log KPIs

Initial Data Size: database data file size at the start of the selected interval (GB).
Latest Data Size: most recent data file size (GB).
Data Growth: growth between initial and latest sizes.
Initial Log Size: log file size at the start of the interval (GB).
Latest Log Size: most recent log file size (GB).
Log Growth: growth for the log file.

Charts

Data Size over time: time-series chart showing data file size changes for selected servers. Use it to detect steady growth or sudden jumps.
Log Size over time: time-series chart showing log file size trends

Summary table

Columns:
- SQL Instance
- Database
- Initial Data Size
- Latest Data Size
- Data Growth
- Initial Log Size
- Latest Log Size
- Log Growth
Use the table to rank databases by growth, find candidates for archive, compression, index maintenance, or retention policy changes, and to plan storage purchases or quota adjustments.

Interpretation and actions

Rapid, sustained data growth may indicate new workloads, retention changes, or missing cleanup jobs — investigate recent deployments and ETL processes.
Large or growing log files often point to long-running transactions, infrequent log backups (in full recovery), or heavy bulk operations — review backup and recovery settings and transaction patterns.
Use time-range filters to focus on growth windows (daily, weekly, monthly) and sort the table by growth to prioritize action.

Disk Usage

This section shows historical disk latency and throughput so you can spot I/O bottlenecks and capacity issues affecting database performance.

Top: Disk KPIs

Avg Read Latency: average read latency observed over the selected interval (ms).
Avg Read Bytes/sec: average read throughput (bytes/sec).
Avg Write Latency: average write latency observed over the selected interval (ms).
Avg Write Bytes/sec: average write throughput (bytes/sec).

Charts

Avg Read/Write Latency: time-series chart showing read and write latency over time. Use this to spot periods of elevated latency and correlate with changes and trends in the workload.
Total Throughput: time-series chart of Read Bytes/sec and Write Bytes/sec. Use this to identify sustained high throughput or bursts that may saturate the storage subsystem.

Disk Usage Summary table

Columns:
- SQL Server Instance
- Avg Read Latency
- Read Bytes/sec
- Avg Write Latency
- Write Bytes/sec
The table summarizes latency and throughput per instance so you can rank servers by I/O pressure and prioritize investigation or remediation.

Interpretation and actions

Elevated read or write latency often points to storage contention, slow disks, or high queue depth; correlate with throughput and queue metrics.
High read throughput with low latency may indicate healthy caching; high throughput with rising latency suggests the cache is saturated or the storage tier is overloaded.
For write-heavy workloads, review transaction patterns, log placement, and backup frequency; consider faster storage or write-optimized tiers.
Use instance filters and time-range controls to isolate problematic windows and correlate with query, CPU, and I/O dashboards before changing hardware or storage tiers.

Memory Usage

This section shows memory demand and headroom so you can detect pressure on the buffer pool and plan memory changes or workload placement.

Top: Memory KPIs

Avg Allocated Memory: average memory allocated to SQL Server over the selected interval (MB or GB).
Max Allocated Memory: peak memory allocation observed during the interval.
Avg Target Memory: average target memory SQL Server attempted to obtain (based on internal heuristics / memory clerk targets).
Max Target Memory: peak target memory during the interval.

Charts

Memory usage over time: time-series chart showing Avg Allocated Memory, Max Allocated Memory, Avg Target Memory, and Max Target Memory. Use this to spot sustained allocation near target (indicating pressure) or sudden spikes in demand.

Summary table

Columns:
- SQL Instance
- Avg Allocated Memory
- Max Allocated Memory
- Avg Target Memory
- Max Target Memory
The table provides a per-instance snapshot to help rank servers by memory consumption and to identify candidates for resizing or investigation.

Interpretation and actions

Large gaps between Target and Allocated Memory may indicate internal limits; investigate OS signals, memory clerks, and max memory configuration.
Correlate memory trends with CPU and I/O dashboards to determine whether adding memory will reduce I/O or improve overall performance before changing server sizing.
Use time-range filters to identify daily or weekly patterns and to size for peak demand with appropriate headroom.

7 - Index Analysis

Missing Indexes and Possible bad Indexes

The Index Analysis dashboard helps you prioritize index work by showing high-value missing index suggestions and identifying existing indexes that may be hurting performance because they attract writes but provide little read benefit.

The values in the dashboard are based on a snapshot of the last 24 hours.

Missing Indexes (top)

This table lists optimizer-suggested indexes and helps estimate potential query benefit:
- Object Name: the table or view that would benefit from the index.
- Advantage: a short description of the expected improvement (e.g., reduced logical reads).
- Impact: an estimate of the relative benefit across the workload.
- Equality columns: columns recommended for equality predicates (key columns).
- Inequality columns: columns recommended for range/inequality predicates.
- Included columns: non-key columns suggested for covering the queries.
- User Seeks: number of seeks that could use the suggested index.
- Last Seek: timestamp of the last observed seek opportunity.
- Unique compiles: compilation counts that referenced the missing index.
- User cost: estimated cumulative cost reduction if the index were present.
Use this table to find high-impact index candidates. Review the SQL that benefits, confirm selectivity and cardinality, and weigh expected read savings against write and storage costs before creating an index.

Possible Bad Indexes (below)

This table surfaces existing indexes that may be candidates for removal, consolidation, or rebuild because they incur write overhead without enough read benefit:
- Database: database name containing the index.
- Schema: schema name.
- Table: table name.
- Index: index name.
- Total Writes: cumulative writes (inserts/updates/deletes) affecting the index.
- Total Reads: cumulative reads (scans/seeks) that used the index.
- Difference: Total Writes minus Total Reads (high positive values suggest write-heavy, low-used indexes).
- Fill factor: configured fill factor for the index (indicates fragmentation risk).
- Disabled: whether the index is disabled.
- Hypothetical: whether the index is hypothetical (not actually created).
- Filtered: whether the index uses a filter predicate.
Sort and filter the table to find indexes with large write cost and minimal read benefit. Investigate usage patterns and test removal or consolidation in a non-production environment before dropping indexes.

Controls and investigation tips

Filter by database, schema, or table to narrow scope.
Click a row to open index details (definition, sample queries, historical usage) and to download index DDL for review.
Consider the write cost, storage, and maintenance impact when acting on missing index suggestions. Missing-index estimates are heuristic — validate with query tests, execution plans, and Explain/actual runs.
For possible bad indexes, evaluate whether a filtered index, different key/included columns, or index consolidation would preserve read benefits while reducing write overhead.

8 - Custom Metrics

Custom Metrics

This dashboard displays custom metric measurements pulled from the selected measurement source.

Top controls

Measurement selector: a dropdown to choose which measurement to query.
Time filter: applies the selected time window to the query.

Data table

The table below the selector shows the measurement data for the chosen measurement and time range. Typical columns include timestamp, value, and any measurement tags or labels.
The dashboard retrieves up to 10,000 data points for the selected query. If the time range or measurement produces more points, results are truncated to this limit.

Usage notes

Select the desired measurement, choose an appropriate time window, and refresh the view to populate the table.
Narrow the time range or apply server/instance filters when results are truncated due to the 10,000‑point limit.
Export or copy table rows for offline analysis if needed.

Future capability

We are working on support for creating custom dashboards directly from these measurements. This feature is in development and will be available soon.

9 - Geek Stats

Geek Stats

The Geek Stats dashboard exposes low-level contention and synchronization metrics so you can diagnose waits and spinlock behavior that can impact throughput or cause unexpected CPU consumption. Use these views to find hotspots, correlate with higher-level symptoms, and guide targeted fixes.

Not all users are interested in this type of metrics, so we decided to dedicate a dashboard instead of including this information in the Instance Overview dashboard.

Wait Stats

Wait Stats (by category): a top-level chart that groups waits into the same categories used on the Instance Overview. This view helps you quickly see which broad wait families (I/O, CPU-related, latch/lock, network, etc.) dominate during the selected interval.
Wait Stats (by type): a detailed chart that shows individual SQL Server wait types (the same names you see in SQL Server DMVs). Use this to identify specific waits such as PAGEIOLATCH, CXPACKET, SOS_SCHEDULER_YIELD, or ASYNC_NETWORK_IO and to track their trend over time. Filter and sort to surface the top contributing wait types and drill into the time windows where they spike.

Spinlock Stats

The Spinlock section presents four charts that measure spinlock activity:
- Collisions: how often threads encountered a collision on a spinlock.
- Spins: total spin attempts observed.
- Spins per collision: average number of spins required for each collision, indicating how costly each contention event is.
- Backoffs: counts of threads backing off (yielding) after spinning.
Interpret spin metrics together: high collisions with high spins-per-collision imply frequent, costly busy-waiting and wasted CPU. High backoffs suggest threads are repeatedly yielding and retrying.
Use these charts to correlate spinlock pressure with CPU spikes or scheduling issues, and to prioritize micro-level fixes (e.g., addressing hot memory structures, reducing shared-state contention, or applying product fixes).

Investigation tips

When you see elevated waits or spinlock activity, correlate timestamps with query CPU, I/O, and scheduler metrics to find root causes.
High category-level waits point you to broad problem areas; the by-type view helps you pinpoint exact resources or operations involved.
For spinlock issues, examine workload patterns that touch shared caches, global counters, or frequently-updated metadata. Changes such as reducing contention hotspots, batching updates, or upgrade/patches may help.

10 - Always On Availability Groups

Check High Availability of databases

This dashboard provides an at-a-glance overview of Always On Availability Groups (AGs) across the selected instances. Use it to verify AG health, replica roles, and synchronization status, and to quickly locate groups that need attention.

Availability Groups table

Availability Group: AG name (click to open the AG detail dashboard).
Primary Replica: the current primary replica host for the AG.
Secondary Replicas: comma-separated list of configured secondary replicas.
Total Nodes: number of replicas configured in the AG.
Online Nodes: replicas currently online and reachable.
N. Databases: number of databases protected by the AG.
Synchronization Health: overall sync state (Healthy, Not Healthy) based on replica synchronization and failover readiness.
Listener DNS Name: cluster listener DNS name, if configured.
Listener IP: listener IP address or addresses.

Usage

Click an AG name to view the detail dashboard for per-replica metrics, database synchronization progress, failover readiness.
Filter or sort the table to find AGs with offline nodes, unsynchronized databases, or other anomalies that require investigation.

10.1 - Always On Availability Group Detail

Check the state of a High Availability Group

This dashboard shows detailed health and replication telemetry for a single Always On Availability Group (AG). Use it to verify replica roles, track failovers, monitor data movement, and identify databases that need attention.

Top: AG summary

Availability Group: AG name
Primary Replica: current primary replica host
Secondary Replicas: configured secondaries
Total Nodes: count of configured replicas
Online Nodes: replicas currently online and reachable
N. Databases: number of databases in the AG
Synchronization Health: overall sync state for the AG
Listener Name: cluster listener DNS name (if configured)
Listener IP: listener IP address(es)

Primary Replica Failovers timeline

A timeline that shows which replica was primary at each point in time.
Use it to review recent failovers and to correlate role changes with events or performance anomalies.

Availability Group Nodes table

Replica Instance: instance name for each replica
Replica role: Primary / Secondary
Sync. Health: per-replica synchronization status
Availability Mode: synchronous / asynchronous
Failover Mode: automatic / manual
Seeding Mode: automatic / manual
Secondary Allow Connections: read-intent settings for secondaries
Backup Priority: priority used for backup routing
Endpoint URL: data movement endpoint
R/O Routing URL: read-only routing address (if configured)
R/W Routing URL: read-write routing address (if configured)

Nodes KPIs and online history

KPIs: Total Nodes and Offline Nodes for quick situational awareness.
Online Nodes chart: time-series showing the number of online replicas over the selected interval to spot outages or flapping nodes.

Transfer rates and queues

Transfer Rates chart: Send Rate (how fast the primary sends changes) and Redo Rate (how fast secondaries apply changes). Use to spot slow secondaries or network saturation.
Transfer Queue Size chart: Send Queue Size and Redo Queue Size. Growing queues indicate replication lag or bottlenecks that may affect failover readiness.

Health history charts

Online node history: online vs total nodes over time to visualize availability trends.
Database Health History: healthy databases vs total databases to track when databases become unsynchronized or unhealthy.

Databases Replication Status table

SQL Instance: instance hosting the database replica
Database Name: database name
Sync. Health: synchronization status for the database
Is Primary Replica: indicates whether this row is the primary
Availability Mode: database-level availability mode (inherits from AG)

Usage and investigation tips

Correlate failover times with primary timeline and with performance metrics (CPU, I/O) to find causes of role changes.
Increasing send/redo queues or sustained low redo rates often point to network, disk, or resource contention on secondaries — investigate those hosts before initiating failover or taking corrective action.