Dashboards
Use dashboards to monitor instance health metrics
Using the taskbar on the left, you can click on the topmost button to open a list of the available dashboards,
that you can use to monitor your SQL Server instances.
QMonitor uses Grafana dashboards: Grafana is a powerful data analytics platform that provides advanced dashboarding
capabilities and represents a de-facto standard for monitoring and observability applications.
All the data in the dashboards can be filtered using the time filter on the top right corner: it offers predefined quick time ranges, like
“Last 5 minutes”, “Last 1 hour”, “Last 7 days” and so on. These are usually the easiest way to select the time range.
If you want, you can also use absolute time ranges, that you can select with the calendar on the left side of the time picker popup.
You can use the calendar buttons on the From and To fields to pick a date or you can enter the time range manually.
1 - Global Overview
An overall view of your SQL Server estate
The Global Overview dashboard is your entry point to the SQL Server infrastructure: it provides an at-a-glance view of all the instances,
along with useful performance metrics.
At the top left of the dashboard, you have KPIs for the total number of monitored instances, divided between on-premises and Azure instances.
At the top right you have the same KPI for the total number of monitored databases, again divided between on-premises and Azure.
The middle of the dashboard contains the Instances Overview table, with the following information:
- SQL Instance: The name of the instance. For on-premises SQL Servers, this corresponds to the name returned by @@SERVERNAME, except that
the backslash is replaced by a colon in named instances (you have SERVER:INSTANCE instead of SERVER\INSTANCE).
For Azure SQL Managed Instances and Azure SQL Databases, the name is the network name of the logical instance. - Database: for Azure SQL Databases, the name of the database
- Elastic Pool: for Azure SQL Databases, the name of the elastic pool if in use, <No Pool> otherwise.
- Database Count: the number of databases in the instance
- Edition: the edition of SQL Server (Enterprise, Standard, Developer, Express). For Azure SQL Databases it is “Azure SQL Database”.
For Azure SQL Managed Instances, it can be GeneralPurpose or BusinessCritical.
- Version: The version of SQL Server. For Azure SQL Database it contains the service tier (Basic, Standard, Premium…)
- Last CPU: the last value captured for CPU usage in the selected time interval
- Average CPU: the average CPU usage in the time interval
- Lowest disk space %: the percent of free space left in the disk that has the least space available. For Azure SQL Databases and
Azure SQL Managed Instances the percentage is calculated on the maximum space available for the current tier.
At the bottom of the dashboard, you have the detail of the disk space available on all instances. The table contains the following information:
- SQL Instance: the name of the instance, Azure SQL Database or Azure SQL Managed Instance.
- Database: for Azure SQL Databases, the name of the database
- Elastic Pool: for Azure SQL Databases, the name of the elastic pool if in use, <No Pool> otherwise.
- Volume: drive letter or mount point of the volume
- Free %: Percentage of free space in the volume
- Available Space: Available space in the volume. The unit measure is included in the value.
- Used Space: Used space in the volume
- Total Space: Size of the Volume (Used space + Available space)
2 - Instance Overview
Detailed information about the performance of a SQL Server instance
This dashboard is one of the main sources of information to control the health and performance of a SQL Server instance.
It contains the main performance metrics that describe the behavior of the instance over time.
At the top you can find the Instance Info section, where the properties of the instance are displayed. You have information
about the name, version, edition of the instance, along with hardware resources available (Total Server CPUs and Total Server Memory).
You also have KPIs for the number of databases, with the counts for different states (online, corrupt, offline, restoring, recovering and recovery pending).
At the bottom of the section, you have a summary of the state of any configured Always On Availability Groups.
Cpu & Wait Stats
At the top of this section you have the chart that represents the percent CPU usage for the SQL Server process
and for other processes on the same machine.
The second chart represents the percent CPU usage by resource pool. This chart will help you understand which parts of the workload are
consuming the most CPU, according to the resource pool that you defined on the instance. If you are on an Azure SQL Managed Instance or
on an Azure SQL Database, you will see the predefined resource pools available from Azure, while on an Enterprise or Developer edition
you will see the user defined resource pools. For a Standard Edition, this chart will only show the internal pool.
The Wait Stats (by Category) chart represents the average wait time (per second) by wait category. The individual wait classes are not
shown on this chart, which only represents wait categories: in order to inspect the wait classes, go to the
Geek Stats dashboard.
Memory
This section contains charts that display the state of the instance in respect to the use of memory. The chart at the top left is
called “Server Memory”, and shows Target Server Memory vs Total Server Memory. The former represents the ideal amount of memory that
the SQL Server process should be using, the latter is the amount of memory currently allocated to the SQL Server process. When the
instance is under memory pressure, the target server memory is usually higher than total server memory.
The second chart shows the distribution of the memory between the memory clerks. A healthy SQL Server instance allocates most of the
memory to the Buffer Pool memory clerk. Memory pressure could show on this chart as a fall in the amount of memory allocated to the Buffer Pool.
Another aspect to keep under control is the amount of memory used by the SQL Plans memory clerk. If SQL Server allocates too much
memory to SQL Plans, it is possible that the cache is polluted by single-use ad-hoc plans.
The third chart displays Page Life Expectancy. This counter is defined as the amount of time that a database page is expected to
live in the buffer cache before it is evicted to make room for other pages coming from disk. A very old recommendation from Microsoft
was to keep this counter under 5 minutes every 4 Gb of RAM, but this threshold was identified in a time when most servers had mechanical
disks and much less RAM than today.
Instead of focusing on a specific threshold, you should interpret this counter as the level of volatility of your buffer cache: a too low
PLE may be accompanied by elevated disk activity and higher disk read latency.
Next to the PLE you have the Memory Grants chart, which represents the number of memory grant outstanding and pending. At any time, having
Memory Grants Pending greater that zero is a strong indicator of memory pressure.
Lazy Writes / sec is a counter that represents the number of writes performed by the lazy writer process to eliminate dirty pages from the
Buffer Pool outside of a checkpoint, in order to make room for other pages from disk. A very high number for this counter may indicate
memory pressure.
Next you have the chart for Page Splits / sec, which represents how many page splits are happening on the instance every second. A page
split happens every time there is not enough space in a page to accommodate new data and the original page has to be split in two pages.
Page splits are not desirable and have a negative impact on performance, especially because split pages are not completely full, so more
pages are required to store the same amount of information in the Buffer Cache. This reduces the amount of data that can be cached, leading
to more physical I/O operations.
Activity
This section contains charts that display multiple SQL Server performance counters.
First you have the User Connections chart, which displays the number of active connections from user processes. This number should be
consistent with then number of people or processes hitting the database and should not increase indefinitely (connection leak).
Next, we have the number of Compilations/sec vs Recompilations/sec. A healthy SQL Server database caches most of its execution plans for
reuse, so that it does not need to compile a plan again: compiling plans is a CPU-intensive operation and SQL Server tries to avoid it as
much as it can. A rule of thumb is to have a number of compilations per second that is 10% of the number of Batch Requests per second.
A workload that contains a high number of ad-hoc queries will generate a higher rate of compilations per second.
Recompilations are very similar to compilations: SQL Server identifies in the cache a plan with one or more base objects that have changed
and sends the plan to the optimizer to recompile it.
Compiles and recompiles are expensive operations and you should look for excessively high values for these counters if you suffer from CPU
pressure on the instance.
The Access Methods charts displays Full Scans/sec vs Index Searches/sec. A typical OLTP system should get a low number of scans and a high
number of Index Searches. On the other hand, a typical OLAP system will produce more scans.
The Transactions/sec panel displays the number of transactions/sec on the instance. This allows you to identify which database is under the
higher load, compared to the ones that are not heavily utilized.
TempDB
This section contains panels that describe the state of the Tempdb database. The tempdb database is a shared system database that is crucial
for SQL Server performance.
The Data Used Space displays the allocated File(s) size compared to the actual Used Space in the database. Observing these metrics over time
allows you to plan the size of your tempdb database, avoiding autogrow events. It also helps you size the database correctly, to avoid wasting
too much disk space on a data file that is never entirely used by actual database pages.
The Log Used Space panel does the same, with log files.
Active Temp Tables shows the number of temporary tables in tempdb. This is not only the number of temporary tables created explicitly from the
applications (table names with the # or ## prefix), but also worktables, spills, spools and other temporary objects used by SQL Server during
the execution of queries.
The Version Store Size panel shows the size of the Version Store inside tempdb. The Version Store holds data for implementing optimistic locking
by taking transaction-consistent snapshots of the data on the tables instead of imposing locks. If you see the size of Version Store going up
continuously, you may have one or more open transactions that are not being committed or rolled back: in that case, look for long standing
sessions with transaction count greater than one.
Database & Log Size
Database Volume I/O
Queries
2.1 - Query Detail
Detailed information about a specific SQL query
The Query Detail dashboard displays details for a single SQL query.
The top panel shows the query text as QMonitor captured it. Queries generated by
ORMs or written on a single line can be hard to read. Click “Format” to apply
readable SQL formatting.
Click “Copy to Clipboard” to copy the query for running or analysis in external
tools (such as SSMS). Use “Download” to save the query as a .sql file.
The table below lists all executions of this query within the selected time range.
QMonitor captures a sample every 15 seconds: long-running queries will produce
multiple samples, and queries running at the instant of capture will produce a
sample as well.
Samples alone may not fully reflect a query’s resource usage or execution time.
For a complete impact analysis, rely on the Query Stats data:
Query Stats.
3 - SQL Server Events
Events analysis
The Events dashboard shows the number of events that occurred on the SQL
Server instance during the selected time range.
The top chart breaks events down by type:
- Errors
- Deadlocks
- Blocking
- Timeouts
Expand a row to view a chart for that event type by database and a list of
individual events. Click a row’s hyperlink to open a detailed dashboard for
that event type, where you can inspect the event details.
3.1 - Errors
Details about errors occurring on the instance
Expand the “Errors” row to see a chart that shows the number of errors per database over time.
Below the chart, a table lists individual error events with these columns: timestamp,
database name, application name, host name, username, severity, error number, and error message.
Only errors with severity >= 16 are included. Error number 17830 is excluded because it
can occur very frequently.
Use the filter controls in the column headers to filter the table. Click a column header to
sort by that column: each click cycles through ascending, descending, and no sort.
Click the link in the Event Sequence column to open the error details dashboard.
It shows the full error message and, when available, the SQL statement that caused the error.
The SQL text may be unavailable for some error types.
3.2 - Deadlocks
Information on deadlocks
Expand the “Deadlocks” row to view a chart that shows the number of deadlocks
for each database.
SQL Server detects a deadlock when two or more sessions block each other so none
can proceed. To resolve the conflict, SQL Server selects a victim and terminates
that session’s statement. QMonitor captures deadlock events and stores the deadlock
graph (XML) for analysis.
Under the chart is a list of deadlock events with columns for time, sequence,
database, and user name. Use the column filters and sort controls to filter and
sort the table.
Click a row to open the Deadlock detail dashboard.
The Deadlock XML panel displays the deadlock graph in XML format. That
graph contains nodes for processes, resources, execution stacks and inputbuf; documenting
every node is beyond the scope of this documentation.
The XML includes one or more nodes that identify the victim and the
participating processes, and that provide details about the resources and SQL
statements involved. Use the graph and statements to identify the conflict and to
find candidate fixes (indexing, query changes, or retry logic).
The bottom grid lists sessions that were running around the event time, giving a
quick overview of related activity. Use the buttons above the grid to set the
time window around the event from 1 to 15 minutes.
3.3 - Blocking
Blocking Events
Expand the “Blocking” row to view a chart that shows the number of blocking events
for each database.
Blocking events are raised by SQL Server when a session waits on a lock longer than
the blocked process threshold. By default the threshold is 0 (no events). The
QMonitor setup script sets the threshold to 10 seconds as a recommended starting
point. If you see too many events, increase the threshold. After you’ve resolved most
blocking events, you can experiment with lowering the threshold.
Under the chart is a list of blocking events with columns for time, sequence,
database, object ID, duration, lock mode, and resource owner type. Use the column
filters and sort controls to filter and sort the table.
Click a row to open the Blocking detail dashboard.
The top table in the detail view shows the same event information as the events
dashboard. The Blocking XML panel displays the blocked process report in XML format.
That report contains many nodes; documenting every node is beyond the scope of
this documentation.
The XML includes one or more nodes and one or more
nodes. These nodes identify the blocked and blocking processes
and provide details about the resources the blocked process was waiting on.
The bottom grid lists sessions that were running around the event time, giving a
quick overview of blocking and blocked processes. Use the buttons above the grid
to set the time window around the event from 1 to 15 minutes.
3.4 - Timeouts
Information on query timeouts
Expand the “Timeouts” row to view a chart that shows the number of timeouts
for each database.
Query timeouts occur when a client, application, or command exceeds its
configured timeout before the operation completes. QMonitor captures timeout
events and records the error text, session details, and, when available, the
SQL text.
Under the chart is a list of timeout events with columns for time, sequence,
database, duration, application, username and duration.
Use the column filters and sort controls to filter and sort the table.
Click a row to open the Timeout detail dashboard.
The top table in the detail view shows the same event information as the
events dashboard. The Timeout Statement panel displays the SQL statement found
int the timeout event. The SQL text may be unavailable for some event types.
The bottom grid lists sessions that were running around the event time, giving
a quick overview of related activity. Use the buttons above the grid to set
the time window around the event from 1 to 15 minutes.
Investigate frequent timeouts by reviewing the SQL, execution plan, and wait
types. Consider query tuning or indexing, increase client timeouts only after
identifying the root cause, or add retry logic where appropriate.
4 - SQL Server Agent Jobs
Check the job activity
The SQL Server Agent Jobs dashboard provides a compact view of job activity
and execution history so you can monitor health, spot failures, and investigate
scheduling or duration issues.
Jobs Overview
- KPIs show totals for the selected interval and instances:
- Total Job Executions: total runs observed.
- Jobs Succeeded: completed successfully.
- Jobs Failed: finished with an error.
- Jobs Retried: runs that retried after transient failures.
- Jobs Canceled: runs canceled manually or programmatically.
- Jobs In Progress: currently running jobs.
- Use these KPIs for a quick health check and to detect elevated failure or
retry rates that need attention.
Job Summary
- A summary table groups executions by job and highlights aggregate stats:
- Job Name
- Total Executions
- Average Duration
- Max Duration
- Last Executed At
- Last Outcome
- Last Duration
- Sort and filter the table to prioritize jobs with long or frequently varying
durations, or with recent failures.
Job Execution Timeline
- A Gantt-style timeline plots each job as a row with start/end bars for
individual executions across the selected interval.
- Bars are color-coded by status (succeeded, failed, in progress) so you can
quickly see scheduling conflicts, overlapping runs, and periods with failures.
- Zoom and pan to inspect specific windows and correlate with other metrics.
Job Execution Details
- A detailed table lists individual executions with full context:
- Job Name
- Job ID
- Job Duration
- Start Time
- End Time
- Job Status
- Execution Type (Scheduled, Manual, etc.)
- Error Message (when available)
- Click a row to view step-level output, error details, and the execution log.
Investigation tips
- Filter by instance, owner, or outcome to isolate problematic jobs.
- Correlate job failures and long durations with CPU, I/O, and blocking at the
same timestamps to find root causes.
- For recurring transient failures, consider retry logic or schedule changes to
avoid resource contention windows.
- Use the timeline to detect overlapping schedules; stagger long-running jobs
to reduce contention.
5 - Query Stats
General Workload analysis
The Query Stats dashboard summarizes workload characteristics and surfaces high
cost queries so you can prioritize tuning and capacity decisions.
At the top:
- Worker Time by Database: a chart that shows query worker time
(CPU) attributed to each database. Use this to spot databases driving CPU
usage and to compare trends over time.
- Logical Reads by Database: a chart that shows logical page reads
per database. High or rising reads indicate I/O pressure or inefficient
query plans.
Below the charts are three sections that drill into queries and their history.
Query Stats by Database and Query
- Shows top queries grouped by database and text (or normalized text).
- Columns include cumulative worker time, logical reads, duration, and execution
count. Use this view to find the heaviest queries within a specific database.
- Use filters and sorting to narrow by database, application, or host, then
click a row to open the query detail dashboard.
Query Stats by Query
- Aggregates statistics across databases for identical or normalized query
text. This helps identify widely-used queries or shared code paths that
impact multiple databases.
- Columns include totals and averages (CPU, reads, duration, executions) and
can be used to detect candidates for indexing, parameterization, or query
rewrite.
Query Regressions
- Highlights queries with significant changes in performance over the selected
time window (for example, increases in duration, CPU, or logical reads).
- Uses baseline comparisons and historical plan information to flag likely
regressions caused by plan changes, data distribution shifts, or blocking.
- Click a query hash to inspect historical plans, execution stats, and
complete query text.
Query Store and data sources
- Query statistics are gathered from QMonitor event capture and, when enabled,
from SQL Server Query Store. Query Store provides persisted plan history and
runtime aggregates that are useful for regression analysis and plan forcing.
- If Query Store is disabled or unavailable, QMonitor will use Query Stats.
Investigating queries
- Drill into a query to view SQL text, execution plans, and example runtime
statistics. Compare current and historical plans to identify plan changes.
- Check execution counts — high cumulative cost with many executions may be
fixed via caching or tuning; high single-execution cost may need plan-level
fixes.
- Use execution plans, wait types, and logical reads to decide between indexing,
query rewrite, statistics updates, or parameter sniffing mitigations.
Controls and tips
- Use the time-range selectors and per-column filters to focus analysis on the
relevant interval and workloads.
- Sort by cumulative worker time or logical reads to prioritize the biggest
opportunities.
- When investigating regressions, capture a longer history where possible to
distinguish transient spikes from sustained regressions.
5.1 - Query Stats Detail
Detailed statistics about a specific SQL server query
The Query Stats Detail dashboard focuses on a single query (text or normalized
text) and shows how each compiled plan for that query performed over the
selected interval.
Top: Query text
- The full SQL text (or normalized text) is shown at the top for context.
- Use the copy and download controls to move the text into SSMS or a local editor.
- Use the format button to view a formatted version of the query text
Plans summary table (Totals by plan)
- A compact table that lists each plan compiled for this query with totals and
averages:
- Execution count
- Worker time (total)
- Total time (total)
- Averages (worker time / exec, total time / exec)
- Rows returned
- Memory grant (total)
- Use the table to quickly identify the highest-cost plans and their relative
impact.
Totals section
- This section contains a time-series table of 5-minute aggregated samples for
the selected time frame. Each row represents a 5-minute bucket and includes:
- Sample time
- Execution count
- Logical reads
- Logical writes
- Memory
- Physical reads
- Rows
- Total time
- Worker time
- Plan hash (clickable)
- Plan hash actions:
- Click a plan hash to download the plan as a .sqlplan file.
- Open the downloaded .sqlplan in SSMS to inspect the graphical plan and
operator costs.
- Charts in Totals:
- Execution count by plan
- Memory by plan
- Total time by plan
- Worker time by plan
- Use these charts to compare plan behavior over time and to spot periods where
a specific plan dominated resource usage.
Averages section
- Similar to Totals, but values are averaged per sample or per execution:
- The table shows the same 5-minute samples with averaged metrics (per-exec
averages where applicable).
- Columns mirror the Totals table (execution count, reads, writes, memory,
physical reads, rows, total time, worker time, plan hash).
- Charts in Averages:
- Total time (avg) by plan
- Worker time (avg) by plan
- Use averages to find plans with high per-execution cost even if execution
counts are low.
Controls and investigation tips
- Use the dashboard time-range controls to focus on the interval of interest.
- Filter or sort the tables by plan hash, execution count, or cost metrics to
prioritize investigation.
- Download and open .sqlplan files in SSMS to review operator costs, warnings,
and estimated vs actual row counts.
- Compare Totals and Averages: high totals with low averages usually indicate
frequent cheap executions; high averages suggest expensive single executions.
6 - Capacity Planning
An overall view of resource consumption to plan resource upgrades
This dashboard presents historical CPU capacity and utilization for your SQL
Server instances so you can spot trends, judge current load, and predict when
additional resources will be needed. Use the KPIs, charts, and summary table to
compare differently sized servers on a common scale, identify sustained or
growing load, and prioritize upgrades, VM rightsizing, or consolidation.
CPU History
- The CPU History section begins with KPIs that summarize the selected
instances: total available cores, the average CPU utilization expressed on a
per-core basis (normalized to one core), and the average number of cores
effectively used. Normalizing lets you compare a small VM and a large host on
the same footing — for example, 20% average CPU on a 4‑core host becomes 80%
when scaled to a single-core equivalent (20 * 4).
- “Cores used” converts that normalized percentage into an estimated count of
cores in use (Avg CPU% * Total Server Cores / 100). These KPIs provide a
quick sense of both intensity (per-core pressure) and absolute demand.
Charts and trends
- SQL Server CPU Usage (normalized to 1 core) is a time-series view that
scales each instance’s CPU to a single-core equivalent. Use it to compare
intensity across instances and to see how load changes over time.
- SQL Server Core Usage shows estimated cores in use over time. This chart
helps you understand aggregate core demand, spot sustained increases, and
evaluate capacity headroom.
Summary table
- The table ties the charts to individual instances and provides:
- SQL Instance name
- Avg CPU Usage % over the selected interval
- Total Server Cores assigned to the host
- CPU Usage % (Normalized) = Avg CPU% * Total Server Cores
- Cores Used (Normalized) = (Avg CPU% * Total Server Cores) / 100
- Use the table to rank instances by CPU absolute usage, and to
identify machines that would benefit from deeper investigation or redistribution of
workload.
Interpreting results and next steps
- Look for rising trends in normalized CPU% or sustained high cores-used values;
these indicate growing pressure that should be addressed before performance
degrades.
- For sustained high per-core utilization, investigate top CPU queries,
parallelism settings, or workload placement. For high aggregate core demand,
consider adding cores, resizing VMs, or offloading noncritical workloads.
- Correlate CPU trends with memory, I/O, and wait-type dashboards to form a
complete capacity plan; allow headroom (commonly 20–30%) for growth and
transient spikes unless autoscaling is available.
Controls and tips
- Adjust the dashboard time range to reveal hourly, daily, or weekly trends
depending on your planning horizon. Filter by environment (prod/test) or
cluster (applications or domain) to focus analysis.
Combine these views with query stats dashboards to identify
root causes before changing capacity.
Data & Log Size
This section tracks historical data and log file growth so you can spot
rapidly growing databases and plan storage capacity or maintenance.
Top: Data & Log KPIs
- Initial Data Size: database data file size at the start of the selected
interval (GB).
- Latest Data Size: most recent data file size (GB).
- Data Growth: growth between initial and latest sizes.
- Initial Log Size: log file size at the start of the interval (GB).
- Latest Log Size: most recent log file size (GB).
- Log Growth: growth for the log file.
Charts
- Data Size over time: time-series chart showing data file size changes for
selected servers. Use it to detect steady growth or sudden jumps.
- Log Size over time: time-series chart showing log file size trends
Summary table
- Columns:
- SQL Instance
- Database
- Initial Data Size
- Latest Data Size
- Data Growth
- Initial Log Size
- Latest Log Size
- Log Growth
- Use the table to rank databases by growth, find candidates for archive,
compression, index maintenance, or retention policy changes, and to plan
storage purchases or quota adjustments.
Interpretation and actions
- Rapid, sustained data growth may indicate new workloads, retention changes,
or missing cleanup jobs — investigate recent deployments and ETL processes.
- Large or growing log files often point to long-running transactions, infrequent
log backups (in full recovery), or heavy bulk operations — review backup and
recovery settings and transaction patterns.
- Use time-range filters to focus on growth windows (daily, weekly, monthly)
and sort the table by growth to prioritize action.
Disk Usage
This section shows historical disk latency and throughput so you can spot
I/O bottlenecks and capacity issues affecting database performance.
Top: Disk KPIs
- Avg Read Latency: average read latency observed over the selected interval
(ms).
- Avg Read Bytes/sec: average read throughput (bytes/sec).
- Avg Write Latency: average write latency observed over the selected interval
(ms).
- Avg Write Bytes/sec: average write throughput (bytes/sec).
Charts
- Avg Read/Write Latency: time-series chart showing read and write latency
over time. Use this to spot periods of elevated latency and correlate with
changes and trends in the workload.
- Total Throughput: time-series chart of Read Bytes/sec and Write Bytes/sec.
Use this to identify sustained high throughput or bursts that may saturate
the storage subsystem.
Disk Usage Summary table
- Columns:
- SQL Server Instance
- Avg Read Latency
- Read Bytes/sec
- Avg Write Latency
- Write Bytes/sec
- The table summarizes latency and throughput per instance so you can rank
servers by I/O pressure and prioritize investigation or remediation.
Interpretation and actions
- Elevated read or write latency often points to storage contention, slow
disks, or high queue depth; correlate with throughput and queue metrics.
- High read throughput with low latency may indicate healthy caching; high
throughput with rising latency suggests the cache is saturated or the
storage tier is overloaded.
- For write-heavy workloads, review transaction patterns, log placement,
and backup frequency; consider faster storage or write-optimized tiers.
- Use instance filters and time-range controls to isolate problematic windows
and correlate with query, CPU, and I/O dashboards before changing hardware
or storage tiers.
Memory Usage
This section shows memory demand and headroom so you can detect pressure on
the buffer pool and plan memory changes or workload placement.
Top: Memory KPIs
- Avg Allocated Memory: average memory allocated to SQL Server over the
selected interval (MB or GB).
- Max Allocated Memory: peak memory allocation observed during the interval.
- Avg Target Memory: average target memory SQL Server attempted to obtain
(based on internal heuristics / memory clerk targets).
- Max Target Memory: peak target memory during the interval.
Charts
- Memory usage over time: time-series chart showing Avg Allocated Memory,
Max Allocated Memory, Avg Target Memory, and Max Target Memory. Use this
to spot sustained allocation near target (indicating pressure) or sudden
spikes in demand.
Summary table
- Columns:
- SQL Instance
- Avg Allocated Memory
- Max Allocated Memory
- Avg Target Memory
- Max Target Memory
- The table provides a per-instance snapshot to help rank servers by
memory consumption and to identify candidates for resizing or investigation.
Interpretation and actions
- Large gaps between Target and Allocated Memory may indicate internal limits;
investigate OS signals, memory clerks, and max memory configuration.
- Correlate memory trends with CPU and I/O dashboards to determine whether
adding memory will reduce I/O or improve overall performance before changing
server sizing.
- Use time-range filters to identify daily or weekly patterns and to size for
peak demand with appropriate headroom.
7 - Index Analysis
Missing Indexes and Possible bad Indexes
The Index Analysis dashboard helps you prioritize index work by showing
high-value missing index suggestions and identifying existing indexes that
may be hurting performance because they attract writes but provide little read
benefit.
The values in the dashboard are based on a snapshot of the last 24 hours.
Missing Indexes (top)
- This table lists optimizer-suggested indexes and helps estimate potential
query benefit:
- Object Name: the table or view that would benefit from the index.
- Advantage: a short description of the expected improvement (e.g.,
reduced logical reads).
- Impact: an estimate of the relative benefit across the workload.
- Equality columns: columns recommended for equality predicates (key columns).
- Inequality columns: columns recommended for range/inequality predicates.
- Included columns: non-key columns suggested for covering the queries.
- User Seeks: number of seeks that could use the suggested index.
- Last Seek: timestamp of the last observed seek opportunity.
- Unique compiles: compilation counts that referenced the missing index.
- User cost: estimated cumulative cost reduction if the index were present.
- Use this table to find high-impact index candidates. Review the SQL that
benefits, confirm selectivity and cardinality, and weigh expected read
savings against write and storage costs before creating an index.
Possible Bad Indexes (below)
- This table surfaces existing indexes that may be candidates for removal,
consolidation, or rebuild because they incur write overhead without enough
read benefit:
- Database: database name containing the index.
- Schema: schema name.
- Table: table name.
- Index: index name.
- Total Writes: cumulative writes (inserts/updates/deletes) affecting the index.
- Total Reads: cumulative reads (scans/seeks) that used the index.
- Difference: Total Writes minus Total Reads (high positive values suggest
write-heavy, low-used indexes).
- Fill factor: configured fill factor for the index (indicates fragmentation risk).
- Disabled: whether the index is disabled.
- Hypothetical: whether the index is hypothetical (not actually created).
- Filtered: whether the index uses a filter predicate.
- Sort and filter the table to find indexes with large write cost and minimal
read benefit. Investigate usage patterns and test removal or consolidation in
a non-production environment before dropping indexes.
Controls and investigation tips
- Filter by database, schema, or table to narrow scope.
- Click a row to open index details (definition, sample queries, historical
usage) and to download index DDL for review.
- Consider the write cost, storage, and maintenance impact when acting on
missing index suggestions. Missing-index estimates are heuristic — validate
with query tests, execution plans, and Explain/actual runs.
- For possible bad indexes, evaluate whether a filtered index, different
key/included columns, or index consolidation would preserve read benefits
while reducing write overhead.
8 - Custom Metrics
Custom Metrics
This dashboard displays custom metric measurements pulled from the selected
measurement source.
Top controls
- Measurement selector: a dropdown to choose which measurement to query.
- Time filter: applies the selected time window to the query.
Data table
- The table below the selector shows the measurement data for the chosen
measurement and time range. Typical columns include timestamp, value, and
any measurement tags or labels.
- The dashboard retrieves up to 10,000 data points for the selected query.
If the time range or measurement produces more points, results are truncated
to this limit.
Usage notes
- Select the desired measurement, choose an appropriate time window, and
refresh the view to populate the table.
- Narrow the time range or apply server/instance filters when results are
truncated due to the 10,000‑point limit.
- Export or copy table rows for offline analysis if needed.
Future capability
- We are working on support for creating custom dashboards directly from
these measurements. This feature is in development and will be available
soon.
9 - Geek Stats
Geek Stats
The Geek Stats dashboard exposes low-level contention and synchronization
metrics so you can diagnose waits and spinlock behavior that can impact
throughput or cause unexpected CPU consumption. Use these views to find
hotspots, correlate with higher-level symptoms, and guide targeted fixes.
Not all users are interested in this type of metrics, so we decided to
dedicate a dashboard instead of including this information in the Instance
Overview dashboard.
Wait Stats
- Wait Stats (by category): a top-level chart that groups waits into the same
categories used on the Instance Overview. This view helps you quickly see
which broad wait families (I/O, CPU-related, latch/lock, network, etc.)
dominate during the selected interval.
- Wait Stats (by type): a detailed chart that shows individual SQL Server
wait types (the same names you see in SQL Server DMVs). Use this to
identify specific waits such as PAGEIOLATCH, CXPACKET, SOS_SCHEDULER_YIELD,
or ASYNC_NETWORK_IO and to track their trend over time. Filter and sort
to surface the top contributing wait types and drill into the time windows
where they spike.
Spinlock Stats
- The Spinlock section presents four charts that measure spinlock activity:
- Collisions: how often threads encountered a collision on a spinlock.
- Spins: total spin attempts observed.
- Spins per collision: average number of spins required for each collision,
indicating how costly each contention event is.
- Backoffs: counts of threads backing off (yielding) after spinning.
- Interpret spin metrics together: high collisions with high spins-per-collision
imply frequent, costly busy-waiting and wasted CPU. High backoffs suggest
threads are repeatedly yielding and retrying.
- Use these charts to correlate spinlock pressure with CPU spikes or scheduling
issues, and to prioritize micro-level fixes (e.g., addressing hot memory
structures, reducing shared-state contention, or applying product fixes).
Investigation tips
- When you see elevated waits or spinlock activity, correlate timestamps with
query CPU, I/O, and scheduler metrics to find root causes.
- High category-level waits point you to broad problem areas; the by-type view
helps you pinpoint exact resources or operations involved.
- For spinlock issues, examine workload patterns that touch shared caches,
global counters, or frequently-updated metadata. Changes such as reducing
contention hotspots, batching updates, or upgrade/patches may help.
10 - Always On Availability Groups
Check High Availability of databases
This dashboard provides an at-a-glance overview of Always On Availability
Groups (AGs) across the selected instances. Use it to verify AG health,
replica roles, and synchronization status, and to quickly locate groups that
need attention.
Availability Groups table
- Availability Group: AG name (click to open the AG detail dashboard).
- Primary Replica: the current primary replica host for the AG.
- Secondary Replicas: comma-separated list of configured secondary replicas.
- Total Nodes: number of replicas configured in the AG.
- Online Nodes: replicas currently online and reachable.
- N. Databases: number of databases protected by the AG.
- Synchronization Health: overall sync state (Healthy, Not Healthy)
based on replica synchronization and failover readiness.
- Listener DNS Name: cluster listener DNS name, if configured.
- Listener IP: listener IP address or addresses.
Usage
- Click an AG name to view the detail dashboard for per-replica metrics,
database synchronization progress, failover readiness.
- Filter or sort the table to find AGs with offline nodes, unsynchronized
databases, or other anomalies that require investigation.
10.1 - Always On Availability Group Detail
Check the state of a High Availability Group
This dashboard shows detailed health and replication telemetry for a single
Always On Availability Group (AG). Use it to verify replica roles, track
failovers, monitor data movement, and identify databases that need attention.
Top: AG summary
- Availability Group: AG name
- Primary Replica: current primary replica host
- Secondary Replicas: configured secondaries
- Total Nodes: count of configured replicas
- Online Nodes: replicas currently online and reachable
- N. Databases: number of databases in the AG
- Synchronization Health: overall sync state for the AG
- Listener Name: cluster listener DNS name (if configured)
- Listener IP: listener IP address(es)
Primary Replica Failovers timeline
- A timeline that shows which replica was primary at each point in time.
- Use it to review recent failovers and to correlate role changes with events
or performance anomalies.
Availability Group Nodes table
- Replica Instance: instance name for each replica
- Replica role: Primary / Secondary
- Sync. Health: per-replica synchronization status
- Availability Mode: synchronous / asynchronous
- Failover Mode: automatic / manual
- Seeding Mode: automatic / manual
- Secondary Allow Connections: read-intent settings for secondaries
- Backup Priority: priority used for backup routing
- Endpoint URL: data movement endpoint
- R/O Routing URL: read-only routing address (if configured)
- R/W Routing URL: read-write routing address (if configured)
Nodes KPIs and online history
- KPIs: Total Nodes and Offline Nodes for quick situational awareness.
- Online Nodes chart: time-series showing the number of online replicas over
the selected interval to spot outages or flapping nodes.
Transfer rates and queues
- Transfer Rates chart: Send Rate (how fast the primary sends changes) and
Redo Rate (how fast secondaries apply changes). Use to spot slow secondaries
or network saturation.
- Transfer Queue Size chart: Send Queue Size and Redo Queue Size. Growing
queues indicate replication lag or bottlenecks that may affect failover
readiness.
Health history charts
- Online node history: online vs total nodes over time to visualize availability
trends.
- Database Health History: healthy databases vs total databases to track when
databases become unsynchronized or unhealthy.
Databases Replication Status table
- SQL Instance: instance hosting the database replica
- Database Name: database name
- Sync. Health: synchronization status for the database
- Is Primary Replica: indicates whether this row is the primary
- Availability Mode: database-level availability mode (inherits from AG)
Usage and investigation tips
- Correlate failover times with primary timeline and with performance metrics
(CPU, I/O) to find causes of role changes.
- Increasing send/redo queues or sustained low redo rates often point to
network, disk, or resource contention on secondaries — investigate those
hosts before initiating failover or taking corrective action.