I am trying to create a report showing buffer cache used by each database on a server. I looked at creating a new metric but none of the categories made sense to me. Do I need to make a separate metric for each database on the server? This would be a pain to maintain as databases come and go.
Here is the basic query.
select db_name(database_id) as 'Database Name',
convert(numeric(8,2),count(page_id)/128.0) as Mb
from sys.dm_os_buffer_descriptors with (nolock)
where database_id !=32767
group by database_id
order by count(page_id) desc;
Custom metrics have to be a single numeric return for DPA to handle it correctly.
In this case, it seems likely that you'd have to create a metric for each database involved.
If this metric doesn't fit into one of the existing categories, you can create your own custom category via the custom metric wizard.
Once you define the custom category and save the metric, it will show up as a new tab in the resources view in the GUI and also they will show up in the add resource chart in the drop down when on the trend screens.
Let me know if you need further details.
Yes I eventually figured that out but it only partially solves my problem.
1) When you do a metric this way it seems that you can not restrict it to a single server since to get a single number back I have to restricted it by DBName. Now I have it running across all instances. Not nice but I can live with this I think.
2) Now I need to create a report showing each of these custom metrics together in 1 graph so I can tell in 1 screen which DB uses the most buffers.
What I am trying to do is get Page Life Expectancy chart and Buffer Cache by Database chart onto a single page to show my developers how their code competes with other teams' code for Buffer Cache. So when PLE gets below 300 on a server the affected dev team(s) can yell at the affecting Dev team(s) to fix their code and as a DBA I am out of the middle. 🙂
Since I have not heard anything back I assume what I want to do is not possible in DPA. So I add a collection of PLE to my servers but when you look at the graph everything looks ok until you drill down into the daily view and I can see, if I look very closely and look at individual readings, that PLE is actually very low for much of the work day. Of course it is much higher when no one is using the database and this obscures the problem. My fix is to collect the data points via PS script. It is too bad I can not use the data DPA already collects but the database is quite complex to try to extract from. Anyway, I collect the data every 5 minutes every week day only between 8am and 3pm since this is when our users are active. I do a count of the points on each server and then I do a count of the points > 600 and store both counts daily into a reporting table. My report shows countDataPoints/CountDataPoints>600 * 100 to give me a number between 1 and 100. This gives me a rough percentage of time the PLE on the server was above the 600 sec threshold. DPA lists 600 as the default warning level which seems reasonable depending on the amount of Buffer pages on the server. So my report graphs the daily percentage where PLE is > 600. I feel that any server that is consistently below 80% should be considered unhealthy and the dev teams should spend some time to fix querys/indexes to bring the PLE over 80%. I also show a graph for each of the lowest servers of the top databases in buffer cache. These databases are the ones that likely should be the first to be fixed. The idea is to show the graph at the monthly managers/directors operation meeting where they go over various operational data. I am hoping this visibility will put pressure on the teams to address poor queries and structures.
The teams would hopefully use DPA to quickly find the bad queries and fix them.
Is this a reasonable way to calculate PLE to show memory pressure?
Is there a better way to calculate server health that can be put into a KPI type of report?
deannich, DPA does not track buffer cache usage by database. We track at the instance level. So your assumption is correct about the limitation at the database granularity.
Regarding PLE, your algorithm to show percentage of time that PLE is healthy (above 600) seems reasonable. I'd say that going after inefficient queries is always a good endeavor, but at some point, you may just be running up against a resource limitation and need to add memory to the host. I like to think of PLE as an indicator that my system memory is under pressure. From there, you can look at what queries are running at that time causing pages to be aged out quickly.
Overall though, I think your methods are reasonable. 8 )
SolarWinds solutions are rooted in our deep connection to our user base in the THWACK® online community. More than 150,000 members are here to solve problems, share technology and best practices, and directly contribute to our product development process.