We are monitoring both the SQL server and the SQL instance using AppInsight for SQL with Server & Application Monitor. From there we're able to tell what's going through the nic, switch (monitoring with NPM) and the overall performance of the SQL instance. We've run into this a couple of times and found that the SQL instance was starving the server of memory due to it's configuration. We were also able to show what was taking the most time to run in terms of queries to show that it wasn't necessarily the network at play.
As stated above we also use AppInsight for SQL that will most likely say that the box is starved for resources.
You could install the trial of SAM, it will include appinsight for SQL. Then you can show the SQL team that their issues are stemming from poorly written queries.
(When they ask what that means, just point at the screen and scream "POORLY WIRTTEN QUERIES") they will go away after a bit.
I am interested in what others say. adatole is busy recording a lab today but I bet he could provide value to this thread.
Sorry, I have to post this video I made a while back, similar process as to what is being discussed here.
Definitely if you have SAM, apply the AppInsight for SQL against your server / cluster - you won't regret it.
Ours contains many Dbs on it - and you can point the DBA's to the exact query / issue for the delays on the server impacting Solarwinds itself, or what ever other Application/DBs users are complaining about..
In regards to the term "SQL Team", are these proper DBAs? Most, worth their salt, have the tools and skills to track down poorly written queries, missing indexes, and things like bad disk performance. If they don't have the tools, then SAM will be your friend in the beginning, at least to start pointing in the right direction, another would also be the Database Performance Analyzer.
You might ask them to define what "latency" means to them, and their users. To a network guy, that's the time a packet takes to get from one location to another, but an application developer might consider latency the time it takes to execute a query, the results to come back, and their application to draw the results. There are 3 layers of "stuff" to troubleshoot in that case, network, SQL, and application. Knowing the difference means knowing where to start looking for issues.
Other things to consider, and usually ways to potentially eliminate network:
1. Where is the data stored on the SQL box (local disk, fiber attached disks, iscsi attached disks, or NFS)?
2. Are there any disk performance issues?
3. Can they [DBAs] reproduce the "latency" by executing the queries locally on the SQL box?
4. Assuming Microsoft SQL server, what does the actually execution plan show you? This is usually a really good basic tool to point out where query bottlenecks might be hiding out (missing indexes, poorly designed indexes, CPU bound issues such as sorting massive data, etc)
Filming went well today, and we have some amazing lab episodes to show for it. (and thanks fcpsolaradmin for mentioning it!)
While I can't disagree with all the talk about AppInsight, I have to point out two other things:
- This is EXACTLY the question that DPI and Quality of Experience dashboard in NPM 11 was meant to resolve. You can either deploy a sensor on a windows box connected to a span port on a switch between the users and the server, or on the server itself, but either way you should identify definitively whether the problem is application or network based.
- Database Performance Analyzer will also go a long way to nailing down this issue. Think of it as AppInsight on steroids.
Both of those are solid tools to give you a clear answer quickly.
Beyond that, all the advice above is sound, albeit it will involve a bit more guesswork.