Here are some posts I’ve found that help with quantifying the cost of downtime.
Hourly cost for computer networks - $42,000: http://www.networkworld.com/careers/2004/0105man.html?page=1
Cost per hour of datacenter downtime for large organizations - $1.13M: http://www.stratus.com/~/media/Stratus/Files/Library/AnalystReports/Aberdeen-4-Steps-Budget-Downtime.pdf
Average cost of a datacenter outage - $505,502: http://emersonnetworkpower.com/en-US/Brands/Liebert/Documents/White%20Papers/data-center-costs_24659-R02-11.pdf
Large enterprises lose more than $1M each year due to IT failures: http://www.informationweek.com/storage/disaster-recovery/it-downtime-costs-265-billion-in-lost-re/229625441
TechRepublic Article – How to calculate the cost of downtime: http://www.techrepublic.com/article/how-to-calculate-and-convey-the-true-cost-of-downtime/
Curious, what is your calculated hourly cost of downtime? What factors went into your calculation?
With the recently launched SolarWinds Network Performance Monitor (NPM), you can now get a view into the application's quality of experience using Deep Packet Inspection (DPI) analysis. Most often, we tend to blame the network when an application issue occurs – whether it’s related to availability or overall performance.
To really know if your application is the culprit, you will have to monitor these essential metrics like network response time and application response time or time to first byte. These metrics are broken out by application, and provide an at-a-glance ability to correlate and identify the source of application issues. If it is indeed an application issue, you can drill down further and look at the health of the server hardware where the application is running. You can monitor the server response time and other metrics to pinpoint that it was in fact an application which is having issues.
You may be using NPM to monitor the quality of service for a critical application like SQL Server. Network Performance Monitor can give you insights to a few server resource issues, like CPU and Memory performance. If all looks good, there may still be an issue with your SQL server. Here is an example of how you could use Server & Application Monitor (SAM) to drill-down further and identify database performance issues.
2. Drilling further into the SQL server with SAM will show that the lock requests/second is higher than it should be.
3. Clicking the metric or the component will tell you that the value is high and the high wait time could be the reason for the poor quality of service. Expert knowledge for this metric in SAM will help you with remediation guidance on how to fix the problem.
Whether you’re monitoring file transfer apps, web services, social networking apps, messaging, email, database, and other applications using NPM, you can leverage out-of-the-box templates in SAM or templates on thwack to monitor the complete application performance. This document has a list of applications that you can monitor using both Server & Application Monitor and the QoE feature of Network Performance Monitor.
If you are using Network Performance Monitor to look at the performance of your Windows servers, you'll get an idea that there is an issue with your server if you get an alert that CPU or Memory is starting to max out.
Now what? Server & Application Monitor is the perfect tool to help you more quickly troubleshoot the root cause of server performance issues. After installation, on the node details view you will see additional management tools - Real-Time Process Explorer, Real-Time Event Log Viewer, the Service Control Manager and a Reboot button.
In launching the Real-Time Process Explorer, you can visualize process that are consuming the most resources. Right from this view, you can kill processes, or start monitoring a process to get alerted to when it uses too many resources.
As a network admin, how many times do you hear the app is down? Server & Application Monitor has other troubleshooting tools to help you determine the cause of the problem and fix it. Using the Service Control Manager, you can get a quick view into services that have stopped. From this view you can restart the service or stop it if it is hanging. Many times when system performance changes for the worse, you can dig into the log files to determine if there was a recent change, if there is a security event and so on. Again, it's as easy as launching the Real-Time Event Log Viewer and sorting by log type and severity.
The following is an actual description of a discussion one of our engineers, Matt Quick, had with a customer as told in his words.
A customer using the product evaluation copy of SolarWinds Server and Application Monitor (SAM) called us wanting an extension on the trial because “SAM was broke, keeps alerting a component when we know everything was fine.” I asked to take a look at the customer’s historical data. The component in question was actually from the Windows 2003-2012 Services and Counters, specifically “Pages/sec” was going critical. I’d seen this before, and it always relates back to the disk.
“But this VM is backended by a NetApp!! It can do 55,000 IOPS!!!” Yeah, I was suspicious at that, so, I asked them, “ok, do you have Storage Manager (STM) or another storage monitoring product installed so we can check?” Sure do, and he promptly informed me that NetApp’s Balancepoint was telling him that while he averaged about 860 IOPS per day, during that hour he spiked to 1350 IOPS, still well within his supposed “55,000” IOP limit.
Ok, so, I go into SolarWinds Storage Manager, hit search in the upper right, find the VM with the component in question. I go to the storage tab and go into Logical mapping, find which LUN and aggregate it belongs to. Next, I go into the NetApp, look at the RAID report to see how many IOPS he can do. A quick calculation later, I estimate about 3500 IOPS total. Customer then realizes the original number of “55,000 IOPS” probably is not real in his specific setup. Then I look at the volume IOPS report on the NetApp, during the same timeframe. Sure enough, March 1st @ 8:30 pm, 3,500 IOP spike.
“But Balancepoint says I got 1350 at that time!” So, I ask him to open it up, and sure enough, 1350 @ 9pm. I ask him to look at the next data point…800 IOPS @ 11pm. He was looking at a bi-hourly aggregate. Sure enough, if you aggregate the 8pm hour, you get 1350. And we couldn’t figure out how to zoom in on NetApp’s software. At this point the customer is speechless as he realizes his current tools were giving him incomplete information.
Then I ask him if Virtualization Manager (VMan) is installed, and sure enough it is. I look in STM at which datastores are on that aggregate in NetApp, and I add all of them into a performance chart in VMan for the same timeframe and isolated it to a single datastore causing the problem. From there I add all related VMs to that datastore, and boom, we found the culprit VM with the problem: Apparently someone was running some kind of backup every day @ 8:30 pm.
All this from what looked like an ‘erroneous’ SAM alert.
This story exemplifies the value of an integrated set of tools that gives you visibility across the extended application stack, from the application and its processes and services through the underlying infrastructure so that you can identify the root cause and then solve hard problems. The following video gives an overview of how we are making this possible with the integration of Server and Application Monitor, Virtualization Manager and Storage Manager to provide extended application stack visibility.
If you have used SAM, STM, VMAN and Database Performance Analyzer to find the root cause of tricky problems or to prevent problems, please share your story (your story is worth a cool 50 thwack points)!
We have expanded the Content Exchange for Web Help Desk to now include FAQ articles and helpdesk articles for IT staff. Now through the end of October you can earn an extra 50 thwack points per article (totaling 100 points per article)! New sections added:
- Share the most common and repetitive help desk questions and the workarounds.
Share tips & tricks on getting things done, fast! Help your peers leverage your knowledge, like for example, Windows 8 Tips & Tricks or a workflow for VMware troubleshooting.-
This blog contains the answer to a question in this month's thwack mission (week 4). Enter this week to win a Slingbox 500!
I had the pleasure to speak with Thomas Löfstrand of Nethouse, a SolarWinds business partner located in Sweden. Thomas recently evaluated the new features of Server & Application Monitor.
TL: I was really excited about this feature when I first heard it was coming out last fall. We are a SolarWinds customer and partner. We’ve been a partner and have been using SolarWinds products for the past 9 years. My main focus is on Network Performance Monitor and Server & Application Monitor products. About 50% of the time I spend as a SQL DBA working to help troubleshoot performance of both Nethouse databases and our customers’ databases.
I really like the complete, specialized view of SQL performance you get with AppInsight for SQL. This was lacking before in Server & Application Monitor (SAM) in that you were only able to view some performance metrics of SQL server itself. Now, when you can see all the SQL metrics in one view, it will be much easier for our customers to understand what is going on with their SQL databases.
TL: I used to create scripts in SAM to monitor SQL agent jobs to see if they were running or not. I also wrote scripts to monitor user connections. Now, with AppInsight, I can see this kind of information immediately, it’s built into the product. This spring I had an incident related to user connections which could have been avoided if we had the features now available in SAM 6.0. We had a user connection that locked some tables in SQL server and it caused the application to stall so other users could not access the application. It took 2 hours to find this problem. If I had SAM 6.0 at this time, I could have seen right off in the SQL dashboard that this user shouldn't have been there.
One of the other things I look at all the time is disk space usage and through SQL commands I can go into each database to see database space usage. Before, with Server & Application Monitor, I could see disk space usage but not for each database and not within the database files. Now I can get an immediate view of available space for each database.
TL: I manage a lot of databases for our customers. I do not perform the day-to-day responsibilities for all customers but help with troubleshooting activities. I have a very big customer I work with to help in troubleshooting SQL performance issues. Before AppInsight, I had to run traces for 12 to 24 hours to see how the server was performing over time to understand the top CPU intensive queries.
Now it will take a matter of 5 minutes after they get the AppInsight feature installed. SAM is also good to show historical data for problem analysis, which is very helpful in working with customers to troubleshoot application issues.
TL: I have tried the inventory dashboard to get a complete view of our hardware and software assets. You can import and export inventory data to CMDBs. Today, we use Microsoft Excel as our CMDB. It is easy to start with the asset inventory dashboard if you don’t have any CMDB or asset management tool.
SolarWinds is really a complete platform to run all of your IT environment. You get monitoring, asset management and now you have specialized SQL server monitoring. You don’t need any specialized tools like Red-Gate or Idera, you get everything with SolarWinds.
If you are interested in seeing a deep demo of the new features of SAM 6.0, check out this webcast replay.
Server & Application Monitor (SAM) has made major strides in the last two years with the introduction of hardware health monitoring, java application and hypervisor monitoring, remediation and a truck load of other capabilities. With the SAM 6.0 release, this product has expanded its capabilities well beyond application and server monitoring and now includes functionality for specialized SQL monitoring and IT asset inventory management.
The response to this release we have seen thus far makes this product marketer want to cry……. with tears of joy. Thank you aLTeReGo!
If you would like to view the webcast replay for a deep dive on some of the new features of SAM 6.0, check out the video below. The video covers these features: AppInsight for SQL, the Threshold Baseline Calculator, Asset Inventory Dashboard and the Real-Time Event Log Viewer. If you would rather download SAM 6.0 and try it out, you can sign up here if you are an existing SAM customer on maintenance. If not, you can contact email@example.com to obtain access to these new features.
If you only have a couple minutes, I encourage you to take a look at this very short video which highlights some of the problems you can solve with AppInsight for SQL.
Below are the questions asked during the Sneak Peek Webcast and the responses.
Q: Will SAM 6.0 get EoL data from the various server hardware vendors (i.e. HP, IBM, Dell, etc.?
A: Correct. SAM 6.0 queries each vendors internet web service for warranty status information. This requires the Orion server to have internet access.
Q: Also, is SAM available as standalone product or only as a module? I am concerned that this app might cause my database to explode once we add our over 8,000 servers to it
A: SAM has been available as a standalone product since APM v4.2. You may want to consider leveraging AppInsight for SQL to analyze your Orion’s SQL performance issues. Performance could probably be improved by adding more spindles to the array where the Orion database or tempDB are located. As always, with MSSQL more memory always helps.
Q: Can a report be generated to show all expired hardware in a list?
A: Yes. SAM 6.0 even includes a new out-of-the-box report that utilizes the new Web Based Report Writer that contains this information.
Q: Will you need to add Asset Inventory to See ILO and Drac's?
A: Yes. Reporting of Out of Band Management Cards such as Dell DRAC’s and HP iLO’s is included as part of SAM 6.0’s Asset Inventory.
Q: What versions of hp support tools or ibm director are needed? the latest and greatest or previous versions?
A: Hardware information collected by Asset Inventory requires the following software provided by the hardware vendor.
• Dell PowerEdge server with OpenManage Server Administrator 7.2 or later
• HP Proliant servers with HP System Insight Manager v6.2 or later
• IBM xSeries servers IBM Director (Common Agent, v6.3 or later)
Q: No other software needs to be installed on the server, like Log Forwarder?
A: Asset Inventory requires vendor specific software be installed on the server for physical hardware components only. General server inventory information is available for all nodes, including virtual guests.
Q: If there are multiple disks, would it break out the disk statistics per disk or as a whole?
A: Yes. AppInsight for SQL will show all files that make up the database or transaction log, and the disk I/O for each drive those files are stored on.
Q: How do you handle Microsoft Server Clusters in AppInsight for SQL? Would you monitor the cluster virtual node or the real server?
A: It’s recommended that AppInsight for SQL be applied to the cluster VIP. You should also have each cluster member node managed/monitored in Orion.
Q: Is the custom assrt information editable on the web interface? What permissions would someone need to edit it?
A: Custom Asset Information requires node management rights to create or modify.
Q: Do you get all the hardware information if you are only using SNMP to monitor Windows servers or does it require WMI?
A: For Linux/AIX hosts yes. For VMware ESX/ESXi hosts it’s recommended you poll those hosts directly using the “Poll for VMware” option for the highest level of detail. Windows hosts can be polled via SNMP, though some information is only available when the host is managed via WMI.
Q: Does this polling impact SQL?
A: AppInsight for SQL has very little impact on monitored SQL servers. Those components which are considered higher impact, such as index fragmentation have fully configurable polling intervals. The default polling intervals for these components are also not configured to poll the standard “5” minute interval. Index fragmentation for example is configured to poll every per-hour.
Q: What is the default interval for AppInsight?
A: AppInsight for SQL uses SAM’s standard 5 minute polling interval, though some information is polled as infrequently as once an hour to limit AppInsight’s impact on the monitored SQL server.
Q: I understand that AppInsight is the beginning of a new era for SAM. What's in your roadmap for other applications?
A: AppInsight for SQL is the first of many applications we’d like to support, though no specific roadmap currently has been defined. As with all features, the order in which we implement applications support will be dictated by user demand.
Q: Is there any new features regarding Exchange Server monitoring? and does SAM support Exchange 2013?
A: SAM 6.0 does not include any new Exchange specific features, but we do have pre-release Exchange 2013 application monitoring templates available on Thwack. When they’re officially released they will be available for download through the Content Exchange.
Q: any changes to other AM like Exchange?
A: SAM 6.0 does not include any new Exchange specific features, but we do have pre-release Exchange 2013 application monitoring templates available on Thwack. When they’re officially released they will be available for download through the Content Exchange.
Q: Is there something new for Oracle for SAP Application?
A: No changes have been made in this release for Oracle. If you’re looking for SAP support I recommend you check out SAPOrion.
Q: is there a miminum release of NPM you have to be running to upgrade your 5.0 to 6?
A: If you’re running NPM on the same server as SAM you’ll first need to upgrade to NPM 10.6 before upgrading to SAM 6.0. You can also upgrade directly from SAM 5.0 to 6.0
Q: Upgrade path, Can you go from 5.0 to 6 without having to go to 5.5?
A: Yes, you can upgrade directly from SAM 5.0 to 6.0
Q: Will there be any licensing or pricing changes with the release of 6.0?
A: No licensing or pricing changes are planned for the SAM 6.0 release.
Q: Is AppInsight only for SQL? or does it work with Oracle?
A: AppInsight for SQL supports only Microsoft SQL at this time.
Q: I would like to see this type of webinar for SAM itself, not just AppInsight. Is this available for release 6?
A: If you currently own SAM, you can sign-up to download the SAM 6.0 Release Candidate. If you don’t currently own SAM you can contact SolarWinds Sales and they can provide you links to download the SAM 6.0 pre-release.
Q: also would like to know the release GA date for SAM6
A: A release date has not yet been made official. Though you can sign-up to download the SAM 6.0 Release Candidate which is fully supported and can be upgraded directly to the GA release when it’s released.
Q: So is the Real-Time Event Log viewer going to be an additional cost or is it included in SAM 6?
A: There is no additional cost associated with the Real-Time Event Log Viewer. It’s included as part of SAM 6.0, similar to the Real-Time Process Explorer and Windows Service Control Manager.
Q: Is AppInsight part of SAM?
A: Yes. AppInsight for SQL is included as part of SAM 6.0
Q: Is it possible to update Warranty info that is auto populated? we buy support for our HP servers through a third party
A: Typically, HP support purchased through 3rd parties is the same support purchased directly. If your support status is accurately reflected on HP’s warranty status website, then it will be shown correctly in SAM’s Asset Inventory.
Q: does the warranty information come from hp/dell?
A: Correct. Warranty information is polled directly from Dell, HP, and IBM’s internet web services.
Q: do you need sql credentials to get the sql data?
A: AppInsight does require valid credentials to connect to the SQL server via the SQL protocol to collect performance information related to the SQL server and databases. Both local SQL and Windows credentials are supported by AppInsight for SQL.
Q: How does AppInsight affect license usage? Does it count as a single component?
A: Appinsight for SQL consumes 50 component monitor licenses for each monitored SQL Server instance.
Q: Will the Asset and Inventory data points be available in Custom Reports?
A: Yes, Asset Inventory information is available for Custom Reports. In fact, several out-of-the-box reports are included in SAM 6.0 that are built on Orion’s new Web Based Report Writer.
Q: Is an AppInsight module for Oracle on the roadmap?
A: We are currently considering adding AppInsight support for several different applications but as with all things at SolarWinds, user demand will dictate the roadmap.
Q: do you have to have Patch Manager
A: No SAM 6.0 features are dependent upon Patch Manager.
Q: when is the 6.0 release date ?
A: A release date has not yet been made official. Though you can sign-up to download the SAM 6.0 Release Candidate which is fully supported and can be upgraded directly to the GA release when it’s released.
Q: so enterprise 2005 this will not be useful ?
A: AppInsight for SQL supports SQL 2008, 2008R2, and SQL 2012. While AppInsight has been reported to work with SQL 2005, it is not officially supported.
Q: We run SAM 5.0, can we upgrade right to 6.0?
A: Yes, you can upgrade directly from SAM 5.0 to 6.0.
Q: Can you add devices that are in stock and not on the network?
A: Asset Inventory in SAM 6.0 is tied to nodes managed in Orion. Theoretically it would be possible to temporarily manage a device that’s in “stock” and, then unmanage that node essentially indefinitely to store and report upon that asset. It’s not however possible to manually key-in node/asset information for a node that has never been managed by Orion.
Q: sorry - how does warranty status get populated again..?
A: SAM 6.0 queries each vendors internet web service for warranty status information. This requires the Orion server to have internet access.
Q: will disk queue or I/O require that the node be monitored my WMI specifically, or does the application of the WMI template allow access to those metrics?
A: AppInsight collects the majority of information via SQL though some information, such as disk I/O require that the node be managed via WMI.
Q: can advanced alerts be setup for asset inventory data..?
A: Alerts based on warranty information can be created. In fact, we include this alert pre-configured out-of-the-box.
Q: can baseline thresholds be used for CPU and Memory, or just SAM elements..?
A: The threshold baseline calculator is available for any application component monitor that returns a statistical value that can have a warning or critical threshold defined.
I recently had the opportunity to interview Cole Lavallee of Waters Corporation. At Waters, Cole and his team use Server & Application Monitor, Log & Event Manager and DameWare to monitor and troubleshoot hundreds of servers, critical applications, and sites worldwide to reduce any downtime and increase customer satisfaction.
JK: What are some of the challenges you face every day in your job?
CL: Currently we have about 80 offices worldwide and it’s very crucial we receive alerts when there is a problem in our environment. Since we are in the life sciences business, there are FDA regulations and guidelines we have to keep log data for certain amounts of time. This includes important files on many different servers and requires periodic validation.
JK: How did you find out about SolarWinds?
CL: Working in IT I’ve known SolarWinds for years. When I came onboard to Waters few months back, we already had Server & Application Monitor (SAM) up and running.
I am in the group that manages all corporate IT and monitor over a hundred servers as part of our datacenter using SAM. We use VMware for our virtual infrastructure with the majority of our servers being virtual and we use SAM to monitor them. We also monitor Active Directory which is a big thing for us. We monitor SQL servers, UNIX and Linux systems, and IIS Web servers.
We monitor the Waters website and all the servers that go along with it, which is a lot of servers. One website is used for customers to communicate to customer support and that’s crucial to customers. If any of those sites go down then we have a big issue so it’s important they’re up.
We use Lotus Domino internally and because of that we use Log & Event Manager to manage Active Directory accounts in case of lock outs. This is really important for us to monitor because with Lotus Domino you actually have to change your password in your phones unlike other environments. With Log & Event Manager, we can automatically reset their password when accounts get locked and it saves a lot of time for us and the end user. This probably saves us 5+ hours a week minimum.
JK: In terms of business benefits, what is the outcome of using SolarWinds products?
CL: We really try to stay with SolarWinds with anything that we try to do. Our team at Waters is really happy with SolarWinds. Also the products are easy to use. We’ve never really had any problems where the tools reported something incorrectly or if something went down. I’ve used a lot of server monitoring software and I’ve seen how awful they are and SolarWinds is one of the easier ones to use. It definitely works for us.
Any growing organization needs some sort of directory service or database management to maintain and manage a smooth IT environment. When the IT environment grows it becomes hard to maintain and manage users, systems and servers. Windows Active Directory® (AD) was a tool designed by Microsoft® to as distributed management service to help manage the IT environment. Active Directory is designed as scalable multi-master database management system and it helps administrators to maintain their entire IT environment from a single source from creating a new user to updating user systems and securing user logins. There are also other directory services like Novell’s Directory Service (NDS), however, all directory services generally have the same features and benefits. Because Windows server is most prominently used let’s just talk about Microsoft Active Directory.
Why implement Active directory?
Active Directory was introduced for the first time in Windows 2000 and it acts as a central hub that manages all network activities of user data and enables connecting different directory hubs together for an integrated IT environment. Without Active Directory, administrators may find it hard to manage a large IT enviornment. Administrators need some kind of directory services in a growing organization to leverage the growing IT needs such as:
• Active Directory provides single top-down view of the entire IT infrastructure and it provides a single link to all users, groups, computers, printers, servers and applications.
• Active Directory acts as management framework for all domain controllers in the domain. It acts as a bridge between various domain controllers, and domains in the organization
• Active Directory provides secure login access for all users on the network. It allows administrators to allocate resources to users, administer email, and manages users and groups using group policies.
Beyond Active directory
• Active Directory is needed by various other IT tools and software for developing a robust IT infrastructure
• Active Directory can be used beyond being a centralized IT management tool. It can be used as a reliable tool for monitoring domain controllers.
• Application listing in Active Directory helps administrators to calculate and allocate appropriate resources to users
• User directory services help administrators understand how many users have logged in at a particular time. Using active directory with an event management tool can be used to monitor user activities.
• Active directory along with a helpdesk tool helps resolving ticketing and support issues
• WSUS and SCCM use active directory tool to check for inventories in user system and to push regular updates
Active Directory is like secondary root for a tree. As the tree grows, it needs additional root to support to support it similarly when IT environment grows directory services are needed to support and connect all elements in a growing network. My next post on Active Directory will discuss what is needed to adequately support and monitor Active Directory.
I had the opportunity to recently interview Jim Shank of Douglas County School District, Castle Rock, CO. Jim is part of Douglas County’s infrastructure team which uses SolarWinds Server & Application Monitor to proactively monitor the schools’ servers and databases.
JK: What are some SolarWinds products you’re currently using and how do you use them?
JS: We started using Orion (NPM) to monitor network switches and monitor the performance of operating systems. We’ve also been using Server and Application Monitor (SAM) for over a year now. It’s definitely providing us great insights on how our servers and databases are performing. SAM helps us know when our databases are busy, whether there is an abnormal memory condition, and it alerts us when something goes wrong within our infrastructure.
We’re also using Alert Central for alert management and escalations, and I like how it’s integrated with SAM so we don’t have to watch the dashboard all the time. When an alert is raised, it’s automatically routedto the concerned team and they immediately see it.
JK: What was your initial reaction after using AppInsight (SQL monitoring feature of SAM)?
JS: With AppInsight, we’ve been able to drill-down specifically to which database instance is having an issue, which one is taking up a lot of RAM, which queries are being sensitive, and so on. So it’s been huge for us. We also get requests from the software team saying there is a network problem which is affecting the database performance. Having AppInsight allows our team to tell them the exact query that is causing a slowdown to the database. It eliminates the finger pointing and allows us to show where the problem is occurring from and the reasons for it.
As a result of having AppInsight, we’re able to be proactive. We share access to the console to various teams which alerts them when an issue comes up. The database team can now take care of the databases and monitor them proactively before a user reports there is an issue with the app.
Another fun thing we’ve been able to do with AppInsight is we’re able to look at slow procedures that are really sloppy. We’re able to bring this to the attention of the off-the-shelf software vendors, like software that helps with our student information system. We can tell them a particular query or a stored procedure built within the product is taking a very long time to load. If they tell us it’s a server or a network or a memory issue, we immediately tell them the specific query in their software program that was not built very efficiently, and that it is likely causing the problem. This also helps our staff because they don’t have to chase the vendor to try to fix the problem. We just look at the stored procedure that’s causing the delay. When we call the vendor, we can now tell them that the stored procedure data index student is taking 800 seconds to load. That’s a huge difference in getting the call moved to the right person in the vendor organization.
JK: The SAM 6.0 release candidate (fully supported in production) is now available, which means you can have deep SQL visibility. Check out the details and sign up for the SAM 6.0 RC here.
Last week I interviewed Joe Kline of Maritz. Joe is a Senior Infrastructure Specialist and manages the Network Management Group which is responsible for deploying new IT solutions, tools, upgrades, and more. Joe and his team use Web Performance Monitor from SolarWinds to manage hundreds of web applications and websites.
Jennifer: What SolarWinds products do you use to monitor your environment and how do you use them?
Joe: We have Server & Application Monitor (SAM), Web Performance Monitor, Network Performance Monitor, Network Configuration Manager, and VoIP & Network Quality Manager.
We previously used HP SiteScope and their Business Availability Center (BAC) products. These products were pretty expensive and the SolarWinds Orion toolset offered more flexible licensing options.
Orion offers a lot of flexibility for monitoring. I do like how flexible SAM is, we can pretty much do what we want to do. It is just basically limited by whatever we want to put our heads and minds together to work on. HP kind of limited you on what you could do.
We use Network Performance Monitor (NPM) for the core CPU, memory, and disk space for pretty much everything. We use Server & Application Monitor (SAM) to cover a lot of application specific monitoring like Windows services, processes, and some performance monitoring counters. We use Web Performance Monitor (WPM) to monitor all our web applications, which is quite a few.
Jennifer: Could you explain how you monitor your websites, what you’re looking for, issues you uncover?
Joe: We are currently monitoring about 315 websites with WPM today. That could easily double if WPM improves its scalability. And for SAM, we probably have about 25,000 component monitors deployed in roughly around 1,700 servers. We are several different business units operating under one big parent company so we have different application development groups supporting each of those lines of businesses. We deal with a lot of different architectures when it comes to web applications. We are very heavy on the IIS and .NET side but we also have a pretty sizeable installation of JBoss. Our business units under Martiz all develop applications that you probably use today if you have a credit card with any kind of points reward program. That’s the kind of things we do. We host a lot of these applications at our HQ in Missouri, but we also have onsite deployments and we are looking at venturing into the cloud.
We have hundreds and hundreds of those applications and SLAs we have to adhere to. Some of which, especially in the financial sector, are very strict with significant financial penalties if we don’t adhere to SLAs.
With our website monitoring, we are primarily looking at availability. We do look at performance on a limited basis and if we see a performance issue, we bring it to the business unit’s attention. For example, we look at every step in the transaction and try to put content matches in where it makes sense. We find issues with the websites every day, at least 10 to 15 alerts every day.
Jennifer: If you didn’t have web application monitoring in your environment, would you be getting a lot more calls?
Joe: Over the last 4 years, we’ve really matured as a Network Operations Center in how we monitor everything. I think our customers that use our services that we’re monitoring applications for comfortably rely on us a lot more than they used to. Monitoring web applications 5 years ago was kind of an afterthought. We did it as requested and now we pretty much monitor everything that we know about.
The obvious benefits are if we’re notifying them (clients) of issues before the client does. We provide the business with a lot of reporting (internally built) driven off the Orion data, largely availability and performance on a monthly level. We also take our change management infrastructure changes, and correlate that with downtime events and manipulate the availability metrics based off those windows, so we can have more realistic SLAs which exclude maintenance windows.
Images definitely enhance the look and feel of a website. They significantly contribute to the overall user experience. If images don’t load when you visit a website for the very first time, chances are it’s going to affect your perception about that site and its content. You might never return to that site. Your users are going to be thinking the same thing when they visit your website. Sometimes images that tell an important story about your product or service won’t load, leaving the page with just text. Let’s look at why this issue occurs.
• The image loads but it looks incorrect. Sometimes when images load on a website, they don’t look the way they do in other browsers. This could happen if you’re using Web accelerator software, which reduces image quality.
• Plug-in issues. Some plug-ins installed in the browser allows images to load only on the very first viewing. They may not load during successive visits to the same page, even after refreshing.
• Cache and cookies. A corrupt cache file or cookies can sometimes prevent images from loading.
• Image permissions. Some browsers prevent certain websites from loading images just to increase load speed.
• Internet Security. Antivirus, firewall, and other security programs may block images and prevent them from loading.
• Pathnames to image files. Images that contain backslashes in their URL might have issues displaying in the browser. This may vary from browser to browser.
4 Tips for Monitoring Image Issues
1. Record your Web transaction. Recording a transaction will establish how well your applications are performing. You can then compare this to your baseline and identify what page element is causing the issue.
2. Use image matching. You can define the number of seconds it takes for an image to load using image matching. Monitoring this will tell you if the image has loaded within the specified time. Then you know if the transaction passed or failed.
Set thresholds to monitor image loading times
3. Monitor page load times. Establish a baseline for how much time it should ideally take applications to load. Then monitor the load times of each step in the page. If a step loads slowly or fails to load, you should receive an alert about the problem.
Efficiently & Effectively Monitor Websites
JK: How did you come to know SolarWinds for your network management needs?
TD: It happened when we were changing daylight savings time in California a few years ago. We have a large array of older Cisco products which did not natively support the new daylight savings time so we were going to need to manually reconfigure about 500 network devices. This would have taken several days to do. Our integrator recommended we use SolarWinds to automate this process. We downloaded and tried Network Configuration Manager (NCM) and were able to push a script out to all 500 devices in about 10 minutes. This product changed my job!
JK: Aside from saving weeks of time in making configuration management changes, how else to you use Network Configuration Manager?
TD: Whenever we have a major project, we go out to bid and sometimes we have multiple firms helping us with our network. Network Configuration Manager gives me peace of mind because if one of these firms makes a configuration change that results in loss of service, I can easily compare the configurations on all devices and pinpoint what happened in my environment, see who made the change, and resolve it quickly.
JK: What other SolarWinds products do you use?
TD: We use most of the products in the Orion suite – Network Performance Monitor, Server & Application Monitor, VoIP & Network Quality Manager, Integrated Virtual Infrastructure Monitor and NetFlow Traffic Analyzer.
JK: Why did you choose SolarWinds for your network and server performance monitoring needs?
TD: Previously we used a product called InterMapper, but we needed to have better support for the number of network devices, applications and server resources we support. We were so happy with NCM and we had seen online demos of Network Performance Monitor, and because of that good experience, we decided to buy a suite of SolarWinds products and we signed a 5 year maintenance agreement.
JK: What are the benefits of using Network Performance Monitor?
TD: We can manage our bandwidth - show the top talkers, where the traffic is going and we have visibility into how the network is performing. With Orion’s flexible interface, this visibility can be shared across all teams. For example, I have a customized Orion portal for me which looks at key network components, interfaces and utilization metrics; my co-worker who manages the server infrastructure has a custom view of server, application and virtual environment metrics, and we also have a custom view for the desktop support team. This is very beneficial because the help desk does not always need to rely on me to understand what is going on in the environment. They have visibility into network and server status and can often diagnose and remediate network issues without having to wait for me! It just allows us to provide better customer service.
Being proactive is also a huge time saver. With so many aging facilities, one of our biggest challenges is power outages. Because of the proactive alerts, I’m often the first to see an outage and report it to Maintenance, so the problem is often fixed well before our teaching staff even starts their day.
JK: Being a K12 school district, you are probably faced with challenges of BYOD/BYOA. How are you dealing with this challenge?
The California Department of Education has a new mandate requiring all state testing to be performed on-line for all students starting in 2014. This is one of the driving forces moving us away from thick textbooks and putting portable devices in the students’ hands with access to educational portals (textbooks, testing, etc.).
We are building out a large wireless network and we will need to monitor this environment to ensure there is adequate bandwidth, that the devices are up and running and healthy, and so on. Right now our major challenge is figuring out how to support a wide array of devices and OS’s on the network, specifically the authentication of these devices to the network and to the various academic systems we support.
How Third Party Content Affects Website Performance
• Content view: Some webpages use Java applets for interactive content like online games. If the Java plug-in isn’t installed, then the browser won’t be able to run the applet and interactive content can’t load.
• Ads: Live ads in a web page that pull data from a third party site will cause performance issues in websites.
• Live content: Live third-party content like, game scores or a stock ticker, will constantly refresh the page with the latest updates. This can affect other page elements and slow their load times.
How to Monitor for Issues with Third Party Content
Stay Ahead of the Game with Web Performance Monitor
SolarWinds Web Performance Monitor (WPM) empowers you to monitor third-party issues that affect your website performance.
Try a fully functional, free 30-day trial of Web Performance Monitor today!
Are your Microsoft Servers running slow? Read this post for best practices on the top metrics to monitor.
• When experiencing an application or server performance issue in your environment, perhaps the most obvious metric you’re likely to look to first is the affected servers CPU utilization. This metric provides insight into how much load is being placed on the servers’ processor at any given time. A high and sustained CPU utilization may be indicative of underperforming hardware that may need replacement or upgrade. If the server is virtual it may suggest that the virtual machine suffers from insufficient resource allocation. If the machine provides multiple services and functions, you may also want to consider distributing those roles amongst other servers in your environment to distribute the load more evenly.
• Another likely culprit of poorly performing applications and sluggish servers is the machines physical memory consumption. RAM is where the operating system stores information it’s actively using to service actively running applications running on the host. When a server has an inadequate amount of memory to run both the operating system and the applications that run on it, the OS will begin moving lesser used blocks of memory temporarily to virtual memory located on disk. This commonly referred to as paging. As demand for memory resources increase the more paging occurs. Because the disks are significantly slower than RAM this introduces a bottleneck on the server that can significantly impact overall server performance. Should this condition occur for a prolonged period of time you should consider adding additional RAM to the physical or virtual server.
• As virtual memory consumption increases hundreds of megabytes of information are constantly moving from RAM to disk and back to RAM again. This puts tremendous strain on the physical disks where the swap file is located. It’s always best to ensure your operating systems swap file is located on a different drive than the operating system to prevent swap file fragmentation and to ensure paging doesn’t impact other disk I/O intensive operations such as databases.
• Disk performance is the actually the leading cause of server and application performance issues today. Big data and virtualization have only compounded this problem by placing ever increasing additional strain on servers’ disk I/O subsystems. As such, it’s important to keep close tabs on your server’s queued I/O and disk latency to understand how your storage performance is impacting your applications. When either disk latency exceeds 100ms for any period of time this is likely indicative of a storage performance issue. The same can be said of sustained high disk Queue length. If your server is suffering from poor storage I/O performance consider changing your RAID type, adding more physical disks to your array, upgrading your storage controller to one with larger cache, or replacing older, slower disks with solid state or 15k SAS drives. Alternatively you may be able to more evenly distribute your applications disk I/O load by moving databases, applications, temp files, etc. across multiple disks.
• Finally, server monitoring should include keeping an eye on the hardware of your Windows servers. If there is an underlying problem with the hardware, the application may not function correctly, and an unforeseen hardware failure (hard drive, fan, etc.) can take your application down without any warning.
Learn how to monitor and manage other aspects of your Microsoft environment – www.solarwinds.com/gotmicrosoft