9 Replies Latest reply on Jan 23, 2015 9:53 AM by designerfx

    Oracle database monitoring with Orion - doing it on an enterprise scale!

    qbarry

      So we're a NPM/SAM/NTA/WPM house and we're looking at replacing our current Oracle and SQL database monitoring solution (HP OMW DBSPI) with what Orion can do. So we've had a look at the standard Oracle monitoring templates that come with SAM and they look great. All the different components we can monitor look good - very similar to what we can do with the existing tool. However, we've one small niggle. We've got 100s of databases globally. Now the default templates as they come with SAM seem to suggest that for each Oracle DB instance you want to monitor, you need to put the SID (or service name) of the database into the template and then deploy that template to the node. Isn't that going to mean that we're going to need a template per database? Isn't that going to get very big very quickly when you consider this at an enterprise level? Are we approaching this the wrong way?


      Very intersted in hearing peoples thoughts on this and how they've managed the issue of referencing each SID etc

       

       

      Same deal for SQL

          • Re: Oracle database monitoring with Orion - doing it on an enterprise scale!
            qbarry

            Bump

             

            So seeing as this hasn't generated many responses, I'll document what we found in relation to monitoring Oracle (and SQL Server) with Orion:

            - Our (SQL Server) DBAs like AppInsight,but the inability to turn off certain components, turns them off! Maybe they don't like it when all their views show red, I don't know, I'm not a DBA. But mostly they'll tell me that even though the item is red, the DB is running fine. One approach which helped here was to set the thresholds high on some of the components they were 'less interested in'. That improved the views but still hasn't led to widespread adoption. Maybe one approach would be to dynamic baselines on the thresholds? Might improve what they see.

             

            - The template approach then is what we ended up focusing on for both Oracle/SQL. So, we ended up defining a couple of golden templates with similar components eg: one for production, one for dev/test databases. That worked well and essentially the steps to add monitoring for a new database just become - copy the relevant golden template, rename it and put in edit it to insert the correct instance/database name and credentials. That's fine for all the new requests we get but we had to retrofit for the already existing database measuring in the many many 100s for us. The fact that we have to insert the database/instance name in each component in each template became an issue for us. Yes, there is the multi-edit option so you can edit all the components (of the same type) in a single template in one go, but try doing this x 100s! We ended up here creating a PowerShell script that takes an exported copy of our golden template, a CSV list of our server/database pairs and outputs a new template file for each server/instance with the relevant bits modified in the file. This we can then upload to Orion (albeit one by one because there is no mass template upload option at present - feature request has been logged!) and then you just apply the new templates to the relevant server.

             

            - On the Oracle side in particular, we played with getting Orion to run the complex SQL queries we need for each component and while it could happily do this, the problem came when the DBAs wanted to modify the queries. They wanted a bit more control over this so they ended up just creating functions in each database and distributing these to each of the DBs. Orion then just calls these functions and reads the return values and operates on those as the statistic values. Not saying this is good or bad, but it's the approach our DBAs preferred.

             

            - Adding removing components to templates that are already deployed is still proving to be a little sticky. So the scenario here is where we've deployed templates/components en-masse to the estate and the DBAs want to modify something. Maybe it's just our inexperience with the tool or doing this on a grand scale showing here, but this we found tricky. Because each database was monitored by a different template (owing to the fact that each template had to reference a unique database/instance name), if we wanted to modify a component across all of them, we first had to remove the existing deployed component. Then we had to create a new component (of the same name) and then add copy that component to each of the templates for each database. That's the easy bit - you then still need to modify each new component again and apply the correct database/instance name to that component. - ditto on the credential for that component.

             

            - Speaking of credentials, we also found that it's easy to apply credentials for a template when deploying to a node, but if you wanted to supply a new set of credentials there was no way to apply a set of credentials to more than one template at a time. So yes, you can modify the account name or password used by an existing template, but if wanted to retain those details and make the template use a new set of creds, the only option seemed to be remove the template from the node and re-deploy it or modify each template one by one. A mass apply option would be better here (SW take note).

             

            - Log monitoring (alert log/error log) wasn't great and we ended up using Splunk to give us what we needed given that we don't have the Orion LEM module. Interestingly the Oracle DBAs were more into what the alert logs were showing than the SQL DBAs. I'm not a DBA, but the way it was explained to me, that's just a difference in the two database technologies and what they output to their logs.

             

            - SQL DBAs seemed to be surprised that there was no built-in component to monitor whether individual databases were up. Yes, the instance availability was there, but not individual DBs. They're currently playing with their own query to perform this function.

             

            - The table space usage queries were next to useless because they didn't take auto-extend into account (or the space available on the file system that table space is using). Again we used our own query here.

             

            - The SQL Query component was only useful up to a point. The fact that it expects only one row (and up to two columns) to be returned was only semi-useful and was another reason the DBAs ended up using their own functions. Much better here would have been if the query could interpret multiple rows returned. So take for example, a table space query - our function returns a 0 if all is ok, a 1 if a single table is over it's space limit and a 2 if there are more than one tables over the limit and in the second column it gives the name of the table and the % space used. A better scenario here would be to return multiple rows, one for each table space that's over the limit.

             

             

            The whole thing for us is still a work in progress and we're learning all the time. It seems doing things en masse in Orion/SAM with templates is a little challenging at times. Things I'd like to see would be:

            - Ability to change credentials used by large groups of templates without having to re-deploy

            - Ability to apply/edit the instance/database used by all components in a template at the template level one time - without having to edit all the components  ... or

            - Ability to apply a generic database template to a node and then run a tool against it to set up which database to monitor on it (almost a mix of AppInsight and template)

            - Ability to interpret or return multi row queries

            - Ability to swap out or replace a component in a template while retaining some of the settings from the old component

            - More flexibility in the AppInsight options

            - AppInsight for Oracle

             

            Of course if there's better ways of doing any of the above, we'd love to hear it!

              • Re: Oracle database monitoring with Orion - doing it on an enterprise scale!
                rharland2012

                Great post and very informative - thank you!

                Have you taken a look at a trial of the formerly-Confio DPA? I'd be interested to hear if this product might address some of the challenges you faced. I have no experience of the product but am interested to hear from people who use it.

                • Re: Re: Oracle database monitoring with Orion - doing it on an enterprise scale!
                  HolyGuacamole

                  hi qbarry,

                   

                  Thanks for documenting your experience in great detail.  See comments inline

                  So seeing as this hasn't generated many responses, I'll document what we found in relation to monitoring Oracle (and SQL Server) with Orion:

                  - Our (SQL Server) DBAs like AppInsight,but the inability to turn off certain components, turns them off! Maybe they don't like it when all their views show red, I don't know, I'm not a DBA. But mostly they'll tell me that even though the item is red, the DB is running fine. One approach which helped here was to set the thresholds high on some of the components they were 'less interested in'. That improved the views but still hasn't led to widespread adoption. Maybe one approach would be to dynamic baselines on the thresholds? Might improve what they see.

                   

                  I'm not sure which component was a turn off for your DBAs, but the most common one I have heard is about indexes for which a configurable option was introduced in v6.1.1

                  https://thwack.solarwinds.com/ideas/2794

                   

                  - The template approach then is what we ended up focusing on for both Oracle/SQL. So, we ended up defining a couple of golden templates with similar components eg: one for production, one for dev/test databases. That worked well and essentially the steps to add monitoring for a new database just become - copy the relevant golden template, rename it and put in edit it to insert the correct instance/database name and credentials. That's fine for all the new requests we get but we had to retrofit for the already existing database measuring in the many many 100s for us. The fact that we have to insert the database/instance name in each component in each template became an issue for us. Yes, there is the multi-edit option so you can edit all the components (of the same type) in a single template in one go, but try doing this x 100s! We ended up here creating a PowerShell script that takes an exported copy of our golden template, a CSV list of our server/database pairs and outputs a new template file for each server/instance with the relevant bits modified in the file. This we can then upload to Orion (albeit one by one because there is no mass template upload option at present - feature request has been logged!) and then you just apply the new templates to the relevant server.

                   

                  A quick note on the terminology. There are Application Monitor Templates and then there are Application Monitors. Once a template is assigned to a node, it is called an Application Monitor and both can the template and the assigned application monitor can be modified independently.  So, really there is no need to create 1 template per node. You can do a multi-edit in the 'Assigned Application Monitors' tab of the 'Manage Templates' settings to define the node specific inputs like connection strings, etc.

                   

                  ** response edit start **

                   

                  On further thought, if you were to use PowerShell monitors instead of SQL user experience monitors, you can use Application Custom properties and pass the application specific details dynamically to the script body. So, you should be able to define everything you need in just the template itself.

                  Invoke-Sqlcmd cmdlet

                  APM script variables

                   

                  ** response edit end **

                  - On the Oracle side in particular, we played with getting Orion to run the complex SQL queries we need for each component and while it could happily do this, the problem came when the DBAs wanted to modify the queries. They wanted a bit more control over this so they ended up just creating functions in each database and distributing these to each of the DBs. Orion then just calls these functions and reads the return values and operates on those as the statistic values. Not saying this is good or bad, but it's the approach our DBAs preferred.

                   

                  That's an interesting approach!

                   

                  - Adding removing components to templates that are already deployed is still proving to be a little sticky. So the scenario here is where we've deployed templates/components en-masse to the estate and the DBAs want to modify something. Maybe it's just our inexperience with the tool or doing this on a grand scale showing here, but this we found tricky. Because each database was monitored by a different template (owing to the fact that each template had to reference a unique database/instance name), if we wanted to modify a component across all of them, we first had to remove the existing deployed component. Then we had to create a new component (of the same name) and then add copy that component to each of the templates for each database. That's the easy bit - you then still need to modify each new component again and apply the correct database/instance name to that component. - ditto on the credential for that component.

                   

                  Since you don't have to create 1 template per node,  now we are talking about create a new component monitor type to your golden template. You then need to simply copy that component to all the Assigned Application Monitors that used the golden template.

                  SAM-Manage-Component-Monitors.png

                   

                  - Speaking of credentials, we also found that it's easy to apply credentials for a template when deploying to a node, but if you wanted to supply a new set of credentials there was no way to apply a set of credentials to more than one template at a time. So yes, you can modify the account name or password used by an existing template, but if wanted to retain those details and make the template use a new set of creds, the only option seemed to be remove the template from the node and re-deploy it or modify each template one by one. A mass apply option would be better here (SW take note).

                   

                  If you want to use a new credential alias, you are right - there is no easy way to apply the new credentials to multiple templates. however, if you are simply changing the user name and password, simply go SAM Settings > Credentials Library and edit the credential that needs an update!

                   

                  - Log monitoring (alert log/error log) wasn't great and we ended up using Splunk to give us what we needed given that we don't have the Orion LEM module. Interestingly the Oracle DBAs were more into what the alert logs were showing than the SQL DBAs. I'm not a DBA, but the way it was explained to me, that's just a difference in the two database technologies and what they output to their logs.

                  I haven't attempted this myself but I see no reason why the components in these 2 out of the box templates cannot be used here so that you have a single place to look at the Oracle health.

                   

                  - SQL DBAs seemed to be surprised that there was no built-in component to monitor whether individual databases were up. Yes, the instance availability was there, but not individual DBs. They're currently playing with their own query to perform this function.

                   

                  I guess it's not an issue for MS SQL as it's covered by AppInsight for SQL

                  http://oriondemo.solarwinds.com/Orion/APM/SqlBlackBox/SqlBlackBoxDatabaseDetails.aspx?NetObject=ABSD:13

                   

                  - The table space usage queries were next to useless because they didn't take auto-extend into account (or the space available on the file system that table space is using). Again we used our own query here.

                  Please post your template to the context exchange section of thwack. This is very useful

                   

                  - The SQL Query component was only useful up to a point. The fact that it expects only one row (and up to two columns) to be returned was only semi-useful and was another reason the DBAs ended up using their own functions. Much better here would have been if the query could interpret multiple rows returned. So take for example, a table space query - our function returns a 0 if all is ok, a 1 if a single table is over it's space limit and a 2 if there are more than one tables over the limit and in the second column it gives the name of the table and the % space used. A better scenario here would be to return multiple rows, one for each table space that's over the limit.

                  It is a matter for constructing the query so that you can leverage the Message field. Below is a good example.

                  Oracle database account status checking

                   

                  The whole thing for us is still a work in progress and we're learning all the time. It seems doing things en masse in Orion/SAM with templates is a little challenging at times. Things I'd like to see would be:

                  - Ability to change credentials used by large groups of templates without having to re-deploy

                  - Ability to apply/edit the instance/database used by all components in a template at the template level one time - without having to edit all the components  ... or

                  - Ability to apply a generic database template to a node and then run a tool against it to set up which database to monitor on it (almost a mix of AppInsight and template)

                  - Ability to interpret or return multi row queries

                  - Ability to swap out or replace a component in a template while retaining some of the settings from the old component

                  - More flexibility in the AppInsight options

                  - AppInsight for Oracle

                   

                  Of course if there's better ways of doing any of the above, we'd love to hear it!

                  I hope the above helped!


                    • Re: Oracle database monitoring with Orion - doing it on an enterprise scale!
                      qbarry


                      Thanks for the reply. A few points in response then:

                       

                      > I'm not sure which component was a turn off for your DBAs, but the most common one I have heard is about indexes for which a configurable option was introduced in v6.1.1

                      Yeah, certainly our DBAs wanted more control than that. But its a step in the right direction I suppose.

                       

                      >A quick note on the terminology. There are Application Monitor Templates and then there are Application Monitors. Once a template is assigned to a node, it is called an Application Monitor and both can the template >and the assigned application monitor can be modified independently.  So, really there is no need to create 1 template per node. You can do a multi-edit in the 'Assigned Application Monitors' tab of the 'Manage >

                      >Templates' settings to define the node specific inputs like connection strings, etc.

                       

                      Here's the problem when you use a single template all the time. Take a situation where you have a server running multiple Oracle databases - so you assign multiple application monitors to the node and you edit them to modify the SID and credentials. Now you've got multiple App monitors deployed to one server, all with the same name and you don't know which one is which. You can't tell from the App monitor name - only thing you can do is delve into each component and check. That's not easy to do on an enterprise scale with big consolidated databases servers. Add to that you've no way to construct a meaningful view that shows a big green light beside all of the monitored databases on that node. Alerting also becomes an issue because if one of the components in the app monitors fails, you can't report on the database connect string or SID in that component. It doesn't seem to be exposed. Not saying we did it the right way, but using multiple templates seemed like the only way to us - or one template and renaming the app monitors when they're deployed.

                       

                      >On further thought, if you were to use PowerShell monitors instead of SQL user experience monitors, you can use Application Custom properties and pass the application specific details dynamically to the script body. >So, you should be able to define everything you need in just the template itself.

                       

                      Haven't really looked at that, but will investigate further.

                       

                      > It is a matter for constructing the query so that you can leverage the Message field. Below is a good example.

                       

                      We made good use of passing info back into the Message field (ie: the 2nd column in the query results) and we passed many items into that field, but then you can't do anything with the contents of that field like splitting it up or parsing it - particular in alerts. So for example, our table space % free component would return a list of all table spaces over the threshold in the second column of the query, but then we couldn't divide up the contents of that field. So we'd have to send our alert based on all of the table spaces that we're over the limit rather than one by one.