1 2 3 4 Previous Next

Geek Speak

1,543 posts
Leon Adato

Convention Season

Posted by Leon Adato May 8, 2015

Convention season is upon us. I know that conventions happen throughout the year, but it seems like April is when things kick into high gear.

 

As anyone who has been in IT for more than a month can tell you, there are so many incredible opportunities to get out there and network, learn, and see what is heading down the pipeline. It can really be overwhelming both to the senses and the budget.

 

The Head Geeks try very hard to find opportunities to meet up with customers, fellow thwack-izens, and like-minded IT Professionals. But like you, there are only so many days in a month and dollars in the budget.

 

I took a quick poll of the other Geeks to find out:

 

  1. Which shows we are GOING to be attending this year.
  2. Which ones we know we SHOULD be attending, but can’t due to other constraints.
  3. Which ones we WISH we could attend, even if it’s a little off the beaten path.

 

Here’s what I’d like from you: In the comments, let us know which shows YOU are going to be attending, and which ones you would like to see US attend next year. That will help us justify our decisions (and budget!) and (hopefully) meet up with you!

 

Attending:

Tom: PASS Summit, VMworld, Ignite

Kong: MS Ignite, VMworld, SpiceWorld Austin, Philadelphia VMUG USERCON, and Carolina VMUG USERCON

Patrick: Cisco Live, Ignite

Leon: Cisco Live

 

Should Attend:

Tom: Spiceworks, VMworld (Barcelona)

Kong: “Are you insane?!?! Did you see what I’m already going to?”

Patrick: VMworld

Leon: Interop, Ignite, SpiceWorld

 

Wish We Could Attend:

Tom: SXSW, AWS re:Invent 2015

Kong: AWS re:Invent

Patrick: RSA, AWS re:Invent 2015

Leon: Interop, DefCon, RSA,

 

Like I said, let us know in comments where YOU are going to be, and we’ll start to make plans to be there the next time around.

In the first part of this series, I described the four (ok, really five) questions that monitoring professionals are frequently asked. You can read that introduction <LINK>here along with information on the first question (Why did I get this alert). You can get the low-down on the second question (Why DIDN'T I get an alert) here. And the third question (What is monitored on my system) is here.

 

My goal in this post is to give you the tools you need to answer the fourth question: Which of the existing alerts will potentially trigger for my system?

 

Reader's Note: While this article uses examples specific to the SolarWinds monitoring platform, my goal is to provide information and techniques which can be translated to any toolset.

 

Riddle Me This, Batman...

It's 3:00pm. You can't quite see the end of the day over the horizon, but you know it's there. You throw a handful of trail mix into your face to try to avoid the onset of mid-afternoon nap-attack syndrome and hope to slide through the next two hours unmolested.

 

Which, of course, is why you are pulled into a team meeting. Not your team meeting, mind you. It's the Linux server team. On the one hand, you're flattered. They typically don't invite anyone who can't speak fluent Perl or quote every XKCD comic in chronological order. On the other...well, team meeting.

 

The manager wrote:

            kill `ps -ef | grep -i talking | awk '{print $1}'`

on the board, eliciting a chorus of laughter from everyone but me. My silence gave the manager the perfect opportunity to focus the conversation on me.

 

“We have this non-trivial issue, and are hoping you can grep out the solution for us.” He begins, “we're responsible for roughly 4,000 sytems...”

 

Unable to contain herself, a staff member followed by stating, “4,732 systems. Of which 200 are physical and the remainder are virtualized...”

 

Unimpressed, her manager said, “Ms. Deal, unless I'm off by an order of magnitude, there's no need to correct.”

 

She replied, “Sorry boss.”

 

“As I was saying,” he continued. “We have a...significant number of systems. Now how many alerts currently exist in the monitoring system which could generate a ticket?”

 

“436, with 6 currently in active development.” I respond, eager to show that I'm just on top of my systems as they are of theirs.

 

“So how many of those affect our systems?” the manager asked.

 

Now I'm in my element. I answer, “Well, if you aren't getting tickets, then none. I mean, if nothing has a spiked CPU or RAM or whatever, then it's safe to say all of your systems are stable. You can look at each node's detail page for specifics, although with 4,000I can see where you would want a summary. We can put something together to show the current statistics, or the average over time, or...”

 

“You misunderstand,” he cuts me off. “I'm fully cognizant of the fact that our systems are stable. That's not my question. My question is…should one of my systems become unstable, how many of your... what was the number? Oh, right: How many of your 436-soon-to-be-442 alerts WOULD trigger for my systems?”

 

“As I understand it, your alert logic does two things: it identifies the devices which could trigger the alertAll Windows systems in the 10.199.1 subnet, for exampleand at the same time specifies the conditions under which an alert is triggeredsay, when the CPU goes over 80% for more than 15 minutes.”

 

“So what I mean,” he concluded, “Is this: can you create a report that shows me the devices which are included in the scope of an alert logic irrespective of the trigger condition?”

 

Your Mission, Should You Choose to Accept it...

 

As with the other questions we've discussed in this series, the specifics of HOW to answer this question is less critical than knowing you will be asked it.

 

In this case, it's also important to understand that this question is actually two questions masquerading as one:

  1. For each alert, tell me which machines could potentially be triggers
  2. For each machine, tell me which alerts may potentially triggered

Why is this such an important questionperhaps the most important of the Four Questions in this series? Because it determines the scale of the potential notifications monitoring may generate. It's one thing if 5 alerts apply to 30 machines. It's entirely another when 30 alerts apply to 4,000 machines.

 

The answer to this question has implications to staffing, shift allocation, pager rotation, and even the number of alerts a particular may approve for production.

 

The way you go about building this information is going to depend heavily on the monitoring solution you are using.

 

In general, agent-based solutions are better at this because trigger logic – in the form of an alert name -  is usually pushed down to the agent on each device, and thus can be queried (both “Hey, node, what alerts are on you?” and “hey, alert, which nodes have you been pushed to?”)

 

That's not to say that agentless monitoring solutions are intrinsically unable to get the job done. The more full-featured monitoring tools have options built-in.

 

Reports that look like this:

part5_2.png

 

Or even resources on the device details page that look like this:

part5_1.png

 

 

Houston, We Have a Problem...

 

What if it doesn't though? What if you have poured through the documentation, opened a ticket with the vendor, visited the online forums and asked the greatest gurus up on the mountain, and came back with a big fat goose egg? What then?

 

Your choices at this point still depend largely on the specific software, but generally speaking there are 3 options:

 

  • Reverse-engineer the alert trigger and remove the actual trigger part


Many monitoring solutions use a database back-end for the bulk of their metrics, and alerts are simply a query against this data. The alert trigger queries may exist in the database itself, or in a configuration file. Once you have found them, you will need to go through each one removing the parts which comprise the actual trigger (i.e.: CPU_Utilization > 80%). This will likely necessitate your learning the back-end query language for your tool. Difficult? Probably, yes. Will it increase your street cred with the other users of the tool? Undoubtedly. But once you've done it, running a report for each alert becomes extremely simple.

 

  • Create duplicate alerts with no trigger

 

If you can't export the alert triggers, another option is to create a duplicate of each alert that has the “scope” portion, but not the trigger elements (so the “Windows machines in the 10.199.1.x subnet” part but not the “CPU_Utilization > 80%” part). The only recipient of that alert will be you and the alert action should be something like writing to a logfile with a very simple string (“Alert x has triggered for Device y”). Every so oftenevery month or quarterfire off those alerts and then tally up the results that recipient groups can slice and dice.

 

  • Do it by hand


If all else fails (and the inability to answer this very essential question doesn't cause you to re-evaluate your choice of monitoring tool), you can start documenting by hand. If you know up-front that you are in this situation, then it's simply part of the ongoing documentation process. But most times it's going to be a slog through of existing alerts and writing down the trigger information. Hopefully you can take that trigger info and turn it into an automated query against your existing devices. If not, then I would seriously recommend looking at another tool. Because in any decent-sized environment, this is NOT the kind of thing you want to spend your life documenting, and it's also not something you want to live without.

 

What Time Is It? Beer:o’clock

After that last meetingnot to mention the whole dayyou are ready pack it in. You successfully navigated the four impossible questions that every monitoring expert is asked (on more or less a daily basis)Why did I get that alert, Why didn't I get that alert, What is being monitored on my systems, and What alerts might trigger on my systems? Honestly, if you can do that, there's not much more that life can throw at you.

 

Of course, the CIO walks up to you on your way to the elevator. “I'm glad I caught up to you,” he says, “I just have a quick question...”

 

Stay tuned for the bonus question!


Related Resources

SolarWinds Lab Episode 24 - Web-based Alerting + Wireless Heat Maps, Duplex Mismatch Detection & More

http://www.youtube.com/watch?v=nE4kpmhKG4s?CMP=THW-TAD-GS-WhatsNew-NPM-PP-fourquestions_4

 

Tech Tip:  How To Create Intelligent Alerts Using Network Performance Monitor

http://cdn.swcdn.net/creative/v13.0/pdf/techtips/how_to_create_intelligent_alerts_with_npm.pdf?CMP=THW-TAD-GS-TechTip_Alerts-NPM-PP-fourquestions_4

 

New Features & Resources for NPMv11.5

http://www.solarwinds.com/network-performance-monitor/whats-new.aspx?CMP=THW-TAD-GS-WhatsNew-NPM-PP-fourquestions_4

 

Recommended Download: Network Performance Monitor

  http://www.solarwinds.com/register/registrationb.aspx?program=607&c=70150000000Dlbw&CMP=THW-TAD-GS-rec_DL-NPM-DL-fourquestions_4

Hello Thwack-community,

 

For the month of May, I will be the Ambassador for the Systems Management Community.

 

First off, I would like to provide some background about me. My name is Jan Schwoebel and I'm on Twitter as @MindTheVirt and write the blog www.MindTheVirt.com - Mind The Virtualization. I scored my first job in IT back in 2007 starting out as a junior consultant, managing customer systems and provide decision-making support. Over the last 4+ years I have spent time in technical support positions, specializing in virtualization and storage systems.

 

Today, I would like to start a discussion with you regarding managing virtualized systems. As the years progress, virtualization has become mainstream and today, many servers and applications are virtualized. An increasing amount of companies are even starting to run 100% of their systems on VMware ESXi and Microsoft Hyper-V. The reasons to virtualize 100% of servers and applications, or your whole datacenter, reach from being green and reducing the carbon footprint to ease of deployment of new servers and systems.

 

However, as it becomes easier to deploy new servers, switches and applications, it becomes more complex to manage all these systems efficiently and be aware of any issues which might arise. Often, we are not aware of how many snapshots a VM has, if we need to run a snapshot consolidation, how scalable the current infrastructure is, or what application is creating a bottleneck. Every other week a new company appears with a product promising to simplify server and data management.

 

Since, I’m working in technical support, I only hear from customers once it is too late and they hit some issue or limitation. As Kevin O’Leary on Shark Tank always says: “There must be a better way”.

Indeed, there must be a better way and I would love to hear from you. What are you doing to avoid support calls? How do you manage your virtualized infrastructure efficiently? What products, workflows and techniques are you using and why?

jdanton

Is Your Shop Stuck in 2008?

Posted by jdanton May 6, 2015

Last week at their Build Developer Conference and the week at Ignite, Microsoft introduced a broad range of new technologies. In recent years, Microsoft has become a more agile and dynamic company. In order for you and your organization to take advantage of this rapid innovation, your organization needs to keep with the change, and quickly adapt to new versions of technology, like Windows 10, or SQL Server 2016 . Or maybe you work with open source software like Hadoop and are missing out on some of the key new projects like Spark or the newer non-map reduce solutions. Or perhaps you are using a version of Oracle that doesn’t support online backups.  It’s not your fault; it’s what management has decided is best.

 

As an IT professional it is important to keep your skills up to date. In my career as a consultant, I have the good fortune to be working with software vendors, frequently on pre-release versions, so it is easy for me to stay up to date on new features. However, in past lives, especially when I worked in the heavily regulated health care industry, it was a real challenge to stay on top of new features and versions. I recently spoke with a colleague there and they are still running eight-year-old operating systems and RDBMSs.

 

So how you manage these challenges in your environment? Do you do rogue side projects (don’t worry we won’t share your name)? Or do you just keep your expert knowledge of old software?  Do you pay for training on your own? Attend a SQL Saturday or Code Camp? What do your team mates do?  Do you have tips to share for everyone on staying current when management thinks “we are fine with old technology”?

Last week, Omri posted a blog titled, What Does APM Mean to You? Personally, I think it means several things, but it really got me thinking about security issues related to APMhow they are of high concern in today’s IT world. Systems and application environments are specifically prone to denial of service attacks, malware, and resource contention issuescaused by remote attacks or other miscellaneous security issues.

     

I've always looked at continuous application or systems monitoring as something that goes hand-in-hand with security monitoring. If SysAdmins are able to provide security insights, along with systems and application performance, it will only benefit the security and operations team.  After all, IT as a whole works best when teams interface and collaborate with each other.

     

It’s not ideal to rely on an application performance monitoring software for IT security, but such tools are certainly designed with some basic features that deliver capabilities that are related to security use casesto complement your existing IT security software.

     

Here are some key security related use cases you get visibility into using an application and systems monitoring software.

    

Check for important updates that should be applied

Forgetting to install an OS or hardware update may put your servers and apps at risk. Your apps may be prone to attacks from malicious software and other vulnerabilities. OS updates will ensure such vulnerabilities are corrected immediately when they are discovered. In addition, you should report on the number of critical, important, and optional updates that are not yet applied to the server.  Remember, you can also view when updates were last installed and correlate that time period to performance issues.  Sometimes these updates cause unexpected performance impacts.

Windows Server.png

         

Keep an eye on your antivirus program

Monitor the status of your antiviruswhether it is installed or not, make sure to check if key files are out of date. When you fail to scan your antivirus software or monitor whether it’s up and running, then you increase your chances of security issues.

              

Ensure your patch updates are installed

Collects information related to patch updates, and answers questions like: are they installed, what’s their severity, by whom and when were they installed? You install patches so that security issues, programs, and system functionalities can be fixed and improved. If you fail to apply patchesonce an issue has been detected and fixed, hackers can then leverage this publically available information and create malware for an attack.

OS Updates.png

          

View event logs for unusual changes

Monitor event logs and look for and alert on potential security events of interest. For example, you can look for account lockouts, logon failures, or other unusual changes. If you don’t have other mechanisms for collecting log data, you can simply leverage some basic log collection, such as event logs, syslog, and SNMP traps. You can use these for also troubleshooting.

Logs.png

           

Diagnose security issues across your IT infrastructure

Troubleshoot security issues by identifying other systems that may have common applications, services, or operating systems installed. Say a security issue with an application or website occurs, you can quickly identify what systems were in fact affected, by quickly searching for all servers that are related to the website or application. 

Appstack.png

           

While these are just a few use cases, tell us how you use your APM softwaredo you use it to monitor key system and app logs, do you signal your IT security teams when you see something abnormal, or do you rely on an APM tool for basic security monitoring? Whatever the case is, we’re curious to learn from you.

My first experience in the IP domain was that of a shock!

 

I had moved from the optical transport domain in an operator to the IP department.

 

As an optical guy, I used Network Management system (NMS) for all tasks including configuration, fault and performance measurements. Above all, I liked the nice Graphical User Interface (GUI) of NMS.

 

However, I found that in the IP world, Command Line (CLI) is used for everything; from provisioning to troubleshooting. CLI rules in the IP domain.

 

“CLI is the tool for Engineers”, I was told.

 

OK fine! This may have something to do with my personal preference that I do not like the user interface of CLI or because I came from optical background, that this stuff seemed strange to me.

 

Irrespective of the user interface, and with all functionality that CLI provides,from my perspective, CLI is not the ideal tool for configuration. First, it focuses on a single box i.e. configuring box by box, which is cumbersome.  Second, it is to prone to human error and because of errors sometimes troubleshooting takes considerable time. And lastly, it is vendor specific so changing a vendor box needs a totally different skill-set to configure a box.

 

Therefore, as an operator, in my view, there is a need for a more flexible way of configuring/ service provisioning. The focus should move out from “box configuration” towards “network configuration”. Also, in this age of emerging technologies like SDN and NFV, where NMS is the primary focus; CLI will simply block the innovation.

 

Network configuration is a major part of the operators' OPEX. Studies put it around 45% of the total TCO of the network.

 

CLI has a place today because the management protocol -SNMP itself is not ideal for service provisioning. That is why operators are using SNMP primarily for monitoring purpose, not for configuration purposes.

 

Both CLI and SNMP, also, do not support one another important requirement for large complex service provider networks. That is they do not support transactional mode for network configuration.

 

Transaction enables multiple configurations to take place as one transaction or fail completely (All or none). To clarify this very important point, take an example of IPTV service that involves configuring one router, two switches, two firewalls and a billing system.  A transactional protocol   enables configurations on all involved network elements or NONE. This is beneficial because if there is any problem of configuration validation on even one network element, the configuration would fail on all other network elements.  This means that configuration would never be implemented partially on some network elements. This is the essence of “network configuration” as we talked earlier.

 

So do we have to live with SNMP and CLI for network configuration, forever?

 

No!

 

The NETCONF YANG protocol, developed by IETF for network management, has a single focus and that is configuring network as easy as possible. IETF learned from the experience of SNMP on what can be improved and approached the new protocol in ground up fashion. It is purpose built for configuring network.

 

NETCONF is the management protocol primarily for network configuration while Yang is text based modeling language designed to be used with NETCONF. Both are needed for a complete flexible service provisioning in IP networks.

 

There are FOUR main features of NETCONF YANG:

 

  1. Support of Transactionality:  Configurations can be applied to multiple network elements as one transaction to either succeed or otherwise.
  2. Get configuration feature. This is distinct advantage compared to SNMP. With SNMP backup config. is available but it is polluted with operational data ( Alarms, statistics); with NETCONF one can just have the configuration data.
  3. Vendor device independence. NETCONF can be used as standard configuration protocol for any vendor. The vendor’s box will sequence the configurations and execute them. This sequencing is internal to the vendor’s box and NETCONF does not need to be aware of it.
  4. Multiple network elements can be configured at one time thus saving time, configuring the network.

 

Therefore in summary, NETCONF is the right solution to solve network management issues in standard way. It is the next generation of network management protocol, which will reduce the time to provision services for an operator and will help in applying multiple configurations to multiple network elements at one time.

 

Now it is your turn to tell me:

 

  1. How do you feel about CLI as a network configuration tool, would you like to live with it forever?
  2. What issues do you face, using CLI? If there are any.
  3. Do you think NETCONF can deliver better than SNMP/CLI?

 

Would love to hear your opinion!

cxi

Checkbox vs Checkbook Security

Posted by cxi May 4, 2015

Happy Month of May everyone!

I wanted to talk to you about a larger topic in the realm of IT Security, Network Security, or the general purpose 'security' space as it were...

The image below was a slide I stole from myself (thanks me!) from a presentation I've delivered at some conferences over the past few months, titled, "Is your IT Department Practicing Security Theater"

You might remember a similarly titled post I did back in January "Are you Practicing Security Theater in IT"

And just like that post itself was not the panacea to solve all matters of security it certainly did inspire both the presentation I delivered as well as some of the points contained here.

 

So, let's discuss for a moment...

 

Screen Shot 2015-05-01 at 10.38.25 AM.png

 

What exactly is Checkbox vs Checkbook Security?

 

The way I was looking at it initially is most organizations, especially budget constrained or regulatory driven ones are faced with the delicate decision to 'check a box', whether the answer solves their problem or not.

 

An example of that is, organizations which are required to implement logging and monitoring solutions.   Often times they'll just get some run of the mill Syslog server, have it collect all of the data and then archive it. Someone will pretend to go and review the logs every now and then, and they can officially check the box saying WE HAVE LOGGING AND MONITORING!

While sure, they TECHNICALLY do, but do they really? Will they be able to provide a backtrack history should an event occur and correlate it? Perhaps.  Will they be able to detect something happening inflight and mitigate it? Yea, no. Does that make it right? It does not, but does it check the box technically? Absolutely 'sort of' depending upon the rules they're required to follow.

 

But what does that mean for you and I? I mean I checked the box within a reasonable budget, even if by merely checking the box it doesn't provide any real value to the organization, what is the long-term impact?

The rub there is exactly that...  A checkbox without efficacy will definitely require you to open your Checkbook later on, whether to really resolve the problem, or due to loss of business, money or otherwise.

 

That's why I broke this list down in this scenario as a series of the 'checkbox' vs the 'checkbook'.  It's not to say that by adopting something in the Checkbook column it will cost more than in the checkbox (Sometimes it MAY, but it doesn't have to)

It really comes down to figuring out a strategy that works best for you and your business.

 

But like all things not being a panacea this is also not an exhaustive list of 'vice versa' possibilities.  I'd love your insight into whether you agree with these approaches. Situations where you've seen this be effective (I love personal stories! I have a fair share of my own ) Also if there are other situations which aren't included in here which should be addressed.

Share the love, spread the knowledge, let's all be smarter together!

 

Great to be back Thwack Community!

 

Ambassador @cxi signing off! <3

IP space management has become increasingly complex -- stemming from the building of new and secure network environments and a surge in the use of IP-enabled devices. Sniffing out problems early and remedying them before damage is done is the core of effective network management. IP space management is an integral part of network management and demands the same level of monitoring, quick troubleshooting, and remediation mechanisms.

 

IP alerting and relevant real-time information helps you avoid:

  • Assigning an IP that’s already in use
  • Failure to replicate IP address status changes to DHCP and DNS servers
  • Erroneous DHCP configuration changes and IP conflicts caused by DHCP scope overlaps
  • Unwarranted downtime due to troubleshooting of network issues and IP Conflicts
  • Over or under provisioning IP addresses, DHCP scope, and split scope address depletion
  • Errors during DNS record creation

 

Let’s take a look at some of the top IP alerts/data that give admins a heads-up, so they can avoid unexpected network downtime.


IP Conflict! Find and fix it before connectivity issues arise

 

The ‘IP conflict’ is a well-known problem in every network and there are many reasons that can cause one. The outcome is usually network issues and loss of user productivity. DHCP server errors, duplicate DHCP servers, BYOD, bad IP documentation, human errors, inadequate network segmentation, etc., are various reasons for IP conflicts in a network. Manually troubleshooting IP conflicts can be a very time consuming process. In turn, users experience significant downtime. Some obstacles that attribute to this include: identifying issues caused by IP conflicts, locating problematic systems, and finally taking the conflicting system off the network.


DHCP Subnets are reaching high utilization -- time to provision for more IP addresses!

 

When DHCP address pools are exhausted, new devices will not be able to connect to the network. In many cases, the administrator is often unaware of full DHCP scopesthat there are no IP addresses left for assignment. In some cases the admin over provisions, leaving IP addresses unused, hindering the optimal usage of IP address space. Further, if IP documentation is not updated, unused static or reserved DHCP addresses will exist. For example, IPs may have valid leases, but are no longer active. All this again means non-availability of IP addresses leading to interruption in network connectivity and user productivity.


What IP addresses are in use/available?

 

One of the main IP management challenges admins face is finding IP addresses that are available for use. A frequently used method is to ping for an IP, find one that doesn’t respond, and assume that it is available and then use it. But then this has its own downsides. Some examples are –

  • users pinging for an available IP wouldn’t know if the IP address is a static or a dynamic one
  • the IPs used for test purposes are left as such and even though technically not in use will still be unavailable
  • any conflict with an IP assigned to a critical server can cause serious downtime

Even in cases where IP documentation is manually and separately maintained, most of the time this data is incomplete or obsolete.


Looks like DNS Data entered was incorrect...…

 

The creation of DNS records is a standard task for administrators. Forward DNS mapping points a domain name to an IP address. Conversely, reverse DNS maps an IP address to a domain name. The two are distinct and separate lookups, however just because a forward lookup of a domain resolves to an IP address, it doesn’t mean that a reverse lookup of the same IP address will resolve to the same domain.

Reverse DNS is also commonly used for establishing outbound e-mail server connections. It helps trace the origin of an e-mail and adds credibility to the e-mail server itself. In turn, incoming mail servers will not accept messages from an IP address that does not identify with a PTR record in a reverse DNS zone, making it very important to ensure these records are error free.

 

To make matters worse, the advent of IPv6 and increase the number of heterogeneous devices has further contributed to the complexity of IP space management. Administrators have come to the realization that using manual methods and spreadsheets is simply not sufficient. What mechanism do you have in place for timely warnings of your IP address data?

omri

What Does APM Mean to You?

Posted by omri Apr 29, 2015

Many of us have heard the term APM, and just as many are confused as to what it truly means. Gartner put together their own definition – you’ll find it here. When writing this post I was thinking about 2 of the 5 “functional dimensions” Gartner outlines as making a proper APM solution. These are:

 

  • Runtime application architecture discovery modeling and display: As I like to think about it, this means discovering and providing useful metrics on the full range of paths that an application can take between software and hardware in your environment as part of proper execution.
  • User defined transaction profiling: Similar to the above, but more focused on how real users are using the application, and thus the paths their actual application requests take through the same hardware and software topology. Think about it like choosing those paths which are most critical to your actual users. This allows the solution to provide real-time metrics surrounding user experience and satisfaction.


We all have a number of web based apps running in our IT world, many of whose topology we may not fully understand. Capabilities such as the above can be helpful in identifying, troubleshooting and isolating the cause of user issues with those apps. After all, the issue could be anywhere in the various layers of our environment, and no one wants to start a guess and check game of servers, databases, etc. you stand to lose while wasting valuable time. At the same time, we’re all curious how exactly users are interacting with our environment, but don’t possess the sixth sense to tell us just which DB has the most users hitting it at any given time. With those two capabilities combined, you can begin to imagine being given visibility into which issues are impacting the most users at any given time (thus where the most helpdesk tickets will come from) as well as the components along their application request path that could be the root cause – all in real time.

 

In thinking about all the above, I’m curious about the following:

  1. Do you or your company currently use any software today that helps provide this sort of information?
  2. If you do, what sort of problems has it helped you to solve? With what type of applications? What do you feel is missing?
  3. Would knowing your various applications’ topology be interesting to you or your company? How about the real user transaction paths within those same applications?
  4. What sort of problems do you think could be solved if you knew all of that?
  5. In general, do you wish you knew more about how your real user’s actions affect the IT environment you work hard to monitor and maintain?

 

Don’t be shy and comment! I know I’m not the only one that struggles with this and, after all, misery loves company.

In 1999 and 2000 I worked as Tier 3 UNIX Support for a colocation company. One of our largest customers had recently moved in so much new equipment, there wasn’t enough power in one rack. In what was to be a temporary solution, a power cord was left running across the aisle about two feet off the floor.

 

One day as I was going into the data center with one of our Tier 2 folks, the inevitable happened – his foot caught the cord as he stepped over it, pulling it free and crashing the customer’s server. He scrambled to plug it back in as quickly as possible, but I stopped him and instead had us find a longer cord which we then ran underneath the raised floor.

 

The outage triggered an automatic trouble ticket, which was assigned to me. My manager advised me to list the root cause as “momentary loss of power”. This in turn triggered a series of daily 8:30AM “post mortem” meetings as the customer – quite reasonably – wanted to know why it had taken so long to get the server back on line. I was instructed to not say anything that could make it appear that the outage was our fault in any way.

 

After six weeks, I couldn't take it anymore and, with my manager waving me off, I said, “I tripped over your power cord.” I then proceeded to tell the customer what had happened, making it sound like I had been alone in the data center.

 

The customer’s first response was, “Oh. You should have just said so in the first place.”

 

With the explanation that we had set things up so no one could trip over the cord again, the customer was satisfied with the resolution.

 

This incident led me to determine the three things any customer wants to know when there’s an incident:

  1. What happened.
  2. What you did to fix it.
  3. What you’re doing to make sure it doesn't happen again.

 

If you can provide (and follow through on) these three things, you’ll have satisfied customers nearly every time.

 

What are the key things you do to ensure your customers are happy with the resolution of their problems?

Based on some of the responses from this last post, it is good to see that there are some that take backing up their network configurations serious. I would like to build on that last post and discuss some ideas around network configuration management, specifically solutions and automation to handle some of the tasks required. I see that several stated that they use Solarwinds NCM, which I personally do use in my environment. Solarwinds NCM is extremely easy to setup and configure to “save your bacon”, as one comment stated. NCM is a perfect example of a solution, which has the ability to track changes, roll back changes, reporting and auditing; as well as many other use cases of the product. There are also many open source products that I have used previously which include routerconfigs plugin for Cacti, simple TFTP jobs setup on each router/switch to backup nightly and numerous other solutions including rConfig which is another open source product that is a very solid solution. However, you may have the solution in place but how do you handle making sure that each and every network switch/router is in a consistent state? Or is configured to have nightly scheduled backups of their configs? Do you still do this manually? Or do you use automation tools such as Ansible, Chef or Puppet to stamp out configurations? I have personally began the journey of using Ansible to build playbooks to start stamping out configurations in a consistent manner as well as for creating dynamic configurations using templates. This is also another great way to start building a solution around the fact that when a network device fails, you have a solid way to start rebuilding a failed device from a somewhat consistent state. I would definitely still leverage a nightly backup to restore configurations which may have changed over time since deployment of an automation tool but hopefully as changes are made, your automation deployment configurations are also modified to reflect these changes.

If you are heading to Chicago next week for Microsoft Ignite, come by booth #543 and say hello to myself, kong.yang, and patrick.hubbard. We'd love to talk with you about, well, anything! I'm looking forward to lots of conversations about databases, performance tuning, and bacon (not necessarily in that order).

 

Click here to let us know you’re going to be at MS Ignite and see a full schedule of our activities.

 

See you there!

Brien Posey

Come Meet me at Ignite!

Posted by Brien Posey Apr 28, 2015

Microsoft Ignite is quickly approaching and although I am not going to be presenting a session this year, I am going to be doing an informal presentation on May 4th in the SolarWinds booth at the exhibit hall.

 

My presentation will focus on the challenges of server virtualization. As we all know, server virtualization provides numerous benefits, but it also introduces a whole new set of challenges. As such, I plan to spend a bit of time talking about some of the virtualization related challenges that keep administrators up at night.

 

I don’t want to give away too much information about my presentation before Ignite even starts, but I will tell you that some of the challenges that I plan to talk about are related to IP address management. Microsoft has tried to make things better for its customers by introducing a feature in Windows Server 2012 and Windows Server 2012 R2 called IP Address Management (IPAM).

 

The Windows Server IPAM feature has its good points and it has its bad points. In some situations, Microsoft IPAM works flawlessly. In other situations, it simply is not practical to use Microsoft IPAM. Unfortunately, some administrators probably do not find out about the IPAM feature’s limitations until they have taken the time to install and configure IPAM. That being the case, I plan to spend some time talking about what you can realistically expect from Microsoft IPAM.

 

So if you are interested in learning more about IP address management, or if you just want to meet me in person then please feel free to stop by the SolarWinds booth (# 543) for my presentation on May 4th at 2pm.  I am expecting a big crowd, so you may want to arrive early.

 

 

Click here to let us know you’re going to be at MS Ignite and see a full schedule of our activities.

 

 

 

SWIatMSIgnite.png

Are you attending Microsoft Ignite in Chicago from May 4th-8th? Come by Booth #543 for the swag -- IT's most-sought-after buttons, stickers, and t-shirts. Stay for the in booth demos and conversations highlighting the breadth and depth of SolarWinds IT management product portfolio. Chat with SolarWinds subject matter experts -- Product Managers, System Engineers, and Head Geeks - patrick.hubbard, Microsoft MVP sqlrockstar, and me.

                                        • Bring your IT management problems and challenges because we're bringing our IT management know-how.
                                        • Learn best practice tips for application stack management from the app to the virtualization and web layers extended through the physical servers and storage arrays connected via the network.
                                        • Delve into the IT learning experience via the SolarWinds in-booth theater presentations.
                                        • [UPDATED]  Play SysMan Hero for fabulous prizes.
                                        • [UPDATED]  Click here to let us know you're going to be at MS Ignite and see a full schedule of our activities.


Ignite will showcase Microsoft's vision in mobility and cloud. Expect exciting sessions and discussions on Windows 10, Windows Server vNext, and Microsoft Azure that include hot tech trends such as DevOps, security, hybrid cloud, containers, and big data analytics. They highlight continuous application delivery and continuous service integration. And SolarWinds can certainly help IT pros manage the continuous app cycles with connected context.   


 

Bonus round:

We invite you to meet one-on-one with Michael Thompson, SolarWinds Director of Systems Management Strategy. Mike can share SolarWinds' vision for helping IT Pros drive and deliver the performance that their business or organizations need as well as discuss your individual IT management needs. If you are attending Microsoft Ignite and would like to speak with Mike for 30 minutes, please RSVP today.


We look forward to meeting and talking to you there!

The need for disruptive innovation is driving businesses to seek better, faster, and cheaper options to internal IT. This coercive effort to find the best-fit technology is putting the squeeze on IT departments, IT budgets, and IT pros. Furthermore, new technologies are disrupting older, monolithic technologies and IT processes at a higher frequency and a grander scale. Alternatives are coming in with more velocity, more critical mass, or both. This abundance of choice is putting IT pros in a scramble-to-come-up-to-speed-on-the-changing-tech-landscape-or-find-yourself-out-of-the-IT-profession mode.

 

IT generalists may find themselves on the endangered list. A generalist is an IT pro with limited knowledge across many domains. Think broad, but not deep in any IT area. Generalists are being treated as replaceable commodities with reduction in force either through automation and orchestration or through future tech constructs like AI or machine learning.

 

But all is not lost, there are two paths that IT pros can seek to differentiate themselves from fellow generalist clones: IT versatilist or IT specialist.

 

IT versatilists are embracing the challenges by leveraging their inner IT force—born of experience and expertise to meet the speed of business. IT specialists are emerging out of the data center shadows to show off the mastery of their one skill in the era of big data and security ops.

 

A versatilist is fluent in multiple IT domains. This level of expertise across many technology disciplines bridges the utility between IT ops and business ops. They are sought after to define the processes and policies that enable automation and orchestration at speed and scale.

 

A specialist is a master of one specific IT discipline. It can be an application like databases or a domain like security, storage, or networking. Businesses are looking to them to transform big data into profit-turning, tunable insights and actions. Businesses are also looking to them to protect their data from all the connected entities in the Internet of things.

 

So which IT pro are you? The revolution has already started—hybrid cloud, software-defined data center, big data, containers, microservices, and more. Have you made preparations to evolve into a specialist or a versatilist? Awaken your inner IT force, there’s plenty of opportunities to expand your IT career. Paid well, you will be.

Filter Blog

By date:
By tag: