1 2 3 4 Previous Next

Geek Speak

1,550 posts

In my last post "THERE MUST BE A BETTER WAY TO MANAGE VIRTUALIZED SYSTEMS", we talked about what systems are out there and which ones everyone is using. Ecklerwr1, posted a nice chart from VMare which compares VMware vRealize Operations to SolarWinds Virtualization Manager and a few others.


vmw-scrnsht-virtualization-practice_lg.jpg

 

Based on the discussion, it seems like many people are using some kind of software to get things sorted in their virtual environment. In my previous job, I was responsible for parts of the lab infrastructure. We hosted 100+ VMs for customer support, so our employees can reproduce customer issues  or use it for training.

 

While managing the lab and making sure we have always enough resources available, I found it difficult to identify which VMs have actively been used and which VMs were idle for some time. Another day-to-day activity was to hunt down snapshots which consumed an massive amount of space.

Back then, we wrote some vSphere CLI scripts to get the job done. Not really efficiently but done. However, using SolarWind's Virtuailzation Manger now, I see how easy my life could have been.

 

My favorite features are the ability to view idle VMs and monitor the VM snapshots disk usage. Both features could have saved me lots of hours in my previous job. 

I am curious to know what features are saving you on a regular basis? Or are there any features, which we are all missing but just don’t know it yet?As Jfrazier mentioned, maybe Virtual Reality Glasses?

If you are an Oracle DBA and reading this, I am assuming all of your instances run on *nix and you are a shell scripting ninja. For my good friends in the SQL Server community, if you haven’t gotten up to speed on PowerShell, you really need to this time. Last week, Microsoft introduced the latest version of Windows Server 2016, and it does not come with a GUI. Not like, click one thing and you get a GUI, more like run through a complex set of steps on each server and you eventually get a graphical interface. Additionally, Microsoft has introduced an extremely minimal server OS called Windows Nano, that will be ideal for high performing workloads that want to minimize OS resources.

 

One other thing to consider is automation and cloud computing—if you live in a Microsoft shop this all done through PowerShell, or maybe DOS (yes, some of us still use DOS for certain tasks).  So my question for you is how are you learning scripting? In a smaller shop the opportunities can be limited—I highly recommend the Scripting Guy’s blog. Also, doing small local operating system tasks via the command line is a great way to get started.

I was watching a recent webcast titled, “Protecting AD Domain Admins with Logon Restrictions and Windows Security Log” with Randy Franklin Smith where he talked (and demonstrated) at length techniques for protecting and keeping an eye on admin credential usage. As he rightfully pointed out, no matter how many policies and compensating controls you put into place, at some point you really are trusting your fellow IT admins to do their job—but not more—with the level of access we grant and entrust in them.

 

However, there’s a huge catch 22—as an IT admin I want to know you trust me to do my job, but I also have a level of access that could really do some damage (like the San Francisco admin that changed critical  device passwords before he left). On top of that, tools that help me and my fellow admins do my job can be turned into tools that help attackers access my network, like the jump box in Randy’s example from the webcast.

 

Now that I’ve got you all paranoid about your fellow admins (which is part of my job responsibilities as a security person), let’s talk techniques. The name of the game is: “trust, but verify.”

 

  1. Separation of duties: a classic technique which really sets you up for success down the road. Use dedicated domain admin/root access accounts separate from your normal everyday logon. In addition, use jump boxes and portals rather than flat out providing remote access to sensitive resources.
  2. Change management: our recent survey of federal IT admins showed that the more senior you are, the more you crave change management. Use maintenance windows, create and enforce change approval processes, and leave a “paper” trail of what’s changing.
  3. Monitor, monitor, monitor: here’s your opportunity to “verify.” You’ve got event and system logs, use them! Watch for potential misuse of your separation of duties (accidental OR malicious), unexpected access to your privileged accounts, maintenance outside of expected windows, and changes performed that don’t follow procedure.

 

The age old battle of security vs. ease-of-use wages on, but in the real world, it’s crucial to find a middle ground that helps us get our jobs done, but still respects the risks at hand.

 

How do you handle the challenge of dealing with admin privileges in your environment?

 

Recommended Resources

 

REVIEW - UltimateWindowsSecurity Review of Log & Event Manager by Randy Franklin Smith -

 

VIDEO – Actively Defending Your Network with SolarWinds Log & Event Manager

 

RECOMMENDED DOWNLOAD – Log & Event Manager

Throughout previous blog posts, I talked about thin provisioning, approaches to move from fat to thin, and the practice of over committing. All what I communicated was about their system, advantages, pluses & minuses, methodology, drawbacks etc. Likewise, I also talked about the need for constant monitoring of your storage as the solution to many drawbacks. This article will talk about how to apply a storage monitoring tool to your infrastructure to monitor your storage devices. But when you select the tool make sure that you select one which has alerting options too.  I will walk you through SolarWinds Storage Resource Monitor (SRM in short) which is one of the storage monitoring tools and in the course I will talk about the different necessary  features that any storage monitoring tool require to overcome the weaknesses of thin provisioning.

 

Introduction to SRM:

SRM is SolarWinds storage monitoring product. SRM monitors, reports, and alerts on SAN and NAS devices like Dell, EMC, NETAPP and so on. For a detailed list check here. In addition, SRM helps to manage and troubleshoot storage performance and capacity problems.

You can download SRM from the link below:

Storage Resource Monitor

Once you have installed SRM, next you will need to add your storage device. Adding your storage device is different based on your vendor. Visit the below page for instructions on how to add storage devices from different vendors.

How to add storage devices

 

Once you have installed SRM and added your storage devices to SRM, you will have instant visibility into all storage layersextending to virtualization and applications with the Application Stack Environment Dashboard. Using SRM, troubleshooting storage problems across your application infrastructure is a cake walk. Let’s start with SRM’s dashboard.

 

dashboard.png

 

The dashboard gives you a birds-eye view of any issues on your storage infrastructure. Further, the dashboard displays all storage devices monitored by SRM classified via product and relevant status of each layer of storage, such as storage arrays, storage pools, and LUN’s.

 

SRM and Thin Provisioning:

Moving on to Thin Provisioning, SRM allows you to more effectively manage Thin Provisioned LUN’s. And when thin provisioning is managed and monitored accurately over-provisioning or over committing can be done efficiently. SRM helps you view, analyze and plan thin provisioning deployments by collecting and reporting detailed information of virtual disks, so you can manage the level of over-commitment on your datastores.

 

LUN.png

 

This resource presents a grid of all LUNs using thin provisioning in the environment.

The columns are:

  • LUN : Shows the name of the LUN and its status
  • Storage Pool: Shows which storage pool the LUN belongs to
  • Associated Endpoint: The server volume or the datastore using the LUN
  • Total Size : The total User size of the LUN
  • Provisioned Capacity : Amount of capacity currently provisioned

There are also columns that show the provisioned percentage, File System Used Capacity, and File System Used Capacity percentage for the concerned LUN.

 

A tool tip will appear when you hover over the LUN or Storage Pool which gives you a quick snapshot of performance and capacity. This helps you decide if you need to take action. Moreover, this tool tip when hovered over storage pool shows the pool’s usable capacity summary. This shows the total usage capacity (i.e, the collected amount of storage capacity that a user can actually use), remaining capacity (the storage left behind to get occupied) and over-subscribed capacity (total capacity this storage pool is over committed).

 

hoverover storage pool _ 2.png

 

A drill-down on a specific storage pool gives information that presents important key/value pairs of information for the current storage pool. Moreover, detailed information on:

  • Total Usable Capacity
  • Total Subscribed Capacity
  • Over-Subscribed Capacity
  • Provisioned Capacity
  • Projected Run-Out time, approximate time it will take to wholly utilize this storage pool.

 

drill down on storage pool.png

 

In addition, Active Alerts displays the alerts related to this storage pool. This displays the alert name, alert message in short, name of the LUN for which alert is triggered and it’s time.

Learn how to create an alert in SRM.

 

Alerting helps proactive monitoring:

Storage performance issues can happen anytime and you cannot literally monitor each and every second on how storage is performing. This is why you need alerts. They help you by warning you before a problem occurs. By setting up alerts based on criteria, you will gain complete visibility into your storage. You have to setup an alert forecasting a particular situation that can cause issues with storage performance.

 

all active alerts.png

 

Provided below is a list of Example alerts that you can use for LUN’s while doing thin provisioning:

  • Alert when usable space in the LUN goes below a particular % (i.e 20%)
  • Alert when usable space in a storage pool goes below a particular %
  • Alert when the storage pools oversubscribed % goes higher than a particular % (i.e 10%)

The % values can only be decided by you, as it will be differ based on infrastructure. Some can add more storage in days, where as in many organizations it might take up to months to get approval for additional storage. Therefore, the decision of setting % can only be done by you. 

 

Once you have alerts in place, you can just sit back and relax. And spare your time (that you spent to monitoring thin provisioning and over committing in storage) for other endeavors.

I have been selected, a THWACK ambassador for the month of May, where every week I will write on a topic related to network management. I work for an operator and also write at my blog at www.telecomlighthouse.com.

 

Well, my last blog generated quite an interest and discussion on the use of CLI for box configuration.

 

As a follow up I want to write on a related topic although it may generate some difference of opinion here but this is my goal to generate a wider discussion on this topic.

 

OK, in my last post I said that CLI is cumbersome; it takes a while to get used to and the worst thing is that if something goes wrong, the troubleshooting takes ages, sometimes.

I also said that protocols like NETCON and YANG would really make the configuration easier, more intelligent and move the focus from the box configuration to the network configuration in addition to making the configuration GUI friendly.

 

I want to bring a new dimension to this discussion.

 

Let’s see if Cisco would really like to give you a better user interface and a better configuration tool.

 

Although I write Cisco here, but it can mean any vendor that gives CLI experience, for example Juniper etc.

 

Ok to start with; let’s agree on a fact that using CLI is a skill; rather an expert skill. This skill is required to configure a box and additionally to troubleshoot networking issues. Not only do you need how to move around with CLI, but you should be able to do it with speed. Isn’t it?

 

This skill requires training and certification. If one has expert certification, it means that he is not only intelligent but he is a command guru. Correct?

 

Cisco certification is a big money making industry. If not a billion dollar, it must be generating hundreds of million dollars of revenue for Cisco ( I contacted Cisco to get real figures, but seems these are not public figures). Cisco makes money by making one pay for exams and selling them trainings. Then there is a whole echo-system of Cisco learning partners, where Cisco makes money by combining their products with training services and selling through them.

 

It costs to get expert level certifications. There is a cost if one passes, and there is more cost, if one fails.

 

An engineer may end up paying thousands of dollars on trainings and exams. We are talking about huge profits for Cisco here just because of the popularity of certifications. There is one for everyone; for a beginner to expert; for an operation guy to architects.

 

Besides creating experts, Cisco is winning here from three angles:

 

  1. It is making its customers used to CLI as customers feel at home using the codes they are trained on.
  2. It is creating loyal users and customers as they would recommend products they already know very well.
  3. It is generating big revenue. ( and big margins as it is a service)

 

For sure the It is win-win for Cisco here.

 

In my perspective, therefore, a difficult to operate switch and router is in the direct interest of Cisco, as Cisco needs experts to run their products and the experts need certifications.

Cisco, therefore, would NOT be very encouraged to make networks easy to operate and configured. Even I have seen the GUI of one of Cisco products; it simply sucks. It seems to me it is not one of their focuses.

 

Thus, this raises an important question here:

 

Why would Cisco take steps to make the network more programmable, easy to operate with newer tools and take CLI out of their central focus? Wouldn’t it like to stick around with difficult to operate products and keep on making more money?

 

Would you agree with me?

 

I like to hear, both if you agree or disagree and why?

cxi

The IT Approach to Security

Posted by cxi May 11, 2015

Hello again! Welcome to my next installment with various slides I've stolen from my own presentations I'd deliver at conference

If you read last weeks installment on this Checkbox vs Checkbook Security you probably know by now that security is an area which is personally important to me.

 

With that said, let's dive a little deeper into what is often the IT Approach to security...

Screen Shot 2015-05-10 at 8.43.04 PM.png

How many times have you heard someone say "I'm not a big enough target" heck, maybe you've even heard yourself say that.

Certainly in solidly targeted world where theater actors are striking to stop you from publishing what was otherwise a horrible movie (Sony) or you experience where credit card and customer data is to be stolen for purposes of stealing monies or other uses (JPMC/Chase) or where hundreds of millions are dollars are stolen from hundreds of banks (Too many sources to count). 

 

Then sure, that puts you into the landscape of, "I'm not a big enough target, why would anyone bother with me!"

 

Let's not forget for a moment here though, that the security landscape is not hard and fast... attacking scripts and threat engines are indiscriminate in their assault at times.  A perfect example is (taken from the old war-dialing days)... Just as we'd dial entire banks of phone numbers looking for modems to connect into, there are attackers who will cycle through entire IP banks while trying to exploit the latest zero day attack on the horizon.   Most Wordpress sites that are hacked on a regular basis are not because they were targeted, it is because they were vulnerable.

 

Or if this analogy helps.. More people are likely to take something from a car with its windows open or its top down, than one which is all locked up.

 

 

What is it that makes us irrespective of size, a target?

Screen Shot 2015-05-10 at 8.58.46 PM.png

I included this image here from my own threatmap to give you a sense of just what kinds of things can and do happen.

So the question then arises of, what exactly makes something 'targetable'


You are a target if:


  • You are connected to a network
  • You run a service which is accessible via a network protocol (TCP, IP, UDP, ICMP, Token-Ring...;))
  • You run an application, server, service which has a vulnerability in it, whether known or unknown
    • I just want to mention for a moment... Shellshock the Bash Vulnerability disclosed 24SEP2014 has been VULNERABLE since September 1989; just food for thought

 

So you're pretty much a target if you... Exist, Right? Wow that leaves us all warm and fuzzy I imagine...

But it doesn't have to be that way! You don't have to run in terror and shut everything down for fear of it being hacked.  But in the same breath, we need not stick our head in the sand assuming that we are invincible and invulnerable because no one would ever attack us, or steal our data, or whatever other lies we tell ourselves to sleep at night.

 

Do you see a future with fewer Zero Day attacks or more critical ones being discovered which had existed for 25 years before being discovered (ala Shellshock) or introduced in the recent past such as Heartbleed?


You know I love your insight! So you tell me... How are you a target, or NOT a target!   What other ways do you see people being a target? (I haven't even touched the mobile landscape...)

 

I look forward to your thoughts on this matter Thwack Community!

Microsoft Ignite 2015 concluded its inaugural event with 20k+ attendees. The SolarWinds team united in the Windy City, Chicago, to provide the single point of truth in IT monitoring for the continuous delivery and integration era with Ignite attendees. SolarWinds also teamed up with Lifeboat Distribution and hosted a Partner Meet and Greet during Microsoft Ignite at Chicago's Smith & Wollensky covering steaks and application stack management. Ignite lit IT from start to finish.

 

Microsoft Announcements at Ignite

There were plenty of announcements made by Microsoft and they've been covered extensively especially on Microsoft's channel 9 program. The announcements involved SoCoMo - social, cloud and mobility with Office 365, Azure, and Windows OS taking the front and center roles. The Edge beat out the Project Spartan name by a brow...ser to become Internet Explorer's named successor. Other notable news included Windows 10 being the last version of Windows and showcase demos of some of the "software defined" roles of Windows Server 2016 aka Windows Server Technical Preview 2 especially Active Directory, Docker containers, RMS, and Hyper-V. And there was something about Office 365 and its E3 subscription, which includes the core Office application suite, plus cloud-based Exchange, SharePoint, and Skype for Business. Exchange, SharePoint, and Unified Communications admins were put on notice and the consensus was that they had to broaden and deepen their skills in other areas especially cloud.

 

From the Expo Floor

SolarWinds booth was non-stop traffic throughout Ignite. The conversations ranged from the latest and greatest Microsoft announcements to solutions that were two or three generations behind. But regardless of the environment whether it be on-premises, colo, private/public cloud or hybrid, it was clear that the application was clearly on the minds of IT Ops AND it required monitoring along with database, security, log & patch management. Conversations also included a healthy dose of the Dev side of the DevOps equation. And yes, Devs need monitoring as well. Without baselines and trends, there can be no truth on what "good" should be.

 

Enjoy some of the moments from SolarWinds' Microsoft Ignite booth.

 

Ignite PicturesIgnite Pictures
booth.jpgbooth presentation.PNG
demo.jpgswag bag.PNG
geek out.PNGthwack hammer.PNG

 

Thank you Ignite

Thank you Ignite attendees for the conversations from those of us who attended and represented the SolarWinds family! Fantastic job SolarWinds team! See you next year.

booth-staff.png

Pictured: 1st row - Ryan Albert Donovan, Brian Flynn, Troy Lehman, Danielle Higgins, Aaron Searle, Wendy Abbott. 2nd row - Matthew Diotte, Kong Yang, Mario Gomez, Patrick Hubbard, Michael Thompson. 3rd row - Dan Balcauski, Karlo Zatylny, Cara Prystowsky, Ash Recksiedler. Not pictured (because of flight times): Thomas LaRock, Jennifer Kuvlesky, Jon Peters.

Leon Adato

Convention Season

Posted by Leon Adato May 8, 2015

Convention season is upon us. I know that conventions happen throughout the year, but it seems like April is when things kick into high gear.

 

As anyone who has been in IT for more than a month can tell you, there are so many incredible opportunities to get out there and network, learn, and see what is heading down the pipeline. It can really be overwhelming both to the senses and the budget.

 

The Head Geeks try very hard to find opportunities to meet up with customers, fellow thwack-izens, and like-minded IT Professionals. But like you, there are only so many days in a month and dollars in the budget.

 

I took a quick poll of the other Geeks to find out:

 

  1. Which shows we are GOING to be attending this year.
  2. Which ones we know we SHOULD be attending, but can’t due to other constraints.
  3. Which ones we WISH we could attend, even if it’s a little off the beaten path.

 

Here’s what I’d like from you: In the comments, let us know which shows YOU are going to be attending, and which ones you would like to see US attend next year. That will help us justify our decisions (and budget!) and (hopefully) meet up with you!

 

Attending:

Tom: PASS Summit, VMworld, Ignite

Kong: MS Ignite, VMworld, SpiceWorld Austin, Philadelphia VMUG USERCON, and Carolina VMUG USERCON

Patrick: Cisco Live, Ignite

Leon: Cisco Live

 

Should Attend:

Tom: Spiceworks, VMworld (Barcelona)

Kong: “Are you insane?!?! Did you see what I’m already going to?”

Patrick: VMworld

Leon: Interop, Ignite, SpiceWorld

 

Wish We Could Attend:

Tom: SXSW, AWS re:Invent 2015

Kong: AWS re:Invent

Patrick: RSA, AWS re:Invent 2015

Leon: Interop, DefCon, RSA,

 

Like I said, let us know in comments where YOU are going to be, and we’ll start to make plans to be there the next time around.

In the first part of this series, I described the four (ok, really five) questions that monitoring professionals are frequently asked. You can read that introduction here along with information on the first question (Why did I get this alert). You can get the low-down on the second question (Why DIDN'T I get an alert) here. And the third question (What is monitored on my system) is here.

 

My goal in this post is to give you the tools you need to answer the fourth question: Which of the existing alerts will potentially trigger for my system?

 

Reader's Note: While this article uses examples specific to the SolarWinds monitoring platform, my goal is to provide information and techniques which can be translated to any toolset.

 

Riddle Me This, Batman...

It's 3:00pm. You can't quite see the end of the day over the horizon, but you know it's there. You throw a handful of trail mix into your face to try to avoid the onset of mid-afternoon nap-attack syndrome and hope to slide through the next two hours unmolested.

 

Which, of course, is why you are pulled into a team meeting. Not your team meeting, mind you. It's the Linux server team. On the one hand, you're flattered. They typically don't invite anyone who can't speak fluent Perl or quote every XKCD comic in chronological order. On the other...well, team meeting.

 

The manager wrote:

            kill `ps -ef | grep -i talking | awk '{print $1}'`

on the board, eliciting a chorus of laughter from everyone but me. My silence gave the manager the perfect opportunity to focus the conversation on me.

 

“We have this non-trivial issue, and are hoping you can grep out the solution for us.” He begins, “we're responsible for roughly 4,000 sytems...”

 

Unable to contain herself, a staff member followed by stating, “4,732 systems. Of which 200 are physical and the remainder are virtualized...”

 

Unimpressed, her manager said, “Ms. Deal, unless I'm off by an order of magnitude, there's no need to correct.”

 

She replied, “Sorry boss.”

 

“As I was saying,” he continued. “We have a...significant number of systems. Now how many alerts currently exist in the monitoring system which could generate a ticket?”

 

“436, with 6 currently in active development.” I respond, eager to show that I'm just on top of my systems as they are of theirs.

 

“So how many of those affect our systems?” the manager asked.

 

Now I'm in my element. I answer, “Well, if you aren't getting tickets, then none. I mean, if nothing has a spiked CPU or RAM or whatever, then it's safe to say all of your systems are stable. You can look at each node's detail page for specifics, although with 4,000I can see where you would want a summary. We can put something together to show the current statistics, or the average over time, or...”

 

“You misunderstand,” he cuts me off. “I'm fully cognizant of the fact that our systems are stable. That's not my question. My question is…should one of my systems become unstable, how many of your... what was the number? Oh, right: How many of your 436-soon-to-be-442 alerts WOULD trigger for my systems?”

 

“As I understand it, your alert logic does two things: it identifies the devices which could trigger the alertAll Windows systems in the 10.199.1 subnet, for exampleand at the same time specifies the conditions under which an alert is triggeredsay, when the CPU goes over 80% for more than 15 minutes.”

 

“So what I mean,” he concluded, “Is this: can you create a report that shows me the devices which are included in the scope of an alert logic irrespective of the trigger condition?”

 

Your Mission, Should You Choose to Accept it...

 

As with the other questions we've discussed in this series, the specifics of HOW to answer this question is less critical than knowing you will be asked it.

 

In this case, it's also important to understand that this question is actually two questions masquerading as one:

  1. For each alert, tell me which machines could potentially be triggers
  2. For each machine, tell me which alerts may potentially triggered

Why is this such an important questionperhaps the most important of the Four Questions in this series? Because it determines the scale of the potential notifications monitoring may generate. It's one thing if 5 alerts apply to 30 machines. It's entirely another when 30 alerts apply to 4,000 machines.

 

The answer to this question has implications to staffing, shift allocation, pager rotation, and even the number of alerts a particular may approve for production.

 

The way you go about building this information is going to depend heavily on the monitoring solution you are using.

 

In general, agent-based solutions are better at this because trigger logic – in the form of an alert name -  is usually pushed down to the agent on each device, and thus can be queried (both “Hey, node, what alerts are on you?” and “hey, alert, which nodes have you been pushed to?”)

 

That's not to say that agentless monitoring solutions are intrinsically unable to get the job done. The more full-featured monitoring tools have options built-in.

 

Reports that look like this:

part5_2.png

 

Or even resources on the device details page that look like this:

part5_1.png

 

 

Houston, We Have a Problem...

 

What if it doesn't though? What if you have poured through the documentation, opened a ticket with the vendor, visited the online forums and asked the greatest gurus up on the mountain, and came back with a big fat goose egg? What then?

 

Your choices at this point still depend largely on the specific software, but generally speaking there are 3 options:

 

  • Reverse-engineer the alert trigger and remove the actual trigger part


Many monitoring solutions use a database back-end for the bulk of their metrics, and alerts are simply a query against this data. The alert trigger queries may exist in the database itself, or in a configuration file. Once you have found them, you will need to go through each one removing the parts which comprise the actual trigger (i.e.: CPU_Utilization > 80%). This will likely necessitate your learning the back-end query language for your tool. Difficult? Probably, yes. Will it increase your street cred with the other users of the tool? Undoubtedly. But once you've done it, running a report for each alert becomes extremely simple.

 

  • Create duplicate alerts with no trigger

 

If you can't export the alert triggers, another option is to create a duplicate of each alert that has the “scope” portion, but not the trigger elements (so the “Windows machines in the 10.199.1.x subnet” part but not the “CPU_Utilization > 80%” part). The only recipient of that alert will be you and the alert action should be something like writing to a logfile with a very simple string (“Alert x has triggered for Device y”). Every so oftenevery month or quarterfire off those alerts and then tally up the results that recipient groups can slice and dice.

 

  • Do it by hand


If all else fails (and the inability to answer this very essential question doesn't cause you to re-evaluate your choice of monitoring tool), you can start documenting by hand. If you know up-front that you are in this situation, then it's simply part of the ongoing documentation process. But most times it's going to be a slog through of existing alerts and writing down the trigger information. Hopefully you can take that trigger info and turn it into an automated query against your existing devices. If not, then I would seriously recommend looking at another tool. Because in any decent-sized environment, this is NOT the kind of thing you want to spend your life documenting, and it's also not something you want to live without.

 

What Time Is It? Beer:o’clock

After that last meetingnot to mention the whole dayyou are ready pack it in. You successfully navigated the four impossible questions that every monitoring expert is asked (on more or less a daily basis)Why did I get that alert, Why didn't I get that alert, What is being monitored on my systems, and What alerts might trigger on my systems? Honestly, if you can do that, there's not much more that life can throw at you.

 

Of course, the CIO walks up to you on your way to the elevator. “I'm glad I caught up to you,” he says, “I just have a quick question...”

 

Stay tuned for the bonus question!


Related Resources

SolarWinds Lab Episode 24 - Web-based Alerting + Wireless Heat Maps, Duplex Mismatch Detection & More

http://www.youtube.com/watch?v=nE4kpmhKG4s?CMP=THW-TAD-GS-WhatsNew-NPM-PP-fourquestions_4

 

Tech Tip:  How To Create Intelligent Alerts Using Network Performance Monitor

http://cdn.swcdn.net/creative/v13.0/pdf/techtips/how_to_create_intelligent_alerts_with_npm.pdf?CMP=THW-TAD-GS-TechTip_Alerts-NPM-PP-fourquestions_4

 

New Features & Resources for NPMv11.5

http://www.solarwinds.com/network-performance-monitor/whats-new.aspx?CMP=THW-TAD-GS-WhatsNew-NPM-PP-fourquestions_4

 

Recommended Download: Network Performance Monitor

  http://www.solarwinds.com/register/registrationb.aspx?program=607&c=70150000000Dlbw&CMP=THW-TAD-GS-rec_DL-NPM-DL-fourquestions_4

Hello Thwack-community,

 

For the month of May, I will be the Ambassador for the Systems Management Community.

 

First off, I would like to provide some background about me. My name is Jan Schwoebel and I'm on Twitter as @MindTheVirt and write the blog www.MindTheVirt.com - Mind The Virtualization. I scored my first job in IT back in 2007 starting out as a junior consultant, managing customer systems and provide decision-making support. Over the last 4+ years I have spent time in technical support positions, specializing in virtualization and storage systems.

 

Today, I would like to start a discussion with you regarding managing virtualized systems. As the years progress, virtualization has become mainstream and today, many servers and applications are virtualized. An increasing amount of companies are even starting to run 100% of their systems on VMware ESXi and Microsoft Hyper-V. The reasons to virtualize 100% of servers and applications, or your whole datacenter, reach from being green and reducing the carbon footprint to ease of deployment of new servers and systems.

 

However, as it becomes easier to deploy new servers, switches and applications, it becomes more complex to manage all these systems efficiently and be aware of any issues which might arise. Often, we are not aware of how many snapshots a VM has, if we need to run a snapshot consolidation, how scalable the current infrastructure is, or what application is creating a bottleneck. Every other week a new company appears with a product promising to simplify server and data management.

 

Since, I’m working in technical support, I only hear from customers once it is too late and they hit some issue or limitation. As Kevin O’Leary on Shark Tank always says: “There must be a better way”.

Indeed, there must be a better way and I would love to hear from you. What are you doing to avoid support calls? How do you manage your virtualized infrastructure efficiently? What products, workflows and techniques are you using and why?

jdanton

Is Your Shop Stuck in 2008?

Posted by jdanton May 6, 2015

Last week at their Build Developer Conference and the week at Ignite, Microsoft introduced a broad range of new technologies. In recent years, Microsoft has become a more agile and dynamic company. In order for you and your organization to take advantage of this rapid innovation, your organization needs to keep with the change, and quickly adapt to new versions of technology, like Windows 10, or SQL Server 2016 . Or maybe you work with open source software like Hadoop and are missing out on some of the key new projects like Spark or the newer non-map reduce solutions. Or perhaps you are using a version of Oracle that doesn’t support online backups.  It’s not your fault; it’s what management has decided is best.

 

As an IT professional it is important to keep your skills up to date. In my career as a consultant, I have the good fortune to be working with software vendors, frequently on pre-release versions, so it is easy for me to stay up to date on new features. However, in past lives, especially when I worked in the heavily regulated health care industry, it was a real challenge to stay on top of new features and versions. I recently spoke with a colleague there and they are still running eight-year-old operating systems and RDBMSs.

 

So how you manage these challenges in your environment? Do you do rogue side projects (don’t worry we won’t share your name)? Or do you just keep your expert knowledge of old software?  Do you pay for training on your own? Attend a SQL Saturday or Code Camp? What do your team mates do?  Do you have tips to share for everyone on staying current when management thinks “we are fine with old technology”?

Last week, Omri posted a blog titled, What Does APM Mean to You? Personally, I think it means several things, but it really got me thinking about security issues related to APMhow they are of high concern in today’s IT world. Systems and application environments are specifically prone to denial of service attacks, malware, and resource contention issuescaused by remote attacks or other miscellaneous security issues.

     

I've always looked at continuous application or systems monitoring as something that goes hand-in-hand with security monitoring. If SysAdmins are able to provide security insights, along with systems and application performance, it will only benefit the security and operations team.  After all, IT as a whole works best when teams interface and collaborate with each other.

     

It’s not ideal to rely on an application performance monitoring software for IT security, but such tools are certainly designed with some basic features that deliver capabilities that are related to security use casesto complement your existing IT security software.

     

Here are some key security related use cases you get visibility into using an application and systems monitoring software.

    

Check for important updates that should be applied

Forgetting to install an OS or hardware update may put your servers and apps at risk. Your apps may be prone to attacks from malicious software and other vulnerabilities. OS updates will ensure such vulnerabilities are corrected immediately when they are discovered. In addition, you should report on the number of critical, important, and optional updates that are not yet applied to the server.  Remember, you can also view when updates were last installed and correlate that time period to performance issues.  Sometimes these updates cause unexpected performance impacts.

Windows Server.png

         

Keep an eye on your antivirus program

Monitor the status of your antiviruswhether it is installed or not, make sure to check if key files are out of date. When you fail to scan your antivirus software or monitor whether it’s up and running, then you increase your chances of security issues.

              

Ensure your patch updates are installed

Collects information related to patch updates, and answers questions like: are they installed, what’s their severity, by whom and when were they installed? You install patches so that security issues, programs, and system functionalities can be fixed and improved. If you fail to apply patchesonce an issue has been detected and fixed, hackers can then leverage this publically available information and create malware for an attack.

OS Updates.png

          

View event logs for unusual changes

Monitor event logs and look for and alert on potential security events of interest. For example, you can look for account lockouts, logon failures, or other unusual changes. If you don’t have other mechanisms for collecting log data, you can simply leverage some basic log collection, such as event logs, syslog, and SNMP traps. You can use these for also troubleshooting.

Logs.png

           

Diagnose security issues across your IT infrastructure

Troubleshoot security issues by identifying other systems that may have common applications, services, or operating systems installed. Say a security issue with an application or website occurs, you can quickly identify what systems were in fact affected, by quickly searching for all servers that are related to the website or application. 

Appstack.png

           

While these are just a few use cases, tell us how you use your APM softwaredo you use it to monitor key system and app logs, do you signal your IT security teams when you see something abnormal, or do you rely on an APM tool for basic security monitoring? Whatever the case is, we’re curious to learn from you.

My first experience in the IP domain was that of a shock!

 

I had moved from the optical transport domain in an operator to the IP department.

 

As an optical guy, I used Network Management system (NMS) for all tasks including configuration, fault and performance measurements. Above all, I liked the nice Graphical User Interface (GUI) of NMS.

 

However, I found that in the IP world, Command Line (CLI) is used for everything; from provisioning to troubleshooting. CLI rules in the IP domain.

 

“CLI is the tool for Engineers”, I was told.

 

OK fine! This may have something to do with my personal preference that I do not like the user interface of CLI or because I came from optical background, that this stuff seemed strange to me.

 

Irrespective of the user interface, and with all functionality that CLI provides,from my perspective, CLI is not the ideal tool for configuration. First, it focuses on a single box i.e. configuring box by box, which is cumbersome.  Second, it is to prone to human error and because of errors sometimes troubleshooting takes considerable time. And lastly, it is vendor specific so changing a vendor box needs a totally different skill-set to configure a box.

 

Therefore, as an operator, in my view, there is a need for a more flexible way of configuring/ service provisioning. The focus should move out from “box configuration” towards “network configuration”. Also, in this age of emerging technologies like SDN and NFV, where NMS is the primary focus; CLI will simply block the innovation.

 

Network configuration is a major part of the operators' OPEX. Studies put it around 45% of the total TCO of the network.

 

CLI has a place today because the management protocol -SNMP itself is not ideal for service provisioning. That is why operators are using SNMP primarily for monitoring purpose, not for configuration purposes.

 

Both CLI and SNMP, also, do not support one another important requirement for large complex service provider networks. That is they do not support transactional mode for network configuration.

 

Transaction enables multiple configurations to take place as one transaction or fail completely (All or none). To clarify this very important point, take an example of IPTV service that involves configuring one router, two switches, two firewalls and a billing system.  A transactional protocol   enables configurations on all involved network elements or NONE. This is beneficial because if there is any problem of configuration validation on even one network element, the configuration would fail on all other network elements.  This means that configuration would never be implemented partially on some network elements. This is the essence of “network configuration” as we talked earlier.

 

So do we have to live with SNMP and CLI for network configuration, forever?

 

No!

 

The NETCONF YANG protocol, developed by IETF for network management, has a single focus and that is configuring network as easy as possible. IETF learned from the experience of SNMP on what can be improved and approached the new protocol in ground up fashion. It is purpose built for configuring network.

 

NETCONF is the management protocol primarily for network configuration while Yang is text based modeling language designed to be used with NETCONF. Both are needed for a complete flexible service provisioning in IP networks.

 

There are FOUR main features of NETCONF YANG:

 

  1. Support of Transactionality:  Configurations can be applied to multiple network elements as one transaction to either succeed or otherwise.
  2. Get configuration feature. This is distinct advantage compared to SNMP. With SNMP backup config. is available but it is polluted with operational data ( Alarms, statistics); with NETCONF one can just have the configuration data.
  3. Vendor device independence. NETCONF can be used as standard configuration protocol for any vendor. The vendor’s box will sequence the configurations and execute them. This sequencing is internal to the vendor’s box and NETCONF does not need to be aware of it.
  4. Multiple network elements can be configured at one time thus saving time, configuring the network.

 

Therefore in summary, NETCONF is the right solution to solve network management issues in standard way. It is the next generation of network management protocol, which will reduce the time to provision services for an operator and will help in applying multiple configurations to multiple network elements at one time.

 

Now it is your turn to tell me:

 

  1. How do you feel about CLI as a network configuration tool, would you like to live with it forever?
  2. What issues do you face, using CLI? If there are any.
  3. Do you think NETCONF can deliver better than SNMP/CLI?

 

Would love to hear your opinion!

cxi

Checkbox vs Checkbook Security

Posted by cxi May 4, 2015

Happy Month of May everyone!

I wanted to talk to you about a larger topic in the realm of IT Security, Network Security, or the general purpose 'security' space as it were...

The image below was a slide I stole from myself (thanks me!) from a presentation I've delivered at some conferences over the past few months, titled, "Is your IT Department Practicing Security Theater"

You might remember a similarly titled post I did back in January "Are you Practicing Security Theater in IT"

And just like that post itself was not the panacea to solve all matters of security it certainly did inspire both the presentation I delivered as well as some of the points contained here.

 

So, let's discuss for a moment...

 

Screen Shot 2015-05-01 at 10.38.25 AM.png

 

What exactly is Checkbox vs Checkbook Security?

 

The way I was looking at it initially is most organizations, especially budget constrained or regulatory driven ones are faced with the delicate decision to 'check a box', whether the answer solves their problem or not.

 

An example of that is, organizations which are required to implement logging and monitoring solutions.   Often times they'll just get some run of the mill Syslog server, have it collect all of the data and then archive it. Someone will pretend to go and review the logs every now and then, and they can officially check the box saying WE HAVE LOGGING AND MONITORING!

While sure, they TECHNICALLY do, but do they really? Will they be able to provide a backtrack history should an event occur and correlate it? Perhaps.  Will they be able to detect something happening inflight and mitigate it? Yea, no. Does that make it right? It does not, but does it check the box technically? Absolutely 'sort of' depending upon the rules they're required to follow.

 

But what does that mean for you and I? I mean I checked the box within a reasonable budget, even if by merely checking the box it doesn't provide any real value to the organization, what is the long-term impact?

The rub there is exactly that...  A checkbox without efficacy will definitely require you to open your Checkbook later on, whether to really resolve the problem, or due to loss of business, money or otherwise.

 

That's why I broke this list down in this scenario as a series of the 'checkbox' vs the 'checkbook'.  It's not to say that by adopting something in the Checkbook column it will cost more than in the checkbox (Sometimes it MAY, but it doesn't have to)

It really comes down to figuring out a strategy that works best for you and your business.

 

But like all things not being a panacea this is also not an exhaustive list of 'vice versa' possibilities.  I'd love your insight into whether you agree with these approaches. Situations where you've seen this be effective (I love personal stories! I have a fair share of my own ) Also if there are other situations which aren't included in here which should be addressed.

Share the love, spread the knowledge, let's all be smarter together!

 

Great to be back Thwack Community!

 

Ambassador @cxi signing off! <3

IP space management has become increasingly complex -- stemming from the building of new and secure network environments and a surge in the use of IP-enabled devices. Sniffing out problems early and remedying them before damage is done is the core of effective network management. IP space management is an integral part of network management and demands the same level of monitoring, quick troubleshooting, and remediation mechanisms.

 

IP alerting and relevant real-time information helps you avoid:

  • Assigning an IP that’s already in use
  • Failure to replicate IP address status changes to DHCP and DNS servers
  • Erroneous DHCP configuration changes and IP conflicts caused by DHCP scope overlaps
  • Unwarranted downtime due to troubleshooting of network issues and IP Conflicts
  • Over or under provisioning IP addresses, DHCP scope, and split scope address depletion
  • Errors during DNS record creation

 

Let’s take a look at some of the top IP alerts/data that give admins a heads-up, so they can avoid unexpected network downtime.


IP Conflict! Find and fix it before connectivity issues arise

 

The ‘IP conflict’ is a well-known problem in every network and there are many reasons that can cause one. The outcome is usually network issues and loss of user productivity. DHCP server errors, duplicate DHCP servers, BYOD, bad IP documentation, human errors, inadequate network segmentation, etc., are various reasons for IP conflicts in a network. Manually troubleshooting IP conflicts can be a very time consuming process. In turn, users experience significant downtime. Some obstacles that attribute to this include: identifying issues caused by IP conflicts, locating problematic systems, and finally taking the conflicting system off the network.


DHCP Subnets are reaching high utilization -- time to provision for more IP addresses!

 

When DHCP address pools are exhausted, new devices will not be able to connect to the network. In many cases, the administrator is often unaware of full DHCP scopesthat there are no IP addresses left for assignment. In some cases the admin over provisions, leaving IP addresses unused, hindering the optimal usage of IP address space. Further, if IP documentation is not updated, unused static or reserved DHCP addresses will exist. For example, IPs may have valid leases, but are no longer active. All this again means non-availability of IP addresses leading to interruption in network connectivity and user productivity.


What IP addresses are in use/available?

 

One of the main IP management challenges admins face is finding IP addresses that are available for use. A frequently used method is to ping for an IP, find one that doesn’t respond, and assume that it is available and then use it. But then this has its own downsides. Some examples are –

  • users pinging for an available IP wouldn’t know if the IP address is a static or a dynamic one
  • the IPs used for test purposes are left as such and even though technically not in use will still be unavailable
  • any conflict with an IP assigned to a critical server can cause serious downtime

Even in cases where IP documentation is manually and separately maintained, most of the time this data is incomplete or obsolete.


Looks like DNS Data entered was incorrect...…

 

The creation of DNS records is a standard task for administrators. Forward DNS mapping points a domain name to an IP address. Conversely, reverse DNS maps an IP address to a domain name. The two are distinct and separate lookups, however just because a forward lookup of a domain resolves to an IP address, it doesn’t mean that a reverse lookup of the same IP address will resolve to the same domain.

Reverse DNS is also commonly used for establishing outbound e-mail server connections. It helps trace the origin of an e-mail and adds credibility to the e-mail server itself. In turn, incoming mail servers will not accept messages from an IP address that does not identify with a PTR record in a reverse DNS zone, making it very important to ensure these records are error free.

 

To make matters worse, the advent of IPv6 and increase the number of heterogeneous devices has further contributed to the complexity of IP space management. Administrators have come to the realization that using manual methods and spreadsheets is simply not sufficient. What mechanism do you have in place for timely warnings of your IP address data?

Filter Blog

By date:
By tag: