Skip navigation

Screenshot-2013-12-12-09.55.02.png

 

Are you ready for another IT Blogger Spotlight? We sure are! This month, we managed to catch up with Brandon Carroll, who blogs over at Global Config and can be found on Twitter where he goes by @brandoncarroll. Here we go...

 

SWI: Brandon, Global Config is much more than a blog. How would you describe it in a nutshell?

 

BC: Global Config actually started out as just a blog when I was working on the CCIE Security lab exam. As time went by, though, and as I started doing training on my own I converted the blog into the frontend for my company, Global Config Technology Solutions, which is now a Cisco Learning Partner.

 

SWI: So then, what’s the Global Config blog all about these days?

 

BC: Well, I cover a number of different topics, primarily around network security and Cisco products. I also try to get some exposure to other vendors, too, though. For example, I do tutorial posts on how to do certain things with, for example, SolarWinds IP Address Manager and Engineer’s Toolset. Really, I only blog about products that I’m able to get my hands on and use personally. Still, since it’s my company, every now and then I sneak in a post about productivity and Mac, iPad and iPhone apps I think are particularly neat or handy.

 

SWI: So, you started blogging back when you were working the CCIE Security lab exam, but what are some of your favorite topics to blog about?

 

BC: Honestly, I just like to blog about whatever fascinates me. And I know that sounds weird, but sometimes I’m fascinated by a product and other times I’m fascinated by a concept or a topic that is covered in one of the courses I teach. Really, the most enjoyable things to write about are the topics that I come up with rather than the topics that somebody asks me to write about. That doesn’t mean that when my students ask me a question I don’t enjoy blogging the answer because I really do enjoy that as well. But there’s just something about taking a thought and putting it down in a blog post and then knowing that other people are reading it and finding value. That it might be helping them solve a problem.

 

SWI: Interesting. Do you find certain types of posts end up being more popular than others?

 

BC: Typically, my most popular posts are the tutorial posts. I also see quite a bit of interest in posts related to the Cisco ASA. Sometimes I’ll also do posts about great consumer products that end up being pretty popular. For example, last year I did a post about a D-Link product and using it for IPv6 connectivity. It ended up being one of my most popular posts.

 

SWI: So, how’d you get into IT in the first place?

 

BC: When I was 18 or 19, I was trying to become a firefighter. In fact, I joined the Air Force in hopes that I would become a firefighter. After I left the Air Force, however, I found it was very difficult to get a job in that line of work so I ended up working a number of odd jobs. During that time, I applied for a job at the phone company GTE and was hired as a field technician. As a field technician, not only did we install circuits for customers, but we also had laptops. When they would break, somebody had to fix them, and after some time that somebody ended up being me. I would spend part of my day fixing laptops for the other technicians that I worked with. From there, I transferred into a group called EPG, or the Enhanced Products Group, and it was there that I learned how frame-relay, frame relay switches and ATM switches worked, and I was also introduced to the world of Cisco routers and Cisco networking. That must've been right around 1998 or 1999.

 

SWI: Well, you’ve come a long way since then. As the experienced IT pro you are, what are some of your favorite tools?

 

BC: Oh, there are too many to list. It’s actually really hard to pick my favorites. One thing I tend to do is jump from tool to tool depending on what I’m trying to accomplish. I like SolarWinds Network Topology Mapper quite a bit because as a consultant I can quickly get a map of a customer’s network and compare it to what they tell me they have. I also like SecureCRT and ZOC6, which are terminal applications for the Mac. Of course, there’s NMap and Wireshark to name a few more.

 

SWI: OK, time for the tough question: What are the most significant trends you’re seeing in the IT industry right now and what’s to come?

 

Bc: Software Defined Networking. I think that a controller-based solution will ultimately be what everybody ends up using or at least how everybody implements their technology, and we’re going to see less and less of this hardcode configuration of data and control plans on individual devices. I think we’re also going to see a lot more virtualization. I don’t think we’re anywhere near being close to seeing the end of innovation there and I believe a lot of the newer products that we see in the virtual space are going to be security products. Overall, I think we are in a major transition right now, so being in IT or even starting out in IT at this point in time is going to be very interesting over the next couple years.

 

SWI: OK, last question: I’m sure running Global Config keeps you pretty busy, but what do like to do in your spare time?

 

BC: Well, I’m a family man and I like to do things with the kids, so when I’m not working or blogging we like to go camping and ride dirt bikes. We recently bought a truck and a fifth wheel trailer, so we’ve been visiting some local campgrounds. It’s an opportunity to disconnect the phones and teach my young ones what it’s like to play a board game for a couple hours. I don’t think people do that enough anymore.

Centralizing globally distributed corporate IT departments is a large challenge for enterprises. A distributed system not only taxes enterprise resources, but also threatens to impede the efficiency of the growing company. In such organizations, it’s the responsibility of the IT team to manage the infrastructure, technology solutions, and services spread out among thousands of employees across the globe.

 

In addition, IT teams must support all employee-facing technology including, networks, servers, help desk, asset management, and more. In short, it’s a tough job to support distributed site locations of varying sizes, mostly because of the large number of networking devices and several hundred systems and datacenters housing both virtualized and physical systems.


Disparate IT Management vs. Holistic IT Infrastructure Management


Large enterprises often end up operating individually by region with each location managing its own infrastructure. This fragmented approach consumes a huge amount of resources and diminishes operational efficiency.

 

Some additional consequences to this approach can include:

 

  • Regional accountability for individual growth versus the company as a whole
  • Absence of global alignment around service availability and monitoring
  • Duplication of efforts—multiple administrators performing the same tasks

 

A better, more efficient approach to managing globally distributed IT departments is to build and unify a team equipped to scale as the company grows. This can only be accomplished by adopting a holistic approach towards IT infrastructure management.

 

A holistic management method provides teams with greater global visibility, alert notification and management, capacity management, service availability, and the ability to measure metrics beyond ‘just uptime’. The key to achieving operational efficiency is to maintain a central access point for all information required for complete visibility, stability and capacity across the global IT infrastructure.

 

In a network of numerous multi-vendor devices and device types, it’s vital that monitoring and maintenance is centralized. It’s important to leverage end-to-end visibility by prioritizing and focusing efforts to understand the nodes that need attention, those that are nearing full capacity, and those that can be left alone for now.

 

By unifying monitoring capabilities, IT teams can increase operational efficiency and function efficiently as one unit as their organizations grow and evolve. Some early advantages of a unified approach include:

 

  • Shared monitoring information that provides faster response to downtime
  • Greater visibility into how changes to business critical devices impact the network
  • Ability to monitor and manage multi-vendor devices under one management umbrella
  • Successfully and confidently meeting end-user SLAs

 

Holistic IT infrastructure management can be achieved by investing in a software solution or tool. It’s important to choose a tool that helps meet the goals of the IT team, is cost effective, and requires minimal training.

 

NetSuite, is a leading provider of cloud-based business management software covering enterprise resource planning (ERP), customer relationship management (CRM), e-commerce, inventory and more to over 20,000 organizations and subsidiaries worldwide. NetSuite uses SolarWinds Solution Suite to centrally manage its globally distributed IT organization.

The Heartbleed survey results are in – and the good news is that the vast majority SolarWinds Thwack Users are in the know and on top of it (of course, this comes as no surprise!).

 

Here are the results

 

 

  • Of those 61 respondents surveyed – only 6.6% were not sure if they were effected by the Heartbleed vulnerability and 100% were aware of the vulnerability.

  • When asked if the organization had a clear action plan to address Heartbleed – a whopping 81% were not vulnerable or had fully addressed the vulnerability.  Only 5% were still trying to identify steps or didn’t know what to do.

 

 

 

  • The cleanup of Heartbleed has made an impact on IT, but there is confidence in fast remediation for the most part.  Almost half of respondents said it only took hours to address.  30% said days, 11% said weeks and 10% were not sure.

 

 

 

 

 

  • When asked about overall effort/cost of tasks associated with the cleanup, the largest cited effort was following up with vendors to determine if products were effected. In second place – replacing digital certificates was cited and addressing customer concerns about the privacy of data was in third.  Surprisingly, addressing internal concerns about the vulnerability was ranked last, which is either a promising indicator of fast and clear communication as part of the incident response process or a lack of security awareness.

 

 

 

  • And finally, in terms of cleanup effort – the answers were pretty even across operating systems, websites and third party applications.

 

So, to sum up, its great to see that in spite of all the hype, while it may have been painful, it wasn't devastating to most.  We put a lot of effort into providing a fast vendor response on our end, and we hope that made it a bit easier for our beloved IT pros out there.

We know mobile devices are must-have tools for your end users and they’re growing more accustomed to having options when it comes to picking their corporate-connected mobile devices. Two end user mobile enablement strategies seem to be leading the pack: BYOD (bring your own device) and CYOD (choose your own device). BYOD, of course, involves end users providing their own device(s) from a virtually unlimited set of possibilities and connecting it to the corporate network and resources. CYOD, on the other hand, involves end users selecting their soon-to-be corporate-connected mobile devices from a defined list of devices that IT supports and can have more control over. The idea being that the burden on you is lessened because you don’t have to be prepared to manage every mobile device under the sun.

 

So we’re curious, has your organization settled on one of these strategies over the other? If so, we’d love to hear about your first-hand experience implementing either of these policies—or a hybrid approach—into your organization, and how your company arrived at the decision to go the route they did. If you have implemented a CYOD policy, what benefits have you seen? Was it successful or did your employees revolt and turn to smuggling in their own devices anyway? I'm looking forward to hearing your feedback.

 

And if you haven’t already taken our BYOD vs. CYOD poll, you can find it here.

Just like how the network admins keep a watch on the configuration changes of networking hardware, it is also important to monitor config changes in a virtual environment. While network configuration errors are definitely a nightmare and can cause the network to go down, config errors in a virtualized host or VM can impact all the servers and applications dependent on them. In a virtual environment, most of the repair time is spent on investigating what changed in a system. Unlike network devices whose config files we want to keep in good state and not make changes unless required, the nature of the virtual environment is such that it is dynamic and VMs will be migrating between hosts and clusters, there will be resources provisioned and reallocated constantly. And all this results in configuration changes. Config changes are also possible during routine processes such as software updates, security patches, hot fixes, memory, CPU and disk upgrades. If we are not keeping a track of what is changing in the virtual layer, we wouldn’t be able to diagnose and troubleshoot performance problems easily.

 

HOW TO DETECT VIRTUALIZATION CONFIGURATION CHANGES?

Picture1.png

The best solution is to map the state of your virtual environment and all its components (such as clusters, VMs, hosts and datastores) in accordance with time, and maintain historical status of the map as it evolves and encounters changes. You need to be in a position to compare the current and historical configuration state of a specific VM between different dates and times. And also compare one VM configuration with another over a specific time in consideration. In this way you’ll be able to what has changed and get visibility for troubleshooting config changes.

  • Compare configuration of VMs before and after vMotion or live migration
  • Monitor whether configuration of a VM has changed over time
  • Monitor resource allocation (CPU, memory, disk and network) to VMs as this directly impacts VM configuration
  • Monitor VM workload: To meet growing workload the hypervisor can provision more resources to a VM and this could result in a config change
  • VM Sprawl: Zombie/stale/inactive VMs present in the virtual environment can cause resource contention amongst the active VMs and, in turn, cause config changes at the host level

 

Virtualization config changes are part of the package and there are bound to be changes as VMs keep moving and migrating. This doesn’t mean VM migration is risky. Flexibility of VM movement is one of the benefits achieved with server virtualization and we should leverage it. Employ the right virtualization management tool to ensure your virtualization environment is mapped, and configuration changes are captured and reported!


Like everyone else, I am very busy at my job, and I tend to be a jack of all trades. I find myself running from fire to fire, getting things under control. I wish I had time to be more proactive and do everything according to the text book.  But, the reality of a short staffed global manufacturing company doesn’t always allow for such luxuries. 


Sometimes things are insanely busy and I don’t know how I get through the day, every once in a while there is a lull in the action and it’s during these times I try to focus on making my network management station better and catch up with a lot of proactive maintenance. I know that in the long run, every hour I spend on the NMS, will be multiplied my times over in stress savings and performance improvements


These lulls don’t happen very often and when they do, I don’t have months to learn an overly complex, high priced NMS, that requires a degree in computer programming in order to get the simplest tasks done.  I know what I need to get done.  I just need to translate that into a task in the NMS.


That’s why I really like the Solarwinds suite of network management products, they are cost effective, easy to use, easy to install, and very easy to run and maintain and if I ever have an issue , I can just go jump on over to the thwack community and find what I’m looking for quickly and efficiently.  There are so many great products from Solarwinds.


They are a company that truly gets it, they understand that I don’t have a lot of time, to spend on learning SQL programming or Oracle databases, or PERL reports.  I just want to monitor my network, with a user friendly graphical user interface.  A few clicks here and there and I am literally up and running. 


Network management is not the main part of my job, in fact it’s a rather small part. I need to run and protect the business and keep the lights on and the machines running. There is such a huge return on investment, with Orion, it has saved my bacon many times, while nothing is perfect , I have to say that I am very happy and sleep better at night knowing I got someone watching my back. I hate being woken up by the pager at 3am, but  that happens a lot less nowadays. I know now what the problem is and get to work on fixing it, instead of fumbling around wondering what happened, and having people asking me for status every 5 minutes.


How are you using your NMS, is it your full time job? I think most people are like me, they have a lot of other things to get done.  I am so thankful the product is so easy to use, it really makes my job that much better.


It’s hard to fix something if you don’t know what is broken. And my NMS is my eye in the sky. Those help desk tickets won’t stop coming in. The phone doesn’t stop ringing, but hey I got this. How bout you?

So far this month, we've talked about the difficulty of monitoring complex, interconnected systems; the merging of traditional IT skills; and tool sprawl. You've shared some great insights to these common problems. I'd like to finish up my diplomatic tenure with yet another dreaded reality of life in IT: Root Cause Analysis.

 

Except... I've seen more occurrences of Root Cause Paralysis lately. I'll explain. I've seen a complex system suffer a major outage because of a simple misconfiguration on an unmonitored storage array. And that simple misconfiguration in turn revealed several bad design decisions that were predicated on the misconfiguration. Once the incident has been resolved, management demanded a root cause analysis to determine the exact cause of the outage, and to implement a permanent corrective action. All normal, reasonable stuff.

 

The Paralysis began when representatives from multiple engineering groups arrived to the RCA meeting. It was the usual suspects: network, application, storage, and virtualization. We began with a discussion on the network, and the network engineers presented a ton of performance and log data during the morning of the outage to indicate that all was well in Cisco-land. (To their credit, the network guys even suggested a few highly unlikely scenarios in which their equipment could have caused the problem.) We moved to the application team, who presented some SCOM reports that showed high latency just before and during the outage. But when we got to the virtualization and storage components, all we had was a hearty, "everything looked good." That was it. No data, no reports, no graphs to quantify "good."

 

So my final questions for you:

 

  1. Has this situation played out in your office before?
  2. What types of information do you bring with you to defend your part of the infrastructure?
  3. Do you prep for these meetings, or do you just show up and hope for the best?

 

Go!

Before we discuss the “how” aspect of cleaning up the firewall rule base, let’s understand the “why” which scoped out the need to perform clean-up.

  • Firewall Performance Impact: The firewall rule base is something that always tends to grow as network admins and security admins keep adjusting them to address firewall policy changes. If left unchecked, your firewall rule base swell to have hundreds or even thousands of rules which will make it harder for the firewall to process – which leads to reduced performance.
  • Firewall Configuration Errors: With complex rule sets, there is the possibility of some unused rules and duplicate rules causing config errors. Given the massive size of the rule base, it become more difficult for the administrator to figure the cause of the error and rectify it.
  • Security Vulnerability: Unmanaged and unchecked firewall rule base can contain rules and objects that open up a security gap in your network. You may not intend them to be there. But you may never know there are these old and unused rules in your firewall that pose a threat to your network access control.
  • Regulatory Compliance Requirements: Compliance policies such as PCI DSS require cleaning up of unused firewall rules and objects. According to PCI DSs 3.0 requirement 1.1.7, firewall and router rule sets have to be reviewed at least every six months.

 

So, it comes back on the administrator to identify redundant, duplicate, old, unused, and shadowed rules and remove them from the rule base to achieve optimized firewall performance. Let’s discuss how you can do this.

 

STRUCTURAL REDUNDANCY ANALYSIS

Structural redundancy needs no additional data and is based on identifying rules that are covered by other rules and have the same action (redundant rules), or the opposite action (shadowed rules). In either case, a rule that is redundant or shadowed is a candidate for elimination. You can employ an automated firewall management tool to conduct a structural redundancy analysis to identify redundant rules. Automated tools help you generate a report and even a clean-up script. In addition to the redundant and shadowed rules, you should also find the rules that cause their redundancy, unreferenced objects, time inactive rules, disabled rules, and so on.

  

LOG USAGE ANALYSIS

Log usage analysis identifies rules and objects that can be eliminated based on zero usage as analyzed using log data. Firewall management tools generally use two techniques to use log data. The first technique uses log data files, the second sets up log data collection directly from the device or management server. Here again, a report and clean-up script are generated.

  

For both cases, you can run the script to remove the identified rules and objects from the firewall rule base. It’ll be more effective if you conduct the log usage analysis first, and clean up unnecessary rules.  The cleaned up rules may be removed from the configuration or disabled. Then the structural cleanup report can be generated to identify additional rules that can be removed.

 

WHAT RULES TO CLEAN UP?

  • Redundant or duplicate rules slow firewall performance because they require the firewall to process more rules in its sequence
  • Orphaned or unused rules make rule management more complex, which creates a security risk by opening up a port or VPN tunnel
  • Shadowed rules can leave any other critical rule unimplemented
  • Conflicting rules may create backdoor entry points
  • Unnecessarily boated firewall rules can complicate firewall security audits
  • Erroneous or incorrect rules with typographical or specification inaccuracies can cause rules to malfunction

 

LEARN MORE

FSM WP.png

A database admin has responsibilities to fulfill around the clock. From ensuring the backing up of databases, attending to breakdowns of applications affecting database performance, verifying accuracy of information within the organizations’ database, and constantly monitoring the entire database server. Fulfilling all these responsibilities is what makes a DBA one of the most valuable players in an organization. On any given day, database admins have a set of routine tasks to attend to, these include:

          

SQL Server® Logs

DBAs view SQL logs to see whether SQL agent job statuses have completed all required operations. If the job status is incomplete, then this will lead to errors within the database. Looking at SQL logs regularly will ensure an issue or a database error doesn’t go unnoticed for an extended time period. Login failures, failed backups, database recovery time, etc. are key fields a DBA looks for in SQL logs. Looking up SQL logs are beneficial, especially when you have critical databases in your environment.

       

Performance Tuning

In order to fully maximize the potential of the database server and also ensure applications don’t have downtime due to a SQL issue, it’s become a best practice for DBAs to monitor SQL server performance and metrics. Whether an issue is due to an expensive query, fragmented indexes, or database capacity, DBAs can set up optimum baseline thresholds. In turn, they’re notified whenever a metric is about to reach the threshold. In addition, it helps to glance through these metrics to see workloads and throughput so you can adjust your database accordingly.

            

Database Backup

DBAs have to regularly test backups and make sure they’re restored. This allows them to be risk-free from issues pertaining to applications and user backups if they’ve been deployed to a different host, server, or datacenter. DBAs also regularly test backups because this helps them verify if they’re staying within the SLA.

             

Reporting & Dashboard

As the size of a database grows, complexity of maintaining and monitoring also grows. Database issues have to be addressed as soon as possible. Therefore, DBAs need real-time data on SQL performance before a disaster occurs. For this reason, having up to date information in the form of reports and dashboards provides visibility and reporting about the server, hardware resources, SQL queries, etc. DBAs need access to reports for database size, availability and performance, expensive queries, transaction logs, database tables by size, and so on.

           

Activities such as database maintenance, data import/export, running troubleshooting scripts, etc. are other areas DBAs focus on and spend their time. To manage and optimize SQL server, it’s essential to consider using a performance monitoring tool which comprehensively monitors vital metrics within your SQL server and simplifies your “to do” list of activities.

           

Read this whitepaper to learn more about how to monitor your SQL Server.

I had an opportunity recently to interview a long time SolarWinds Server & Application Monitor (SAM) customer, Prashant Sharma, IT Manager, Fresenius Kabi (India).

   

KR: As an IT manager, what are you primary roles and responsibilities?

PS: I’m in charge of the whole of IT where I manage the IT environment, look at IT security, data center performance, monitoring, and also application development.

        

KR: What other SolarWinds products do you currently own other than SAM and how are you using SAM?

PS: Other than SAM, we currently have NPM and NTA. For monitoring IT security, we use RSA, but I’m not happy with RSA as it’s too expensive to maintain and we are looking at replacing it with LEM. We have around 98 nodes and we use SAM to monitor the performance of servers, we monitor critical applications like SQL, SharePoint, IIS, AD, and we also have custom and out of the box applications we monitor for R&D using SAM. Since we have a huge R&D center, we do monitor applications in both development and production environment. We use the built in feature integrated virtualization module within Orion to monitor our virtual environment as we are a VMware shop.

        

KR: Why did you choose SAM and what other products did you look at before narrowing down on SAM?

PS: We chose SolarWinds products because they are easy to implement and troubleshoot. We are actually able to setup in about an hour. We have also never reached SolarWinds for any issues in the last 4 years of owning the product. SAM is cost effective software and we have all the features that we want in the product. We also evaluated other products from CA and Whatsup Gold. Ended up going with SolarWinds as it was fairly simple and straight forward. We also own NPM and NTA, and are able to monitor for issues easily with those as well.

           

KR: How was it before using SAM and how have things changed right now?

PS: SolarWinds is the only company we use for monitoring. Before SAM we didn't know how to troubleshoot for issues, didn't know where to go and look for issues and solve them. Now we are able to identify performance issues more easily with SAM and are able to save a lot of time and money. I also leverage thwack to find answers to anything I needs from other IT pros.

 

Learn more about Server & Application Monitor

I work at a global automotive manufacturer and they have offices all over the globe, and one of the ways we use our network management for is capacity planning. For example, if I have a plant in Bejing China, I can set an alert to let me know when a circuit is going over a predefined threshold and when it does, I can start the process of ordering a new circuit. As you might imagine, the logistics of ordering a circuit in China is full of all kind of red tape and is going to take a while to get into place. I can use all the head start that I can get and Orion gives me the lead time I need to get things done.

 

Another thing I really like NMS for is to be able to monitor manufacturing plants all over the globe, where their main job is to build cars and not information technology.  Where any outage can cost, thousands to millions of dollars, and the pressure is on.  Orion helps me do my job, so I can help the workers do their job.  This way we can deliver enterprise class IT support with a skeleton crew at the actual site.

 

Some Network management systems are hard to use and cost hundreds of thousands of dollars, and you need a PHD to figure them out.  Orion is so easy to use and there is so much help from the online community, Just click here, input a bit of management information and you’re up and running.  It’s not all roses and chocolate, but you can be up and running in a few hours , or a few days. This is in contrast to a few months or years with other overly complicated network management systems, that require plenty of well paid consultants eating up the projects budget.

 

One particular incident, I can recall recently is our chassis switches switching blades turned out to have a manufacturing defect in them. This defect only manifested itself above a certain temperature. When the defect hit, the blade just shut down.  Production stopped, pagers went off and phones rang, everyone wanted to know when the network was going to be back up again, and why it went down. I got the blade RMA’ed to the manufacturer, and the problem was solved.

 

I then found out this was a widespread defect and the only way to tell if a blade was affected was to send in the serial number to the manufacturer. Network management to the rescue, I was able to gather all the serial numbers from all the blades on all the switches globally. Let me tell you hundreds of blades were part of this problem and they are now being scheduled to be replaced instead of waiting for them to fail. NMS saved my hide this time, a massive failure like that would not have been fun. Orion has paid for itself so many times over, it’s just a no brainer, a necessary tool of the trade.

 

To Summarize, Network management Is good for Capacity planning, monitoring sites without a full IT staff, network inventory , so you know how your network is put together. And finally Orion is just so easy to use. Just set it up and let it do its job. I couldn’t imagine how I would do my job without it and stay sane.

 

It’s so much better to be proactive , then reactive, while nothing is perfect, NMS can make your job a lot easier and maybe even a bit of fun.  It’s the shock absorbers , that help you avoid the pot holes of an IT career, so relax and enjoy, it’s going to be a smoother ride from here on out.

michael stump

Too Many Tools

Posted by michael stump Expert Apr 21, 2014

I'll make an assumption: if you're a Thwack user and you're reading this post, you've got an interest in systems and applications monitoring. Or maybe you just really want those points. Or that Jambox. Whatever's cool with me.

 

But if you're a tools geek, this post is for you.

 

Tool suites aspire to have the ability to monitor everything in your infrastructure, from hardware, through networking and virtualization, right up to the user. But to be honest, I've never seen a single solution that is capable of monitoring the entire environment. Many times this is not a technological problem; organizational structures often encourage multiple solutions to a single problem (e.g., every team purchases a different tool to monitor their systems, when a single tool (or a subset of tools) would suffice). Other times, tool sprawl is the result of staff or contractor turnover; everyone wants to bring the tools they know well with them to a new job, right?

 

Tool sprawl is a real problem for IT shops. The cost alone of maintaining multiple tools can be staggering. But you also run into the problem of too many tools. You'll likely end up monitoring some systems many times over, and will most certainly miss other systems because there's confusion over who is monitoring what. As a result, you'll end up with more tools than you need, and the job still won't be done.

 

How do you manage or prevent tool sprawl at work? Do you lean on a single vendor, like SolarWinds, to address every monitoring need that arises? Or do you seek out best-of-breed, point solutions for each monitoring need, and accept the reality that there is no single pane of glass?

It's Good Friday here in the US and many companies are on holiday today. Some will have Easter Monday as a day off instead. I suspect much of the working world is enjoying at least a three day weekend.

 

And that got me thinking about how we should all have longer weekends.

 

I'm not suggesting that we only work four days a week (but if my bosses think that's a great idea, then I'll take credit for it). No, what I am thinking about is how often my life has been interrupted during a weekend by someone needing help. As a production DBA for 7+ years, with 5 of those years as the team lead, I lost many, many hours on weekends due to "emergencies" that were anything but. It wore me down, and wore me out.

 

So that's going to be my number one goal here at Solarwinds, to give customers longer weekends. Every thing I do here at Solarwinds will be with that one purpose in mind.

 

I want our customers to have what I didn't: the opportunity to spend uninterrupted time with their families when they are away from the office.

 

Enjoy the long weekend!

The Heartbleed bug is a vulnerability that’s compromising Internet applications like Web, email, and instant message communication. However, recent revelations indicate that there’s more to this threat. Unravelling the intensity of its potential, Heartbleed has recently been found to also affect connected devices that rely on OpenSSL encryption library, including network hardware like routers and switches. Networking vendors such as Cisco, Juniper Networks, F5 Networks, and Fortigate have all issued security alerts indicating this risk.

 

OpenSSL is a software that decrypts data encrypted under SSL (Secure Socket Layer) or TLS (Transport Layer Security) technology. OpenSSL 1.0.1 before 1.0.1g does not properly handle Heartbeat extension packets, which allows remote attackers to obtain sensitive information. This information can include private keys, usernames and passwords, or encrypted traffic from process memory via created packets that trigger a buffer over-read. This creates a huge vulnerability that allows hackers to infiltrate any large network.

 

Heartbleed Remediation in Your Network

 

OpenSSL, being a widely-used implementation of SSL, is difficult to fully remediate. However, your immediate action should be to update patches with the fixed version, i.e. 1.0.1g or newer.

 

To remediate Heartbleed in 3 simple steps:

  1. Change passwords for all devices (before & after patching, to be absolutely sure that no attacker sneaks in...)
  2. Patch your network operating system for all perimeter hosts
  3. Purge bad OpenSSL versions from your entire infrastructure

 

It’s important to contact vendors of your devices that connect to the Internet. You need to find out if those devices rely on OpenSSL and ask if  there is a patch available. In addition, refrain from using any affected applications or devices, and apply any updates as soon as possible.

 

Junos OS affected by OpenSSL "Heartbleed" issue – Juniper

junos.png

Cisco has also released a list of affected and vulnerable products.

 

For a network with 100s to 1000s of devices, it’s no small task to push Network OS and firmware updates/patches in bulk. Using an automated tool to quickly take action and apply software fixes on all devices in the network will definitely save the network admin time, and enable quicker TATs (turn-around-time) to address sudden vulnerabilities such as Heartbleed.

 

Also, note that most vendors are working on providing fixed versions of code for its products. So, patching your devices with new updates should be on your to-do list for a quite some time.

 

Note for SolarWinds customers: Please take a look at this table to check the Heartbleed vulnerability against the product(s) you use.

 

Quick tip: Click here to download IOS upgrade templates from thwack. Just type 'upgrade' in the 'filter by tag' box.

 

upgrade.png

Application performance monitoring (APM) is a broad subject which tends to look at how businesses are using enterprise level applications. These applications can help solve end-user requirements and continuously maintain high availability to ensure optimal performance. Furthermore, organizations that depend on APM technology to scale certain areas within their business must understand that innovation plays a vital role. After all, CIOs and other decision makers will want to look at what the ROI is over a period of time.

      

The Role of an APM Software

Organizations with a sizeable IT infrastructure need an APM software to effectively manage IT assets to ensure they last for a certain amount of time, always deliver from the time they’re set up, and so on. For example, say you’re an enterprise with 1000+ physical and virtual servers. These servers have mission critical applications that need to support a certain business group. As IT pros, it’s your duty to ensure the life of server hardware is always healthy and applications are always available without downtime. Managing this manually isn’t an option since there are several nuts and bolts you’ll have to look at. Moreover, you also have to support and manage other areas within the environment.

   

Having an APM tool means you can automate availability and performance management for your servers and applications. In addition, APM tools offer various benefits, for example, getting automatically notified when something goes wrong with servers and apps. Within minutes, APM tools will help pin-point where an issue originates, and monitor application performance in phases before they go live on to a production environment. In turn, you can fix minor issues before the end-user starts pointing them out, and much more.

      

Where APM Fits

APM as a technology has evolved from primarily monitoring only a set of applications that IT uses. Today, this technology has a significant impact among various groups in an organization. For example, industry specific users leveraging APM to manage their business needs and addressing everyday challenges, IT pros looking for tools that go deeper to manage performance of business and critical applications, like Exchange and SQL server®.

     

Manage Critical Applications: Several times in a week, IT pros are asked to help users by unlocking their accounts. APM tools these days not only monitor Active Directory® metrics, but also have built-in functionalities to manage logs that are generated by critical applications.

     

Manage Custom Applications: Industries like healthcare and financial services are largely dependent on APM tools to help improve customer support, help streamline auditing and compliance processes, manage large amounts of customer data, and so on. For example, the banking industry may have poor customer satisfaction when sites are slow to respond to user requests. Monitoring online transactions based on response time, traffic, etc. will help business groups streamline their system.

    

Manage Overall IT Infrastructure: It’s not enough for IT personnel to only know the performance of a network. To really identify the root of the issue, IT needs APM to figure out whether the network is at fault or whether the problem has to do with inadequate hardware resources or whether an app failure has caused end-user issues.

      

Mobile IT Management: Other than accessing emails via a smartphone, IT organization and business groups feel the need to have critical applications at their fingertips. Using an APM solution on the go means that instant notifications of components with issues can be routed to the right teams so it can be fixed in real-time and in matter of minutes.

       

Role of APM in Analytics: An APM tool gives you different types of information on the performance of your servers and hardware, operating systems, critical applications, databases, etc. Making sense of this data is essential to be able to determine problems that may arise. 

       

Manage Web Applications: Getting visibility into the performance of your websites and Web applications can help you quickly pinpoint and resolve the root cause of issues. APM helps you determine if there are constraints on resources, such as Web server, application server, or databases.

          

Manage Virtual Environments: Organizations may have virtual admins managing the health of virtual appliances. Virtual admins also need visibility into how applications running in VMs are performing. APM also allows you to plan capacity management for a given application and its underlying resources.

        

Whether it’s analytics, cloud-based apps, or managing various assets within your IT infrastructure, APM fits well and more often than not, provides assistance in managing your IT environment.

One of the questions encountered often by new users of Patch Manager is the purpose and uses of the Managed Computers node in the console.PMMC-1.png


The Managed Computers node is the collection of all Patch Manager servers, registered WSUS servers, and any machines that have been targeted for an inventory task, regardless of whether the machine was successfully inventoried. As the inventory task obtains the list of machines from the target container, a record is created in the Managed Computers list for that machine. When the inventory is successfully completed, a number of attributes are displayed for the machine in the Managed Computers node.

 

The Managed Computers node is especially useful for accessing basic diagnostic information about the status of computers and the inventory process. In the Computer Details tab for each machine, five state results are provided that describe the results of the inventory connection attempt.

 

When an inventory task is initiated, the Patch Manager server queries the container object that the inventory task has been targeted to. This may be a domain, subdomain, organizational unit, workgroup, WSUS Target Group, or Patch Manager Computer Group. Regardless of the type of container, the Patch Manager server obtains a list of machine names from the identified container.

 

Failed Inventory Connections

An entry in the Managed Computers node with an icon containing a red circle PMMC-2.png

indicates a machine that failed the most recent inventory connection attempt.

 

DNS resolution attempt reports the status of the attempt to resolve the computer name obtained from the container. If the name was resolved, the IP Address is captured and stored in the computer record and the status is reported as “Success”. If the name was not resolvable, the status is reported as “Failed”.

 

ARP resolution attempt reports the status of the attempt to resolve the IP Address obtained from the DNS resolution attempt. If the ARP resolution attempt is successful, the MAC address is captured and stored in the computer record, and the status is reported as “Success”. If the ARP resolution attempt was not successful, the status is reported as “Failed”.

 

ARP is a broadcast-based network technology and generally does not cross broadcast boundaries, which include routers, bridges, gateways and VLANs. As such, when performing ARP resolution for IP Addresses on the other side of a gateway, it’s important to note that the gateway will respond with its own MAC Address, as the purpose of ARP is to identify where network packets should be addressed to get the packet on the correct pathway to its destination. Patch Manager knows whether a MAC address returned is the MAC address of a boundary device or the actual targeted device. When a boundary device is identified as the owner of a resolved MAC Address, Patch Manager will not record that MAC address and will report the ARP resolution as “Failed”. Thus, it is a normal indication for machines on remote networks to have a status of “Failed” for the ARP resolution attempt, except where an Automation Role server is physically present on that remote network. (See Patch Manager Architecture - Deploying Automation Role Servers and How-To: Install and Configure a Patch Manager Automation Role Server for more information about the use of Automation Role servers.)

 

Endpoint Mapper connect attempt reports the status of the attempt to connect to the RPC Endpoint Mapper on port 135. When the status of this event is reported as “Failed”, and the status of DNS and ARP resolution events are reported as “Success”, this is generally the result of an intervening firewall blocking traffic on port 135.

 

File and Printer Sharing connect attempt reports the status of the attempt to establish a file sharing session on port 445 using SMB over IP. When the status of this event is reported as “Failed”, either an intervening firewall is blocking port 445, or the File and Printer Sharing service may not be enabled. Comparing the results of the Endpoint Mapper connect attempt can shed additional light on the situation. It’s also important to note that File and Printer Sharing is only needed to deploy/update the Patch Manager WMI Providers. If the WMI Providers
are deployed, a failure here will not negatively impact the completion of the inventory task.

 

WMI connect attempt reports the status of the attempt to establish the WMI session. When this event is reported as “Failed”, you should check firewall configurations as well as the credentials configured in the assigned credential ring. If using a local account to access the machine, the password stored for the credential may not match the password configured on the machine’s local account; also confirm that the chosen credential does have local Administrator privileges on the target machine.

 

Partially Successful Inventory Connections

A machine with a yellow triangle iconPMMC-4.png

PMMC-3.png

indicates a machine that was successfully inventoried, but one or more issues occurred while attempting to access a specific datasource or object. The specific objects that were impacted will be listed at the bottom of the Computer Details tab.

 

Tabs & Graphics

There are four tabs provided on the Managed Computers node that display the statistical results of various steps in the inventory collection process. Double-clicking on any graph segment will launch the corresponding report listing the machines affected.

 

The Computer Inventory Details tab shows the specific datasources collected, and the timestamp of the last successful collection.

 

The Connectivity Summary graph shows the number of systems targeted for inventory, the number of systems accessible via WMI, the number of systems presumed to be powered off (or otherwise not reachable), and the number of systems that are reachable, but could not be accessed with WMI.

 

The Connectivity Failure Summary graph shows the number of systems which failed at any of four of the five steps of the connection process: DNS resolution, ARP resolution, RCP Endpoint Mapper connectivity, and File and Printer Sharing connectivity. It also contains results for a NetBIOS connection attempt which is also performed during the inventory connection process.

 

The WMI Failure Summary graph shows the number of systems which failed WMI connectivity for the three most commonly occurring reasons: [1] Access Denied, [2] Firewall blocked or WMI disabled, and [3] any other known WMI failures which are unidentifiable or other failures not attributable to any other known cause.

 

In addition, the Managed Computers node also provides a Discovery Summary graph which shows the number of machines that were accessible on selected ports tested during a Discovery event. Discovery is a process by which devices and machines are identified by IP address, and network accessibility is identified by TCP port availability.

 

For more information about the features of Patch Manager, or to download your own 30-day trial, please visit the Patch Manager product page at SolarWinds.

Now that I’ve had a chance to settle in after five weeks on the road, I wanted to take a moment to write about our visit to Interop in Las Vegas this year.  While SolarWinds always exhibits at Cisco Live including a couple of international editions, and a host of other events, we haven’t been to Interop since 2007.  In a way it’s an anniversary of sorts for me; my first time staffing our booth was at Interop 2007.
It WasSmaller but Better Than 2007
interop-shirts.jpg
Interop was a bit smaller than during our previous visit, but the majority of the absent vendors weren’t really missed, putting the focus back on real admins solving issues. After working CeBIT this spring, if I never have to walk past another row of 10x10s featuring exactly the same looking WICs from vendors I’ve never heard of I’ll be happy.  With many of the single issue solution vendors gone, most attending were companies like SolarWinds that know what they’re doing, have been around for a while and solve real problems without vaporware.  There were a few startups looking to snag for their first whale, but by in large it was a great environment for attendees to take their time and engage vendors in low pressure conversations about.. well Geek stuff. Like how products really work not the pretty graphics on the booth.

Another outstanding feature of the show was the network.  I’ll put this in bold because it’s earned.  The Interop Network Team (InteropNet) delivered the greatest, most stable and highest performing network of any tradeshow I’ve ever connected the booth to.  60 down, 12 up with <12 latency and almost no jitter.  DHCP terminated into Fe0/0 on my trusty 2800- done and done.  I don’t understand why everywhere else exhibitors have to pay $1900 for ADSL speeds. That it’s staffed largely by volunteers is all the more amazing.  Great job guys!

Mandalay Bay – Most Improved Hotel Internet

Once upon a time the casinos did everything they could to keep you inside, including of rumors of cell phone interference and certainly poor WiFi.  Apparently the word is out that BYOD isn’t just for enterprises, and that humans prefer to hang out where they have good internet.  For my last six shows there, Mandalay Bay has been a black hole of connectivity, especially for geek events when we arrive in masse to pound the APs.  This year was completely different.

BYOD rule #1- it has to be stupid easy to use.  For example if you own a huge structure hundreds or thousands of feet from other networks, you don’t need passwords on your SSIDs.  Great improvement there.  Also, use your controllers and thin clients to shutdown rouge networks.  My Dell, Air and phone were all online and happy in moments. BYOD rule #2- Create an edge where the network is available but controlled access as long as it’s still pretty easy to use.  In all the areas outside the rooms including the casinos floor they now offer free 3m/1m with your name and room number.  Smarter, they also upsell to even faster speeds.  Provide visitors with bandwidth or we go elsewhere.  They got the message.

NSX Was (REDACTED), More from Cisco Live

I’m a huge ESX fan and love the idea of a combined console that virtualizes everything.  I’ll talk about this more in another column because it’s really important, but want to chat more at Cisco Live and balance that discussion with Cisco ACI.  I’m not saying I was unimpressed with the technology for SMBs, but more research is needed before prognostication about the future of the non-huge datacenter.
The IT Beast Is Dead Long Live the IT Beast

With all the great customer conversations, and special sights in the booth, like watching customers spontaneously demo our products to non-customers on the big screen, I'm a bit sad to see our current booth theme retired.  We typically keep a theme about a year and the big, green, betentacled beast has been an eye-catching conversation starter in booths from Cisco Live to RSA.  Interop was his last outing.  His piercing red eyes and sharp teeth will be missed.

However-- we’re bringing all new fun to Cisco Live in San Francisco, so be there to check it out. It’s your first chance to get this year’s new t-shirt before anyone else does.
interop-pano.jpg

Users describe call experience as ‘good’ ‘ok’ ‘poor’ ‘bad’ ‘terrible’. And call experience is defined by the elements of voice quality and the network factors that affect them. Elements of voice quality are loudness, distortion, noise, fading and cross talk whereas network factors that affect them are latency, jitter, packet loss, voice activity detection, echo and echo canceller performance.

 

How do these factors affect voice quality?


  • Latency: is the delay or the time it takes for the speech to get from one designated point to another. Very long latency results in delay of hearing the speaker at the other end.
  • Jitter: is the variance of inter-packet delay. When multiple packets are sent consecutively from source to destination and if there are delays in the network like queuing or arriving through alternate routes, the arrival delay between packets is the jitter value. For delay-sensitive applications like VoIP a jitter value of 0 is ideal.
  • Packet loss: occurs when data packets are discarded at a given moment when a device is overloaded and unable to accept incoming data. You need to keep packet loss to the lowest value possible. For VoIP, packet loss causes parts of the conversation to be lost.
  • Voice Activity Detection: is used in VoIP to reduce bandwidth consumption. When this technology is used, the beginnings and ends of words tend to be clipped off, especially the "T" and "S" sounds at the end of a word.
  • Echo: is the sound of the speaker's voice returning to and being heard by the speaker. Echo is a problem of long round-trip delay. The longer the round-trip delay, the more difficult it is for the speaker to ignore the echo.
  • Echo Canceller Performance: The echo canceller remembers the waveform sent out and, for a certain period of time, looks for a returning waveform that it can correlate to the original signal.   How well the echo is cancelled depends on the quality of the echo canceller. If the return signal (echo) arrives too late, the echo canceller won't be able to correlate and cancel it properly.
  • CODEC: stands for coder-decoder, converts an audio signal into digital form for transmission and then back into an audio signal for replay. CODECs also compress the packet to gain maximum efficiency from the network. How well the CODEC converts speech to digital packets and back again affects voice quality. Choosing the right codec for the network depends on the required sound quality, available network bandwidth and so on. Some networks use more than one codec but this again may impact call quality.

  

The index to measure the call quality using network data is called the Mean Opinion Score (MOS).

  

Mean Opinion Score (MOS)


MOS is a benchmark used to determine the quality of sound produced by specific codecs and the opinion scores are averaged to provide the mean for each codec sample. It is always preferable to have a MOS score of 4 or 5 for your VoIP calls. When the MOS decreases to 3.5 or below, users find the voice quality to be unacceptable. It is used to assess the performance of codecs that compress the audio.

  

Measuring VoIP performance using MOS


Test Infrastructure Readiness for VoIP Traffic

When implementing VoIP, it is a good practice to test the network for its readiness to carry voice traffic. But how can you accomplish this without spending CAPEX on VoIP infrastructure? Cisco devices use IP SLA to generate synthetic VoIP traffic and collects data to measure metrics like latency, jitter, packet loss and MOS. The MOS score is an indication of what voice quality to expect. You can start by troubleshooting network devices in the route of VoIP calls and those with an MOS score lower than 3. However, configuring IP SLA operations requires good knowledge of CLI (Command Line Interface).


Troubleshoot Poor VoIP Performance In the Network

Manually troubleshooting VoIP issues involves collecting performance metrics like jitter, latency, packet loss, etc., from various nodes in the network such as switches, routers or call managers. But this does not provide a standard to compare and understand the VoIP call quality in the network. MOS acts as a standard for measuring call quality. For a network experiencing poor VoIP performance, you can pin point the root cause depending on the MOS scores measured for a specific codec or at a particular time or for a particular department, location.


In summary, MOS scores are good indicators for troubleshooting VoIP performance issues in the network. Tools like SolarWinds VoIP and Network Quality Manager (VNQM) lets you enable IP SLA operations on your devices without knowledge of CLI commands, as well as helps avoid time consuming manual configurations on multiple devices.


SolarWinds VNQM also provides automated reports with locations and MOS scores, comparison of MOS scores per codec for each call, comparison of call performance metrics between departments or call managers and notification via email in case of bad calls.


Reduce time needed to evaluate network readiness and troubleshoot VoIP performance issues from hours to minutes!


Learn More:

Network management is good, but there are some benefits that are not so apparent at first. It gives you the proof you need when that person calls the help desk and says “the network is slow..” at this point you have to clarify the issue with the user and find out exactly what the problem is. Once you have obtained the details, you can then check your NMS, to find out if there is an issue, one of three things is going to happen, you find a problem and you can correct it, or you don’t find a problem and prove the user is incorrect.

 

The thing you wont have is “ I don’t know if there is a problem“ and that is the one that will drive you crazy. Because the user is sure there is a problem and unless you can offer authoritative proof the finger pointing cycle will begin, and this is a place no one wants to be. You know there is not a problem with the network, but the user needs proof, Solarwinds Orion is what gives you this proof, that can be taken to the client and say, its not my opinion,  that the network is not slow , here are  the facts.. circuit xyz has a utilization of 20% and is nowhere near being saturated.

 

I had a user call into the help desk and claim the network was slow, he placed a ticket that his mission critical system was slow and this was causing a production problem. The help desk created a high priority ticket and sent it over to my team. I looked at the ticket and checked my Orion server , for the problem area that was reported I saw no issues,  I created a report to backup my findings. I called the user and explained that based on the facts I have obtained , there was no problem.

 

The user told me , that there was indeed a problem , he had just moved from building 3  to building 1 and when he was in building one he ran a speedtest from speed test.net and it was five times faster in his old building. He even showed me some print screens of his findings.

 

In reality , there was no problem, but if I didn’t have the facts to back me up, the finger pointing would start and it would just spiral out of control and at that point nobody would be happy . like they always say “the truth shall set you free”  and it sure did in this case.  

 

Orion, has saved so much time and frustration, and I sleep better at night knowing that yes there will still be network problems, but I will be able to answer that question, Is there a problem? If yes, then I fix it , if not I can prove it. but I never have to deal with I don’t know, and that my friends is a wonderful thing. Bye, bye  finger pointing hello productive day. How much is your time worth?

Last week, we had a great conversation on finger-pointing, and some of you shared real-world advice on how to avoid it. Most of the comments described a work environment that was still tied to the stove-piped organizational structure from ten years ago, when network, server, and storage were discrete disciplines with effectively zero relation to one another. This approach, however, is no longer valid.


Virtualization, specifically the abstraction of physical resources, makes isolated engineering teams dysfunctional. It’s not enough to pursue skills that exist exclusively in the confines of network, server, and storage. For example, it’s not surprising to hear that someone has a few VMware certifications, and a CCNA. That makes sense, since you can’t do a whole lot with vSphere unless it’s connected to your network. But for those of us who have been doing IT work for a long time, you certainly remember a time when having a Microsoft cert AND a Cisco cert was unheard of.


So, a few questions for you:


  1. Are you part of a siloed team at work? If so, how do you support virtualization (or other technologies that consume resources from multiple teams)?
  2. Do any of you have multiple vendor certifications that extend beyond the network | server | storage silos? How have they helped your career?
  3. Do you think having some primitive coding skills can help engineers in any discipline?
  4. Is there a future for engineers who focus on a single skill-set?


And here's a hint: the answer to number 4 is no. Discuss.

Network management is good, but there are some benefits that are not so apparent at first. It gives you the proof you need when that person calls the help desk and says “the network is slow..” at this point you have to clarify the issue with the user and find out exactly what the problem is. Once you have obtained the details, you can then check your NMS, to find out if there is an issue, one of three things is going to happen, you find a problem and you can correct it, or you don’t find a problem and prove the user is incorrect.

 

The thing you wont have is “ I don’t know if there is a problem“ and that is the one that will drive you crazy. Because the user is sure there is a problem and unless you can offer authoritative proof the finger pointing cycle will begin, and this is a place no one wants to be. You know there is not a problem with the network, but the user needs proof, Solarwinds Orion is what gives you this proof, that can be taken to the client and say, its not my opinion,  that the network is not slow , here are  the facts.. circuit xyz has a utilization of 20% and is nowhere near being saturated.

 

I had a user call into the help desk and claim the network was slow, he placed a ticket that his mission critical system was slow and this was causing a production problem. The help desk created a high priority ticket and sent it over to my team. I looked at the ticket and checked my Orion server , for the problem area that was reported I saw no issues,  I created a report to backup my findings. I called the user and explained that based on the facts I have obtained , there was no problem.

 

The user told me , that there was indeed a problem , he had just moved from building 3  to building 1 and when he was in building one he ran a speedtest from speed test.net and it was five times faster in his old building. He even showed me some print screens of his findings.

 

In reality , there was no problem, but if I didn’t have the facts to back me up, the finger pointing would start and it would just spiral out of control and at that point nobody would be happy . like they always say “the truth shall set you free”  and it sure did in this case.  

 

Orion, has saved so much time and frustration, and I sleep better at night knowing that yes there will still be network problems, but I will be able to answer that question, Is there a problem? If yes, then I fix it , if not I can prove it. but I never have to deal with I don’t know, and that my friends is a wonderful thing. Bye, bye  finger pointing hello productive day. How much is your time worth?

File sharing is common; file sharing is critical; and file sharing is sometimes complex. We perform file sharing all the time in the organization either by peer-to-peer sharing via email, instant message, internal shared drives, etc., or by using FTP transfers and using cloud services. File sharing has always been a simple concept, but when it comes to security of the data in transit and storage, that’s when the doubt seeps in. What could happen during the transfer process? Can someone else intercept it and steal or modify the data? It is possible and happens many times even in large organizations. Not just in file transfer within the organization, but also to parties outside the corporate network. Security is no guarantee in any of these forms of file transfers. Then, there’s the issue of process complexity.

 

Secure File Transfer

There’s an alternative which is secure file sharing using a manager file transfer server which is secured by FTPS/SFTP/HTTPS and any other organizational security policy and permissions. Within the perimeter of your own network for data storage and FTP server infrastructure, you can transfer files safely and securely between FTP clients (including both computers and handheld devices). That takes the security concern away. Now, to address complexity.

 

Ad Hoc File Sharing

What you would want in a file sharing process is the simplicity to send and receive files without complicated process and manual labor. Enter ad hoc file sharing – whenever-you-want, wherever-you-want file transfer is a few clicks away!

  • Sending a File: When user wants to send a file, all he needs to do is upload the file to a secure FTP server and email the link (with or without password protection) to the recipient. The recipient (whether inside or outside the enterprise network) will receive the email with the link to download the file. If password security is enabled, he’d require to open the file upon entering the password. Sending a file cannot get any simpler over secure FTP.
  • Requesting & Receiving a File: And, if you’d like to request a file to be transferred, just use the FTP Client in the MFT server interface to send an email with a secure upload link to the sender (with or without password protection). Once the sender receives this link in his email, he can use it to upload the file to the FTP server, and you’d get an email notification of the completed file transfer. Now, you, the recipient, can just use the click the link in your email and download the file, or use an FTP client interface to do the same thing.

FTP.png  

Managed File Transfer

In addition to security and the ease of file sharing, automation and control of the file transfer process will result in more simplicity and operational efficiency. Employing a third-party MFT server will help you gain additional features like event-driven automation, scheduled file transfer, multiple file transfer, large file transfer, file synchronization, side-by-side and drag-and-drop transfer, reporting, virtual folder access, AD sync for permissions, multi-factor authentication, more intuitive FTP Clients and more.

 

Alongside simplifying and securing your file transfer process, you can also make it powerful and robust to support your growing file transfer needs within your organization. Indeed file transfer is fun when you have the right tool to facilitate it. Only remember to play safe!

According to Forrester, the SaaS application and software market is expected to reach $75 billion in 2014. Forrester goes on to quote that the, “browser-based access model for SaaS products works better for collaboration among internal and external participants than behind-the-firewall deployments.” When you think about it, today’s users at organizations spend most of their time accessing various “smart applications”. Whether it’s applications like Office 365 or salesforce, the user base accessing and using these applications are increasing tremendously.

      

Monitoring the performance of these applications will make a huge difference considering more and more users are adopting the use of SaaS and cloud based applications. Monitoring the load on the server, user experience, and bottlenecks are crucial to optimize the overall performance whether the application is hosted on-premise, in a public cloud, or using a hybrid approach. If your organization is using several SaaS based applications, you can look at the following considerations if you choose to monitor performance and availability of such applications.

         

Monitor User Experience: Since users are going to be accessing the application extensively, you should monitor overall user experience and users’ interaction with the application. This allows you to analyze performance from the end-users perspective.  Slow page load times or image matching issues can be a first indication that there’s an issue with the application.  By drilling in deeper, you can determine if the problem is related to a specific page and location.  Ultimately, monitoring user experience allows you to improve and optimize application performance, which results in improved conversion.

     

You could also look at this in two ways: from the perspective of the service provider, and from the perspective of the service consumer.

     

Service providers need to focus on:

  1. User experience: It’s likely service providers have SLAs with end users and they need to demonstrate they are meeting uptime and other SLA considerations.
  2. Infrastructure: There are many factors that can cause a service failure, therefore all aspects of the infrastructure must be monitored. These aspects include applications, servers, virtual servers, storage, network performance, etc.
  3. Integration services (web services): Services provided are dependent on other SaaS providers or internal apps.

        

Service consumers need to focus on: 

  1. User experience: If part of your web application consumes web services, this can be the first indication of a problem.
  2. Web service failures: This can help identify a failure in communication.

     

Focusing on these aspects are essential when you’re monitoring SaaS applications. These key considerations help IT admins to take proactive measures to ensure applications don’t suffer downtime during crucial business hours. At the same time, each application will be optimized by continuously monitoring, thus improving overall efficiency.

            

Check out the online demo of Web Performance Monitor!

Ok, fellow thwackers... we are getting down to the wire.  The next 12 hours of voting will determine which games go on to compete to win the HIGH SCORE or go GAME OVER.

 

If you haven't voted for your favorite game, you still have time on the clock. The final four round closes at MIDNIGHT tonight.  Call of Duty is facing Grand Theft Auto in a first-person shooter match up, while the other side of the bracket is a match up between two iconic games which many of you cut your gaming teeth on, Zelda v. Doom. 

 

Get out there and campaign for your favorites, vote early (but not often). There is still time on the clock to change the outcome of this round.

 

So, get to it. We will see you for the final boss battle.

 

Game on, gamers!

This is a very common predicament that most SQL developers and DBAs face in their day-to-day database encounters – regardless of the relational database platform being used. “Why is my database slow?” This could be for many reasons, with one of the hard-to-isolate reasons being slow query processing and longer wait times.

 

Reasons for Slow Database Performance

  • Network: There could be network connection issues
  • Server: The server workload on which the database is running could be high which makes database processing slower
  • Database/Query: There may be redundant query lines, complex or looping syntaxes, query deadlocks, lack of proper indexing, improper partitioning of database tables, etc.
  • Storage: Slow storage I/O operations, data striping issues with RAID

 

While network issues and server workload can be easily measured with typical network monitoring and server monitoring tools, the real complexity arises with the understanding of the following database & query-related questions:

  • What query is slow?
  • What is the query wait time?
  • Why is the query slow?
  • What was the time of the day/week of the performance impact?
  • What should I do to resolve the issue?

  

Query response time analysis is the process of answering the above questions by monitoring and analyzing query processing time and wait time, and exploring the query syntax to understand what makes the query complex. We can break down query response time into 2 parts:

  1. Query processing time – which is the actual processing time for the database run the query. This includes measuring all the steps involved in the query operation, and analyzing which step is causing processing delay.
  2. Query wait time – which is the time of a database session spent on waiting for availability of resources such as a lock, log file or hundreds of other wait events or wait types

 

Response Time = Processing Time + Waiting Time

DPA.png

Query waiting time is determined with the help of the wait time metric called wait type or wait event. This indicates the amount of time spent while sessions wait for each database resource.

  •   In SQL Server®, wait types represent the discrete steps in query processing, where a query waits for resources as the instance completes the request. Check out this blog to view the list of common SQL Server wait types.
  • In Oracle®, queries pass through hundreds of internal database processes called Oracle wait events which help understand the performance of a SQL query operations. Check out this blog to view the list of common Oracle wait events.

 

A multi-vendor database performance monitoring tool such as SolarWinds Database Performance Analyzer will help you monitor all your database sessions and capture query processing and wait times to be able to pinpoint bottlenecks for slow database response time. You can view detailed query analysis metrics alongside physical server and virtual server workload and performance for correlating database and server issues. There are also out-of-the-box database tuning advisors to help you fix common database issues.

A recent survey commissioned by Avaya reveals that network vulnerabilities are causing more business impacts than most realize, resulting in revenue and job loss.


  • 80% percent of companies lose revenue when the network goes down; on average, companies lost $140,000 USD as a result of network outages
  • 1 in 5 companies fired an IT employee as a result of network downtime

 

And.....


  • 82% of those surveyed experienced some type of network downtime caused by IT personnel making errors when configuring changes to the core of the network
  • In fact, the survey found that one-fifth of all network downtime in 2013 was caused by core errors

 

Cases of Device Misconfigurations Leading to Network Downtime


Real-world scenario 1: Company Websites Down, Reason Unknown

Soon after a software giant had a big advertising campaign with major incoming Web traffic expected, their websites went down. Unable to pinpoint the actual cause of downtime to being a configuration change made earlier, the websites remained unreachable for a few hours. Taking time to identify the issue and re-establish connectivity, the organization suffered huge losses in revenue from the millions of dollars spent on the promotional campaign.


Troubleshooting: With the current network situation, all thoughts pointed to a core router failure or a DoS attack. On checking and confirming all critical devices to be ‘Up’, the next assumption was that the network was the victim of a DoS attack. But again, seeing no traffic flood on the network the root cause had to be something else. After hours of troubleshooting and individually checking core and edge device configurations, it was later found that the WAN router had a wrong configuration. The admin who made the configuration change, instead of blocking access to a specific internal IP subnet on port 80, ended up blocking port 80 for a wider subnet that also included the public Web servers. This completely cut off Webserver connectivity to inbound traffica typo that cost the company millions!


Real-world scenario 2: Poor VoIP Performance, Hours of Deployment Efforts Wasted


A large trading company uses voice and video for inter-branch and customer communication. To prioritize voice and video traffic and ensure quality at all times, QoS policies are configured across all edge devices over a weekend. However, following the change, the VoIP application begins to experience very poor performance.

Troubleshooting: QoS monitoring suggests that VoIP and video has been allocated lesser priority than required. Instead of marking VoIP traffic to EF (Expedited Forwarding) priority, the administrator ended up marking VoIP packets to DF (Default class) resulting in the poor performance of VoIP and video traffic. Correcting the VoIP traffic setting to EF on all edge devices meant many more hours of poor performance and loss of business.


Remediation


The network downtime in the above two cases could have been avoided via simple change notification and approval systems.


In the first case, notifying other stakeholders about the change would have helped correlate and identify the recent change as a possible cause of the issue. Troubleshooting would have been faster and normalcy restored by quickly rolling back the erroneous change.


In the second case, a huge change involving critical edge devices should have gone through an approval process. Having the configuration approved by a senior administrator before deployment can help identify and prevent errors that can bring the network down.


Both cases reflect poorly on the administrators. Bringing down the network was clearly not intentional!


Human errors are expected to occur in daily network administration. However, considering the impact a bad change can have on both the company and the person, it’s imperative that there are NCCM processes put in place. To reduce human errors and network downtime, use a tool that supports NCCM processes such as change notification and approvals.


Check out this paper for more tips on reducing configuration errors in your network.

Tech Tip Banner Image.jpg

It's amazing,


I used to think, my network was running pretty well, a few hiccups now and then, but by and large, I got by and thought everything was business as usual, the boss would ask me: "Craig how is the network today?" and I would say "it's fine boss". Flash ahead to fiscal year's end, and there is some money left in the budget and I am given fifteen minutes to decide what tools to buy otherwise the money doesn't get spent. Most people wouldn't be prepared for this, but I was ready. I pulled out my wish list and said I want SolarWinds Orion NPM. After a few frantic calls to the vendor of choice, it was all set. I'm thinking, this is great that I got this, but I probably won't use it that much, because we have no problems.. and it's going to take forever to install, and the learning curve will be immense. In reality, the software was really easy and intuitive to install; point here, click here, answer a few questions and it was done.But, why was there so much red everywhere? This must be a software bug, because my network has no problems..  I spent the rest of the day tweaking and twiddling about, and I have to say, it was like turning the light on in a dark room. I was able to solve a lot of long standing problems, some of them that I didn't even know I had.

 

There was the switch with a bad blade that always had problems intermittently, but never failed, so no alarm was tripped. After being alerted to this, I had the blade replaced and things began to run clean. Sometimes the network was slow but I never could attribute it to any single cause, it usually coincided with a home game at the local ballpark. Turns out a lot of non-work related web streaming was going on, and some other folks were enjoying Netflix.

There was the router that went down even though it had redundant power supplies, but no one ever saw when the first one went down, but people sure noticed went the second one failed.. I setup an alert to monitor this and several other things. The major cost of IT where I work is no so much the hardware or software, it's the cost of actually scheduling time with union, paying for the lift truck, just the logistics were mind boggling and intensive and nobody wanted downtime. I am now able to easily automate and monitor my network, and do a lot more proactive monitoring and forecasting. I am just as busy as I was before, the difference is now I have a better view of what is going on with the network and I can act proactively instead of reactively. I have a lot less stress. I have lost fifty pounds and I have a corner office.. lol.. just kidding.. but I do get to sleep through the weekends without the pager going off at 3am and I still go to the same amount of meetings, but now they are more about future planning instead of postmortems.

 

What about you guys? Can anyone share a general process of things you might monitor and proactively forecast? Any tips and tricks pertaining to procedure are greatly appreciated!

I am now a believer in network performance management. It has really paid for itself many times over.

Hi there! I’m _stump, a technology consultant with a keen focus on virtualization and a strong background in systems and application monitoring. I hope to spark some discussion this month on these topics and more.


Last month, I published a post on my personal blog about the importance of end-to-end monitoring. To summarize, monitoring all of the individual pieces of a virtualization infrastructure is important, but it does not give you all of the information you need to identify and correct performance and capacity problems. Just because each individual resource is performing well doesn’t mean that the solution as a whole is functioning properly.


This is where end-to-end monitoring comes in. You’re likely familiar with all of the technical benefits of e2e monitoring. But let’s talk about the operational benefits of this type of monitoring: reducing finger-pointing.


In the old days of technology, the battle lines between server and network engineers were well-understood and never crossed. But with virtualization, it’s no longer clear where the network engineer’s job ends and the virtualization engineer’s job begins. And the storage engineer’s work is now directly involved in both network and compute. When a VM starts to exhibit trouble, the finger-pointing begins.


“I checked the SAN, it’s fine.”

“I checked the network, it’s fine.”

“I checked vSphere, it’s fine.”


Does this sound familiar? Do you run into this type of fingerpointing at work? If so, share a story with us. How did you handle the situation? Does end-to-end monitoring help this problem?

Whether it’s Hyper-V® or VMware® or any other virtual environment, growth is inevitable for virtual machines (VM) and workload in any data center setup. IT teams always want to know how many VMs can be created on a physical host, and how much more VM workload can my host resources support? Especially for Hyper-V environment, Microsoft® has augmented and expanded the limits of VM capacity with Hyper-V 2012.

 

According to this post on Perti, these are the capacity and scalability limits of Hyper-V VMs in windows Server 2012 – which is a drastic improvement on Windows Server 2008 & 2008 R2.

  • Virtual processors per VM: 64
  • Logical processors in hardware: 320
  • Physical memory per host: 4 TB
  • Memory per VM: 1 TB
  • Nodes in a cluster: 64
  • VMs in a cluster: 8000
  • Active VMs: 1,024

So, what happens when all these limits are reached? You just need to add more VMs. And that’s not an easy job for the IT admin. You have to figure out the budget, host resource procurement, and carry out the actual VM creation and assignment. But this is NOT the smart and cost-effective way to scale your VM environment.

 

Capacity planning is the process of monitoring VM and host resource utilization, while being able to predict when the VMs will run out of resources and how much more workload can be added to them. The benefit is that you will be able to optimize your Hyper-V environment, chart usage trends, reallocate some unused resources to critical VMs, identify and control VM sprawl, and right-size the entire VM environment, without just making a case for resource procurement.

 

The proactive capacity planning approach would be to identify capacity bottlenecks so that you’re in a position to make an informed decision about VM expansion.

 

Top Reasons for Capacity Bottlenecks

  • Uncontrolled VM sprawl
  • Enabling HA without accounting for failover
  • Increase in VM reservation
  • Resource pool config changes
  • Natural resource utilization growth
  • Workload changes

 

Capacity Management: “What If” Analysis

The next step is to perform “What If” analysis to determine how much more load existing VMs will sustain with, and how many more VMs can be created for a specified workload. Third-party virtualization management tools, such as SolarWinds Virtualization Manager provide dedicated capacity management functionality that allows you to perform VM capacity estimations and understand possible expansion.

VMan 1.png   VMan 2.png

   

Key Questions to be Answered While Performing Capacity Planning

  • How can I detect capacity bottlenecks?
  • How can I predict capacity bottlenecks before they happen?
  • How may VMs can I fit within my current footprint?
  • What if I add more resources (VMs, hosts, storage, network, etc.) to my environment?
  • Which cluster is the best place for my new VM?
  • When will I run out of capacity?
  • How much resource is my average SQL Server®, Exchange, etc. VM using?
  • How much more resources do I need to buy and when?
  • How can I right-size my VMs to optimize existing capacity?

 

The below capacity planning dashboard in Virtualization Manager tracks and trends CPU, storage IOPS, memory, network throughput and disk space and provides you details into how many more large, medium and small VMs you can add to your Hyper-V and other clusters.

 

Benefits of Capacity Planning

  • Monitor Hyper-V capacity operations and resource utilization, and forecast resource depletion
  • Optimize IT resources with business requirements and make informed purchase decisions on host resource procurement, VM creation, and overall budget planning
  • Gain insight into VM placement between or within Hyper-V clusters to deploy VMs across clusters efficiently
  • Pinpoint zombie or rogue virtual machines and over or under-allocated VMs to right-size your Hyper-V environment
  • Determine when and where Hyper-V bottlenecks will occur and identify the solutions

 

Read this TechNet post to learn more about Microsoft Windows Server 2012 Hyper-V Scalability limits.

 

Watch this short video to learn more about capacity planning and management - explained by Eric Siebert (vExpert)

 

 

Its another year and another 5 stars for SolarWinds Log and Event Manager in SC Magazine’s SIEM Group test!  The reviewers tested every aspect of our SIEM - with a dual focus on log and event management as well as strong attention to usability, scalability, reporting, third party support, and ease of implementation.

 

The verdict? “This is a solid product, worthy of consideration.”


SolarWinds has put together another outstanding product. The SolarWinds Log & Event Manager (LEM) offers a quality set of log management, event correlation, search and reporting facilities. This gives organizations the ability to collect large volumes of data from virtually any device on a network in real time and then correlate the data into actionable information. The company does this by paying attention to the need for real-time incident response and effective forensics, as well as security and IT troubleshooting issues. Another winning set of features are the quality regulatory compliance management and ready-made reporting functions.”

 

With the increase of attacks on compliant companies, the previously separate focuses of security and compliance are converging.  At the same time, attack methods are growing more sophisticated and harder to detect.  At SolarWinds, we are dedicated to providing the situational awareness and visibility previously only available to large enterprises to companies of every size.  We are pleased that SC Magazine saw the results of our efforts.

 

Their one weakness we can easily address: “Consider a Ticket Management System for smaller companies”.  We offer Alert Central as a free ticket management system that easily integrates with LEM.  For those that need more robust reporting and tracking, we also offer Web Help Desk as a low cost alternative.

 

To read the review, visit http://www.scmagazine.com/solarwinds-log--event-manager-v57/review/4153/

When Windows 8 launched, I wrote this scathing review, "Microsoft, have you lost your mind again?"  It was a bloodbath for Microsoft that day. Two years later, I just finished installing/tweaking Windows 8.1 here at the office. (It wasn't my choice.)

 

Windows 8 vs. 8.1

You can read my full review of Windows 8 at the link provided above, and I stand by it. Now let's examine the tweaks Microsoft has made to v8.1:

  • Lo and behold, the Start button/menu is back! (Sorta). Back to the way things were. An improvement, I guess.
    start.png
  • Aero glass effect, which I liked, needed to be installed using a third-party app. Still got it though.
  • Another "improvement:" I can now launch into desktop mode on boot (something previous versions did naturally) bypassing those ugly and useless tiles.
  • Icon spacing. This tweak was available in Windows 7 and earlier through the UI. In 8.1 I had to implement a registry hack, as evidenced by my MRU list in the Start menu above.
  • I'm experiencing a lag when typing versus what I see on the screen from time to time. Annoying but this does not happen too often, although it is happening as I write this.
  • OS seems a little sluggish. Time, and benchmarks, will tell.
  • Compatibility: Surprisingly, everything seems to work fine. Good job!
  • I've also learned that you can mount and unmount ISOs through the OS. No third party app needed. Sweet.
  • The shell graphics are more appealing and informative as well, but I think this may take away from performance. I still need to tinker more just to make sure.

 

Overall, I cannot complain about Windows 8.1. Let's slow down though. I won't praise it either. I still prefer Windows 7 any day of the week and twice on Sunday. (Funny, it's like the VPs over at Microsoft actually read part one of this article and listened! Go figure.) There is still work to be done though. The "working" part of the OS needs to be refined more to perform more like Windows 7 IMHO. At least this is a step in the right direction.

 

Office 2013

Office 2013 was also part of my transition. Just want to say a few words while I'm here:

  • The display is very flat. No appearance of texture. Difficult to distinguish between the "draggable" top portion of the app and the rest of it. And all the apps look the same. Very bland. See the pics below for comparison:
    What my Outlook used to look like - Outlook 2007 (Note: This is a random pic from Google.)
    old.png
    My current version of Outlook 2013 - Flat, no 3-D texture or feel. Looks like paper.
    inbox.png
  • Another observation was that they changed the way VBA understands VB. In other words, I had to re-write some of my code and register some older ActiveX controls to get my apps and macros working again. Took some time but I got it done.

Again, nothing terribly bad here, but I think we could all do without those ribbons. The real estate they chew up is just too valuable.

 

The Verdict.

Overall, not bad, but don't rush to upgrade just yet.

 

My Motto

"If you're happy with your OS, you can keep your OS. If you like your version of Office, you can keep your version of Office, period. End of story." (Wait. Why are you kicking me off of my current OS and Office versions and forcing me to "upgrade"? I was happy and liked what I had. You said over and over that I can keep what I liked! Is this better for me, or better for you?) Hmmm...see what I did there?

The battle is on, Round 1 is complete and the community has spoken. 

 

  • Halo falls to Call of Duty
  • Ms PacMan, too much like PacMan.  Donkey Kong prevails!
  • Time invested in WoW creates a higher level of commitment than Baldur's Gate
  • The all-out melee of Smash Bros may just have beat Punch Out by virtue of the plethora of favorite characters NOT represented in the bracket
  • Golden Eye 007? Huh?

 

All of this means that we are down to 16 gaming heavy-weights. And, now the match ups get a little more complicated. Given the fact that MOST of these games have very little in common. How will you judge Madden NFL versus Galaga? Half Life versus Mario Cart? Can Mortal Combat stand against the game that spurned a movie about the competition for a HIGH SCORE?

 

Head here to view all of the match ups and cast your vote to see who will move one to the for the honor of representing each of the four bracket divisions.

 

We are getting close to the end, don't miss your chance to chime in and push your favorites to victory.

 

And, while we are at it, let us know who you think will reign supreme...

 

Round 2 VOTING is HERE.  And remember, you have to be logged in to comment and vote. This round ends tomorrow (April 2) at MIDNIGHT.

 

Oh, and by the way, I was informed that I unfairly worded the Zork question which was confirmed by zachmuchler's IMDB post. While that game is not specifically Zork, it was based on Zork (rights can be hard to secure, you know).  For that reason, we will award an extra 50 points to the following thwack members:  crippsb, zachmulcher and bradkay.  Congrats, and use those points wisely.

 

If you are just joining us, you can catch up here (Let's LEVEL UP!) and here (The Cheat).

Filter Blog

By date: By tag: