1 15 16 17 18 19 Previous Next

Geek Speak

1,593 posts

Shellshock is the name given to a vulnerability detected in the Bash which allows attackers to remotely compromise vulnerable systems allowing for unauthorized disclosure of information. Ever since news of the bug came out and the original fix actually not fixing the issue, attackers have been using ‘masscan’ to find vulnerable systems in the Internet. This means network–based attacks against *nix based servers and devices through web requests or other programs that uses Bash is happening. Check Robert Graham’s blog here to learn more.

 

Your first step should be to test if your version of bash is vulnerable by typing the following in your command line:

env x='() { :;}; echo vulnerable' bash -c "echo this is a test"


If the system is vulnerable, the output would be:

 

vulnerable

this is a test

 

That means you need to patch your server’s Bash as soon as possible. In case your network devices are vulnerable, contact your vendor. For Cisco’s list, check the link here and for SolarWinds list, check this blog.

 

My first thought was, because the access vector for shellshock is the network, would the network show signs of an attack leveraging the bash bug?

 

Here is some info from redhat.com blog:

“The vulnerability arises from the fact that you can create environment variables with specially-crafted values before calling the Bash shell. These variables can contain code, which gets executed as soon as the shell is invoked.”

 

In short, the bash shell allows function definitions to be passed using environment variables that share the name of the function and the string "() { :;};" means it is a function declaration. So, the initial attack vector will always include (or starts with?) the “() {“ sequence and that should be the signature for detecting bash attacks.

 

My next thought was, if you don’t have an IDS or IPS on which you can define the signature, can your other network devices detect the “() {“ signature in the HTTP header and help you mitigate an attack?

 

Let us talk ‘Cisco’ here. Cisco devices have a couple of options for HTTP header inspection. One is NBAR but NBAR’s HTTP header inspection is limited to 3 fields as far as client to server requests are concerned, namely ‘user-agent’, ‘referrer’ and ‘from’, none of which will hold the signature “() {“.

 

The 2nd option I found for HTTP header inspection is Zone-Based Policy Firewall (ZFW) which Cisco states is available on Cisco routers and switches from IOS 12.4(6)T onwards. ZFW supports application layer (Layer 7) inspection including HTTP headers that can then be used to block traffic. ZFW allows you to use Class-Based Policy Language (remember QoS?) to define what traffic has to be matched and what action has to be taken.

 

With ZFW, you can inspect and block HTTP traffic that includes the regex “\x28\x29\x20\x7b” in the header. If you are wondering why “\x28\x29\x20\x7b”, that is the hex format for “() {“. Refer the chart here to see how we converted our signature to hex regex.

 

Back to the bash bug and ZFW, based on Cisco configuration guides, a sample configuration for a bash attack mitigation should look like the below but supported commands could change depending on the IOS versions.

 

Define a parameter map to capture the signature we were looking for:

parameter-map type regex bash_bug_regex

pattern “\x28\x29\x20\x7b”


Create the class map to identify and match traffic:

class-map type inspect http bashbug_classmap

   match req-resp header regex bash_bug_regex

 

Put the class under a policy to apply a reset action to the traffic that was matched by the class map.

policy-map type inspect http bashbug_policymap

   class type inspect http bashbug_classmap

      reset

 

While the HTTP header inspection may cause a CPU shoot up, ZFW would still be a good option if you cannot apply any available patches right now. ZFW is also an extensive topic and can have implications on your network traffic if not implemented properly. Read up about ZFW with configuration examples here:

http://www.cisco.com/c/en/us/support/docs/security/ios-firewall/98628-zone-design-guide.html

http://www.cisco.com/c/en/us/td/docs/ios-xml/ios/sec_data_zbf/configuration/xe-3s/sec-data-zbf-xe-book/sec-zone-pol-fw.html#GUID-AD5C510A-ABA4-4345-9389-7E8C242391CA

 

And any alternatives to ZFW for network-level mitigation of bash bug based attacks?

I briefly touched on IP SLA in one of my previous posts, and I wanted to spend a bit more time on this topic simply because IP SLA is a very powerful tool for monitoring the health of your VoIP network, and one of my favorites! (Plus it's not just for your VoIP infrastructure)

 


A few types of IP SLA's that are useful in a VoIP deployment:

UDP Jitter - This one goes without saying, as it probably the most common IP SLA that is deployed in a VoIP network. After all keeping track of the jitter within your network could be the first sign of a circuit/WAN/LAN issue or possibly a QoS policy that needs to be looked at.

DHCP - This one is not specific towards the VoIP infrastructure but hear me out. The VoIP phones probably won't be rebooting too often but if the endpoints are not able to receive their proper IP addresses in a timely fashion you will definitely will receive some tickets from the end users. Without an IP address those IP Phones are not really IP Phones are they.

DNS - Like DHCP, this one is not specific for your VoIP infrastructure, but if your IP phones are configured to perform DNS lookups to find their Call Manager or to utilize any specific services. Your VoIP infrastructure is more than likely dependent on DNS, keeping an eye on DNS performance could definitely give you a troubleshooting edge.


Historical IP SLA Information can save you!

Having historical information for IP SLA statistics can be just as useful as the IP SLA tool itself, after all having basic monitoring to prove network availability is one thing. Being able to provide provide performance statistics at the application level is another useful level entirely.

The historical information can also be used to identify and/or track peak usage times within the network, especially if you can see a performance degradation everyday at a specific time.


Yes, you can have too much of a good thing!

IP SLA's can be great for monitoring a network and provide a lot of valuable troubleshooting information, however you will definitely want to keep in mind the amount of SLA traffic you are generating (especially if you are marking all your UDP Jitter operations as EF/DSCP-46) this could generate issues in itself by wasting precious space in your priority queues if you designed your queues pretty lean.


Something else to consider, if you are terminating all your IP SLA's to a single router you may want to keep a close eye on the resource utilization of that router. The IP SLA process doesn't utilize a lot of resources but multiple that by 50 or above & it will definitely be a performance hit, & even possibly throw off the IP SLA statistics/results. In worse cases if the CPU/memory spikes often enough you could start seeing issues in your data plane forwarding.


IP-SLA-In-Path.PNG

IP-SLA-Consolidate.PNG


Like everything else in life it is a matter of finding a good middle ground that provides you the monitoring you need without having any negative effects. So to what extent are you utilizing IP SLA in your network and what operational types are you relying on? Are you even a fan of IP SLA's? (I know I've definitely got some horror stories, but I do also have a lot of good stories)

mwpreston

To the cloud!

Posted by mwpreston Sep 28, 2014

Private, Public, Hybrid, Infrastructure as a Service, Database as a Service, Software Defined Datacenter; call it what you will but for the sake of this post I’m going to sum it all up as cloud.   When virtualization started to become mainstream we seen a lot of enterprises adopt a “virtualization first” strategy, meaning new services and applications introduced to the business will first be considered to be virtualized unless a solid case for acquiring physical hardware can be made.   As the IT world shifts we are seeing this strategy move more towards a “cloud first” strategy.  Companies are asking themselves questions such as “Are there security policies stating we must run this inside of our datacenter?”, “Will cloud provide a more highly available platform for this service?”, and “Is it cost effective for us to place this service elsewhere?”.

 

Honestly, for a lot of services the cloud makes sense!  But is your database environment one of them?  From my experiences I’ve seen database environments stay relatively static.  The database sat on different pieces of physical hardware and watched us implement our “virtualization first” strategies.  We’ve long virtualized web front ends, the application servers and all the other pieces of our infrastructure but have yet to make the jump on the database.  Sometimes it’s simply due to performance, but with the advances in hypervisors as of late we can’t necessarily blame it on metrics anymore.  And now we are seeing cloud solutions such as DBaaS and IaaS present themselves to us.  Most of the time, the database is the heart of the company.  The main revenue driver for our business and customers, and it gets so locked up in change freezes that we have a hard time touching it.  But today, let’s pretend that the opportunity to move “to the cloud” is real.

 

When we look at running our databases in the cloud we really have two main options; DBaaS (Database functionalities delivered directly to us) and IaaS (The same database functionality being provided, but allowing us to control a portion of the infrastructure underneath it.)  No matter the choice we make, to me, the whole “database in the cloud” scenario is one big trade off.  We trade away our control and ownership of the complete stack in our datacenters to gain the agility and mobility that cloud can provide us with.

 

Think about it!  Currently, we have the ability to monitor the complete stack that our database lives on.  We see all traffic coming into the environment, all traffic going out, we can monitor every single switch, router, and network device that is inside of our four datacenter walls.  We can make BIOS changes to the servers our database resides on.  We utterly have complete and ??? control over how our database performs (with the exception of closed vendor code )  In a cloudy world, we hand over that control to our cloud provider.  Sure, we can usually still monitor performance metrics based on the database operations, but we don’t necessarily know what else is going on in the environment.  We don’t know who our “neighbors” are or if what they are doing is affecting us in anyway.  We don’t know what changes or tweaks might be going on below the stack that hosts our database.  On the flip side though, do we care?  We’ve paid good money for these services and SLAs and put our trust in the cloud provider to take care of this for us.  In return, we get agility.  We get functionality such as faster deployment times.  We aren’t waiting anymore for servers to arrive or storage to be provisioned.  In the case of DBaaS we get embedded best practices.  A lot of DBaaS providers do one thing and one thing alone; make databases efficient, fast, resilient and highly available.  Sometimes the burden of DR and recovery is taken care of for us.  We don’t need to buy two of everything.  Perhaps the biggest advantage though is the fact that we only pay for what we use.  As heavy resource peaks emerge we can burst and scale up, automatically.   When those periods of time are over we can retract and scale back down.

 

So thoughtful remarks for the week – What side of the “agility vs control” tradeoff do you or your business take?  Have you already made a move to hosting a database in the cloud?  What do you see as the biggest benefit/drawback to using something like DBaaS?   How has cloud changed the way you monitor and run your database infrastructure?

 

There is definitely no right or wrong answers this week – I’m really just looking for stories.  And these stories may vary depending your cloud provider of choice.  To some providers, this trade-off may not even exist.  To those doing private cloud, you may have the best of both worlds.

 

As this is my fourth and final post with the ambassador title I just wanted to thank everyone for the comments over the past month...  This is a great community with tons of engagement and you can bet that I won’t be going anywhere, ambassador or not!

You likely have two common tasks at, or near the top of your IP address management To-Do list:

  • Quickly find and allocate available IP addresses
  • Reclaim unused IP addresses


These tasks seem simple enough. But IP address management is anything but simple. With IP documentation that runs into rows of data and no information about who made updates or when, you could find yourself dishing out incorrect IP address assignments. If that isn’t enough, you might also contend with IP address conflicts, loss of connectivity, loss of user productivity, network downtime, and more. IP address management and troubleshooting can be time consuming and require a lot of manual effort.


With all the tasks and teams involved with network administration, an organized and efficient IP address management process is vital in preventing network downtime. When you oversee hundreds and thousands of IP addresses, you need to stay on top of your IP address management efforts. Identifying and effectively mapping IP address space is a critical piece of the IP address management puzzle.


Accurate mapping of IP address space is important to clearly see your IP address usage stats. IP address space is mainly divided into three units:

  • IP address blocks - a large chunk of IP addresses that are used to organize an IP address space
  • IP address ranges - small chunks of IP addresses that correspond to a DHCP scope
  • Individual IP addresses - map to a single IP address range


When you map IP address space, you might consider using one IP address block for private IP addresses and another block for public IP addresses. Similarly, you can create smaller IP address blocks based on location, department, vendor, devices, etc. However, you do not deploy and manage these IP address blocks on the network like you would IP address ranges or individual IP addresses.


Rules for IP Address Ranges and blocks:

  • IP address ranges are mapped to IP address blocks
  • Multiple IP address ranges can be mapped to a single IP address block, but not to multiple IP address blocks
  • IP address ranges mapped to the same block cannot overlap
  • When an IP address is mapped to a range, actions like adding, updating, or deleting on IP address fields in a range will affect all the IP addresses in that range

 

Once you define your IP address blocks and ranges, be sure to clearly document and inventory them.

 

When you manage a network with hundreds and thousands of IP addresses spread over different locations with multiple departments and projects, IP requirements commonly change. Under these circumstances, manual IP address management is difficult and inefficient because IP addresses are prone to IP duplication and assignment issues, making troubleshooting even more difficult.

 

The alternative is to use an IP address management tool that automates the entire process of IP address discovery and management. These tools simplify the process of quickly finding an available IP address or reclaiming an unused IP.

 

Top 3 benefits of using an IP address management tool to manage your IP space:

  • Better control and management of both IPv4 and IPv6 addresses: Easily organize IP ranges and individual addresses into logical groups.
  • Reduce manual errors and downtime due to IP address issues: Maintain a centralized inventory of your IP addresses and track changes.
  • Proactive management and planning: Close monitoring of IP space usage and easy identification of both over-used and under-used IP address space.

3f0d620.jpg

 

Last month, we shined our IT blogger spotlight on Michael Stump, who was one of the delegates at the recent Tech Field Day Extra at VMworld. This month, I figured why not keep it up? So, I tracked down the renowned Mr. Ethan Banks (@ecbanks), who also participated in the event. Without further ado, here’s what he had to say.

 

SW: First of all, it seems like I see your name just about everywhere. When it comes to blogging, where do you call home, so to speak?

 

EB: I blog in several places these days. My personal hub site is Ethan Banks on Networking, which is devoted to networking and closely related topics. I also write many of the blog posts that accompany the podcast I co-host over at Packet Pushers, a media company that discusses the networking industry with engineers and architects from around the globe. On top of that, I blog roughly once a month or so for Network Computing. Somewhat less frequently, an article of mine will be published by Network World or one of the TechTarget sites. If you poke around, you can find a few other places my writing has appeared as well, including here on thwack.

 

SW: Wow! It sounds like you’re staying busy. And someplace in there I’m sure you find time for a day job, and not to mention a hobby or two, I hope.

 

EB: I have two day jobs, actually. I’m lucky enough that the one is my blogging. I’m known to many in the networking industry as a writer, podcaster, and co-founder of Packet Pushers. In addition, I’m also the senior network architect for Carenection. Carenection is a technology company that connects medical facilities to medical services such as real-time, video-over-IP language translation via our ever-expanding network covering the US.

 

As far as hobbies go, I enjoy backpacking in the wilderness very much. I do my best to get out on the trails three or four times a month and stomp out some scenic miles in the mountains. I’m lucky enough to live in New Hampshire where there is a great outdoor culture and rich heritage of trails—over 1,400 miles of them in the White Mountain National Forest. My goal is to hike all of those miles. I’ve bagged over 22 percent so far!


SW: It’s fantastic that you’re able to count your writing and podcast efforts as a day job. How did that all get started?

 

EB: I started blogging in January 2007 when I committed to Cisco’s notoriously difficult CCIE program. Blogging was part of my study process. I’d read or lab, then blog about the important information I was learning. Blogging forced me to take the information in, understand it and then write it down in a way that someone else could understand it.

 

SW: And I guess it just grew from there. What are some of your most popular topics?

 

EB: The most popular post I’ve written this year was about my home virtualization lab. The post described in detail my choice of server and network gear, and offered pictures and links so that folks could jump off of my experience to explore their own server builds. Reddit found the article early on and has continued to drive an incredible amount of interest months later.

 

Other popular articles are related to career. People like to know what the future might hold for them in the networking space, which has been changing rapidly in recently years.

 

Yet other popular articles are “how to” explanations of common technical tasks. For example, I've spent some time with Juniper network devices running Junos, which are very different to configure than Cisco devices running IOS or NX-OS. These articles do well simply because of SEO—people with the same pain point I had find my article via Google, and can use it to help them with their configuration tasks.

 

SW: In between it all, are there any other bloggers you find the time to follow?

 

EB: There are far too many to name, to be fair. I subscribe to several dozen blogs, and usually spend the first 60-90 minutes of my day reading new content. A few that are worth Googling are Etherealmind (broad, insightful networking perspectives by my friend and podcast co-host Greg Ferro), the CloudFlare blog (these guys are doing some amazing things and describe how they push the envelope), Keeping It Classless (my friend Matt Oswalt is on the cutting edge of networking and writes outstanding content), Network Heresy (a blog by some folks working in the networking industry and thinking outside the box), and The Borg Queen (networking perspectives from Lisa Caywood, of one of the most interesting people in IT I know).

 

SW: So, we talked about how you got started with blogging, but how did a life in IT begin for you?

 

EB: In a sense, I got into IT out of desperation. I have a CS degree that was focused on programming, but my early jobs out of college were not doing development work. Instead, I spent a year as a school teacher and a year in banking. After a cross-country move to be closer to family, I couldn't find a job in banking in the area I'd moved to. At that time, the banking industry was consolidating, and getting work was very hard. So, I took out a loan and enrolled in a school that taught official Novell Netware training. I quickly became a Certified Netware 3 Administrator, landed a contract supporting a company in my area, and never looked back.

 

SW: Being an IT management company, I of course always like to ask guys like you who’ve been in IT for a good while about what tools they can’t live without. What are some of yours?

 

EB: Any tool that can track historical routing table topology information is a favorite of mine. I’m sometimes called on to find out what changed in the middle of the night that caused that 10 second blip. That’s impossible to do without the right tool. Packet Design’s Route Explorer, a product I admittedly haven’t used in a few years as I’ve changed jobs, is such a tool that knows exactly the state of the network, and could rewind to any historical point in time. Fabulous tool.

 

Over the years, I’ve also used SolarWinds NPM, NTA, NCM, VNQM, Kiwi CatTools, and the Engineer’s Toolset. I’ve also spent time with SAM and UDT. My favorites have to be the tools that let me get at any sort of SNMP OID I want. So, NPM is the SolarWinds tool I’ve spent the most time with and gotten the most from, including NPM’s Universal Device Poller feature and Network Atlas. Along the same lines, the Engineer’s Toolkit is a great resource. I’ve saved myself lots of time with the Switchport Mapper and also caught bandwidth events in action using the real-time gauges. These are simple tools, but reliably useful and practical.

 

SW: To finish us off, tell me a few of the things you’re seeing happen in the industry that will impact the future of IT.

 

EB: There are three trends that I think are key for IT professionals to watch over the next several years.

 

First, hyperconvergence. Entrants like VMware’s EVO:RAIL are joining the fray with the likes of upstarts Nutanix and Simplivity, and with good reason. The promise of an easy-to-deploy, fully integrated IT platform is resonating well with enterprises. Hyperconvergence makes a lot of sense, obscuring many of the details of complex IT infrastructure, making it easier to deliver applications to an organization.

 

Second, automation. Configuring IT systems by hand has been on the way out for a long time now, with networking finally heading into the automated realm. Automation is pushing IT pros to learn scripting, scheduling, APIs, and orchestration. The trick here is that automation is bringing together IT silos so that all engineers from all IT disciplines work together to build unified systems. This is not the way most IT has been building systems, but it appears to be the standard way all IT systems will be built in the coming years.

 

Finally, the move from public to private cloud. There’s been lots of noise about organizations moving their internal IT resources out to the public cloud, right? But another trend that's starting to show some legs is the move back. Issues of cost and security in the public cloud are causing organizations to take a second look at building their own private clouds instead of outsourcing entire IT applications. This bodes well for IT folks employed by enterprises, but also means that they need to skill up. Building a private cloud is a different sort of infrastructure than the traditional rack and stack enterprise.

No matter what bandwidth monitoring solution you use to enhance network availability, you will find the network is often riddled with issues pertaining to latency, packet loss, and jitter. To avoid unwanted traffic from interrupting key processes in your network, look for a way to safeguard your business critical applications by prioritizing them. Implementing and monitoring Quality of Service (QoS) policies can ensure optimal application and network performance, and help network admins weed out any unwanted traffic consuming a bulk of your bandwidth.

Why Should You Implement and Monitor QoS?

Quality of Service (QoS) is a set of policies that help network administrators prioritize network traffic for applications based on business impact, and guarantees enough bandwidth to ensure high network performance. It’s a mechanism that’s internal to network devices and determines which traffic gets preferential access to network resources.


Additionally, QoS is fundamental in efficiently handling traffic. A network that doesn’t have QoS policies runs with a best-effort delivery, meaning all traffic is routed with equal priority. In times of low network traffic, there typically won’t be any problems, however, what happens when traffic is heavy and congested?


Without a QoS policy, all network traffic packets have an equal chance of being dropped. A QoS policy will prioritize specific traffic according to its relative importance and application type, and use congestion-avoidance to ensure its delivery. For instance, under congestion a network device might choose to queue the traffic of applications that are more latency-tolerant. In turn, allowing traffic of applications that are less latency-tolerant to be forwarded immediately to the next network device, such as streaming media/videos, IP TV, VoIP, etc.


How does QoS Help You in Troubleshooting?


Most network administrators implement QoS policies to make sure their business-critical applications always receive the highest priority.  Additionally, QoS monitoring helps to enhance your network monitoring ability, allowing you to adjust your policies based on your priorities and can also aid in troubleshooting.

Advanced network monitoring software allows you to view network traffic segmented by class of service methods, by monitoring NetFlow data. By using Class based Quality of Service or CBQoS in network monitoring software, you can measure the effectiveness of your QoS policies, and quantify bandwidth consumption by class map.


a.png

 

Avoid a network riddled with latency, packet loss, and jitter issues. Implement QoS to ensure sufficient network bandwidth is available for business critical IT applications. More importantly, keep a lid on unwanted traffic that could possibly consume your network.

 

Learn More

Continuous Monitoring: Managing the Unpredictable Human Element of Cybersecurity

Deep Packet Inspection for Quality of Experience Monitoring

Let’s face it, there is always a possibility of networks being affected by worms and viruses. If it happens, they can replicate at an alarming rate and slow your network considerably. While you may be trying to resolve the issue as quickly as possible, the fact is your organization is experiencing downtime. Bottom line impact of downtime can be devastating, especially in terms of monetary loss.  Sometimes, it is time consuming and tedious to troubleshoot without proper stats of the network or router/switch ports.


Say, you just received an alert that one of your access switch is down. Naturally, your performance monitoring tool will show that node in red, but you notice that some of the pings are still getting back. So, you run a web based traceroute, which shows that some of the nodes in a path are reporting higher response times than usual, whereas the target device replies sporadically.


Having historical visibility into erring ports and port utilization/errors is good, but it’s immensely helpful to visualize these in real-time. When you visualize the interfaces in a chart, you could easily see the high utilization of the uplink interface and the associated access switchport that generates the high traffic. Now, all you have to do is SSH to the switch and shut down the port, so that the response times in Tracert and Bandwidth utilization values are restored to their in normal values.


To find out if a router or switch is routing significant amounts of traffic, you need a real-time troubleshooting software that can isolate exactly which port is generating the traffic. For example, using Engineer’s Toolset’s Interface Monitor, you can get real-time statistics by capturing and analyzing SNMP data from multiple routers and switches simultaneously. You can watch live monitoring statistics for both received, transmitted traffic (Rx, Tx or Rx + Tx) from a set of statistic groups like percent utilization, bandwidth, total bytes transferred, error packets, and discarded packets. You can also set warning and critical thresholds for specific interfaces.


In order to do this, select the interfaces you want to monitor and configure in Interface Monitor: polling interval, metrics, and thresholds based on your requirements. You can set the polling interval to collect statistics as frequently as every 5 seconds.

interface-Mon-graph.png


Cyber-attacks can hit your network hard and keep you awake at night. So, why not avoid the onslaughts of viruses and worms with properly maintained IT infrastructure and effective monitoring? Reacting to a threat faster and keeping your networks ticking with optimal performance is something that can keep the network admins on their toes. The ability to respond quickly to bandwidth and network performance issues using the right tools can save time and money, and increase the overall productivity of the users using the network.


tech-tip-RTM Interface.png

Let's face it, you cannot talk about VoIP without hearing about QoS (Quality of Service) for many companies a VoIP deployment is the only reason they implement QoS. After I think about it for a while, I realize 90% of the companies I've deployed QoS for/at were in preparation for or to improve a previous voice deployment. The first question I used to get is 'Why do I need to deploy QoS? I have 1Gb links that's more than enough bandwidth, well let's go back to basics. In my mind voice is a pretty stable and timid IP stream it’s the rest of the non-VoIP IP traffic that is bursty and rude so from my perspective it's not always a case of managing low-bandwidth links for VoIP traffic, it's a matter of protecting the VoIP RTP streams from all the other day-to-day data traffic. Plus, we also have to consider not every company can afford 1Gb+ private WAN links at every site, so in that case it does become a matter of reserving bandwidth for VoIP traffic.

 

QoS is definitely one of my favorite topics to discuss & design for, especially because it's one of the topics that every company does differently and they usually have different goals for the QoS implementation.  I'll kick it off with a few points I like to mention out of the gate.

 

Don't queue TCP & UDP traffic together! This is definitely one of my favorites, I've seen many people out there lump up a bunch of applications together and throw them in a single queue, it sounds like a good idea but remember how TCP & UDP fundamentally behave when packet loss occurs. If the congestion avoidance mechanisms (RED/WRED) kick in and a UDP packet is dropped the flow continues like nothing happened. Where-as if a TCP packet is dropped the stream decreasing the window size and less data gets transferred over time until the endpoints negotiate the window size back up to where it was. You might find yourself in a situation where TCP throughput is suffering but the UDP applications function like normal because they have essentially taken up the whole queue. This is a rather tough situation to troubleshoot.

 

Get Sign-off from Management - This may sound odd or trivial at first but it is usually best to work with the business (was that layer 8 or 9 again I always confuse those two?) to determine what traffic allows the company to bring in the money. You also might want to take that a step further and ask that same management/business team to put a priority on those business applications, so they can decide which applications can/should be dropped first if bandwidth is not sufficient. After all, the last thing you want to do is explain to your own management or business teams why you are dropping business critical traffic. It is a good idea to make sure they are standing behind your QoS configuration.

 

Trust boundaries - Deciding where you place your trust boundary can change your configuration & design drastically, after all if you decide to place your trust boundary on a sites edge/WAN router then you only need to worry about the queuing outbound on the WAN and the inbound markings. However if you setup your trust boundary on your access switches then you may also need to consider layer 2 QoS mechanisms and the queuing from the layer 2 device to the upstream layer 3 WAN router. 


Trust Boundary.PNG

Those are a few considerations I take into account when working with a QoS, what else do you consider for deploying QoS in your environment?

Problem management is a crucial part of IT service management that requires support teams to diagnose the ‘root cause of incidents’ (identified as problems), and determine the resolution to these problems. This is not an easy task, and specifically for mid-size to large organizations, where the number of incidents logged is considerably high, it becomes harder to handle this process. Typically problem management has been reactive in nature, i.e. getting into inspection mode after an incident has occurred. While incident management will help restoring the service temporarily, problem management comes afterwards, and ensures there is a permanent fix making sure the incident will not recur.proactive.jpg

 

It is also important to look at problem management from a proactive behavioral standpoint. Here, IT pros analyze past incidents, extrapolate trends and investigate whether any specific conditions in the IT framework will cause problems to occur. Proactive problem management overlaps with risk management as we have to constantly keep studying the IT infrastructure, identify risks and mitigate them before they turn into problems and affect service delivery.

 

The help desk plays a vital role in both types of problem management.

  • In reactive problem management, a help desk ensures incidents are recorded properly and easily tied to problems, while also supporting customizable workflows to handle incident and problem tickets. Help desk integration with remote control tools work to the advantage of speeding up reactive problem management, and allowing admins to quickly and remotely solve end-user desktop issues causing problems.
  • In proactive problem management, a help desk provides data about various entities of the service management model (operations, infrastructure, people, process, and service requests), and helps you get better at understanding and identifying risks. If your help desk could integrate with IT infrastructure management tools like network and server monitoring to associate network, application & server issues with incident tickets, it’ll help you identify trends for problems related to infrastructure causing problem ticket.

 

It is important for IT departments to decide and plan in advance a feasible problem management methodology that can be applied to known problems easily, and is also flexible to adjust and apply to new problems. Instead of siding with reactive or proactive approach, IT should strategize both and be prepared to fix problems fast.

 

Share with us how you handle problems as part of your IT service support process.

For the most part most database performance monitoring tools do a great job at real-time monitoring – by that I mean alerting us when certain counter thresholds are reached, such as Page Life Expectancy below 300 or Memory Pages per Second is too high.  Although this is definitely crucial to have setup within our environment, having hard alerts does pose a problem of its own.  How do we know that reaching a page life expectancy of 300 is a problem?   Maybe this is normal for a certain period of time such as month end processing.

 

This is where the baseline comes into play.  A baseline, by definition is a minimum or starting point used for comparisons.  In the database performance analysis world, it’s a snapshot or how our databases and servers are performing when not experiencing any issues for a given point of time.  We can then take these performance snapshots and use them as a starting point when troubleshooting performance issues.  For instance, take into consideration a few of the following questions…

 

  1. Is my database running slower now than it was last week?
  2. Has my database been impacted by the latest disk failure and RAID rebuild?
  3. Has the new SAN migration impacted my database services in any way?
  4. Has the latest configuration change/application update impacted my servers in any way?
  5. How have the addition of 20 VMs into my environment impacted my database?

 

With established baselines we are able to quickly see by comparison the answer to all of these questions.  But, let’s take this a step further, and use question 5 in the following scenario.

 

Jim is currently comparing how his database server is performing now against a baseline he had taken a few months back.  This, being after adding 20 new VMs into his environment.  He concludes, with the data to back him up, that his server is indeed running slower.  He is seeing increased read/write latency and increased CPU usage.  So is the blame really to be placed on the newly added VMs?   Well, this all depends – What if something else was currently going on that is causing the latency to increase?  Say month end processing and backups are happening now and weren't during the snapshot of the older baseline.

 

We can quickly see that baselines, while they are important, are really only as good as the time that you take them.  Comparing a  period of increased activity to a baseline taken during a period of normal activity is really not very useful at all.

 

So this week I ask you to simply tell me about how you tackle baselines.

  1. Do you take baselines at all?  How many?  How often?
  2. What counters/metrics do you collect?
  3. Do you baseline your applications during peak usage?  Low usage?  Month end?
  4. Do you rely solely on your monitoring solution for baselining?  Does it show you trending over time?
  5. Can your monitoring solution tell you, based on previous data, what is normal for this period of time in your environment?

 

You don’t have to stick to these questions – let's just have a conversation about baselining!

It seems like every organization is looking at what can be moved—or should be moved—to the cloud. However, the cloud is clearly not for everything; as with any technology there are benefits and tradeoffs. As such, it is important for all IT professionals to understand when and how the cloud is advantageous for their applications.

 

In this evaluation process and the migration planning for moving applications to the cloud, databases are usually the more difficult element to understand. Of course, data is the heart of every application, so knowing how databases can reliably work in the cloud is key. Here are a few ideas and recommendations to keep in mind when considering moving databases to the cloud:

 

1. It starts with performance. If I had a penny for every time I have heard, “the cloud is too slow for databases,” I might have enough for a double venti latte. Performance uncertainty is the key concern that stops professionals from moving databases to virtualized environments or the cloud. However, this concern is often unfounded as many applications have performance requirements that are easy to meet in a number of different cloud architectures. Cloud technology has evolved over the past three years to offer multiple deployment options for databases, some of them with very high performance capabilities.

 

2. Visibility can help. The easiest way to solve performance problems is to throw hardware at them, but that is obviously not a best practice and is not very cost effective. A database monitoring tool can help you understand the true database and resource requirements of your application. Things such as:

    • CPU, Storage, memory, latency and storage throughput (IOPS can be deceiving)
    • Planned storage growth and backup requirements
    • Resource fluctuation based on peak application usage or batch processes
    • Data connection dependencies—aside from application connectivity there may be other application data interchange requirements, backups or flow of incoming data

One of the advantages of the cloud is the ability to dynamically scale resources up and down. So, rather than being the source of performance uncertainty concerns, it can actually give you peace of mind that the right amount of resources can be allocated to your applications to ensure adequate performance. The key, however, is knowing what those requirements are. You can use Database Performance Analyzer (there is a 14 day free trial) to understand these requirements.

 

3. Take a test drive. One of the obvious benefits of the cloud is low cost and accessibility. Even if you don’t have a migration plan in the works yet, it is a good idea to play with cloud databases to become familiar, experiment and learn. In an hour of your time, you can get a database running in the cloud. Set one up, play with it and kill it. The cost is minimal. With a bit more time and a few more dollars, you can even move a copy of a production database to the cloud and test deployment options and learn how things specific to your application and database will work in the cloud.


4.Carefully plan your deployment model. The cloud offers multiple deployment options that should be considered. For example, Database as a Service (DBaaS) provides simplicity in deployment, automation and a managed service. Leveraging Infrastructure as a Service (IaaS) is an alternative for running database instances on cloud servers that provides more control and that looks and feels like a traditional on-premise deployment. There are also various storage options, including block storage, SSD drives, guaranteed IOPS, dedicated connections and database-optimized instances. As the cloud is mostly a shared environment, it is also important to understand and test for performance consistency and variability, not just peak theoretical performance.

 

5. Make the move. There is no single migration plan that covers all use cases. Rather than trying to use dome formula for making the move to the cloud, I recommend talking to your cloud provider, explaining your environment and getting their guidance. It is also usually a good idea to create a duplicate environment in the cloud and verify it runs well before switching the production application. And in addition to your data recovery and backup requirements, it is also important to consider replication or standby servers in a different region than where your primary servers are located.

 

6. Monitor and optimize. Just like with on-premise deployments, it is important to monitor and optimize your cloud environment once it is up and running. Database optimization tools offer wait time analysis and resource correlation can speed database operations significantly, alert to issues before they become big problems, increase application performance and monitor resources to help with planning. Database administrators, developers and IT operations can benefit from a performance analysis tool like SolarWinds DPA that allows them to write good code and pinpoint the root cause of whatever could be slowing down the database, whether that be queries, storage events, server resources, etc.

 

The cloud is evolving quickly. It is getting better, more reliable and more flexible all the time. Just like five years ago when most of us could not envision just how transformative the cloud would be today, we should expect the technology to continue evolving at the same pace over the next five years. This is one more reason to start experimenting with the cloud today. It is a journey that requires breaking some paradigms and shifting your mindset, but also a journey that can provide significant benefits for your applications and your job.

We caught an article this week over on Bank Info Security's website about The Future of PCI. The PCI Security Standards Council revealed some of their thinking about where PCI needs to go during a recent PCI Community Meeting in Orlando, Florida. Some of the highlights, as we see them:

  1. "We really need to have a risk-based dialogue versus a compliance-based approach" - sounds a little bit like we're all on the same page when it comes to "compliance ≠ security".  He also acknowledges the ongoing challenge that retailers are interested in more prescriptive guidance, but threats are continually evolving: "merchants and the payments industry have to be committed to long-range security planning" and not just focusing on the current big breach. This is tough for the rest of us, who are really heads down in the day to day job. We may need the PCI Council to help us move along the spectrum, otherwise we'll keep focusing on table stakes security with the limited resources (people, money, and time) that we have.
  2. "When it comes to ensuring ongoing PCI compliance, it's critical that organizations regularly track the effectiveness of the controls and technologies they put in place, Leach says." - the reality of audit-driven compliance is that it's a once-a-year kind of deal. It's hard to keep the focus on something year in and year out when there's no pressing need. Theoretically with #1 (compliance better aligned with good security practices) it becomes easier to be able to answer "are we compliant TODAY, not just on audit day?" We see continuous compliance/monitoring becoming a trend across industries and segments, so I'm not surprised to see PCI thinking the same way. They sum it up pretty well: "Ongoing PCI is a challenge. It's very, very complicated and has many situation-specific qualities to it. ... We have to work with these organizations and make them realize the risks and then help them find solutions that work."
  3. "The very old, very basic kind of security flaws still remain - weak passwords, insecure remote access, lack of security patches, things like that that in some cases have been almost deliberately set up to make it easy for that reseller or that POS support person to do the maintenance" - a lot of us really are still fighting common security stuff. The security industry is constantly focusing on detecting the next big threat with new products and services - but the reality is a lot of us still need help making sure that our bases are fully covered in constantly evolving environments where balancing security and convenience is still a huge challenge.

 

There's more over in the article and we'll keep our eyes peeled for more on how the PCI council may turn this into actual material changes.

 

We've talked a little on Thwack before about whether compliance = security (or some variation of that truth - check out the discussion here: Does Compliance Actually Make you More Secure?). Do you think this news will change anything? Are your IT, compliance, and security teams moving toward more ongoing compliance instead of just point in time, or is an audit still a scramble? Let us know what you think about all things PCI in the comments.

Because healthcare organizations are commonly a prime target for security breaches, they need to do their part in protecting the privacy of patient information and records. The federal government has “acted” on that notion by setting standards for protecting sensitive information and requiring companies that handle sensitive patient data to comply with those standards.

 

First, there was the Health Insurance Portability and Accountability (HIPAA) Act of 1996, which defined rules for securing processes such as saving, transmitting, and accessing patient information. Then there was the Health Information Technology for Economic and Clinical Health (HITECH) Act, enacted as part of the American Recovery and Reinvestment Act (ARRA) of 2009. This act was designed to strengthen the privacy and security protections established under HIPAA.

 

With the upswing in attacks on hospitals and other healthcare providers, IT security has become a high priority for these organizations. HIPAA has always been the defining baseline for securing information from nefarious entities that target the healthcare industry. But it wasn’t the final word on protecting medical and personal information. In January 2013, the Department of Health and Human Services (HHS) released the Omnibus Final Rule (Final Rule) to assist in interpreting and implementing various provisions of the HITECH Act and the Genetic Information Nondiscrimination Act of 2008 (GINA). A deadline for full compliance by September 23, 2013 was announced at the same time. While a number of organizations were allowed to delay updating their Business Associate Agreements (BAAs) to meet compliance guidelines, all organizations were required to comply with the Final Rule by that date.

 

The Omnibus Final Rule modified the HIPAA standards used by the healthcare industry to determine whether a breach transpired in relation to protected health information (PHI). This is an important amendment to the HIPAA Security Rule and the compliance deadline meant that organizations needed to make significant changes to their security processes in a short amount of time.

 

Key Policies of the Omnibus Final Rule include:

  • Healthcare Organizations (including business associates and subcontractors) are directly liable for compliance and the costly penalties for all violations.
  • In the event of a breach, organizations must notify patients, HHS, media, within 60 days of discovery. The exception is if the organization conducts a risk assessment and can demonstrate a low probability that the PHI has been compromised.
  • The focus of the risk assessment is not on the harm of the patient but whether information has been compromised.
  • Previous exceptions for breaches of limited data sets (data that does not contain birth dates or zip codes) are no longer allowed. Breaches to this kind of data must be treated like all other breaches of PHI.

 

The Omnibus Final Rule imposed many changes to HIPAA and HITECH, but some items remained, such as the safe harbor exception that lists 18 identifiers that must be removed from the data before it can be shared with an outside party. The rule states that an unauthorized disclosure only rises to the level of a breach and only triggers the notification requirements of the HITECH Act if the PHI disclosed is unsecured. Many of the other breach definitions that were brought in by the interim rules also remained. These include access to PHI by a workforce member without disclosure, inadvertent disclosure to an authorized person, and if there is a good faith belief that disclosure is necessary to prevent or lessen a threat to the health or safety or the patient or others.

 

In response to the growing number and sophistication of attacks, many organizations are seeing the necessity of increasing their security posture. Some of the regulations require updates across the enterprise to ensure continued compliance with the Final Rule.

In my last post, we discussed implementing voice into a new environment, now I figured we would discuss troubleshooting that environment after it's initially deployment. Only seems natural right?

 

Now, due to its nature troubleshooting VoIP issues can be quite different then troubleshooting your typical data TCP/UDP applications and more often then not we will have another set of tools to troubleshoot VoIP related issues. (And hopefully some of those tools are integrated with our day-to-day network management tools)


IP SLA/RPM Monitoring*:

This is definitely one of my favorite networking tools, IP SLA monitoring allows me see what the network looks like from a different perspective (usually a perspective closer to the end-user). There are a few different IP SLA operations we can use to monitor the performance of a VoIP network. UDP Jitter is one of those particular operations that allow us to get a deeper insight into VoIP performance. Discovering variances in jitter could point to an incorrectly sized voice queue or possible WAN/transit related issues. Have you ever considered implementing a DNS or DHCP IP SLA monitor?

*Keep in mind IP SLA monitoring can also be used outside of monitoring the VoIP infrastructure, other operations support TCP/HTTP/FTP/etc protocols so you can get the user's perspective for other mission critical applications.


NetFlow/JFlow/IPFIX:

Another great tool to have in the arsenal. NetFlow is an easy way to see a breakdown of traffic from the interface perspective. This allows you to verify your signaling and RTP streams are being marked correctly. It also provides you with the ability to verify other applications/traffic flows are not getting marked into the Voice queue unintentionally. Different vendors can run their own variations of NetFlow but at the end of the day they all provide very similar information, many of the newer NetFlow versions allow more granular control of what information is collected and where it collected from.


MOS Scores:

While this one is not an actual tool itself. Keeping an eye on your MOS scores can quickly identify trouble spots (if the end-users don't report it first that is) by identifying poor quality calls.

MOS.PNG

 

A good old Analog phone:
Wait a second this a VoIP deployment right? What if we do have a few analog lines for backup/AAR/E911 services we find ourselves in a situation were we might need troubleshoot that analog line. Possibly for static or functionality.

Polling more specific information:

 

Depending on what you trying to troubleshooting you can definitely get some great insight from polling some more specific information using the UnDP:

(Some of these will only be manageable for smaller locations/deployments)

  • Number of registered phones - displayed in a graph format so you can easily drops in registered phones
  • CME Version - Specific for Cisco routers running CME, but keeping track of the CME versions could help isolate issues to a specific software set.
  • Below are a few others I have created as well, below is a sample VoIP dashboard.

Custom VoIP Pollers2.png

A lot of times as administrators or infrastructure people we all too often get stuck “keeping the lights on”.  What I mean by this, is we have our tools and scripts in place to monitor all of our services and databases, we have notification set up to alert us when they are down or experiencing trouble, and we have our troubleshooting methodologies and exercises that we go through in order to get everything back up and running.


The problem being, that's where our job usually ends.  We simply fix the issue and must move on to the next issue in order to keep the business up.  Not that often do we get the chance to research a better way of monitoring or a better way of doing things.  And when we do get that time, how do we get these projects financially backed by a budget?

 

Throughout my career there have been plenty of times where I have mentioned the need for better or faster storage, more memory, more compute, and different pieces of software to better support me in my role.  However the fact of the matter without proof on how these upgrades or greenfield deployments will impact the business, or better yet, how the business will be impacted without them, there's a pretty good chance that the answer will always be no.

 

So I’m constantly looking for that silver bullet if you will – something that I can take to my CTO/CFO in order to validate my budget requests.  The problem being, most performance monitoring applications spit out reports dealing with very technical metrics.  My CTO/CFO do not care about the average response time of a query.  They don’t care about table locking and blocking numbers.  What they want to see is how what I’m asking for can either save them money or make them money.

 

So this is where I struggle and I’m asking you, the thwack community for your help on this one – leave a comment with your best tip or strategy on using performance data and metrics to get budgets and projects approved.  Below are a few questions to help you get started.

 

  • How do you present your case to a CTO/CFO?  Do you have some go to metrics that you find they understand more than others?
  • Do you correlate performance data with other groups of financial data to show a bottom line impact of a performance issue or outage?
  • Do you map your performance data directly to SLA’s that might be in place?  Does this help in selling your pitch?
  • Do you have any specific metrics or performance reports you use to show your business stakeholders the impact on customer satisfaction or brand reputation?

 

Thanks for reading – I look forward to hearing from you.

Filter Blog

By date:
By tag: