1 14 15 16 17 18 Previous Next

Geek Speak

1,572 posts

Let’s face it, there is always a possibility of networks being affected by worms and viruses. If it happens, they can replicate at an alarming rate and slow your network considerably. While you may be trying to resolve the issue as quickly as possible, the fact is your organization is experiencing downtime. Bottom line impact of downtime can be devastating, especially in terms of monetary loss.  Sometimes, it is time consuming and tedious to troubleshoot without proper stats of the network or router/switch ports.


Say, you just received an alert that one of your access switch is down. Naturally, your performance monitoring tool will show that node in red, but you notice that some of the pings are still getting back. So, you run a web based traceroute, which shows that some of the nodes in a path are reporting higher response times than usual, whereas the target device replies sporadically.


Having historical visibility into erring ports and port utilization/errors is good, but it’s immensely helpful to visualize these in real-time. When you visualize the interfaces in a chart, you could easily see the high utilization of the uplink interface and the associated access switchport that generates the high traffic. Now, all you have to do is SSH to the switch and shut down the port, so that the response times in Tracert and Bandwidth utilization values are restored to their in normal values.


To find out if a router or switch is routing significant amounts of traffic, you need a real-time troubleshooting software that can isolate exactly which port is generating the traffic. For example, using Engineer’s Toolset’s Interface Monitor, you can get real-time statistics by capturing and analyzing SNMP data from multiple routers and switches simultaneously. You can watch live monitoring statistics for both received, transmitted traffic (Rx, Tx or Rx + Tx) from a set of statistic groups like percent utilization, bandwidth, total bytes transferred, error packets, and discarded packets. You can also set warning and critical thresholds for specific interfaces.


In order to do this, select the interfaces you want to monitor and configure in Interface Monitor: polling interval, metrics, and thresholds based on your requirements. You can set the polling interval to collect statistics as frequently as every 5 seconds.

interface-Mon-graph.png


Cyber-attacks can hit your network hard and keep you awake at night. So, why not avoid the onslaughts of viruses and worms with properly maintained IT infrastructure and effective monitoring? Reacting to a threat faster and keeping your networks ticking with optimal performance is something that can keep the network admins on their toes. The ability to respond quickly to bandwidth and network performance issues using the right tools can save time and money, and increase the overall productivity of the users using the network.


tech-tip-RTM Interface.png

Let's face it, you cannot talk about VoIP without hearing about QoS (Quality of Service) for many companies a VoIP deployment is the only reason they implement QoS. After I think about it for a while, I realize 90% of the companies I've deployed QoS for/at were in preparation for or to improve a previous voice deployment. The first question I used to get is 'Why do I need to deploy QoS? I have 1Gb links that's more than enough bandwidth, well let's go back to basics. In my mind voice is a pretty stable and timid IP stream it’s the rest of the non-VoIP IP traffic that is bursty and rude so from my perspective it's not always a case of managing low-bandwidth links for VoIP traffic, it's a matter of protecting the VoIP RTP streams from all the other day-to-day data traffic. Plus, we also have to consider not every company can afford 1Gb+ private WAN links at every site, so in that case it does become a matter of reserving bandwidth for VoIP traffic.

 

QoS is definitely one of my favorite topics to discuss & design for, especially because it's one of the topics that every company does differently and they usually have different goals for the QoS implementation.  I'll kick it off with a few points I like to mention out of the gate.

 

Don't queue TCP & UDP traffic together! This is definitely one of my favorites, I've seen many people out there lump up a bunch of applications together and throw them in a single queue, it sounds like a good idea but remember how TCP & UDP fundamentally behave when packet loss occurs. If the congestion avoidance mechanisms (RED/WRED) kick in and a UDP packet is dropped the flow continues like nothing happened. Where-as if a TCP packet is dropped the stream decreasing the window size and less data gets transferred over time until the endpoints negotiate the window size back up to where it was. You might find yourself in a situation where TCP throughput is suffering but the UDP applications function like normal because they have essentially taken up the whole queue. This is a rather tough situation to troubleshoot.

 

Get Sign-off from Management - This may sound odd or trivial at first but it is usually best to work with the business (was that layer 8 or 9 again I always confuse those two?) to determine what traffic allows the company to bring in the money. You also might want to take that a step further and ask that same management/business team to put a priority on those business applications, so they can decide which applications can/should be dropped first if bandwidth is not sufficient. After all, the last thing you want to do is explain to your own management or business teams why you are dropping business critical traffic. It is a good idea to make sure they are standing behind your QoS configuration.

 

Trust boundaries - Deciding where you place your trust boundary can change your configuration & design drastically, after all if you decide to place your trust boundary on a sites edge/WAN router then you only need to worry about the queuing outbound on the WAN and the inbound markings. However if you setup your trust boundary on your access switches then you may also need to consider layer 2 QoS mechanisms and the queuing from the layer 2 device to the upstream layer 3 WAN router. 


Trust Boundary.PNG

Those are a few considerations I take into account when working with a QoS, what else do you consider for deploying QoS in your environment?

Problem management is a crucial part of IT service management that requires support teams to diagnose the ‘root cause of incidents’ (identified as problems), and determine the resolution to these problems. This is not an easy task, and specifically for mid-size to large organizations, where the number of incidents logged is considerably high, it becomes harder to handle this process. Typically problem management has been reactive in nature, i.e. getting into inspection mode after an incident has occurred. While incident management will help restoring the service temporarily, problem management comes afterwards, and ensures there is a permanent fix making sure the incident will not recur.proactive.jpg

 

It is also important to look at problem management from a proactive behavioral standpoint. Here, IT pros analyze past incidents, extrapolate trends and investigate whether any specific conditions in the IT framework will cause problems to occur. Proactive problem management overlaps with risk management as we have to constantly keep studying the IT infrastructure, identify risks and mitigate them before they turn into problems and affect service delivery.

 

The help desk plays a vital role in both types of problem management.

  • In reactive problem management, a help desk ensures incidents are recorded properly and easily tied to problems, while also supporting customizable workflows to handle incident and problem tickets. Help desk integration with remote control tools work to the advantage of speeding up reactive problem management, and allowing admins to quickly and remotely solve end-user desktop issues causing problems.
  • In proactive problem management, a help desk provides data about various entities of the service management model (operations, infrastructure, people, process, and service requests), and helps you get better at understanding and identifying risks. If your help desk could integrate with IT infrastructure management tools like network and server monitoring to associate network, application & server issues with incident tickets, it’ll help you identify trends for problems related to infrastructure causing problem ticket.

 

It is important for IT departments to decide and plan in advance a feasible problem management methodology that can be applied to known problems easily, and is also flexible to adjust and apply to new problems. Instead of siding with reactive or proactive approach, IT should strategize both and be prepared to fix problems fast.

 

Share with us how you handle problems as part of your IT service support process.

For the most part most database performance monitoring tools do a great job at real-time monitoring – by that I mean alerting us when certain counter thresholds are reached, such as Page Life Expectancy below 300 or Memory Pages per Second is too high.  Although this is definitely crucial to have setup within our environment, having hard alerts does pose a problem of its own.  How do we know that reaching a page life expectancy of 300 is a problem?   Maybe this is normal for a certain period of time such as month end processing.

 

This is where the baseline comes into play.  A baseline, by definition is a minimum or starting point used for comparisons.  In the database performance analysis world, it’s a snapshot or how our databases and servers are performing when not experiencing any issues for a given point of time.  We can then take these performance snapshots and use them as a starting point when troubleshooting performance issues.  For instance, take into consideration a few of the following questions…

 

  1. Is my database running slower now than it was last week?
  2. Has my database been impacted by the latest disk failure and RAID rebuild?
  3. Has the new SAN migration impacted my database services in any way?
  4. Has the latest configuration change/application update impacted my servers in any way?
  5. How have the addition of 20 VMs into my environment impacted my database?

 

With established baselines we are able to quickly see by comparison the answer to all of these questions.  But, let’s take this a step further, and use question 5 in the following scenario.

 

Jim is currently comparing how his database server is performing now against a baseline he had taken a few months back.  This, being after adding 20 new VMs into his environment.  He concludes, with the data to back him up, that his server is indeed running slower.  He is seeing increased read/write latency and increased CPU usage.  So is the blame really to be placed on the newly added VMs?   Well, this all depends – What if something else was currently going on that is causing the latency to increase?  Say month end processing and backups are happening now and weren't during the snapshot of the older baseline.

 

We can quickly see that baselines, while they are important, are really only as good as the time that you take them.  Comparing a  period of increased activity to a baseline taken during a period of normal activity is really not very useful at all.

 

So this week I ask you to simply tell me about how you tackle baselines.

  1. Do you take baselines at all?  How many?  How often?
  2. What counters/metrics do you collect?
  3. Do you baseline your applications during peak usage?  Low usage?  Month end?
  4. Do you rely solely on your monitoring solution for baselining?  Does it show you trending over time?
  5. Can your monitoring solution tell you, based on previous data, what is normal for this period of time in your environment?

 

You don’t have to stick to these questions – let's just have a conversation about baselining!

It seems like every organization is looking at what can be moved—or should be moved—to the cloud. However, the cloud is clearly not for everything; as with any technology there are benefits and tradeoffs. As such, it is important for all IT professionals to understand when and how the cloud is advantageous for their applications.

 

In this evaluation process and the migration planning for moving applications to the cloud, databases are usually the more difficult element to understand. Of course, data is the heart of every application, so knowing how databases can reliably work in the cloud is key. Here are a few ideas and recommendations to keep in mind when considering moving databases to the cloud:

 

1. It starts with performance. If I had a penny for every time I have heard, “the cloud is too slow for databases,” I might have enough for a double venti latte. Performance uncertainty is the key concern that stops professionals from moving databases to virtualized environments or the cloud. However, this concern is often unfounded as many applications have performance requirements that are easy to meet in a number of different cloud architectures. Cloud technology has evolved over the past three years to offer multiple deployment options for databases, some of them with very high performance capabilities.

 

2. Visibility can help. The easiest way to solve performance problems is to throw hardware at them, but that is obviously not a best practice and is not very cost effective. A database monitoring tool can help you understand the true database and resource requirements of your application. Things such as:

    • CPU, Storage, memory, latency and storage throughput (IOPS can be deceiving)
    • Planned storage growth and backup requirements
    • Resource fluctuation based on peak application usage or batch processes
    • Data connection dependencies—aside from application connectivity there may be other application data interchange requirements, backups or flow of incoming data

One of the advantages of the cloud is the ability to dynamically scale resources up and down. So, rather than being the source of performance uncertainty concerns, it can actually give you peace of mind that the right amount of resources can be allocated to your applications to ensure adequate performance. The key, however, is knowing what those requirements are. You can use Database Performance Analyzer (there is a 14 day free trial) to understand these requirements.

 

3. Take a test drive. One of the obvious benefits of the cloud is low cost and accessibility. Even if you don’t have a migration plan in the works yet, it is a good idea to play with cloud databases to become familiar, experiment and learn. In an hour of your time, you can get a database running in the cloud. Set one up, play with it and kill it. The cost is minimal. With a bit more time and a few more dollars, you can even move a copy of a production database to the cloud and test deployment options and learn how things specific to your application and database will work in the cloud.


4.Carefully plan your deployment model. The cloud offers multiple deployment options that should be considered. For example, Database as a Service (DBaaS) provides simplicity in deployment, automation and a managed service. Leveraging Infrastructure as a Service (IaaS) is an alternative for running database instances on cloud servers that provides more control and that looks and feels like a traditional on-premise deployment. There are also various storage options, including block storage, SSD drives, guaranteed IOPS, dedicated connections and database-optimized instances. As the cloud is mostly a shared environment, it is also important to understand and test for performance consistency and variability, not just peak theoretical performance.

 

5. Make the move. There is no single migration plan that covers all use cases. Rather than trying to use dome formula for making the move to the cloud, I recommend talking to your cloud provider, explaining your environment and getting their guidance. It is also usually a good idea to create a duplicate environment in the cloud and verify it runs well before switching the production application. And in addition to your data recovery and backup requirements, it is also important to consider replication or standby servers in a different region than where your primary servers are located.

 

6. Monitor and optimize. Just like with on-premise deployments, it is important to monitor and optimize your cloud environment once it is up and running. Database optimization tools offer wait time analysis and resource correlation can speed database operations significantly, alert to issues before they become big problems, increase application performance and monitor resources to help with planning. Database administrators, developers and IT operations can benefit from a performance analysis tool like SolarWinds DPA that allows them to write good code and pinpoint the root cause of whatever could be slowing down the database, whether that be queries, storage events, server resources, etc.

 

The cloud is evolving quickly. It is getting better, more reliable and more flexible all the time. Just like five years ago when most of us could not envision just how transformative the cloud would be today, we should expect the technology to continue evolving at the same pace over the next five years. This is one more reason to start experimenting with the cloud today. It is a journey that requires breaking some paradigms and shifting your mindset, but also a journey that can provide significant benefits for your applications and your job.

We caught an article this week over on Bank Info Security's website about The Future of PCI. The PCI Security Standards Council revealed some of their thinking about where PCI needs to go during a recent PCI Community Meeting in Orlando, Florida. Some of the highlights, as we see them:

  1. "We really need to have a risk-based dialogue versus a compliance-based approach" - sounds a little bit like we're all on the same page when it comes to "compliance ≠ security".  He also acknowledges the ongoing challenge that retailers are interested in more prescriptive guidance, but threats are continually evolving: "merchants and the payments industry have to be committed to long-range security planning" and not just focusing on the current big breach. This is tough for the rest of us, who are really heads down in the day to day job. We may need the PCI Council to help us move along the spectrum, otherwise we'll keep focusing on table stakes security with the limited resources (people, money, and time) that we have.
  2. "When it comes to ensuring ongoing PCI compliance, it's critical that organizations regularly track the effectiveness of the controls and technologies they put in place, Leach says." - the reality of audit-driven compliance is that it's a once-a-year kind of deal. It's hard to keep the focus on something year in and year out when there's no pressing need. Theoretically with #1 (compliance better aligned with good security practices) it becomes easier to be able to answer "are we compliant TODAY, not just on audit day?" We see continuous compliance/monitoring becoming a trend across industries and segments, so I'm not surprised to see PCI thinking the same way. They sum it up pretty well: "Ongoing PCI is a challenge. It's very, very complicated and has many situation-specific qualities to it. ... We have to work with these organizations and make them realize the risks and then help them find solutions that work."
  3. "The very old, very basic kind of security flaws still remain - weak passwords, insecure remote access, lack of security patches, things like that that in some cases have been almost deliberately set up to make it easy for that reseller or that POS support person to do the maintenance" - a lot of us really are still fighting common security stuff. The security industry is constantly focusing on detecting the next big threat with new products and services - but the reality is a lot of us still need help making sure that our bases are fully covered in constantly evolving environments where balancing security and convenience is still a huge challenge.

 

There's more over in the article and we'll keep our eyes peeled for more on how the PCI council may turn this into actual material changes.

 

We've talked a little on Thwack before about whether compliance = security (or some variation of that truth - check out the discussion here: Does Compliance Actually Make you More Secure?). Do you think this news will change anything? Are your IT, compliance, and security teams moving toward more ongoing compliance instead of just point in time, or is an audit still a scramble? Let us know what you think about all things PCI in the comments.

Because healthcare organizations are commonly a prime target for security breaches, they need to do their part in protecting the privacy of patient information and records. The federal government has “acted” on that notion by setting standards for protecting sensitive information and requiring companies that handle sensitive patient data to comply with those standards.

 

First, there was the Health Insurance Portability and Accountability (HIPAA) Act of 1996, which defined rules for securing processes such as saving, transmitting, and accessing patient information. Then there was the Health Information Technology for Economic and Clinical Health (HITECH) Act, enacted as part of the American Recovery and Reinvestment Act (ARRA) of 2009. This act was designed to strengthen the privacy and security protections established under HIPAA.

 

With the upswing in attacks on hospitals and other healthcare providers, IT security has become a high priority for these organizations. HIPAA has always been the defining baseline for securing information from nefarious entities that target the healthcare industry. But it wasn’t the final word on protecting medical and personal information. In January 2013, the Department of Health and Human Services (HHS) released the Omnibus Final Rule (Final Rule) to assist in interpreting and implementing various provisions of the HITECH Act and the Genetic Information Nondiscrimination Act of 2008 (GINA). A deadline for full compliance by September 23, 2013 was announced at the same time. While a number of organizations were allowed to delay updating their Business Associate Agreements (BAAs) to meet compliance guidelines, all organizations were required to comply with the Final Rule by that date.

 

The Omnibus Final Rule modified the HIPAA standards used by the healthcare industry to determine whether a breach transpired in relation to protected health information (PHI). This is an important amendment to the HIPAA Security Rule and the compliance deadline meant that organizations needed to make significant changes to their security processes in a short amount of time.

 

Key Policies of the Omnibus Final Rule include:

  • Healthcare Organizations (including business associates and subcontractors) are directly liable for compliance and the costly penalties for all violations.
  • In the event of a breach, organizations must notify patients, HHS, media, within 60 days of discovery. The exception is if the organization conducts a risk assessment and can demonstrate a low probability that the PHI has been compromised.
  • The focus of the risk assessment is not on the harm of the patient but whether information has been compromised.
  • Previous exceptions for breaches of limited data sets (data that does not contain birth dates or zip codes) are no longer allowed. Breaches to this kind of data must be treated like all other breaches of PHI.

 

The Omnibus Final Rule imposed many changes to HIPAA and HITECH, but some items remained, such as the safe harbor exception that lists 18 identifiers that must be removed from the data before it can be shared with an outside party. The rule states that an unauthorized disclosure only rises to the level of a breach and only triggers the notification requirements of the HITECH Act if the PHI disclosed is unsecured. Many of the other breach definitions that were brought in by the interim rules also remained. These include access to PHI by a workforce member without disclosure, inadvertent disclosure to an authorized person, and if there is a good faith belief that disclosure is necessary to prevent or lessen a threat to the health or safety or the patient or others.

 

In response to the growing number and sophistication of attacks, many organizations are seeing the necessity of increasing their security posture. Some of the regulations require updates across the enterprise to ensure continued compliance with the Final Rule.

In my last post, we discussed implementing voice into a new environment, now I figured we would discuss troubleshooting that environment after it's initially deployment. Only seems natural right?

 

Now, due to its nature troubleshooting VoIP issues can be quite different then troubleshooting your typical data TCP/UDP applications and more often then not we will have another set of tools to troubleshoot VoIP related issues. (And hopefully some of those tools are integrated with our day-to-day network management tools)


IP SLA/RPM Monitoring*:

This is definitely one of my favorite networking tools, IP SLA monitoring allows me see what the network looks like from a different perspective (usually a perspective closer to the end-user). There are a few different IP SLA operations we can use to monitor the performance of a VoIP network. UDP Jitter is one of those particular operations that allow us to get a deeper insight into VoIP performance. Discovering variances in jitter could point to an incorrectly sized voice queue or possible WAN/transit related issues. Have you ever considered implementing a DNS or DHCP IP SLA monitor?

*Keep in mind IP SLA monitoring can also be used outside of monitoring the VoIP infrastructure, other operations support TCP/HTTP/FTP/etc protocols so you can get the user's perspective for other mission critical applications.


NetFlow/JFlow/IPFIX:

Another great tool to have in the arsenal. NetFlow is an easy way to see a breakdown of traffic from the interface perspective. This allows you to verify your signaling and RTP streams are being marked correctly. It also provides you with the ability to verify other applications/traffic flows are not getting marked into the Voice queue unintentionally. Different vendors can run their own variations of NetFlow but at the end of the day they all provide very similar information, many of the newer NetFlow versions allow more granular control of what information is collected and where it collected from.


MOS Scores:

While this one is not an actual tool itself. Keeping an eye on your MOS scores can quickly identify trouble spots (if the end-users don't report it first that is) by identifying poor quality calls.

MOS.PNG

 

A good old Analog phone:
Wait a second this a VoIP deployment right? What if we do have a few analog lines for backup/AAR/E911 services we find ourselves in a situation were we might need troubleshoot that analog line. Possibly for static or functionality.

Polling more specific information:

 

Depending on what you trying to troubleshooting you can definitely get some great insight from polling some more specific information using the UnDP:

(Some of these will only be manageable for smaller locations/deployments)

  • Number of registered phones - displayed in a graph format so you can easily drops in registered phones
  • CME Version - Specific for Cisco routers running CME, but keeping track of the CME versions could help isolate issues to a specific software set.
  • Below are a few others I have created as well, below is a sample VoIP dashboard.

Custom VoIP Pollers2.png

A lot of times as administrators or infrastructure people we all too often get stuck “keeping the lights on”.  What I mean by this, is we have our tools and scripts in place to monitor all of our services and databases, we have notification set up to alert us when they are down or experiencing trouble, and we have our troubleshooting methodologies and exercises that we go through in order to get everything back up and running.


The problem being, that's where our job usually ends.  We simply fix the issue and must move on to the next issue in order to keep the business up.  Not that often do we get the chance to research a better way of monitoring or a better way of doing things.  And when we do get that time, how do we get these projects financially backed by a budget?

 

Throughout my career there have been plenty of times where I have mentioned the need for better or faster storage, more memory, more compute, and different pieces of software to better support me in my role.  However the fact of the matter without proof on how these upgrades or greenfield deployments will impact the business, or better yet, how the business will be impacted without them, there's a pretty good chance that the answer will always be no.

 

So I’m constantly looking for that silver bullet if you will – something that I can take to my CTO/CFO in order to validate my budget requests.  The problem being, most performance monitoring applications spit out reports dealing with very technical metrics.  My CTO/CFO do not care about the average response time of a query.  They don’t care about table locking and blocking numbers.  What they want to see is how what I’m asking for can either save them money or make them money.

 

So this is where I struggle and I’m asking you, the thwack community for your help on this one – leave a comment with your best tip or strategy on using performance data and metrics to get budgets and projects approved.  Below are a few questions to help you get started.

 

  • How do you present your case to a CTO/CFO?  Do you have some go to metrics that you find they understand more than others?
  • Do you correlate performance data with other groups of financial data to show a bottom line impact of a performance issue or outage?
  • Do you map your performance data directly to SLA’s that might be in place?  Does this help in selling your pitch?
  • Do you have any specific metrics or performance reports you use to show your business stakeholders the impact on customer satisfaction or brand reputation?

 

Thanks for reading – I look forward to hearing from you.

How fast are you at troubleshooting?


Quick. If Storage Array X goes into a Warning state, what applications are affected? Oh great, now applications are going critical. Here come the phone calls. Where do these critical applications live?! You need answers now! Time’s up. Sorry, you lose. Thanks for playing.


Ho, ho. What’s this?


The Environment view you say? What’s that? Please explain.


Sure. But first, let’s imagine you work for a company called, SolarBronx. Here at SolarBronx, software is being developed at break-neck speed. We employ many people who work in various departments. Where do you fit in? You’re smart. You work in Engineering!

AppStack1 - Copy.png

Now let’s imagine you get sick and end up in the hospital for two weeks. Who will that affect at SolarBronx? Let’s have a look:

AppStack2 - Copy.png

As you can see, certain employees in various departments will be affected, one way or another. Some may need to pick up the slack in your absence. Let’s remove the clutter and just focus on who is affected.

AppStack3 - Copy.png

And there it is. Look at all the people your illness affected. This is unacceptable. SolarBronx determines you, the problem that’s costing the company time and money, must be removed. Sorry buddy, you’re fired. Hit the bricks! Now SolarBronx is running smoothly once again without you mucking up the works by costing the company a fortune after only three days on the job.

 

The Environment View

The Environment View is Orion’s way of telling the same story about your IT environment, only without the twisted humor. Take a look below at Orion’s interpretation:

e1.png

Here in this environment, Storage Array CX3-10c is in a Warning state. Clicking this Storage Array will show me everything that it’s related to and affecting in my environment, like so:

e2.png

Objects not affected or related to the object selected are shown as faded. To get a cleaner view of the related objects, click Spotlight to remove all unrelated objects. Voila!

e3.png

Pretty slick troubleshooting tool, wouldn't you agree? And it will be coming soon to a web console near you!


A recap of the previous month’s notable tech news items, blog posts, white papers, video, events, etc. - For the week ending Friday, Aug 29th, 2014.


News

 

OpenFlow Supports IPv6 Flows

Initially, OpenFlow did not have any definition for handling IPv6 communications. Now, newer OpenFlow versions have IPv6 capabilities and more vendors are deploying products that use the newer OpenFlow versions.

 

If packet sniffing won't work in SDN, then what?

Why Wireshark and other packet-sniffing tools aren’t ready for virtualized Layer 4-7 networks, but monitoring application server NICs could be the answer.

 

IPv6: An answer to network vulnerabilities?

On Aug. 15, 2012, in one of the most devastating cyberattacks ever recorded, politically motivated hackers wiped the memory of more than 30,000 computers operated by the Saudi Aramco oil company in an attempt to stop the flow of oil and gas to local and international markets.

 

Addressing 100G Network Visibility and Monitoring Challenges

Recent announcement on the availability of its 100G accelerator addressed industry concerns for managing high-traffic networks in an increasingly mobile world. When combined with its end-to-end Napatech Software Suite, the company significantly addresses the complexity of managing multiple accelerators with one "single pane of glass" view.

 

Blogger Exploits

 

SVoIP and VoSIP: What's the Difference?

VoIP’s wireless connectivity capabilities mean that voice signals can be transmitted over networks without the expensive costs of cabling, allowing businesses and residences to communicate at a significant cost savings. The final consideration begs the question of whether VoSIP or SVoIP is more secure in transmitting voice communications.

 

Cisco revamps CCNP for IoT, other market transitions

There’s no company in technology that’s taken advantage of market transitions more often than Cisco. In almost every presentation that CEO John Chambers gives, he opines about how catching market transitions at the right time has been the primary vehicle for Cisco’s growth over the past few decades.

 

Webinars & Podcasts

 

PMACCT: THE TRAFFIC ANALYSIS TOOL WITH UNPRONOUNCEABLE NAME

SDN evangelists talking about centralized traffic engineering flow steering or bandwidth calendaring sometimes tend to gloss over the first rule of successful traffic engineering: Know Thy Traffic.

 

Food for Thought

 

10 Networking Blogs You Must Follow So many network technology blogs, so little time. Get started with this list of blogs by networking experts and top Interop speakers.

 

How To Find A Lost Article In Google’s CacheEthan Banks

I accounted for everything – Cat6 cabling, fiber ready router, 3-tier architecture, failover at Layer 3, segmentation with VLANs, and many more features that sounded great but we probably never needed. I was proud it was not ‘One Big Flat Network’ (OBFN).

 

The first change request was raised on day 1 followed by at least one every week. With each request, I almost always found things I could have done differently during design or implementation. Thinking about it now, here is my list:

 

Too many cables:

Every network starts with a single cable. And then you keep adding more until you have no idea on which cable connects what. Happened with me. As the number of devices increased, so did the number of cables. And because I had not planned my cable schematics, my rack ended up almost like this:

 

cablemess-1-600x450.jpg

 

If you have to trace cables every time something has to be changed or there is an issue, rework on your cable management plan. Have different colors based on what they are for: clients to switches, access or trunk ports, router to other devices, etc. Group similar cables and don’t forget labels. Pick up a few cable management tips from here.

 

How tall are you?

I thought that the heavy core switch and the small firewall would never have to be moved. You get the idea?

 

Place your devices where they are reachable for maintenance – neither too high nor any place from where the ‘OFF’ switch can be kicked or the power cable can be yanked.

 

Backup from Day Zero

During implementation, a NAT and few ACLs later, I realized that RDP was not connecting. Took me multiple reconfigurations and hours of reading to realize that my original configuration was fine and I had simply forgotten to account for the address translation while trying RDP. Nothing bad until I realized that the delta of my current non-working config and my last working config was everything except ‘hostname Router2951’.

 

Backup your configurations the minute you see ICMP replies coming in either through northbound or southbound. Power failures don’t wait for a network to be up before bringing it down.

 

127.0.0.1/?

Because every ‘networking 101’ teaches you how to subnet, I added quite a few to my network. Users, servers, wireless, management or any department with at least 2 hosts had its own subnet. Believing in small is beautiful, I loved class C and /28 for 8 months until I realized 30 hosts would do better before settling down with /26 after a year.

 

Plan for the next year or the year after. It is also fine to start with /24 or even a /23 if you don’t have too many hosts. Club a couple of departments together in a larger subnet rather than starting with smaller ones and then recalculating subnets every six months. OBFN is not bad.

 

Complexity is not a Necessity

I added technologies I thought were great for the network. Example VLAN. Most SMBs, especially those using VoIP have VLANs though they don’t even fill out a /24. Why? Because we have been taught about VLANs and broadcast domains. VLAN is great for isolation and management but is not always good for performance. In an SMB network VLANs only add to the complexity.

 

VLAN is an example. SAN is another and there are more. The rule of the thumb is, use technologies only if it actually solves a problem for you.

 

Quality matters

New admins dread QoS – either they don’t use it or they overdo it and I was the former. With no QoS to police the network, HTTP, FTP and some random port-protocol combos teamed up in my network to slow down RTP and RDP during peak hours.

 

Use QoS to provide priority. But that should only be when required and not be the norm causing all your other apps to fail while only VoIP goes through.

 

What hit me?

One day our firewall decided to drop packets because it had too many rules. Another time it was an end-user deciding that dancing pigs were better than the security warning. Either way, we ended up with downtime.

 

10 or 10000 devices, be it with open-source, free or paid tools, network monitoring should be in your network design. That is how you can get insights into what will go down and who will bring it down.

 

So after all the learning, there we are now! [Exaggerated visual]

neat-data-cabling-network.jpg

 

We could agree or disagree. But Pimiento or Pure Capsaicin, Padawan or Master, what is in your list?

Here are some posts I’ve found that help with quantifying the cost of downtime.

 

Hourly cost for computer networks - $42,000: http://www.networkworld.com/careers/2004/0105man.html?page=1

Cost per hour of datacenter downtime for large organizations - $1.13M: http://www.stratus.com/~/media/Stratus/Files/Library/AnalystReports/Aberdeen-4-Steps-Budget-Downtime.pdf

Average cost of a datacenter outage - $505,502: http://emersonnetworkpower.com/en-US/Brands/Liebert/Documents/White%20Papers/data-center-costs_24659-R02-11.pdf

Large enterprises lose more than $1M each year due to IT failures: http://www.informationweek.com/storage/disaster-recovery/it-downtime-costs-265-billion-in-lost-re/229625441 

TechRepublic Article – How to calculate the cost of downtime: http://www.techrepublic.com/article/how-to-calculate-and-convey-the-true-cost-of-downtime/

 

Curious, what is your calculated hourly cost of downtime? What factors went into your calculation?

I was thinking a good place to start off a VoIP discussion was with the topic of deploying VoIP! As many of us know voice has become a common staple in many IP networks and is typically designed into the network from day 1. However there are definitely still networks out there that are not currently running VoIP and are looking to implement these VoIP & mobility services. If you have had the opportunity to integrate voice into an existing IP network you know there is definitely some prep work that will have to be done. After all, if implementing VoIP into a network was a simple thing we wouldn't get paid the big bucks right?

 

Like many implementations and deployments after you have done a couple, they become easier and you learn how to avoid the more common pitfalls and setbacks. Many people have went as far as to create a 'checklist' of sorts for making VoIP deployments easier providing an simpler work flow to follow. Below are going to be a few bullets from my own checklist:

 

Power - How do you plan on powering VoIP equipment (Hand held phones, conference phones, desktop phones, etc), sure you can use power bricks or you can rely on PoE (Power over Ethernet) switches this way you don't have to worry about the additional overhead of the power bricks (ordering, inventory, cost, shipping, etc). You may also have to consider the power budget of those switches, depending on what (if any) other PoE capable devices are already receiving power. Remember many wireless-N & AC access points require more power than the older 802.3af standard could provide to fully operate, or perhaps you also have PoE capable surveillance systems? If you are deploying wireless handheld phones do you need some larger multi-device charging docks, or should you account for spare batteries?

 

Guides for the end users - This may sound trivial but we can't forget about the end users. Making sure the users know how to access their voice mail and use the IP Phones the way they need ensures we get fewer questions from the end users, which makes everyone's life easier.

 

QoS (Quality of Service) – This is definitely a very important aspect in every VoIP network. Without QoS to protect the signaling and RTP audio streams from the rest of the IP data network traffic the VoIP system could be rendered completely un-usable. Where should you create your trust boundary and how should you allocate your queue's?

 

IP Addressing/VLAN Assignment - Depending on the size of your deployment, you might have to take some extra care into your address assignments and DHCP scope creations (Do you need to create any special DHCP Options?). What if your routing schema is summarizable and hierarchical to the core of your network how can you maintain that standardization?

 

Authentication/Security - How are the IP Phones going to authenticate to the network will you need to deploy certificates or require the phones to participate in Dot1X authentication? Do you need to consider security compliancy with any governing bodies? (PCI, HIPPA, CJIS, SOX, etc)

 

Site Surveys - More aimed toward VoWLAN (Voice over Wireless LAN) deployments. The RF spectrum is the layer-1 medium for wireless networks, performing a site survey & knowing the RF environment of your WLAN gives you an amazing advantage from a support and maintenance perspective. If you are aware of the extent of your coverage and what interferes are in your environment you can either eliminate those obstacles or work around them eliminating issues before they even arise.

 

 

Those are just a few topics, what items are on your VoIP deployment/readiness checklist that you look out for and why do you consider them to important aspects during a VoIP deployment?

IT infrastructure is nothing but a bunch of hardware and software equipment or assets – of course there’s a ton of configurations, connections and customizations that go into making IT work for the business. But, in essence, all the elements of the IT infrastructure are assets – all those that network devices and storage hardware in the server room, all those computers that employees use in your network, all the communication equipment including printers, telephones, projectors, almost anything and everything IT and hardware are assets. Do not leave out software, because that’s IT asset too, and goes through most of the phases of the hardware asset lifecycle.

ITAM 2.png

 

As IT admins and help desk managers go about managing IT assets from the process of procurement, assignment, problem resolution, change and disposal, there are many a challenge that they face in implementing a smooth and seamless asset management process. Let’s list out some top challenges that IT pros face with both hardware and software asset management.

#1 LACK OF ORDER = MANAGEMENT CHAOS

The biggest challenge that small organizations face is they don’t have a good asset inventory. Without an inventory you are missing vital information on what exists on your network and where, who owns what assets, etc. You could try spinning this up in an Excel sheet or using any home-grown tools, but without a comprehensive asset management practice managing asset information will be extremely challenging leading to management disarray.

 

#2 EVER-CHANGING NATURE OF CHANGE

Any piece of IT equipment whether it’s hardware or licensed software has an expiry date. Either a hardware goes dysfunctional/kaput due to usage over time, or there’s a better version, an upgraded version of software/hardware that must replace the incumbent one. When there’s lack of organization with asset management, tracking and keeping abreast of changes in asset equipment, changes in asset assignment, and repairs/changes done on assets become almost impossible.

 

#3 PROCUREMENT PREDICAMENT AND BUDGETARY BLINDNESS

When you do not know what exists in your IT infrastructure, it becomes harder to determine what to buy and how many to buy, because you don’t know what’s in stock. IT teams will end up spending money on new purchase without knowing the real need for expenditure. Only if you knew what you have, who has what, what’s in use, how old or expired existing assets are, you will be able to make informed decisions about procurement of new equipment and software.

 

#4 ALL THAT LOST TIME FOR WHAT – DISCOVERING ASSETS?

Where IT admins tend to lose their valuable time each day is trying to manually discover and make stock of assets. As your network and user base grows, it’ll become physically impossible to inventory computer equipment and software in a one-by-one basis manually. What you need is an automated asset discovery process that allows you to automatically discover computer assets, and lets you know what exists and where, and keeps the inventory up to date with periodic discovery updates.

 

#5 THE HELP DESK & IT SERVICE MANAGEMENT CHALLENGE

All the aforementioned points impacts IT service and support process. Most of the trouble tickets are related to IT assets (either hardware or software). Without sufficient information about the history of the asset, its configuration, previous IT issues, etc., you will have to spend time figuring this out each and every time. This slows down service request resolution and makes the help desk management process more challenging. Having an IT asset management software would help you save time on managing asset and let you focus on actual IT administration and support.

 

One of the best use cases of IT asset management is with your help desk or infrastructure monitoring tool that allows you to discover assets, maintain inventory, report on changes and alert on license/warranty expiry.

 

If you face any IT asset management challenges, or have any tips to overcome them, please do share your views.

 

Read this free white paper to learn more about the benefits of and best practices for IT asset management.

Filter Blog

By date:
By tag: