1 2 3 Previous Next

Geek Speak

1,587 posts

speeding-car.png

 

Like many of you, I suspect, I am firmly under the thumb of my corporate security group. When they want to know what’s going on on the network I’ll put in as many taps as they want; but sometimes they want more than that, especially where NAT is involved and IPs and ports change from one side of a device to the other and it’s otherwise difficult to track sessions.

 

 

Speedometer.png

High Speed Logging

 

Have you heard of High Speed Logging (HSL)? I’d like to say that it’s a single standard that everybody plays by, but it seems like more of a concept that people choose to interpret in their own way. HSL is the idea of logging every single session, and in particular every single NAT translation.

 

So who offers HSL? Well, in theory if you can log session initiation to syslog, you can do High Speed Logging, but some vendors explicitly support this as a feature. For example:

 

 

 

road-closed.png

Potential Road Blocks

 

High Speed Logging sounds simple enough, but there are a few things to consider before configuring it everywhere.

 

 

 

  1. What is the impact on your device when you enable High Speed Logging?
  2. Is it possible that the volume of logging messages generated might exceed the capacity of your device’s management port? One solution to this is to use an in-band destination for HSL instead.
  3. With multiple devices logging at high speed, can your management network cope with the aggregate throughput?
  4. Once you have the logs, how long do you have to keep those logs? Do you have the necessary storage available to you?
  5. For those logs to be useful, any system that is doing analysis, filtering or searching of the logs is going to have to be fairly fast to cope with the volume of data to look through.
  6. Will a DDoS attack also DoS your logging infrastructure?

 

 

speedy-snail.pngHow Powerful Is Your Engine?

 

The question of the whether a device is capable of supporting HSL should not be underestimated. In one company I worked at, the security team logged every session start and end on the firewalls. To accommodate this the firewall vendor had to provide a special logging-optimized code build, presumably at the cost of some throughput.

In another case I’ve seen a different firewall vendor’s control CPU knocked sideways just by asking it to count hits on every rule, which one would imagine should be way less intensive than logging every session.

 

 

magic-roundabout.jpgNavigation

 

Assuming you manage to get all those logs from device to log server, what are you going to do with them? What’s your interface to search this rapidly growing pile of log statements–undoubtedly in different formats–in a timely fashion? Are you analyzing real time and looking for threats? Do you pre-process all the logs into a common, searchable format, or do you only process when a search is initiated?

 

 

rocket-bike.jpgBicycles Are Nice Too

 

I suppose the broader question perhaps is whether High Speed Logging is actually the right solution to the problem of network visibility. Should we be doing something else using taps or SPAN ports instead? Having worked in an environment that logged every session on firewalls, I know it is tremendously useful when troubleshooting connectivity issues, but that doesn't make it the best or only solution.

 

 

What do you think? Have you tried High Speed Logging and succeeded? Or failed? Do you think your current log management products (or perhaps even just the hardware you run it on) are capable of coping with an onslaught of logs like that? What alternatives do you use to get visibility? Thanks in advance for sharing your thoughts!

In our second visit back to our 2015 pro-dictions, let’s take a look at the evolution of the IT skill set. It appears Patrick Hubbard was spot-on in his prediction about IT pros needing to broaden their knowledge bases to stay relevant. IT generalists are going the staying with what they know best, while IT versatilists and specialists are paving the way to the future.

PHUB_update.jpg


Earlier this year, kong.yang wrote a blog addressing this very topic, which he dubbed the Stack Wars. There, he lightly touched on generalists, versatilists, and specialists. Read the following article to get a deeper look at each of the avenues IT pros can pursue: “Why Today’s Federal IT Managers May Need to Warm Up to New Career Paths.”

 

Fans of spy fiction (of which there are many in the government ranks) might be familiar with the term “come in from the cold.” It refers to someone who has been cast as an outsider and now wishes to abandon the past, be embraced, and become relevant again.

 

It’s a term that reflects the situation that many federal IT managers find themselves in as government agencies begin focusing on virtualization, automation and orchestration, cloud computing, and IT-as-a-Service. Those who were once comfortable in their position as jacks-of-all-IT-trades are being forced to come in from the cold by choosing new career paths to remain relevant.

 

Today, there’s very little room for “IT generalists.” A generalist is a manager who possesses limited knowledge across many domains. They may know how to tackle basic network and server issues, but may not understand how to design and deploy virtualization, cloud, or similar solutions that are becoming increasingly important for federal agencies.

 

And yet, there’s hope for IT generalists to grow their careers and become relevant again. That hope lies in choosing between two different career paths: that of the “IT versatilist” or “IT specialist.”

 

The IT Versatilist

An IT versatilist is someone who is fluent in multiple IT domains. Versatilists have broadened their knowledgebase to include a deep understanding of several of today’s most buzzed-about technologies. Versatilist can provide their agencies with the expertise needed to architect and deliver a virtualized network, cloud-based services, and more. Versatilists also have the opportunity to have a larger voice in their organization’s overall strategic IT direction, simply by being able to map out a future course based on their familiarity surrounding the deployment of innovative and flexible solutions.

 

The IT Specialist

Like versatilists, IT specialists have become increasingly valuable to agencies looking for expertise in cutting edge technologies. However, specialists focus on a single IT discipline. This discipline is usually directly tied to a specific application. Still, specialists have become highly sought-after in their own right. A person who’s fluent in an extremely important area, like network security, will find themselves in-demand by agencies starved for security experts.

 

Where does that leave the IT generalist?

Put simply – on the endangered list.

Consider that the Department of Defense (DoD) is making a major push toward greater network automation. Yes, this helps takes some items off the plates of IT administrators – but it also minimizes the DoD’s reliance on human interference with its technologies. While the DoD is not necessarily actively trying to replace the people who manage their networks and IT infrastructure, it stands to reason that those who have traditionally been “keeping the lights on” might be considered replaceable commodities in this type of environment.

 

If you’re an IT generalist, you’ll want to expand your horizons to ensure that you have a deep knowledge and expertise of IT constructs in at least one relevant area. Relevant is defined as a discipline that is considered critically important today. Most likely, those disciplines will center on things like containers, virtualization, data analytics, OpenStack, and other new technologies.

 

Whatever the means, generalists must become familiar with the technologies and methodologies that are driving federal IT forward. If they don’t, they risk getting stuck out in the cold – permanently.

 

**Note: this article was originally published in Technically Speaking

 

There’s no doubt that over the past couple of years the cloud as gone from a curiosity to a core component of many companies IT organizations.  Today Amazon AWS and Microsoft Azure are well known commodities and the cloud-based Office 365 has proven to be popular for businesses as well as well as consumers.  Today it’s even common for many business applications to be cloud based. For instance, SalesForace.com is a popular Software-as-a-Server (SaaS) application and many organizations have moved to cloud-based email. However, one notable holdout has been database applications. While there certainly are cloud-based database options business have been more than a little reticent to jump abroad the cloud for their databases.

 

Why the reluctance to move to the cloud?

 

The fact of the matter is that for most organizations their relational databases are the core of their IT infrastructure. Business critical applications are built on top of those databases and availability is paramount. While businesses can tolerate some downtime in their email downtime connectivity problems with their relational databases are unacceptable.  While there’s no doubt that internet and cloud connectivity is better than at any point in the past it this past year’s well publicized outages have shown that it’s far from perfect.

 

And then of course there are the control and security issues. Many organizations are just uncomfortable moving their data off premise. While the security of most cloud providers exceeds the average IT organization putting your critical data into someone else’s hands is a matter a trust that many organizations are not willing to make. When you data is on-premise and backed up you know you can restore it – the control remains within your own organization.  That’s not the case if your data is in the cloud. There you need to depend on the cloud provider

 

Another key issue for many international organizations is data sovereignty. In many countries like Canada, businesses are required by law to keep their data within their countries borders. In the past this has been a hurdle for many cloud providers as cloud servers could be located anywhere and they are not typically aligned with national boundaries. This is beginning to change as some cloud providers are beginning to support national data boundaries.

 

Where Cloud Database Fit Best?

 

So does this all mean that databases will never make to the cloud? The answer is clearly no. While established medium and enterprise sized businesses may have reservations about moving to the cloud, the cloud can make a lot of sense for smaller business and startups. Using the cloud can result in considerable capital savings for new businesses. SMB that may be faced with considerable hardware upgrade costs could also find the cloud to be a compelling alternative. Just as important is the fact that cloud database move many of the maintenance tasks like patching and upgrades into the hands of the cloud vendor freeing the business from them.

 

The move to cloud databases is far from inevitable but cost and labor savings make it a compelling option for new businesses and SMBs. In the next posting I’ll look at what happens if you do make that jump to the cloud.

 

There are many areas in which existing monitoring, and orchestration fall short. Subsequent to the discussions in response to my previous posting there were some interesting and appropriate concerns raised.

 

The added complexity of the addition of the hybrid cloud into the equation regarding monitoring dictate that hypervisor, hardware, and connectivity make reliance on specific platform a quandary that requires massive consideration prior to the willingness to undertake such a purchase. If you accommodate for one of these things, but not all of them, you’ll find yourself open to a tool that simply doesn’t do what you need it to. Let’s say that you’ve chosen a tool that relies on a specific hardware platform, but then choose a third party provider that uses different switching gear, or different host servers. What would your solution be in this case? The answer is that your solution won’t solve anything, and becomes shelfware. So, what becomes necessary in this case is a full-evaluation of your potential solution, along with an eye toward the future growth of your environment.

 

Can your management layer provide insight into the hybrid data center? If so, does your solution rely on standard MIBs, via SNMP? Will it give accurate predictive monitoring of server/switch/storage through this tunnel? How does it handle the migration of workloads or does it at all? What if you’re trying to monitor variable hypervisor stacks? Will you be able to interact from your internal hypervisor to an external OpenStack environment? What about the storage on the other end? Will you gain insight to your own storage? How about that on the other end?   

 

As I’ve stated, any monitoring solution that relies on a specific hardware platform, not APIs or SNMP; a system that relies on literally anything proprietary will only work if your systems comply with these standards. You’ll need to be willing to comply with lock-in in order for these systems to work for you. This is a key consideration of any purchasing decision. Of course, today’s architecture may fit into any proprietary rule set, but what about the future. Does management want to invest in such a massive undertaking (not only the investment in software, but systems, and manpower) with short-sightedness? Remember, this is a capex as well as opex investment. Often, the operational expenses can far outreach those of the capital investment.

 

In my opinion, the future of the cloud world is still uncertain. OpenStack has changed the game, yet its future too is uncertain. The fragmentation in this nascent technology leaves me wondering where its future may lie. And again, I’m still not sold on the cost model. This particular business needs to solidify, and engage in deeper standards, meanwhile allowing for full-stack OpenStack models to be created with the inclusion of a support model that doesn’t leave the customer in the lurch.

 

I wonder what this means for the future of these products. Should an organization invest in a single stack? I wonder…

mfmahler

DARK SIDE OF THE ENCRYPTION

Posted by mfmahler Jul 20, 2015

“Encryption makes it more difficult for governments and other third parties to monitor your traffic. It also makes it harder for Internet Service Providers (ISPs) to censor access to specific Wikipedia articles and other information”. - Wikimedia blog, June 12, 2015

 

“and that by the end of 2016 65-70% of traffic will be encrypted in most markets”. - Sandvine Global Internet Phenomena Spotlight

 

 

I recently met a bright young man who works for Google in Europe.

Me: It’s nice to meet you!

Him: It’s nice to meet you, too!

Me: I have to say to you that Google’s “HTTPS everywhere” makes my life harder as a network security professional”.

Him: Well… This is the way to make things more secure.

Me: I agree, especially in the user perspectives. But my job is still harder…

 

 

GOOGLE I/O 2014

Pierre Far and Ilya Grigorik gave an awesome Google I/O session to developers, titled HTTPS everywhere, in June 2014. They evangelized that ALL web communications should be secure always and by default, in order to protect users’ privacy. To prevent Man-In-The-Middle (MITM) attacks, all web communications should be secured by HTTPS, which is HTTP over TLS. Pierre and Ilya stated that HTTPS not only would provide encryption of the client-server communications, but also authentication and data integrity. They later demonstrated the best practices of setting up a secure web site and its indexing signals for Googlebot.

 

 

EFF’s HTTPS EVERYWHERE

Google’s increasing use of HTTPS inspired Electronic Frontier Foundation (EFF) to introduced HTTPS Everywhere (with uppercase E in Everywhere) with version 1.0 released in 2011. HTTPS Everywhere is an open source web browser extension for Firefox, Chrome, and Opera. If the target websites support HTTPS, the browser extension automatically makes the web browsing connections to HTTPS. As far as I understand, IE users can install the non-EFF Zscaler Tools HTTPS Everywhere; Safari users need to type https:// manually.

 

 

65-70% INTERNET TRAFFIC ENCRYPTED BY 2016

Canadian network policy control products company Sandvine conducted a study with a North American fixed access network service provider in April 2015 to understand the encryption adoption of the internet traffic. The study found that 29% of the downstream traffic of that service provider was encrypted. The majority of the encrypted source traffic was YouTube and BitTorrent’s traffic followed.

 

For the unencrypted traffic, Netflix contributed 35% share (surprise, surprise, surprise not). This was an interesting finding because in April 2015, Netflix announced in the quarterly earnings letter that it would move to HTTPS to stream movies in the next year, in addition to the existing encrypted log-in and other sensitive data pages. With the Netflix transition to secure content delivery, Sandvine predicted that almost two-third on that North American ISP traffic would be encrypted.

 

More and more web sites are moving to HTTPS. For example, Wikimedia Foundation announced in a blog on June 2015 that it were in the process to encrypt all Wikimedia’s content with HTTPS and that it would use HTTP Strict Transport Security (HSTS) to protect against MITM attacks.

 

 

CHALLENGES OF MONITORING ENCRYPTED TRAFFIC

My team has recently been working on a project to migrate our perimeter firewalls to the Next Generation Firewalls (NGFW). Before we would put them inline, we set them up as monitor mode. What did we observe? Over 95% of our DMZ inbound traffic was encrypted. It’s not a surprise because our company’s website enforces HTTPS connections. About 60% of our outbound web traffic was encrypted. Of course with only monitor mode, our NGFW found ZERO threat from the encrypted traffic.

 

How do you monitor the activities in the encrypted traffic? You may say you can implement SSL Interception. SSL Interception is a beautiful term that we information security use for what we do, but in the end, it’s basically MITM attack (OK, in a white hat).

 

Even though we have the blessing from the executives to implement SSL interception for DLP, IPS, IDS, etc, we certainly cannot provide 100% customer satisfaction to our employees. NGFW and web proxy vendors provide a list of affected applications when SSL interception is implemented. The list includes, Microsoft Update, iTunes Store, GoToMeeting, and Dropbox. Beside high cost (money and man power) of implementing SSL interception for visibility and control, I wonder how many companies are blind to the encrypted traffic on their network.

 

Lastly, I would like to point out that Jacob Thompson of Independent Security Evaluators proposed a method against SSL interception. He demo’ed it at DerbyCon 4 in 2014. My point is that the half a million to a million dollars NGFW/IPS may not be able to give you 100% visibility that you expect.

 

Do you encounter any challenge to detect threats with the increasing encrypted traffic on the infrastructure? Do you have any successful and failure story to share? I would like to hear from you.

Iaas, SaaS, PaaS... It seems like everything can be consumed as a service these days. Databases are no different thanks to the cloud providers out there, but my question to you... the IT admin / DBA / programmer ... is, are you ready to use them? After all most things out there have a pretty similar adoption cycle, but databases seem to be a bit of a touchy subject. Throughout the years of working at technical consulting companies the one thing that always seemed to be static is that most IT departments don't get too crazy with their databases, meaning not too many were ready to be a trendsetter on how they manage or operate databases. Why? Well most of the time I found that not too many of them considered themselves "database people". Now with that said some customers, who had full time DBA's, were much more liberal about upgrading things and pushing the limits ... but most mid-market companies didnt have a full time DBA..


So getting back to my question. Are cloud databases ready for primetime?


My guess is that most mid-market companies would shy away from answering this question or even the thought of putting their databases on any cloud as a service... but I could be wrong.


Think about it this way... people have had email host by their ISP, or hotmail, or yahoo, etc for decades... yet hosted enterprise email has really only taken off in the last few years. Even though people know how it works and know it can be trusted in terms of reliability. So I think to answer my question I should first ask: "What do you know about cloud hosted databases?" and "If you knew more about them would you be ready to sign up?"


To help you answer these questions I should probably help explain why DBaaS is attractive. Like anything (XaaS) the best part of consuming these services is that you don't need to worry about the hardware or the platform software. So in a sense, DBaaS is perfect for the exact market that would shy away from it, because all of the things that people don't know about databases are the things that are taken care of for you when using DBaaS. All you need to do as the consumer is connect your app to the service and start consuming.


So with that said I'm thinking we should review some of the more mainstream DBaaS offerings as well as some that you might not know and along the way do some how to get started posts.


On the list of DBaaS to review I have: Azure SQL, Amazon RDS ( which really includes MySQL, Oracle, MS SQL, and Postgres), Google Cloud SQL, HP Helion Relational DB (MySQL), and Openstack Trove.

Also just as a side note, I'm not trying to endorse one service over another, or even over services that I havent listed. These are just the ones I know about, I am however open to suggestions.


So stay tuned, hopefully we will all learn a little more about DBaaS after checking these out.


If you recall, to kick off the New Year, the Head Geeks made predictions forecasting how they thought the year in IT would unfold. Now that we’re past the mid point in 2015, I thought it would be fun to revisit some of their predictions over the coming weeks.


So, to kick things off, let’s start with the following prediction:

Security  prediction.jpg

It’s safe to say that this prediction from adatole holds true. Security issues can be devastating for a company, let’s take a look at this related article, “preventing a minor, insider accident from becoming a security catastrophe.

 

There are accidents – and then there are accidents.


A dog eating a kid’s homework is an accident. Knocking over a glass of water is an accident. A fender-bender at a stop sign is an accident.

The incorrect use of personal devices or the inadvertent corruption of mission-critical data by a government employee can turn out to be more than simple accidents, however. These activities can escalate into threats that can result in national security concerns.


These types of accidents happen more frequently than one might expect — and they’ve got DOD IT professionals worried. Because for all of the media and the government’s focus on external threats — hackers, terrorists, foreign governments, etc. — the biggest concern continues to be threats from within.


As a recent survey by my company, SolarWinds, points out, administrators are especially cognizant of the potential for fellow colleagues to make havoc — inducing mistakes. Yes, it’s true: DOD technology professionals are just as concerned about the person next to them making a mistake as they are of an external anonymous-style group or a rogue hacker.

So, what are agencies doing to tackle internal mistakes? Primarily, they’re bolstering federal security policies with their own security policies for end-users.

While this is a good initial approach, it’s not nearly enough.


IT professionals need more than just intuition and intellect to address compromises resulting from internal accidents. Any monitoring of potential security issues should include the use of technology that allows IT administrators to pinpoint threats as they arise, so they may be addressed immediately and without damage.


Thankfully, there are a variety of best practices and tools that address these concerns and nicely complement the policies and training already in place, including:

  • Monitoring connections and devices on the network and maintaining logs of user activity to track: where on the network certain activity took place, when it occurred, what assets were on the network, and who was logged into those assets.
  • Identifying what is or was on the network by monitoring network performance for anomalies, tracking devices, offering network configuration and change management, managing IT assets, and monitoring IP addresses.
  • Implementing tools identified as critical to preventing accidental insider threats, such as those for identity and access management, internal threat detection and intelligence, intrusion detection and prevention, SIEM or log management, and Network Admission Control.

Our survey respondents called out each of these tools as useful in preventing insider threats. Together and separately, they can assist in isolating and targeting network anomalies. Log and event management tools, for example, can monitor the network, detect any unauthorized (or, in this case, accidental) activity, and generate instant analyses and reports. They can help IT professionals correlate a problem — say, a network outage — directly to a particular user. That user may or may not have inadvertently created an issue, but it doesn’t matter. The software, combined with the policies and training, can help administrators attack it before it goes from simple mistake to “Houston, we have a problem.”

The fact is, data that’s accidentally lost can easily become data that’s intentionally stolen. As such, you can’t afford to ignore accidental threats, because even the smallest error can turn into a very large problem.


**Note: This article was originally published by Defense Systems.**

Every time an organization decides to adopt a new technology or expands its business operations, the IT network is the department where the process beginschanges and additions to the existing network to accommodate new technologies and users. But making changes to the existing IT infrastructure is one thing most network admins think twice about. Even a small configuration error can lead to network downtime, a security breach, or data loss which in turn may even cause the organization to vanish from the business map.

 

GNS3 is the solution to a network admin’s reluctance for experimenting with the production network. With GNS3, a network admin can emulate their actual network to try out configuration changes they otherwise would have had to perform on the production network. The benefits don’t end therean admin can design, build, and test complex networks using GNS3 before even spending capex to procure actual physical hardware.

 

Now, networks don’t run forever once configured. A network monitoring solution is critical to maintain network uptime and ensure business continuity. And because monitoring is such a critical component, the best possible tool has to be chosen for the task.  If you are a network admin who does not like to run trial software in the production network, you should check out GNS3 and their community, the Jungle. And to work with it, there is now enterprise-class monitoring from SolarWinds. Network monitoring products from SolarWinds, including SolarWinds Network Performance Monitor can be installed on any virtual network created using GNS3 and used to monitor it. Welcome to virtual reality! 

 

GNS3 is a strategic SolarWinds partner and to help you get to know them better, we bring you the newly created GNS3 group in Thwack! Log into thwack, register for ThwackCamp and join the GNS3 group so you’re in the know!

“Oh, the farmer and the cowman should be friends” – Rogers & Hammerstein, “Oklahoma!”

 

The modern office environment has its fair share of rivalries, competitions, and friction. In some companies, interdepartmental politics abound,  project teams put “frenemies” in direct contact with each other, or a heated exchange can impact an entire career. This affects IT Professionals as much as any other career area (some would say more).

 

There’s one IT rivalry I have seen consistently in almost every organization, and that’s between the team responsible for security (InfoSec) and the team in charge of network monitoring and operations (NetOps). In many companies, the dynamic is almost co-predatory, with each side attempting to completely obliterate the efficacy and credibility of the other.

 

The classic characterization is that

1) the Monitoring team wants/needs complete access to all systems in order to pull reliable and accurate metrics;

2) While the InfoSec team wants to lock everyone out of all systems in the name of keeping things “secure”

 

But it’s patently not true. At ThwackCamp 2015, security industry powerhouse Charisse Castagnoli (c1ph3r_qu33n here on Thwack) and I sat down for a frank talk about the pressures of our respective roles, and then brainstormed ways to get InfoSec and NetOps/Monitoring working together rather than in opposition.

 

One of the things we hit on was the good old lunch-and-learn. A lot of the friction between security and monitoring comes from a good old communication disconnect. Not knowing about the current pressures, priorities, and projects on the other side of the cube wall typically leads to frustration and loathing. The solution is to regularly sit down to hash it out, and find ways to augment, rather than short-circuit, each other’s efforts.

 

During our talk Charisse and I challenged viewers to set a table, along with a meeting request, and record notes of how the conversation went (You had a food fight? We want to see pics or it never happened!). Post those notes (and pictures) below, and we’ll select some of the best ones to receive 500 thwack points.

Network Gumshoe.pngNetwork admins definitely play the role of Network Gumshoe. Dealing with daily network issues like bandwidth hogs, IP conflicts, rogue users, and moreadministrators spend a considerable amount of time investigating and resolving network issues. But are they really equipped for this kind of troubleshooting? Is there a specific troubleshooting process involved in finding problematic users/devices while ensuring minimal downtime?

 

In a network, employees come in with devices pre-configured with IP addresses from prior Internet connections (home or elsewhere). This could result in an IP conflict with a critical application server that could cause an interruption of services. In other cases, IP conflicts happen when a network admin accidently assigns a duplicate IP address, or a rogue DHCP server operating in the network hands out IP addresses at will. Bandwidth issues creep up in the presence of a YouTube hog, or when someone misuses company resources for unofficial purposes. Finally, rogue users who’ve somehow gained entry to the network may attempt to access confidential data or restricted networks. All these frequently occurring incidents threaten to upset any smoothly functioning network.

 

In any case, the primary goal of a network admin is to fix an issue with minimal downtime and take steps to ensure that it doesn’t happen again. For issues associated with problematic users/devices in a network, here are four simple steps to follow when troubleshooting:

  • Quickly identify and investigate the problematic user/device.
  • Locate the problematic user/device.
  • Immediately remediate the problematic user/device.
  • Take steps to prevent the same situation from happening again.

Network Trouble Shooting Process.png

 

  1. To quickly detect problems in the network, it’s best to have a monitoring tool in place. Depending on which specific area of the network needs monitoring, admins can set up timely alerts and notifications. Specific monitoring tools are available to help, including those that let you see the up/down status of your devices, IP address space, user/device presence in the network, etc. Once the bandwidth hog, IP conflict, or rogue DHCP is identified, the first step of the troubleshooting process is complete.
  2. The next critical step is determining whether the user/device in question actually caused the problem. You need to look at detailed data that reveals the amount of bandwidth used, who used it, and for what application. You should also look at details on devices in an IP conflict and determine what type of conflict it was, look for the presence of unauthorized devices in the network, and so on. This investigation should also provide data on the location of the user/device in the network, including details like switch port information, or the Wireless Access Point (WAP), if it’s a wireless device.
  3. The third step is remediation. Whatever caused the network interruption needs to be fixed. Knowing the location of the problemas mentioned in the previous stepit’s very helpful in taking immediate steps. Admins can both physically locate the device and unplug network access, or they can use tools that enable the remote shutdown of devices. The remote facility especially helpful for admins working with networks spread over large areas or multiple locations. The critical point here is that network access needs to be revoked immediately.
  4. Finally, take steps to prevent the same problem from happening again. If it’s the case of a problematic user/device, make sure you block or notify entry of these systems into the network. Create check points and monitoring mechanisms so that you can take proactive measures and prevent unauthorized users from entering your network.

 

What troubleshooting processes do you follow in your organization? Feel free to share your experiences, which fellow network admins might find useful.

If you're running a network performance monitoring system, I'll bet you think you have visibility into your network.

 

I say you don't – or at least that your vision may be a bit more blurry than you realized.

 

Gathering Statistics

 

There are three kinds of lies: lies, d***ed lies, and statistics.

 

In reality there's nothing wrong with the statistics presented by a network performance management system so long as you understand the implications of the figures you're looking at, and don't take them as a literal truth. For example, when I set a utilization alarm threshold at 90% on a set of interfaces, what does 90% actually represent? If an application sends a burst of data and the interface is used at 95% of its capacity for 3 seconds, should that trigger an alarm? Of course not. How about if it's at 95% utilization for 3 minutes while a file transfer takes place; should that trigger an alarm? Maybe rather than triggering alarms on specific short term utilization peaks I should be forecasting on an hourly average utilization instead; that would even out the short term peaks while still tracking the overall load on the link.

 

And thus we touch on the murky world of network performance management and the statistical analysis that takes place in order to present the user with a single number to worry about. Each product developer will make their own decisions about how to process the numbers which means that given the same underlying data, each product on the market will likely generate slightly different results.

 

Garbage In, Garbage Out

 

Before any statistical analysis can take place, data must be gathered from the devices. "GIGO" implies that if the data are bad, the outputs will be bad, so what data are we gathering, and how good are they?

 

Monitoring systems will typically grab interface statistics every 5 minutes, and a standard MIB-II implementation can grab information such as:

 

  • ifSpeed (or ifHighSpeed); the speed of the interface
  • ifInOctets / ifHCInOctets (received octets, or bytes)
  • ifOutOctets / ifHCOutOctets (sent octets, or bytes)

 

Since there is no "current utilization" MIB entry, two polls are required to determine interface utilization. The first sets a baseline for the current in/out counters, and the second can be used to determine the delta (change) in those values. Multiply the deltas by 8 (convert bytes to bit), divide that by polling interval in seconds and I have bits per second values which I can use in conjunction with the interface speed to determine the utilization of the interface. Or rather, I can determine the mean utilization for that time period. If the polling interval is five minutes, do I really know what happened on the network in that time?

 

The charts below represent network interface utilization measured every 10 seconds over a five minute period:

 

50charts.png

 

All four charts have a mean utilization of 50% over that five minutes, so that's what a 5-minute poll would report. Do you still think you have visibility into your network?

 

Compromises

 

Network performance management is one big, bad set of compromises, and here are a few of that issues that make it challenging to get right:

 

  • Polling more often means more (and better resolution) data
  • More data means more storage
  • More data means more processing is required to "roll up" data for trending, or wide data range views, and so on.
  • How long should historical data be kept?
  • Is it ok to roll up data over a certain age and reduce the resolution? e.g. after 24 hours, take 1-minute polls and average them into 5-minute data points, and after a week average those into 15-minute data points, to reduce storage and processing?
  • Is the network management system able to cope with performing more frequent polling?
  • Can the end device cope with more frequent polling?
  • Can I temporarily get real-time high-resolution data for an interface when I'm troubleshooting?

 

What Do You Do?

 

There is no correct solution here, but what do you do? If you've tried doing 1-minute poll intervals, how did that work out for you in terms of the load on the polling system and on the devices being polled? Have storage requirements been a problem? Do you have any horror stories where the utilization on a chart masked a problem on the network (I do, for sure). How do you feel about losing precision on data older than a day (for example), or should data be left untouched? Do you have a better way to track utilization than SNMP polling? I'm also curious if you simply hadn't thought about this before; are you thinking about it now?

 

I'd love to hear from you what decisions (compromises?) you have made or what solutions you deployed when monitoring your network, and why. Maybe I can learn a trick or two!

The month of July is always special for various reasons. We officially welcome the summer heatwave, and it’s also one of those times where we look forward to taking a break and spending time with family. Another reason why July is special is because it’s time to give thanks to your friendly Systems Administrator, yes, the person you always call for help when you get locked out of your computer, or when your email stops working, or when the internet is down, or when your mouse stops responding, or when you need just about anything.

        

To all the SysAdmins and IT superheroes out there, SysAdmin Day is fast approaching. And this year, just like every year, we at SolarWinds dedicate the entire month of July to SysAdmins across the world and we invite you to join the festivities as we celebrate SysAdmin Day in the biggest possible way.

       

SysAdmin Day can mean a lot of things to IT pros and organizations. But, what I think makes SysAdmin Day a day to remember is being able to share your journey with fellow SysAdmins and IT pros. So, in the comment section, share why you chose this career path, the meaning behind those defining moments, or remind us about the day you knew you were going to be a SysAdmin. Take this time to also narrate funny instances or end-user stories that made you laugh or challenging situations you successfully dealt with in your very own server rooms.

IT Pro.png

We’re super thrilled about this year’s SysAdmin Day because we will have fun weekly blogs on Geek Speak to celebrate and a thwack monthly mission that offers weekly contests and exciting giveaways. Some of these sweet prizes include:

            

Now it’s time to get the party started. Visit the July thwack monthly mission page today!

SysAdmin Day 2015.png

What do we mean today when we talk about managing our environments in the cloud? In the old physical server days, we had diverse systems to manage the network, the storage, the server infrastructure. As time moved on, these systems began to merge into products like Spectrum and OpenView. There came to be many players in a space that involved quite often a vendor specific tool Your server manufacturer would often tie you in to a specific management tool.

 

Again, as time moved on, we began to see 3rd party tools built to specifications that used SNMP traps and API’s that were no longer unique to particular vendors and furthered the ability to monitor hardware for faults, and alert to high utilization, or failures. This helped our abilities extensively. But, were these adequate to handle the needs of a virtual environment? Well, in enterprises, we had our virtual management tools to give us good management for that infrastructure as well. However, we still had to dig into our storage and our networks to find hot-spots, so this was not going to allow us to expand our infrastructure to hybrid and to secondary environments.

 

This whole world changed drastically as we moved things to the cloud. Suddenly, we needed to manage workloads that weren’t necessarily housed on our own infrastructure, we needed to be able to move them dynamically, we needed to make sure that connectivity and storage in these remote sites as well as our own were able to be monitored within the same interface. Too many “Panes of Glass” were simply too demanding for our already overtaxed personnel. In addition, we were still in monitor but not remediate modes. We needed tools that could not only alert us to problems, but also to help us diagnose and repair the issues that arose, as they inevitably did, quickly and accurately. It was no longer enough to monitor our assets. We needed more.

 

Today with workloads sitting in public, managed, and private spaces, yet all ours to manage, we find ourselves in a quandary. How do we move them? How do we manage our storage? What about using new platforms like OpenStack or a variety of different hypervisors? These tools are getting better every day, they’re moving toward a model wherein your organization will be able to use whatever platform with whatever storage, and whatever networking you require to manage your workloads, your data, your backups, and move them about freely. We’re not there yet on any one, yet many are close.

 

In my opinion, the brass ring will be when we can live migrate workloads regardless of location virtualization platform, etc. To be sure, there are tools that will allow us to do clones and cutovers, but to move these live with no data loss and no impact to our user base as we desire to AWS, to our preferred provider or in and out of our own data centers is truly the way of the future.

If you asked Michael Jordan why he was so successful, he’d probably tell you because he spent four hours each day to practice shooting free-throws. The fundamental basics are everything.

 

“You can practice shooting eight hours a day, but if your technique is wrong, then all you become is very good at shooting the wrong way. Get the fundamentals down and the level of everything you do will rise.”

- Michael Jordan


This can be extended to all things and planning your storage environment is no exception. It is obvious as a storage administrator that you consider important parameters like device performance, storage consumption, total cost of ownership etc. to write a storage strategy. But have you given thought about basic things like understanding data growth or the importance business of data? They do have a large impact on day-to-day storage operations and thus the business. In this post, I will touch upon two points that you can consider in your storage strategy blueprint.


Analyze Your Data in More Ways than One 

 

Data forms the crux of your storage. So, before you draft your storage strategy you need to go through your data with a fine-tooth comb. You should have a basic understanding on where your data comes from, where it will reside, which data will occupy what kind of storage etc. It has been widely believed that in most enterprises, 80 % of data is not frequently accessed by business users. Since that is the case, then why is there a need for data to reside on a high performing storage arrays? Normally, only 20% of data is regularly needed by the business and is considered active. This allows you to place your 80% data on a lower cost solution that provides enough performance and reserves your high performing storage for active data.

 

Another overlooked factor is the business value of data. An employee leave balance record normally is not as important as quarterly financial projection. Understanding your data significance can help you assign storage accordingly.

 

The last step is understanding the life cycle of data. Information which is critical today may lose its importance in the long run. A regular audit on the data lifecycle will help you understand what data needs to be archived. In turn, allowing you to save storage space and budget. Having a good understanding of your data landscape will help you plan your future storage requirements more accurately.


Collaborate with Your Business Units Frequently

 

As a storage expert, running out of disk space is not an option, so staying on top of storage capacity is truly your top priority. But you may not be able to achieve this unless you frequently collaborate with the business units in your organization. With a storage monitoring tool, you can accurately predict when you will run out of free space, but that might not be sufficient. Why?

 

Here is an example: Consider you are planning on 50 TB of data growth for your 5 business units over the next year, 10 TB each. This is based on evaluating the previous year’s storage consumption for each business. Then your company decides to acquire a new company which needs an additional 30 TB of storage. Based on this scenario, you will be forced make a quick storage purchase, which will affect your limited budget.

 

By having a better understanding of the business unit’s plan, you could have made a plan to accommodate the additional storage requirements. In fact, in larger organizations, legal and compliance teams play an important role in shaping the storage strategy. These functions largely rely on storage teams to meet many mandatory regulatory requirements. Frequent collaboration with your company’s business units will equip you with the knowledge on how data is expected to grow in the future. This will allow you to understand your future storage needs and plan your budgets accordingly.

 

These are just a couple out of many aspects that contribute to a successful storage strategy. The two points above are subtle ones that can be easily missed if not or not fully understood. What are the other important factors that you take into account when mapping out your storage strategy? What are the common pitfalls that you faced while drafting a storage strategy? Share your experience.

TRIBUTE TO 'GEEK SPEAK AMBASSADORS'

I want to share with the thwack community the latest e-book that we developed at SolarWinds. Why is this so special and rewarding to thwack? Well, because we have picked out some great contributions by our esteemed Geek Speak ambassadors, put a creative spin to them, and presented them as an e-book with an instructive story and some fetching visuals and graphics.

 

AN ENGAGING E-BOOK TO BEHOLD

Outline: Meet Cameron (thinks of himself as the average, run-of-the-mill IT guy; but in reality is way more than that) who was called upon to manage an IT team of a mid-sized call center. This is a company that was growing rapidly in staff and business operations, but kept itself quite conservative in adopting new trends and technologies

 

In Cameron’s journal, you will get to read 4 chapters about how he confidently explores new avenues for changing the work culture in his IT team. Further, how he plans to implement new processes and practices towards greater productivity, teamwork, and successful help desk operations.

 

CAMERON’S CHRONICLES

This e-book is available 4 chapters. Just click on each topic to read the contents of that chapter.

Chapter 1: Building a Positive and Encouraging Help Desk Culture

Read now »

1.png

Chapter 2: Defining SLA for Successful Help Desk Operations

Read now »

2.png

Chapter 3: Developing Workflows Using Help Desk Data

Read now »

3+(2).png

Chapter 4: How to Fence Off Time for a Help Desk

Read now »

4.png

 

Tell us what you thought about this e-book and if you have other ideas in using and repurposing some of your awesome thwack posts and thought contributions.

 

You can also download the full e-book as a PDF from the link below.

Filter Blog

By date:
By tag: