Skip navigation
1 2 Previous Next

Geek Speak

17 Posts authored by: donthomas Employee

When:           August 2005

Where:          Any random organization experimenting with e-commerce

 

Employee:         I can’t access the CRM! We might lose a deal today if I can’t send that quote.

Admin:               Okay, check the WAN, check the LAN, and the server crm1.primary.

Junior:                All fine.

Admin:               Great, Restart the application service. That should solve it.

Junior:                Yes, Boss! The app service was down. I restarted it and now CRM is back up.

 

When:             August 2015

Where:            Any random organization that depends on e-commerce

 

Sales Rep:           Hey! The CRM is down! I can’t see my data. Where are my leads! I’m losing deals!

Sales Director:     Quick! Raise a ticket, call the help desk, email them, and cc me!

Help desk:           The CRM is down again! Let me assign it to the application team.

App team:            Hmm. I’ll reassign the ticket to the server guys and see what they say.

SysAdmin:           Should we check the physical server? Or the VM instance? Maybe the database is down.

DB Admin:           Array, disc, and LUN are okay. There are no issues with queries. I think we might be fine.

Systems team:     Alright, time to blame the network!

Net Admin:           No! It’s not the network. It’s never the network. And it never will be the network!

Systems team:     Okay, where do we start? Server? VM? OS? Apache®? App?

 

See the difference?

 

App deployment today

Today’s networks have changed a lot. There are no established points of failure like there were when the networks were flat. Today’s enterprise networks are bigger, faster, and more complex than ever before. While current network capabilities provide more services to more users more efficiently, this also has led to an increase in the time it takes to resolve an issue, much less pinpoint the cause of failure.

 

For example, let’s say a user complains about failed transactions. Where would you begin troubleshooting? Keep in mind the fact that you’ll need to check the Web transaction failures, make sure the server is not misbehaving, and that the database is good. Don’t forget the hypervisors, VMs, OS, and the network. Also consider the fact that there’s switching between multiple monitoring tools, windows, and tabs, trying to correlate the information, finding what is dependent on what, collaborating with various teams, and more. All of this increases the mean time to repair (MTTR), which means increased service downtime and lost revenue for the enterprise.

 

Troubleshoot the stack

Applications are not standalone entities installed on a Windows® server anymore. Application deployment relies on a system of components that must perform in unison for the application to run optimally. A typical app deployment in most organizations looks like this:

app stack.png

When an application fails to function, any of these components could be blamed for the failure. When a hypervisor fails, you must troubleshoot multiple VMs and the multiple apps they host that may have also failed. Where would troubleshooting begin under these circumstances?

 

Ideally, the process would start with finding out which entity in the application stack failed or is in a critical state. Next, you determine the dependencies of that entity with other components in the application stack. For example, let’s say a Web-based application is slow. A savvy admin would begin troubleshooting by tracking Web performance, move to the database, and on to the hosting environment, which includes the VM, hypervisor, and the rest of the physical infrastructure.

 

To greatly reduce MTTR, it is suggested that you begin troubleshooting your application stack.  This will help move your organization closer to the magic three nines for availability. To make stack-based troubleshooting easier, admins can adopt monitoring tools that support correlation and mapping of dependencies, also known as the AppStack model of troubleshooting.

 

Learn more at Microsoft Convergence 2015.

If you would like to learn more, or see the AppStack demo, SolarWinds will be at booth 116 at Microsoft Convergence in Barcelona.


Every time an organization decides to adopt a new technology or expands its business operations, the IT network is the department where the process beginschanges and additions to the existing network to accommodate new technologies and users. But making changes to the existing IT infrastructure is one thing most network admins think twice about. Even a small configuration error can lead to network downtime, a security breach, or data loss which in turn may even cause the organization to vanish from the business map.

 

GNS3 is the solution to a network admin’s reluctance for experimenting with the production network. With GNS3, a network admin can emulate their actual network to try out configuration changes they otherwise would have had to perform on the production network. The benefits don’t end therean admin can design, build, and test complex networks using GNS3 before even spending capex to procure actual physical hardware.

 

Now, networks don’t run forever once configured. A network monitoring solution is critical to maintain network uptime and ensure business continuity. And because monitoring is such a critical component, the best possible tool has to be chosen for the task.  If you are a network admin who does not like to run trial software in the production network, you should check out GNS3 and their community, the Jungle. And to work with it, there is now enterprise-class monitoring from SolarWinds. Network monitoring products from SolarWinds, including SolarWinds Network Performance Monitor can be installed on any virtual network created using GNS3 and used to monitor it. Welcome to virtual reality! 

 

GNS3 is a strategic SolarWinds partner and to help you get to know them better, we bring you the newly created GNS3 group in Thwack! Log into thwack, register for ThwackCamp and join the GNS3 group so you’re in the know!

Information security is important to every organization, but when it comes to government agencies, security can be considered the priority. A breach or loss of information held by federal agencies can lead to major consequences that can even affect the national and economic security of the nation.


The Defense Information Systems Agency (DISA) is a combat support agency that provides support to the Department of Defense (DoD), including some of its most critical programs. In turn, this means that DISA must have the utmost highest possible security for networks and systems under its control. To achieve this, DISA developed Security Technical Implementation Guides (STIGs), which is a methodology for secure configuration and maintenance of IT systems, including network devices.  The DISA STIGs have been used by the Department of Defense (DoD) for IT security for many years.

 

In 2002, Congress felt civilian agencies weren’t making IT security a priority, so to help civilian agencies secure their IT systems, Congress created the Federal Information Security Management Act (FISMA). This act requires that each agency implement information security safeguards, audit them, as well make an accounting to the President’s Office of Management and Budget (OMB), who in turn prepares an annual compliance report for Congress.

 

FISMA standards and guidelines are developed by the National Institute of Standards and Technology (NIST). Under FISMA, every federal civilian agency is required to adopt a set of processes and policies to aid in securing data and ensure compliance.

 

Challenges and Consequences:

 

Federal agencies face numerous challenges when trying to achieve or maintain FISMA and DISA STIG compliance. For example, routinely examining configurations from hundreds of network devices and ensuring that they are configured in compliance with controls can be daunting, especially to agencies with small IT teams managing large networks. Challenges also arise from user errors too, such as employees inadvertently exposing critical configurations, not changing defaults, and employees having more privileges than required. Non-compliance can also have fatal consequencesnot just sanctions, but there is the weakening or threat to national security, disruption of crucial services used by citizens, and significant economic losses. There are multiple examples of agencies where non-compliance has resulted in critical consequences. For example, a cyber-espionage group named APT1 had compromised more than 100 companies across the world and stolen valuable data related to organizations. Some of this information includes: business plans, agendas and minutes from meetings involving high-ranking officials, manufacturing procedures, e-mails as well as user-credentials and network architecture information.

 

Solution:

 

With all that said, NIST FISMA and DISA STIGs compliance for your network can be achieved through three simple steps.

 

1. Categorize Information Systems:

An inventory of all devices in the network should be created and then devices must be assessed to check whether they’re in compliance or not. You should also bring non-compliant devices to a complaint baseline configuration and document the policies applied.

 

2. Assess Policy Effectiveness:

Devices should continuously be monitored and tracked to ensure that security policies are followed and enforced at all times. Regular audits using configuration management tools should be used to assess policy violations. Further, using penetration testing can help evaluate the effectiveness of the policies enforced.

 

3. Remediate Risks and Violations:

After all security risks or policy violations are listed, apply a baseline configuration that meets recommended policies or close each open risk after it has been reviewed and approved. Once again, the use of a tool to automate review and approval can speed the process of remediation.

 

In addition to following these steps, using a tool for continuous monitoring of network devices for configuration changes and change management adds to security and helps achieve compliance.

 

If you are ready to start with your NIST FISMA or DISA STIG implementation and need an even deeper understanding on how to achieve compliance, as well as how to automate these processes with continuous monitoring, download the following SolarWinds white paper: “Compliance & Continuous Cybersecurity Monitoring.

 

But for those of you who would like to test a tool before deploying it into the production network, SolarWinds Network Configuration Manager is a configuration and change management tool that can integrate with GNS3, a network simulator. For integration information, refer to this integration guide here for details:

https://community.gns3.com/docs/DOC-1903

 

Happy monitoring!

Security is an aspect that every organization should give the utmost priority. Ideally, every employee, from the end-user to top-level management are educated when it comes to the impact of network security failure. That said, organizations spend significant capex on securing the network. Despite all the investment on intrusion detection devices, firewalls and access control rules, hackers and their threats continue to succeed—data is stolen, critical services are brought down, and malware manages to sneak into secured networks.

 

Akamai released their fourth quarter “State of the Internet” report last month which provides valuable insights into, well…obviously, the state of Internet! The security section of the report discusses the top originating country for attack traffic (no points for guessing), the most targeted port, and information about DDoS attacks.

 

As per the report, the most targeted port for attacks is the good old Telnet port. In fact, Port 23 remains the most targeted port for the 3rd consecutive quarter and attacks against port 23 have increased to 32% from 12% in Q3 2014! This despite the fact that most enterprises I know have shifted to SSH from Telnet to enhance security. The cause of attacks can mostly be attributed to bots trying their luck on finding devices with port 23 open and then using the default username and password. That or a brute-force attack to gain access into the target network.


most attacks.png

Source: Akamai State of the Internet report


While the data in the report reminds the network admin not to leave unused ports open, it also shows that HTTP and HTTPS, both of which are open in most enterprise networks, too are targeted for attack. And then, port 23 or none of the top 10 ports listed might be the ones used to target your network. It can be a different random port which you might have left open inadvertently or had to leave open to facilitate a business service. Of course, it is not possible to block all ingress traffic originating from the WAN to your network.

 

Firewalls and Intrusion Detection/Prevention Systems (IDS/IPS) enhance your network’s security and are a necessity. But they may not successfully protect your network every timeto name a few, everyone remembers what happened to Sony, Home Depot, and Target! These organizations definitely had security measures in place to protect against malware and other threatsbut despite their efforts, the breaches still occurred. This shows that malware and other network threats are getting smarter every day and the traditional methods of security using firewalls and IDS/IPS alone are not sufficient. The work around?

 

A New Security Layer:

 

In addition to firewalls and intrusion detection systems, add a 3rd layer of security that can detect threats and attacks that have breached your defense. A layer that looks at the behavior of network traffic to detect anomalies, such as malware, hacking, data theft, and DDoS attacks.

 

With Network Behavior Anomaly Detection or NBAD, it is possible to detect anomalies that get past the firewall and IDS/IPS systems. NBAD tracks traffic behavior and alerts you if there is unusual or out of the ordinary activity. For example, traffic originating from invalid IP addresses, traffic on one port from one system to many, TCP or UDP packets whose size is less than the least expected value, etc., are all network behavior anomalies. NBAD is further enhanced, when individual systems in the network are monitored for behavior anomalies.

 

Enterprises can get started with NBAD on their own using traffic flow data, network performance data, and log analysis.

 

Flow technologies, such as NetFlow, sFlow, J-Flow, or IPFIX carries information about the IP conversations with details like source and destination IP addresses, ports, protocol, volume, number of packets, etc. The data can then be used to track behavior anomalies, such as burst of packets, traffic from invalid IP addresses, malformed packets, etc.

 

Network performance data can also help discover network anomalies. If there were sudden voice call drops, it could be due to fully utilized links which in turn could possibly be a DDoS attack.

 

While flow based analysis of traffic is the most widely used method for NBAD, log analysis from various elements in the network including user systems can add value to network behavior analysis. With a log analysis tool that analyzes logs and extrapolates information based on correlation, the admin can pin-point the source of threats within the network and take preventive measures before major damage occurs.

 

While you are still waiting to find a dedicated NBAD tool that really does what you need, leverage existing technologies and tools for your own network behavior analysis engine. So, what are you starting with? NetFlow or log analysis?

If you’ve worked in IT for any amount of time, you are probably aware of this story: An issue arisesthe application team blames the database, the database admin blames the systems, the systems admin blames the network, and the network team blames the application. A classic tale of finger pointing!

 

But, it’s now always the admins fault. We can’t forget about the usersoften the weakest link in the network.

 

Over the years, I think I’ve heard it all. Here are some interesting stories that I’ll never forget:

 

Poor wireless range


User:     Since we moved houses, my laptop isn’t finding my wireless signal.

Me:        Did you reconfigure your router at the new location?

User:     Reconfigure…what router?

 

The user had been using their neighbors signal at their previous house. I guess they just assumed they had free Wi-Fi?  However, this was almost a decade ago when people were unaware that they could secure their Wi-Fi.

 

Why isn’t my Wireless working?


User:     So, I bought a wireless router and configured it, but my desktop isn’t picking up the signal.

Me:        Alright, can you go to ‘Network Connections’ and check if your wireless adapter is enabled?

User:     Wait, I need a wireless adapter?

 

Loop lessons


I was at work and one of my coworkers…let’s call him the hyper enthusiastic newbie. Anyway, the test lab was under construction, lab devices were being configured and the production network wasn’t connected to the lab yet. After hours of downtime, the hyper enthusiastic newbie came to me and said:

 

Newbie:               I configured the switch, and then I wanted to test it.

Me:                        And?

Newbie:               I connected port 1 from our lab switch to a port on the production switch. It worked.

Me:                        Great.

Newbie:               And then to test the 2nd port, I connected it to another port on the production switch.

 

This is a practical lesson on what switching loopbacks can do to the network

 

Not your average VoIP trouble


A marketing team member’s VoIP phone goes missing. An ARP lookup showed that the phone was on a sales reps desk. The user decided to borrow the phone for her calls because hers wasn’t working. Like I said, not your average VoIP trouble.

 

One of my personal favorites: Where's my email?


User:     As you can see I haven’t received any email today.

Admin: Can you try expanding the option which says today?

 

Well, at least it was a simple fix.


Dancing pigs over reading warning messages


So, a user saw wallpaper of a ‘cute dog’ online. They decided to download and install it despite the 101 warning signs that his system threw at him. Before they knew it…issues started to arise: Malware, data corruption, and soon every system was down. Oh my!

 

Bring your own wireless


The self-proclaimed techie user plugs in his wireless travel router that also has DHCP enabled. This DHCP also first responds to a client that asks for an IP. As you all know, this can lead to complete Mayhem and is very difficult to troubleshoot.

 

Excuse me, the network is slow


I hear it all the time and for a number of reasons:

 

Me:        What exactly is performing slowly?

User:     This download was fine. But, after I reached the office, it has stopped.

Me:        That is because torrents are blocked in our network.

 

That was an employee with very high expectations.

 

Monitor trouble!


Often, our office provides a larger sized monitor to users who are not happy with their laptop screen size. That said:

User:     My extra monitor displays nothing but the light is on.

Me:       Er, you need to connect your laptop to the docking station.

User:     But I am on wireless now!

 

Due to all these instances, user education has been a priority at work. However, these situations still continue to happen. What are your stories? We’d love to hear them.

The holidays are over and your work force is back at their deskblaming the IT team for whatever isn’t work or is slow. While most application performance issues can be blamed on the application itself, there can be other factors too, like an edge or core device behaving badly, server faults, or just another low bandwidth issue. And there are times when even the good old FTP fails too and you have no idea what’s wrong.

 

Here is a quick list of things to check for when you are at a dead end:

IP conflicts: Unfortunately, you are not always notified by your system in the event of an IP conflict. If the conflicting device is a rogue network device or an OS that cannot resolve an IP conflict by itself, the result would be intermittent connectivity. This happens because all devices with the same IP respond to ARP requests sent by a switching or routing device. So, during a data transfer, some of the conversation packets will go to one device while a few packets will go to the other device resulting in intermittent connectivity.

 

Solution? Use a ‘user device’ tracking tool or an IP conflict detection tool.


MTU: This is the largest possible size for a PDU that the communication layer can pass forward and this is set to 1518 units for Ethernet version 2 networks. But in cases when a router receives a large packet, it will either fragment the packet, or drop the packet if the DF (Don’t Fragment) bit has been set. It will also send an ICMP error back to the transmitter about the packet being too large. If your application chooses to ignore the error or your network somewhere blocks the ICMP error sent by the router, the application will continue to send large PDU thereby impacting performance. The issue is usually seen in scenarios where VPN is involved because the encapsulation load causes the MTU to exceed 1500 bytes.

 

Solution? Use ping or traceroute to find the MTU your router interface can forward and set that MTU on your device. And don’t forget to make sure that the ICMP error messages about MTU are not being blocked anywhere in your network.


Auto-Negotiation or Duplex mismatch: This one can be controversial but still, here we go. While there are network admins who always hard set the speed and duplex on each interface of their networking device, there are others who believe that auto negotiation issues are a myth and will never happen. In reality auto negotiation can fail, but when? When the cabling is bad, the devices in question are obsolete, cheap, or simply because one of the devices is set for auto negotiation and the other is forced. Hard setting the duplex to an interface can also cause an issue when two connected devices are set at different duplexes. The end result is an impact on performance because of packet retransmissions and a high number of errors on the affected ports.

 

Solution? Check for errors and retransmissions with an NMS or use auto-negotiation on all your devices. And don’t forget, when it comes to Gigabit Ethernet, auto-negotiation must be used.


TCP windowing: When there is a slow connectivity issue, the first step for many organizations is to throw expensive bandwidth at the problem. But there is the TCP window size that many admins forget about. While window scaling is an available solution, some routers and firewalls do not properly implement TCP window scaling, thereby causing a user's Internet connection to malfunction intermittently for a few minutes. When transferring large files between two systems, if the connection is slower than what it should be or intermittent, it could be an issue with low TCP window size on the receiving system.

 

Solution? As stated, most systems do support TCP window scaling, but when things slow down and you don’t know what is wrong, make sure that TCP window scaling is functioning properly or try increasing the TCP receive buffer. Again, make use of an NMS tool for troubleshooting.


Flow control: The flow control mechanism allows an overloaded Ethernet device to send ‘pause’ frames to other devices that are sending data to it. Without flow control, the overloaded device drops packets causing a major performance impact. But when it comes to the backbone or network core, flow control can cause congestion in areas that would have otherwise transmitted without issues. For example, a switch sends a pause frame to a transmitting device because a switch port is unable to match the transmitter’s speed. When the pause frame is received by the transmitting device, it pauses it’s transmission for a few milliseconds. But what also happens is that the traffic to all other switch ports that has the bandwidth to handle the speed is paused as well.

 

Solution? Use flow control on computers, but don’t have your switches send out pause frames. Instead, implement QoS in the backbone to prioritize packets based on their criticality. You can find flow control best practices from this whitepaper here.


Right QoS: And that brings us to QoS. Network admins use QoS because it can prioritize business applications and drop unwanted traffic. But a few network admins overdo QoS by using it for every type of traffic that passes through the device. This can result in a few business applications performing well all of the time, while a few other applications continue to act up most of the time as well.

 

Solution? Use QoS only when it is absolutely necessary. You do not have to set priority or a QoS action for all traffic that passes through your network. Prioritize what is important and set best effort or queuing for everything else. Assign bandwidth for very critical applications whose data delivery is important to business continuity.

 

It’s important to understand the various reasons for data delivery failure, even the uncommon ones. By doing so, you will have a better idea of where to search when issues arise. If you’ve faced data delivery problems, where did your issue stem from and how did you resolve it?

The modern day network handles a high volume of data and more applications than ever before. Many of these applications are sensitive to delay and latency. Under such situations, network engineers need QoS to prioritize delay-sensitive business apps over others or to drop non-business traffic.

 

A QoS implementation method used to classify and mark applications or protocols in the network is Modular Quality of Service (MQC) QoS. With MQC QoS, the traffic you need to prioritize or drop is grouped into a class-map. The class-map is then assigned to a policy-map to perform QoS actions. If you are not familiar with QoS, check out this blog for getting started with MQC QoS.

 

An option available under MQC QoS to group traffic into a class-map is the “match protocol” statement. This statement allows users to match a desired application or protocol, such as FTP or HTTP, into a class-map and then perform QoS actions on it. Here, the ‘protocol’ key word can refer either to regular protocols like bgp, citirix, dhcp, etc., or Network Based Application Recognition (NBAR) recognized protocols.


What is NBAR?

 

NBAR is a classification technology from Cisco that can identify and classify applications and protocols, including those that use dynamic port numbers. NBAR goes beyond TCP/UDP port numbers and can inspect the payload to identify a protocol. NBAR classifies applications using the default Packet Description Language Modules (PDLM) available in the IOS.

 

Cisco also has NBAR2, which is the next generation version of NBAR that enhances the existing NBAR functionality to classify even more applications. It also provides additional classification capabilities, such as field extraction and attributes-based categorization. Cisco routinely releases updated protocol packs for NBAR2, which can be accessed from the NBAR protocol library for new signatures, signature updates, and bug fixes.

 

Conveniently, Cisco NBAR is supported on most Cisco IOS devices and NBAR2 is supported on devices such as ISR-G2, ASR1K, ASA-CX, and Wireless LAN controllers. And to make it easy, NBAR2 configuration is exactly the same as NBAR.


Why NBAR

 

Many network engineers use Access Control Lists (ACL) for application classification when defining their QoS policies. But sometimes, NBAR is a better choice than ACLs because of NBAR’s ability to automatically recognize applications and protocols which otherwise would have to be defined manually.

 

NBAR is also easier to configure compared to ACLs and provides collection statistics (if you need them) via an NBAR protocol discovery MIB for each application identified by NBAR.

Finally, the biggest advantage of NBAR is that it can be used for custom protocol identification.


Custom Protocols with NBAR

 

There are many applications that are designed to use dynamic port numbers. Such a dynamic change in port numbers can make it difficult to identify applications when using regular monitoring tools and sometimes even with NBAR. While NBAR2 does have signatures for various applications, there are chances you might be using an internally built application not defined in NBAR2 which gives a good reason to define your own custom protocol for NBAR.

 

NBAR custom protocol is quite extensive too. You can define custom protocols to be identified by the NBAR engine based on IP address, port, transport protocol, and even after inspecting into specific bytes of the payload for keywords.

 

Another is the HTTP advantage. Every network allows ingress and egress HTTP protocol which also makes it the protocol used by many non-business applications, rouge applications, and even malware to gain access into the enterprise. With custom protocol matching, NBAR can classify HTTP traffic based on URL, host, MIME, or even the HTTP header fields. So imagine the advantages: allow HTTP traffic from specific sources and block everything else, stop unwanted HTTP traffic and allow all business applications, block only Youtube, but not Salesforce, or allow only Salesforce, but block everything else and many more permutations.

 

So, here it is. You do not have to enable NBAR on your device to group with QoS policies unless you need either NBAR protocol discovery or the NBAR custom protocol identification. There are two options that Cisco reference sites mention for enabling custom NBAR, depending on your IOS version. There is ip nbar custom and also ip nbar custom_name transport command. Provided below is the syntax for both:

 

ip nbar custom name [offset [format value]] [variable field-name field-length] [source | destination] [tcp | udp ] [range start end | port-number]

 

In the above command, offset refers to the byte location in the payload for inspection. The format and its value can be a term (when used with ascii format), a hexadecimal value (used with hex format), or a decimal value (used with decimal format). For complete information on what each option refers to, check this link:

http://www.cisco.com/c/en/us/td/docs/ios/qos/command/reference/qos_book/qos_i1.html#wp1022849

 

Another command, mostly referred to with NBAR2 or newer IOS is:

 

ip nbar custom name transport {tcp | udp} {id id } ip address ip-address | subnet subnet-ip subnet-mask}| ipv6 address {ipv6-address | subnet subnet-ipv6 ipv6-prefix} | port {port-number | range start-range end-range} | direction {any | destination | source}

 

Check the link below for a reference on the above command:

http://www.cisco.com/c/en/us/td/docs/ios-xml/ios/qos/command/qos-cr-book/qos-i1.html#wp1207545360

 

Once you have your custom protocol captured with NBAR, create a class-map and use the match protocol statement with your custom protocol name to classify the traffic that matches the custom-protocol into a class-map. You can then prioritize, drop or police the traffic based on your requirements.

 

Well, I hope this information eases your implementation of NBAR. More importantly, I hope you enjoy the many benefits of NBAR and a trouble-free network!

It’s the best of times. It’s the worst of times. Well...not quite! However, it certainly is the age of technology offering immediate ROI, sky high cost savings, and even magic that can help add to an organization’s bottom line. It’s also the time when new technology is wreaking havoc on data delivery when implemented without considering the additional traffic load it adds to the network. To think of it, global IP traffic is expected to increase 8-fold before the end of 2015. All of this is making it trickier to deliver data to the cloud, a remote site, or even just out of the edge router.

 

When network engineers need to police and drop unwanted traffic, prioritize business traffic, and ensure data delivery, the answer is QoS or Quality of Service. QoS can provide preferential treatment to desired traffic within your LAN, at the network edge, and even over the WAN if the ISP respects your QoS markings. ISPs have always used QoS to support their own (preferred) services or to offer better chances of delivery at a premium. While ‘end-to-end QoS’ in its real sense (from a system in your LAN, over the WAN, peered links and multiple Autonomous Systems to an end-point sitting thousands of miles away) is challenging, it’s wise to use QoS to ensure that your data at least reaches the PE device without packet loss, jitter, and errors.

 

Alright, now comes the fun part, implementing Cisco QoS! Some network engineers and SMBs are wary of implementing QoS for the fear of breaking something that already works. But fear not, here is some help for beginners to get started with Cisco QoS, its design & implementation strategies.

 

QoS Design and Implementation:

QoS design consists of 3 strategies:

  • Best Effort: Default design with no differentiation or priority for any traffic. All traffic works under the best effort.
  • IntServ: A signaling protocol such as RSVP is used to signal to routers along a path about an application or service that needs QoS. This reserves bandwidth for the application and cannot be re-allocated even when the specific application is not in use.
  • DiffServ: The most widely used option. Allows a user to group traffic packets into classes and provide a desired level of service.

 

The choices for QoS implementation range from traditional CLI and MQC to AutoQoS. For a beginner, the easiest would be to start with a DiffServ design strategy and use Cisco’s MQC (Modular QoS CLI) for implementation. MQC based QoS configuration involves:

  • Class-Maps: Used to match and classify your traffic into groups, say web, peer-to-peer, business-critical, or however you think it should be classified. Traffic is classified into class-maps using match statements.
  • Policy-Maps: Describes the action to be taken on the traffic classified using class-maps. Actions can be to limit the bandwidth used by a class, queue the traffic, drop it, set a QoS value, and so forth.
  • Service-Policy: The last stage is to attach the policy-map to an interface on whose traffic you wish to perform the QoS actions defined earlier. The actions can be set to act on either Ingress or Egress traffic.

MQC QoS structure.png

Now, I would like to show you a sample configuration to put unwanted traffic and a business app in two different classes and  set their priorities using IP precedence.

 

Creating class-maps to group traffic based on user requirements:

Rtr(config)#class-map match-any unwanted

Rtr(config-cmap)#match protocol ftp

Rtr(config-cmap)#match protocol gnutella

Rtr(config-cmap)#match protocol kazaa2

 

Rtr(config)#class-map videoconf

Rtr(config-cmap)#match protocol rtp

 

Associating the class-map to a policy and defining the action to be taken:

Rtr(config)#policy-map business

Rtr(config-pmap)#class unwanted

Rtr(config-pmap-c)#set precedence 0

Rtr(config-pmap)#class videoconf

Rtr(config-pmap-c)#set precedence 5

 

Assigning the policy to an interface:

Rtr(config)#interface Se0/0

Rtr(config-if)service-policy output business

 

QoS Validation:

The next thoughts after implementation should be on how to make sure the QoS policies you created are working - Are they dropping the traffic they are supposed to or are the QoS policies affecting the performance of your business applications?

 

This is where Cisco’s Class-Based QoS MIB, better known as CBQoS steps in. SNMP capable monitoring tools can collect information from the CBQoS MIB to report on the pre and post-policy statistics for every QoS policy on a device. CBQoS reports help determine the volume of traffic dropped or queued and confirms that the classifying and marking of traffic is working as expected.

 

Well, that completes the basics of QoS using MQC and implementation ideas for your network. While we talked about QoS configuration using classification and marking in this blog, there are more options such as congestion management, congestion avoidance, and shaping which we have not explored because they can be complex when starting out. If you have got the hang of QoS configuration using MQC, be sure to explore all options for classifying and marking traffic from here before your first QoS implementation.

 

Good luck creating a better network!

 

Shellshock is the name given to a vulnerability detected in the Bash which allows attackers to remotely compromise vulnerable systems allowing for unauthorized disclosure of information. Ever since news of the bug came out and the original fix actually not fixing the issue, attackers have been using ‘masscan’ to find vulnerable systems in the Internet. This means network–based attacks against *nix based servers and devices through web requests or other programs that uses Bash is happening. Check Robert Graham’s blog here to learn more.

 

Your first step should be to test if your version of bash is vulnerable by typing the following in your command line:

env x='() { :;}; echo vulnerable' bash -c "echo this is a test"


If the system is vulnerable, the output would be:

 

vulnerable

this is a test

 

That means you need to patch your server’s Bash as soon as possible. In case your network devices are vulnerable, contact your vendor. For Cisco’s list, check the link here and for SolarWinds list, check this blog.

 

My first thought was, because the access vector for shellshock is the network, would the network show signs of an attack leveraging the bash bug?

 

Here is some info from redhat.com blog:

“The vulnerability arises from the fact that you can create environment variables with specially-crafted values before calling the Bash shell. These variables can contain code, which gets executed as soon as the shell is invoked.”

 

In short, the bash shell allows function definitions to be passed using environment variables that share the name of the function and the string "() { :;};" means it is a function declaration. So, the initial attack vector will always include (or starts with?) the “() {“ sequence and that should be the signature for detecting bash attacks.

 

My next thought was, if you don’t have an IDS or IPS on which you can define the signature, can your other network devices detect the “() {“ signature in the HTTP header and help you mitigate an attack?

 

Let us talk ‘Cisco’ here. Cisco devices have a couple of options for HTTP header inspection. One is NBAR but NBAR’s HTTP header inspection is limited to 3 fields as far as client to server requests are concerned, namely ‘user-agent’, ‘referrer’ and ‘from’, none of which will hold the signature “() {“.

 

The 2nd option I found for HTTP header inspection is Zone-Based Policy Firewall (ZFW) which Cisco states is available on Cisco routers and switches from IOS 12.4(6)T onwards. ZFW supports application layer (Layer 7) inspection including HTTP headers that can then be used to block traffic. ZFW allows you to use Class-Based Policy Language (remember QoS?) to define what traffic has to be matched and what action has to be taken.

 

With ZFW, you can inspect and block HTTP traffic that includes the regex “\x28\x29\x20\x7b” in the header. If you are wondering why “\x28\x29\x20\x7b”, that is the hex format for “() {“. Refer the chart here to see how we converted our signature to hex regex.

 

Back to the bash bug and ZFW, based on Cisco configuration guides, a sample configuration for a bash attack mitigation should look like the below but supported commands could change depending on the IOS versions.

 

Define a parameter map to capture the signature we were looking for:

parameter-map type regex bash_bug_regex

pattern “\x28\x29\x20\x7b”


Create the class map to identify and match traffic:

class-map type inspect http bashbug_classmap

   match req-resp header regex bash_bug_regex

 

Put the class under a policy to apply a reset action to the traffic that was matched by the class map.

policy-map type inspect http bashbug_policymap

   class type inspect http bashbug_classmap

      reset

 

While the HTTP header inspection may cause a CPU shoot up, ZFW would still be a good option if you cannot apply any available patches right now. ZFW is also an extensive topic and can have implications on your network traffic if not implemented properly. Read up about ZFW with configuration examples here:

http://www.cisco.com/c/en/us/support/docs/security/ios-firewall/98628-zone-design-guide.html

http://www.cisco.com/c/en/us/td/docs/ios-xml/ios/sec_data_zbf/configuration/xe-3s/sec-data-zbf-xe-book/sec-zone-pol-fw.html#GUID-AD5C510A-ABA4-4345-9389-7E8C242391CA

 

And any alternatives to ZFW for network-level mitigation of bash bug based attacks?

I accounted for everything – Cat6 cabling, fiber ready router, 3-tier architecture, failover at Layer 3, segmentation with VLANs, and many more features that sounded great but we probably never needed. I was proud it was not ‘One Big Flat Network’ (OBFN).

 

The first change request was raised on day 1 followed by at least one every week. With each request, I almost always found things I could have done differently during design or implementation. Thinking about it now, here is my list:

 

Too many cables:

Every network starts with a single cable. And then you keep adding more until you have no idea on which cable connects what. Happened with me. As the number of devices increased, so did the number of cables. And because I had not planned my cable schematics, my rack ended up almost like this:

 

cablemess-1-600x450.jpg

 

If you have to trace cables every time something has to be changed or there is an issue, rework on your cable management plan. Have different colors based on what they are for: clients to switches, access or trunk ports, router to other devices, etc. Group similar cables and don’t forget labels. Pick up a few cable management tips from here.

 

How tall are you?

I thought that the heavy core switch and the small firewall would never have to be moved. You get the idea?

 

Place your devices where they are reachable for maintenance – neither too high nor any place from where the ‘OFF’ switch can be kicked or the power cable can be yanked.

 

Backup from Day Zero

During implementation, a NAT and few ACLs later, I realized that RDP was not connecting. Took me multiple reconfigurations and hours of reading to realize that my original configuration was fine and I had simply forgotten to account for the address translation while trying RDP. Nothing bad until I realized that the delta of my current non-working config and my last working config was everything except ‘hostname Router2951’.

 

Backup your configurations the minute you see ICMP replies coming in either through northbound or southbound. Power failures don’t wait for a network to be up before bringing it down.

 

127.0.0.1/?

Because every ‘networking 101’ teaches you how to subnet, I added quite a few to my network. Users, servers, wireless, management or any department with at least 2 hosts had its own subnet. Believing in small is beautiful, I loved class C and /28 for 8 months until I realized 30 hosts would do better before settling down with /26 after a year.

 

Plan for the next year or the year after. It is also fine to start with /24 or even a /23 if you don’t have too many hosts. Club a couple of departments together in a larger subnet rather than starting with smaller ones and then recalculating subnets every six months. OBFN is not bad.

 

Complexity is not a Necessity

I added technologies I thought were great for the network. Example VLAN. Most SMBs, especially those using VoIP have VLANs though they don’t even fill out a /24. Why? Because we have been taught about VLANs and broadcast domains. VLAN is great for isolation and management but is not always good for performance. In an SMB network VLANs only add to the complexity.

 

VLAN is an example. SAN is another and there are more. The rule of the thumb is, use technologies only if it actually solves a problem for you.

 

Quality matters

New admins dread QoS – either they don’t use it or they overdo it and I was the former. With no QoS to police the network, HTTP, FTP and some random port-protocol combos teamed up in my network to slow down RTP and RDP during peak hours.

 

Use QoS to provide priority. But that should only be when required and not be the norm causing all your other apps to fail while only VoIP goes through.

 

What hit me?

One day our firewall decided to drop packets because it had too many rules. Another time it was an end-user deciding that dancing pigs were better than the security warning. Either way, we ended up with downtime.

 

10 or 10000 devices, be it with open-source, free or paid tools, network monitoring should be in your network design. That is how you can get insights into what will go down and who will bring it down.

 

So after all the learning, there we are now! [Exaggerated visual]

neat-data-cabling-network.jpg

 

We could agree or disagree. But Pimiento or Pure Capsaicin, Padawan or Master, what is in your list?

VoIP has been widely adopted by enterprises for the cost savings it provides but it is also one of the most challenging applications for a network administrator in the network. Some enterprises choose to run VoIP on their existing IP infrastructure with no additional investment, bandwidth upgrades or preferential marking for voice packets. But because VoIP is a delay sensitive application, the slightest increase in latency, jitter or packet loss affects the quality of a VoIP call.

 

The Story:

A medium sized business with their HQ in Austin, US and a branch office in Chennai, India used VoIP for sales and customer support requirements as well as for communication between offices. IP phones and VoIP gateways were deployed at both Austin and Chennai and the call manager and the trunk to the PSTN for external calls was at Austin. Austin and Chennai were connected over the WAN and the voice calls from Chennai used the same path as data.

 

network dgm.png

The Problem:

Tickets were raised by users in Chennai about VoIP issues such as poor call quality and even call drops when calling Austin and customers around the globe.

 

The network admin had the NOC team check the health and performance of the network. The network devices in the path of the call were analyzed for health issues, route flaps, etc., with the help of an SNMP based monitoring tool. After confirming that the network health was fine, the team leveraged on a few free Cisco technologies for VoIP troubleshooting.

 

The Solution:

  1. Analysis with Call Detail Records (CDR) and Cisco VoIP IPSLA
  2. Root cause with Cisco NetFlow
  3. Resolution with Cisco QoS

 

Analysis with Call Detail Records (CDR) and Cisco VoIP IPSLA

When call drops were first reported, the NOC team quickly set up a tool with which they could analyze both Call Detail Records (CDRs) and Cisco IPSLA operations. The Cisco call manager was configured to export CDR data to the tool and the edge Cisco routers at both locations were added to the tool for IPSLA monitoring. CDR data was analyzed to find details about all failed calls and IPSLA was used to measure MOS, jitter and latency for VoIP traffic between the locations. IPSLA reports were correlated with CDR information to confirm the affected location, subnet and set of users.

failed calls.png

mos score.png

Root cause with Cisco NetFlow

IPSLA confirmed high packet loss, jitter and latency for VoIP conversations origination from Chennai and this put suspicion on the available WAN bandwidth. The network admin verified the link utilization using SNMP. Though WAN bandwidth was being utilized to the max, it was not to the extent that packets should be dropped and latency should be high.

 

The 2nd free technology to be used was NetFlow. Most routing and switching devices from major vendors supports NetFlow or similar flow formats, like J-Flow, sFlow, IPFIX, NetStream, etc. NetFlow was enabled on the WAN interfaces at both Austin and Chennai and set to be exported every 1 minute to a centralized flow analysis tool that provided real-time bandwidth analysis.

 

The network admin checked the top applications being used and did not find VoIP occupying a place in the top applications list as expected. ToS analysis from NetFlow data showed that VoIP conversations from India did not have the preferred QoS priority. A configuration change on the router had caused backup traffic to have a higher priority than VoIP traffic. This had caused backup traffic to be delivered whereas VoIP traffic was being dropped or buffered when the WAN link utilization was high. The admin also found that a few scavenger applications too had high priority.

 

top apps.png       EF-top apps.png

Resolution with Cisco QoS

With reports from the flow analyzer tool, the network admin identified applications and IP addresses hogging the WAN bandwidth and redesigned the QoS policies to provide preferential marking to VoIP and mission-critical applications and put everything else under “Best Effort”. Bandwidth hogging applications were either policed or set to be dropped. Traffic analysis with NetFlow confirmed that VoIP now had the required DSCP priority (EF) and that other applications were not hogging the WAN bandwidth. Because Cisco devices supports QoS reporting over SNMP, the QoS policies on the edge Cisco devices were monitored to confirm that the QoS drops and queuing were as desired.

 

EF priority for VoIP.png  CbQoS drops.png

 

Cisco IPSLA and CDR analysis confirmed that VoIP call performance was back to normal no more VoIP calls had a poor MoS score or were being dropped. We had a smart network admin and that was the day we were taught to be proactive rather than reactive.

 

The question I now have is:

Have you been in a similar soup?

 

Are there alternatives methods we could use and how would you have gone about it?

There is a technology available through most of the United States capable of providing net bit rates in the range of “terabits per second” and extremely low latency. Though big data enterprises like Google, Microsoft and Facebook are already using this for their data transfers, not many Fortune 500 enterprises have considered this. The technology is known as ‘dark fiber’.

 

What is dark fiber?

 

During the best years of the dot.com bubble in the late 90’s, telecom and other large utility companies foreseeing an exponential demand for network access laid more fiber optic cables than needed. One reason was, they expected a readily available network capacity will help capture the market. Another is, more than 70% of the costs of laying fiber optic cables goes towards labor and other infrastructure development1. It made sense to lay more fiber than needed to save on future labor expenses than laying new ones as and when needed. But two factors left most of these fiber unused:

  1. The development of Wavelength-division Multiplexing (I refuse to explain that, but you can read it up here) increased the capacity of existing optical fiber cables by a factor of 100!
  2. The dot.com bubble burst and the demand for network connectivity died down.

In fiber optic communication, light pulses are what carries information and so when transmitting data, the fiber lights up. Any fiber that is not transmitting data remains unlit and is called ‘dark fiber’. Today, the term dark fiber mostly refers to fiber optic cables that were laid expecting demand but is now not in use or was abandoned. There are thousands of miles of dark fiber2 available throughout the United States being sold or leased out by the companies that built them or purchased them from bankrupt telecoms.

 

Should you consider dark fiber?

 

Dark fiber is suitable for enterprises that need low latency, high speed connectivity with zero interference from service providers and has the capex to invest. Here are a few scenarios where dark fiber can help.

 

Point-to-point connections, such as those to network-neutral data centers, cloud providers, DR and back-up sites would do better with Gigabit or even Terabit transfers speeds. Fiber optic cables are capable of exactly that – a single pair of fiber can transfer Gigabits of data per second.

 

There are enterprises whose bandwidth speed requirements can change from a few Gbps to an unpredictably high limit. Optical fiber is capable of virtually unlimited data speeds allowing it to meet high and unpredictable bandwidth demands.

 

Enterprises, especially those involved in stock trading or online gaming and those using newer communication technologies such as HD video conferencing and VoIP need ultra-low latency connections which optical fiber is capable of providing.

 

Dark fiber also provides net neutrality. If you are purchasing Gold class QoS from your ISP for priority data delivery to your data center or branches, dark fiber needs none of that. With dark fiber, data is delivered over your privately owned cable and because of its high bandwidth capabilities, there is no need for traffic prioritization too.

 

Finally, you get the ability to transfer data from your offices to data centers or the cloud without having to worry about the data being read, modified or stolen. Dark fiber is your private connection where only you have access to both the data as well as the fiber that transmits the data.

 

And a few more facts:

 

Dark fiber is an excellent choice if you already have a router that supports fiber connections, thereby ensuring last mile high speed data delivery. But before you consider buying or leasing dark fiber, make sure you have a real business requirement. Here are a few more facts to consider:

  • Renting or buying dark fiber is cheap but you still need to invest in hardware and other equipment needed to light up the cables.
  • Optical fiber is a physical asset that needs maintenance. Consider the costs involved in maintaining the fiber and related infrastructure.
  • The time needed to identify and resolve outages is much higher than with Ethernet. But, on the other side, issues such as cable cuts happen very rarely with fiber optic cables due to the manner in which they are laid.

 

If dark fiber costs are prohibitive, consider alternatives such as ‘wavelength services’ where instead of the whole fiber, you lease a specific wavelength on the fiber based on requirements.

Still sounds like hype? Trust copper!

Quality of Service (QoS) is used in enterprise networks to ensure that business-critical applications have the required priority and are not bogged down by non-business traffic when passing through the enterprise WAN link or even when traversing the Internet.

 

Cisco devices support a QoS model where packets can be treated with priority even by Intermediate Systems (IS) depending on its DSCP value. Based on a packet's DSCP value, the traffic is put into a specific service class and traffic conditioning functions such as marking, shaping, and policing are done to it. To ensure priority for preferred packets even after it leaves the network, the DSCP markings are done to the outbound traffic at the edge.

 

Take a traffic conversation moving from the LAN to the WAN with the default priority:

 

 

Source IP

 

Source Interface

 

Destination IP

Destination Interface

Port / Protocol

 

DSCP Value

 

192.168.1.10

 

FastEthernet 0/1

 

  1. 74.125.224.68

 

Serial 1/1

 

2654 TCP

 

Default

 

To achieve service delivery when this conversation moves over the WAN, a DSCP based QoS policy  that changes the packet’s DSCP marking from ‘default’ to a high priority ‘EF’ is applied on the outside of the serial interface.

 

 

Most enterprises use NetFlow for traffic analytics because NetFlow can provide details while not being resource intensive on the device as well as on the bandwidth. When enabling NetFlow on a Cisco device, the options available are Ingress NetFlow or Egress NetFlow and a majority of the network admins use Ingress NetFlow. With Ingress NetFlow, the IN traffic across an interface is captured. Because NetFlow data also has information about the interface through which the IP conversation exited the device, the same conversation can be attributed as the OUT traffic for the exit interface. So all NetFlow reporting tools can construct the OUT traffic for an interface from the information captured by Ingress NetFlow.

 

For the TCP conversation we discussed, Ingress NetFlow captures IN traffic at the Fa 0/1 interface where no QoS policy was applied and the DSCP marking was “default”. This conversation exits the router through Serial 1/1 and so the same conversation is attributed as the OUT traffic for Serial 1/1.

 

And that is the downside. Since traffic was captured by NetFlow from the inbound of Fa 0/1 where there was no QoS policy, the conversation was captured when its DSCP marking was on ‘default’. When the same conversation is attributed as the outbound of Serial 1/1, it will still be shown to have a ‘default’ DSCP marking though in reality the packets have been altered to have an ‘EF’ marking while it was exiting the Se 1/1. This is the behavior with any NetFlow reporting tool.

 

Then there is Egress NetFlow. Egress NetFlow captures the OUT traffic from an interface and from this OUT traffic, the IN traffic for the entry interface is constructed.

 

 

Source IP

 

Source Interface

 

Destination IP

Destination Interface

Port / Protocol

 

DSCP Value

 

192.168.1.10

 

FastEthernet 0/1

 

  1. 74.125.224.68

 

Serial 1/1

 

2654 TCP

 

EF

 

In our example, Egress NetFlow captures traffic when it exits the Serial 1/1 interface but with the correct outbound DSCP marking of EF. This way, your NetFlow reporting tool can report on IP conversations with the modified DSCP marking rather than the pre-QoS policy DSCP marking.

 

There are other advantages too with Egress NetFlow – such as where you use WAN compression, Egress NetFlow captures traffic after the compression and not at the original level. This way, you see the actual volume of traffic that exited your device and not pre-compression traffic volumes.

 

To apply Egress NetFlow on your interfaces, use the command “ip flow egress” (traditional NetFlow) or “ip flow monitor monitor_name output” (Flexible NetFlow) and that should get you ready for traffic capture with the correct DSCP values. And if you have not yet used NetFlow, try it with a network traffic monitor to monitor traffic as well as to validate your QoS policy performance.

 

 

30 Day Full Feature Trial | Live Product Demo | Product Overview Video | Twitter

Summary: How big the botnet problem is, how it can affect your network and how traffic and log analysis can help slay the botnets in your network.

 

As a network administrator, you may have implemented security measures to stop DDoS attacks and upped the ante against malware. You may have your firewalls, ACLs, and Intrusion detection and prevention systems in place to protect your network from attacks originating from the Internet.

 

But have you thought about a scenario where your network is hosting a DDoS attack or sending out spam? Which means your network is contributing to an attack and not under attack.

 

That can happen if the computers in your network have been compromised and are part of a botnet. Other than possible legal issues and blacklisting of your public IP addresses, you may also incur huge bandwidth charges because bots in your network are sending countless spam or taking part in high traffic DDoS attacks. For example, there was this record-breaking DDoS attack that reached 400Gbps at its peak!

 

What is a Botnet?


A botnet is a network of compromised computers called bots which are controlled by a bot master through a Command and Control Center (C&C center). Bots can be remotely configured to forward transmissions that can perform DDoS attacks, email spamming, click fraud, and malware distribution. The number of hosts or bots in a botnet can range from a few thousands to even millions (Zeus, Conficker or Mariposa).

 

The C&C center is the interface through which the bot owner manages his bots, mostly from behind a Tor and the communication methods used include IRC channels, peer-to-peer, social media and now, even the cloud.  Statistics show that each day, more than a 1000 DDoS attacks1 occur and between 40 and 80 billion spam emails2 are sent. Botnets are responsible for almost all DDoS attacks and more than 80% of the total spam sent worldwide3.

 

Detecting Botnets through Analytics:


To stop bots, you will first have to detect them. Bots can lie dormant for months together and become active when it has to take part in a DDoS attack. That doesn’t mean bots are undetectable. It is possible to detect bots by analyzing event logs and network traffic behavior. So, let’s take a look at some common bot behavior that can help with its detection.

 

IRC is one of the methods used by the C&C center to communicate with its botnet and the communication is kept as short as possible to prevent noticeable impact on the network. If you cannot block IRC in your network, analyze your network traffic and check for port-protocol combinations that matches with IRC traffic. And if you see multiple short sessions of IRC, make sure to scan for botnet presence in your network. You can also scan your system logs to find if any new programs have been installed or if there has been unexpected creation, modification or deletion of files and check for modification of registry entries. If any of it smells IRC, you know what you should look for next.

 

A C&C center is what controls the botnet. If the C&C center is taken down, the botnet itself is useless. For resilience, a C&C center has 2 options - one, known as Fast flux, involves constantly changing the IP address associated with the FQDN which hosts the C&C center. The other, which is Domain flux, creates multiple FQDNs each day and allocates it to the IP address of the C&C center. Due to this, the bots will have to do a number of DNS lookups to locate its C&C center. Analyze egress/outbound traffic from your network or logs related to DNS and if you find more DNS lookups than expected or DNS lookups for weird domain names, it could be a bot.

 

Like malware, bots too search for other vulnerable hosts to infect. To find open ports on a host, a burst of packets with the SYN flag set is send either to a single host on multiple ports or to multiple hosts on a single port. If the target port is open, the system responds with a SYN-ACK packet. So, if you see too many conversations from a host to other hosts with the SYN flag set or an increase in the packet count but no major increase in traffic volume, you are possibly looking at a port scan by the bots.

 

Remember the statistic about almost 90% of the total spam being sent by botnets? Something worse than receiving spam is being slammed with a huge bandwidth bill for spam emails sent by bots from your network and even possible blacklisting of your IP address along with other legal troubles. Because spam email has to be sent from your network to the outside, SMTP will have to be leveraged on. If you see an unexpectedly high volume of SMTP traffic originating from your network to the outside, especially from random endpoints, bad news - you are hosting a spam bot!

 

SYN flooding is one of the methods bots use to carry out a DDoS attack. Bots send SYN messages with a spoofed source IP address to its target so that the server’s ACK message never reaches the original source. The server keeps the connection open waiting for a reply while it receives more SYN messages all leading to a DoS. Watch your outbound traffic for conversations with only the SYN flag set but no return conversation with an ACK flag. And check for egress network traffic with the source having invalid IP addresses, such as an IANA reserved IP or a broadcast IP. Both these behaviors can be due to bots taking part in a DDoS attack.

 

While the patterns we discussed here can also be genuine IP traffic behavior, keeping an eye open for anything out of the ordinary and comparing that information with your baseline data or normal network behavior will help you minimize false positives.

 

There are a number of mature, easy to use options for network behavior analysis such as packet capture, flow analysis with technologies like NetFlow or with an SIEM tool. Options such as NetFlow and syslog exports are already built into your routers, switches and systems. You only have to turn it on and use reporting or SIEM tools such as a NetFlow analyzer or log analyzer. Such solutions are cost-effective and does not need complex configuration or set up.  So start your traffic and log analysis to slay those botnets.

 

 

 

 

Reference:

  1. Number of DDoS attacks per day as per the Arbor ATLAS report: http://atlas.arbor.net/summary/dos
  2. Guardian claims the peak was at 200 billion: http://www.guardian.co.uk/technology/2011/jan/10/email-spam-record-activity
  3. 88% of all spam emails are sent by botnets: http://www.techrepublic.com/blog/10-things/the-top-10-spam-botnets-new-and-improved/

http://searchsecuritychannel.techtarget.com/feature/Virtual-honeypots-Tracking-botnets

The Cisco Catalyst 3850 is a fixed, stackable GE (Gigabit Ethernet) access layer switch that converges wired and wireless within a single platform. This switch is based on Cisco’s programmable ASIC named Unified Access Data Plane (UADP) which supports the convergence as well as allows for deployment of SDN and Cisco ONE (Cisco’s version of SDN).


The Catalyst 3850 switch can stack and route, supports PoE, has a higher throughput, larger TCAMs, be your Wireless LAN Controller supporting up to 50 AP and 2000 clients and importantly supports Flexible NetFlow export. And why is NetFlow important? NetFlow has over the years become the de-facto standard for bandwidth monitoring and traffic analytics due its ability to report on the ‘Who, What, When and Where’ of your network traffic.


Flexible NetFlow configuration for Cisco Catalyst 3850 Switch:

The Cisco 3850 needs either an IP Base or IP Services Base license to support Flexible NetFlow (FNF) export.


Flexible NetFlow configuration involves creating a Flow Monitor, Flow Exporter and a Flow Record. Flow Monitor is the NetFlow cache whose components include the Flow Exporter and Flow Record. The Flow Exporter carries information for the export – such as the destination IP Address for the flows, the UDP port for export, interface through which NetFlow packets are exported, cache timeout for active and inactive flows, etc. The Flow Record carries the actual information about the network traffic which is then used by your NetFlow analyzer tool to generate bandwidth and traffic reports. Some of the fields in a Flow Record are source and destination IP Address, source and destination port, transport protocol, source and destination L3 interface, ToS, DSCP, bytes, packets, etc.


So, here is a sample configuration for enabling Flexible NetFlow on a Cisco Catalyst 3850 and exporting it to your flow analyzer such as SolarWinds NTA.


Flow Record:

We start with creating the flow record. From the 'global configuration' mode, the followings commands are to be applied.

 

flow record NetFlow-to-Orion           \\ You can use a custom name for your flow-record

match ipv4 source address                               

match ipv4 destination address

match ipv4 protocol

match transport source-port

match transport destination-port

match ipv4 tos

match interface input

collect interface output

collect counter bytes long        \\ Though "long" is an optional command, readers have stated that NetFlow reporting works only when "long" is used

collect counter packets long


Flow Exporter:

And next for the flow exporter, again from the 'global config' mode.

 

flow exporter NetFlow-to-Orion       \\ You can use a custom name for your flow-exporter

destination 10.10.10.10                     \\ Use the IP Address of your flow analyzer server

source GigabitEthernet1/0/1            \\ Opt for an interface that has a route to the flow analyzer server

transport udp 2055                             \\ The UDP port to reach the server. SolarWinds NTA listens on 2055

 

Flow Monitor:

Now to associate the flow record and exporter to the flow monitor.

 

flow monitor NetFlow-to-Orion          \\ Again, you can use a custom name

record NetFlow-to-Orion                  \\ Use the same name as your flow record

exporter NetFlow-to-Orion               \\ Use the same name as your flow monitor

cache timeout active 60                  \\ Interval at which active conversations are exported - in seconds

cache timeout inactive 15                \\ Interval at which inactive conversations are exported - in seconds

 

Enabling on an Interface:

And finally associate the flow monitor to all the interfaces you would monitor with your flow analyzer. Go to the ‘interface config’ mode for each interface and apply the command:

 

ip flow monitor NetFlow-to-Orion input          \\ Or use the name of your custom flow monitor

 

The above command attaches the flow monitor to the interface you selected after which the ingress traffic that passes across the interface is captured and send to your flow analyzer for reporting.


For a trouble free setup, ensure that your firewalls or ACLs are not blocking the NetFlow packets exported on UDP 2055, and that you have a route from the interface you had selected under flow exporter to the flow analyzer server. And then you are all set. Happy Monitoring!

 

 

 

30 Day Full Feature Trial | Live Product Demo | Product Overview Video | Geeks on Twitter


Filter Blog

By date: By tag: