Skip navigation

Geek Speak

8 Posts authored by: glenkemp

In my last post, I took a look the DNS protocol with tcpdump; and as it turns out you can do some really useful stuff with the embedded protocol decoders. So, how far can we take troubleshooting with tcpdump? Well pretty far; but in troubleshooting you have to decide whether the fastest resolution will come from the tools you have to hand or grabbing the capture and using something better. Whilst you *can* definitely do some analysis of HTTP, as we’ll see, once the initial handshake is complete, it gets messy, real quick.


ASCII - Still a thing


The most useful switch for debugging http is -A, which decodes the traffic in ASCII format, which kinda makes it human readable. To kick of a capture on our server we run:


[~] # tcpdump -i eth0 -pnA port 80


Capture 0.PNG


For sanity’s sake; I've snipped out the initial handshake. After a few packets we can see the client's request, the ACK, and the server’s response. Probably the most interesting parts are highlighted in yellow;


  • The HTTP GET request from the client (GET / HTTP/1.1)
  • The server HTTP 200 return code (HTTP/1.1 200 OK)
  • The Content-encoding (gzip)
  • The content type returned (text/html)


Sorry, I've got no HEAD


Various other headers are displayed, but they not usually that useful. Beyond that, it’s HTML:


Capture 1.PNG


But, that doesn't look anything like HTML. There are no lovely <HEAD> or <HTML> tags. The clue is in the client and server headers. Whilst the sessions is not encrypted; with gzip compression enabled, for a human it may as well be. You can’t see the conversation between the client and server once the TCP and HTTP parameters are established. However, we can divine the following:


  1. The URL the client requested
  2. The server was happy to accept the request
  3. The parameters/session features enabled (such as gzip compression)
  4. But not much else


Somewhere in this packet exchange, a redirect sends the client to a different TCP port. However, from just the tcpdump we can’t see that. There *may* be an arcane way of getting tcpdump to decompress gzip on the fly, but I’ll be darned if I can figure it out. As a workaround, you could disable compression on your browser, or use a CLI web tool such as cURL. However, changing things just to troubleshoot is never a good idea; and that wouldn't help if your problem is with gzip compression.


404 - Not Found


Another example shows a less than healthy connection:


Capture 2.PNG


This time, the client is requesting reallybadfile.exe; so the server returns a 404 Not Found error. Random clients attempting to request executables is of course an example of virus or other malicious activity. Many firewalls can filter this stuff out at the edge, but this a job best suited to a load balancer or Application Delivery controller (posh load balancer).


If you are just interested in the negative status codes, you of course can just pipe the output to grep:


[~] # tcpdump -i eth0 -pnA port 80 | grep '404\|GET'


Capture 3.PNG


This is an especially quick and dirty method; of course you could pipe in multiple status codes, or use egrep and a regex, but from the CLI you run a pretty big risk of missing something.


(insert Wireshark pun)


Sometimes, it’s best to admit defeat and make a permanent capture for analysis elsewhere. To do this we use the -w switch to write the packets to disk. The verbose switch is also helpful here as it reports the number of packets received during the capture so you know you've actually caught something..


[~] # tcpdump -i eth0 -pnAv port 80 -w httpcapture.pcap


Capture 3.PNG


Then, the  session can then be analysed in Wireshark with minimal effort. Just grab the file any way you can, and open it; the built in decoders will work their magic. Once loaded, go to Analyze and Follow TCP Stream. This will show you a cleaned up version of the capture we took in the beginning; but with the payload still encoded.


Capture5.PNG


No problem, just go and find the HTTP response packet, dig through and the payload should be in the clear:


Capture 6.PNG

And here, now highlighted in yellow we can see a tiny piece of JavaScript that redirects my HTTP session to another location. By not using a HTTP/1.1 300 status code to redirect the session, it became much harder (if not impossible) to realise what was going on just using tcpdump. With some elite-ninja regex skills, and a perfect recollection of the HTTP protocol, maybe you could have figured this out without bringing in the Wireshark heavy artillery. However, for mere mortals such as myself, it’s just a case of knowing when to bring in the really big hammer.


So, that’s my little treatise on the deep packet analysis, and with some practical applications. Please let me know in the comments your thoughts, and any tips you can share with tcpdump, Wireshark, or any of the other excellent tools out there.


In my last post I looked at how flags can pull useful information out of packet that we otherwise we might struggle to see.  This time, we’re going to use tcpdump to look into the actual applications.


The first application I'm going to look at is the humble Domain Name Service (DNS), the thing that needs to work flawlessly before any other application can get out of bed. Because DNS lookups are typically embedded in the OS IP stack, a packet capture is often the only way to get any visibility.


The NS in a Haystack


In my scenario, I suspect that something is amiss with DNS, but I'm not sure what. So to pick up just DNS traffic, we only need to capture with simple filter:


[~] # tcpdump -i eth0 -pn port 53


Capture.PNG


In the above example, even with minimal options selected, we can see some really useful information. The built-in decoder pulls out the transaction ID from the client (equivalent to a session ID), the query type (A Record) and the FQDN we are looking for. What is unusual in this example is that we can see not one, but two queries, about 5 seconds apart. Given that we are filtering on all port 53 traffic, we should have seen a reply. It would appear that my local DNS proxy (172.16.10.1) for some reason failed to respond. The client-side resolver timed out and tried the Google Public DNS. This may be a one-time event, but but it certainly bears monitoring. If the client configuration has an unresponsive or unreliable DNS server as first port of call, a the very least, this will manifest in a frustrating browsing experience.


Selection of the Fittest (and Fastest)


Selection of DNS servers is pretty important; I hadn't realised that my test Linux box was using Google as a secondary resolver. Whilst it is reliable; it’s actually four hops and a dozen milliseconds further away than my ISP service. When your broadband is as crappy as mine, every millisecond counts.


Anyway, as you can see, Google returns eight A records for google.co.uk; any of them should be fine.


Another thing to look for is what happens when we make an invalid query, or there is no valid response:


Capture2.PNG


In this case we get a NXDomain (non-existent domain) error. This case is an obvious typo on my part, but if we turn up the logging with the very verbose (vv) switch the response is still interesting:


[~] # tcpdump -i eth0 -pnvv port 53


Capture3.PNG


Highlighted above is the SOA (start of authority) record for the domain .ac.uk; this is far as the server was able to chase the referral before it got the NXDomain response. 


Edit - Contributor JSwan Pointed out a small mistake; I've fixed the below version.


Whilst a bunch of stuff is revealed with very verbose enabled; not all of it is useful. One thing to look for is the IP time to live (TTL); this shows how many hops the packet has made since leaving the source. If this number is low, it can be an indicator of routing problems or high latency (I did say it wasn't very useful!).


Furthermore, the DNS protocol-specific TTL can be seen highlighted in yellow, after the serial number in date format. The DNS TTL specifies how long the client (or referring server) should cache the record before checking again. For static services such as mail, TTLs can be 24 hours or more. However, for dynamic web services this can be as low as 1 second. TTLs that low are not a great idea; they generate HUGE amounts of DNS traffic which can snowball out of control. The moral is, make sure that the TTLs you are getting (or setting) are appropriate to your use-case. If you failover to your backup data centre, with a DNS TTL of a week, it will be a long time before all the caches will be flushed.


As JSwan points out in the comments, if you use the very very verbose switch (-vvv), for A records tcpdump will display the DNS TTL in hours, minutes and seconds:


Capture4.PNG


Apparently Google has very short TTL. Interestingly, tcpdump doesn't print the DNS TTL for NXDOMAIN result, although it is still visible in the capture.

 

Why is capturing in context important?


Imagine trying to troubleshoot connectivity for a network appliance. You have configured IP addressing, routing and DNS, but yet it cannot dial-home to it’s cloud service. Unless the vendor has been really thorough in documenting their error messages, a simple DNS fault can leave you stumped. Once again, tcpdump saves the day; even on non-TCP traffic. The built in protocol decoder gives vital clues as to what may be borking a simple communication problem.


In my next and final blog of this series, I’m going to look at another common protocol, HTTP.





In my last post I talked through some of the reasons why mastering tcpdump is useful. Building on our previous example in this post I’ll focus on using TCP flags for troubleshooting.


Even with our cleaned up filter, we can still see quite a lot of traffic that we don’t care about.  When troubleshooting connectivity issues, the first packet is the hardest; especially when you start involving firewalls. I’m sure you will recall, a TCP packet flagged with SYN is the first sent when client tries to establish a layer-7 connection to a remote host.  On a LAN, this is simple, but when the destination is on a different network, there is more to go wrong with the inevitable NAT and routing.


Troubleshooting approaches differ, but I prefer to jump on the target console and work backwards as traffic is usually only policed on ingress. We need to know whether our packet is reaching our target. This way we can look down the stack (at routing and firewalling) or up to application itself.


This is where a working knowledge of TCP flags and a copy of tcpdump is helpful. 


Signalling with Flags


Each TCP packet contains the source and destination port, the sequence  and acknowledgement numbers as well as a series of Flags (or control bits) which indicate one or more properties of the packet. Each flag is a single bit flipped on or off. The below diagram shows the fields in a TCP packet. The source and destination ports are 16 bit integers (and is where we get the maximum 65535 from), but if you look at offset 96, bits 8 through 15 are individual flag, SYN, ACK, RST, etc.



Capture 0.PNG


When a connection is setup between a client and server, the first three packets will have the SYN and ACK flags set. If you look at them in any packet analyser, it’ll look something like this:


Client -> Server (SYN bit set)

Server -> Client (SYN and ACK bits set)

Client -> Server (ACK bit set)


To make sure we are actually getting the traffic, we want to see the first packet in the connection handshake. To capture packets where only the SYN flag is set, we use the following command from the server console:


[~] # tcpdump -i eth0 -n 'tcp[tcpflags] & (tcp-syn) != 0 and port not 22’


The above tcp option filters on specific packet fields. In this case we are using tcpdump’s built-in shorthand to look at the bits associated with TCP SYN (the ‘Flags [S]’ in yellow). 


The other thing we are introducing is using ‘single’ quotes on the filter. This prevents the shell from trying to interpret anything within brackets.


In the below example we can see three packets sent ~500ms apart (also highlighted in yellow). You must consider that almost everything on the wire is filtered out, we are only hearing one side of the conversation.


Capture1.PNG


Three packets with the same source and destination ports transmitted 500ms apart us a clue to what is happening. This is the typical behaviour of a client connection that received no response, assumed the packet was lost in transit and tried twice more.


What does the flag say?


Having taken this capture from the server; we know the outbound communication is working, so it unlikely that an intermediate firewall is causing a problem. My hunch is the server is not completing the connection handshake for some reason. A quick and dirty way is to check for TCP Reset packets; the host’s universal way of asking for a “do-over” and restarting the handshake. Hosts will respond with a TCP reset when there is no application listening; the lights are on; but no-one is home.


[~] # tcpdump -i eth0 -n 'tcp[tcpflags] & (tcp-rst) != 0 and port not 22'


Capture2.PNG


I took the above captures a few minutes apart, but for every TCP SYN, there is a TCP RESET from the server. Whether the server is actually listening on any ports or interfaces on that target destination port (3333) is easily confirmed with:


[~] # netstat -na | grep 3333


If no results are returned, the service ain't running. When taking captures from a firewall, you should expect different behaviour. In 99% of cases if a packet doesn't a match policy; it will be dropped without an acknowledgement (ACK) or reset (RST) packet.


These are not the flags you are looking for


With the tcpflags option we can pipe in additional matches. For example, we can look for all traffic where the tcp-syn and tcp-ack flags are not set to 0.


tcpdump -i eth0 -n 'tcp[tcpflags] & (tcp-rst|tcp-ack) != 0 and port not 22'


However, everything with SYN or ACK set  doesn’t constitute much of a filter; you are going to be picking up a lot of traffic.


Rather than just filtering on source/destination ports, why do I care about TCP flags? I'm looking for behaviour rather than specific sources or destinations. If you are troubleshooting back-to-back LAN PCs, you wouldn't bother with all the Vexillology. However, with hosts either side of a firewall, you can’t take much for granted. When firewalls and dynamic routing is involved traffic may cross NAT zones or enter an unexpected interface.


It’s easy to switch to source/destination filtering once you've “found” the flows you are looking for; I try and avoid making assumptions when troubleshooting.


In my next post I’ll dig into the payload of two common protocols to see what we can learn about layer 7 using only tcpdump.





Well hello there, returning like a bad penny, I am here to talk again about Deep Packet Analysis. In my last series of blogs I talked about the use-cases for Deep Packet Analysis but conspicuous by it’s absence was a lack of real world applications. This time I thought I would dust off my old-timey packet analysis skills and share some practical applications. I’ll focus on troubleshooting but no doubt we’ll wander into security and performance as well.


Whilst Solarwinds have some excellent tools for network performance management, there will be occasions where they won’t be available. For example, when troubleshooting remote systems without a full desktop or limited privileges. We’d like to think that those expensive “AcmeFoo Widget 5000” appliances use a custom built operating system. However, instead of an OS written in assembly by virgin albino programmers, it’s usually a headless Linux distribution with a fancy web GUI. As a result, the tools are pretty universal. Wireshark is the kind of tool that most administrator-types would have on their desktop. It has no end of fancy GUI knobs; click randomly long enough you are bound to find something noteworthy. However, when you don’t have access to a desktop or can’t export a complete dump, working with the available tools may be your only option. One of the most of basic, and powerful, is of course tcpdump. Available on most platforms, many vendors use it for native packet capture with a CLI or GUI wrapper.


How does Packet Capturing work?


A packet capture grabs packets from a network interface card (NIC) so they can be reviewed; either in real-time or dumped to a file. The analysis is usually performed after the event in something like Wireshark. By default, all traffic entering or leaving the NICs of your host will be captured. That sounds, useful, right? Well SSH or Telnet (you fool) onto a handy Linux box and run:


[~] # tcpdump

or if you are not root:

[~] # sudo tcpdump


And BWAAA. You get a visit from the packet inception chipmunk.

 

tcpdump 1.PNG


Filling your screen is a summary of all the traffic flowing into the host (press control+c to stop this, BTW).  This mostly consists of SSH traffic from your workstation, which contains the SSH traffic etc. In the days of multicore everything, this is not so much of a problem. On a feeble or marginal box an unfettered tcpdump can gobble cycles like some sort of Dutch personal transport recycling machine. To make what is flying past your screen readable, and stop your CPU from getting caned, a filter is needed.


So, to do a capture of everything except the ssh traffic, Use the following:


[~] # tcpdump not port 22

tcpdump 2.PNG

And BWAAA. Your second visit from the packet inception chipmunk (it’s a slightly smaller than the first, to simulate this just turn down the volume).


This time, you can see all the traffic heading into the NIC; and there is probably more than you thought. What’s not obvious is that tcpdump and other packet interception technologies operate in promiscuous mode. This is a feature of the NIC designed to assist in troubleshooting. A tonne of network traffic arriving at the NIC is not destined for the host. To save host CPU cycles this is silently ignored and is not passed up the stack. If your NIC was connected to a hub (or ethernet bridge) there would be a lot of ignoring. Even on a switched network there is a lot of broadcast noise from protocols such as ARP, NetBIOS, Bonjour, uPNP, etc. Promiscuous mode picks up everything on the wire sends it up the stack to be processed, or in our case, captured.


Troubleshooting with all this rubbish stuff floating around is difficult, but not impossible. If you intend analysing the traffic in Wireshark, filtering after the event is easy. However in our pretend scenario we can’t export files, but we can filter in-place with the --no-promiscuous-mode (or -p) option.


[~] # tcpdump -p not port 22


tcpdump 3.PNG


You’ll still see a lot of traffic from ARP and the like, but it should be much cleaner. Still firmly in the realm of Marmot-family filmic tropes, the process of packet capturing actually generates a lot of traffic. By default, tcpdump will try and reverse map IPs to hostnames in the traffic it collects. You could take the not port filter a bit further and exclude your DNS servers, or the DNS protocol entirely with:


[~] # tcpdump -p not port 22 and not port 53

or

[~] # tcpdump -p not port 22 and not host 8.8.8.8


But both of those are a bit clumsy, as whatever we are trying to fix may be DNS related. When tcpdump attempts this resolution a lot of secondary traffic can be generated. This may alarm a Firewall administrator; as the host may not normally generate outbound traffic.  Furthermore, DNS is sometimes used for data exfiltration. A host suddenly generating a lot of queries (caused by an “innocent” tcpdump) could cause unnecessary panic. DNS resolution can also play tricks; it’s showing you what the DNS server thinks the source or destination is; not what the packet headers actually say. The better option is just turn the darn thing off with the -n option:

 

[~] # tcpdump -pn not port 22

 

tcpdump 4.PNG


So there we have a nice clean real-time output of what is going on, but it’s still a bit vague. Don’t worry if you don’t understand what you are actually seeing here, we will come to that.


Again, the default behaviour is not that helpful. Unless you tell it otherwise, tcpdump will pick a NIC to capture on, usually defaulting to eth0. If you are dealing with a host with a single NIC, this is a good guess. However on a server or firewall, the traffic direction traffic and source NIC matter.


The final option I shall mention is to specify the interface on which to capture. The command:


[~] # ifconfig

eth0      Link encap:Ethernet  HWaddr 00:08:9B:BD:CC:9F 

          inet addr:172.16.10.220  Bcast:172.16.10.255  Mask:255.255.255.0

          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

          RX packets:15590165 errors:0 dropped:104804 overruns:0 frame:0

          TX packets:14783138 errors:0 dropped:0 overruns:0 carrier:0

          collisions:0 txqueuelen:532

          RX bytes:1908396023 (1.7 GiB)  TX bytes:906356383 (864.3 MiB)

          Interrupt:11


lo        Link encap:Local Loopback 

          inet addr:127.0.0.1  Mask:255.0.0.0

          UP LOOPBACK RUNNING  MTU:16436  Metric:1

          RX packets:28772 errors:0 dropped:0 overruns:0 frame:0

          TX packets:28772 errors:0 dropped:0 overruns:0 carrier:0

          collisions:0 txqueuelen:0

          RX bytes:14490908 (13.8 MiB)  TX bytes:14490908 (13.8 MiB)



Will show you a list of configured interfaces. To tell tcpdump to only capture on one, just use the -i switch followed by the logical interface name.

 


[~] # tcpdump -i eth0 -pn not port 22


tcpdump 5.PNG


If you don’t know which interface the traffic is appearing, use the any switch to pick them all. This disables promiscuous mode at the same time, so you don’t need the -p option. However, I found this to be less than universal; its not not supported by every OS/build of tcpdump.


[~] # tcpdump -i any -n not port 22


So, there we have it. A cleanish packet capture which shows us what’s going on with the upper layer protocols.  In my next post I’ll dig a bit deeper with some filters and advanced switches for “on the spot” troubleshooting.




In my last post, I talked about some of limitations of DPI and the kind of tools used to combat it. However, DPI is useful in a couple of other scenarios and in this final post, I’ll cover them also.


Traffic Optimisation


As traffic flows across a network, it needs to be managed as it crosses points of contention. The most common point of (serious) contention are WAN links, but even inter-switch links can be put under pressure. Simple source/destination/port policies are often used to match protocols against prioritisation queues. However, for much the same reasons port matching is not good enough for security (lack of specificity), it’s not really good enough for traffic optimisation.


In an enterprise network, consider a branch office that accesses applications in a server farm at a remote data centre. Creating a class for HTTP alone isn’t much use it’s used by many distinct applications. Creating a class based upon destination subnet alone isn’t going to be much cop either. In a virtualized or VDI environments server IPs are going to change often. IP/Port classes are helpful when you need to pull protocols (such as Citrix or Oracle) to the top of pile, but that isn’t good enough in the context of highly contended or long-haul WAN links.


DPI is used by packet shaping and WAN optimization devices to identify traffic flows to the molecular level; individual user and/or Citrix/Virtual Desktop application. This is necessary for two reasons:


  1. So that the administrator may apply granular policies to individual applications
  2. To identify traffic that may be optimized; either as simple priority access to bandwidth, or something more invasive such as Layer 4-7 protocol optimization. 


In the context of protocol optimisation (read, WAN optimization or acceleration) correctly identifying traffic flows is critical. As an example, many years ago Citrix moved the default port for sessions from TCP 1494 to TCP 2598. Many bandwidth management policies identified Citrix by the TCP port alone. When the port moved, unless the network or Citrix administrator was paying particular attention, latency-sensitive traffic was thrown in with the “best effort” class. Unsurprisingly, this usually resulted in an overwhelmingly negative user experience.


Troubleshooting


Deep packet inspection is incredibly useful when it comes to troubleshooting network behaviour. DPI can be used to identify applications on the network (such as the Citrix example above) but also for identify the behaviour of applications, and is the final tool to identify a “something bad” happening on the network.
Just to recap, here is a summary of the “something bad” that DPI technologies can address:


  1. Firewall PAD: Filtering out a deliberately malformed packet that would crash a web server
  2. Firewall Signature: Identifying an otherwise correctly formatted packet that produces bad behavior on a web server that otherwise would lead to an exploit (such as dropping a root kit onto a vulnerable host)
  3. Firewall SSL Inspection: Looking into an encrypted session for either of the previous two attacks.
  4. Traffic Optimisation: Identifying and the limiting applications that wish to use excessive amounts of bandwidth
  5. Identifying applications that are behaving badly despite being from a security perspective “clean” and correctly optimized.


Making this a bit more real world; consider the following scenario. External users complain that a web application you host is performing very badly. Firewall PAD and signatures shows that the traffic between the client and the web server is clean; and there are no other apparent attacks.  Traffic optimization ensures this critical application has priority access to bandwidth between the Web server and it’s Database server. However, performance still sucks. DPI tools can be used to analyze the flows between the client, web, and database server. With access to the entire application flow, poorly optimized SQL queries may be identified. This cascade effect can only be identified by tools that understand the application flows not from a security or bandwidth perspective, but in it’s native context. In my view, these kinds of tools are the least well-implemented and understood, and their widespread and proper use could massively improve the time-to-resolution on faults, and identify many things that were previously not in use.


Deep Packet Inspection techniques are used in many places on the network to address a variety of challenges; I’ve focused on security but there are many other applications of DPI technology. And hopefully I’ve made clear that the  correct identification of network traffic is critical to the proper operation of networks and application management in general.


This is my last post in Geek Speak, I'd like to thank everyone who's read and commented. I've really enjoyed writing these articles for this community, and the feedback has steered the conversation in a really interesting direction.  I've no doubt that I'll continue lurking here for some time to come!


Peace


Glen Kemp (@ssl_boy)




In my last post I talked (briefly) about Protocol Anomaly Detection (PAD) and Signatures. My original intention was to talk about other topics, but given that the security aspect has been so popular, I'm going talk more about the security aspects and (quickly) circle back to optimisation and performance management later.



Obfuscation


Like any security inspection technology, there are limitations and trade-offs to Deep Protocol Inspection (DPI). DPI can only make security decisions on visible traffic; and that is becoming an increasingly common problem. Encrypting traffic with SSL or TLS was traditionally reserved for banking applications and the like, but the exploits of the NSA has lead many organisations to adopt a “encrypt everything” approach, including simple web traffic. Applying PAD or Protocol Signature checks to network flows after they have been encrypted is difficult, but not impossible. The size of the problem it is influenced by the direction of traffic you are trying to inspect. For example:


  • To inspect your organisation's inbound traffic (e.g. heading towards a public web server).  For a server under your administrative control you should have access to the private key used for encryption. In this case, the firewall (or other network security device) can decrypt the traffic flows on the fly and make a decision based on the payload. Alternatively, if you have server load balancer that supports SSL offload (and can’t think of any that don’t), you may choose to inspect the traffic after it has been successfully decrypted.
  • Inspecting traffic as it leaves your network (e.g. headed to a 3rd party website) is trickier.  When the firewall sees a client-to-server SSL session being established (after the TCP 3-Way), the firewall takes over and acts as a “man in the middle” (MITM).  The client establishes an SSL session to the firewall, and in turn the firewall creates a separate session to the outside server. This allows the firewall (or other secure-proxy device) to inspect traffic transiting the network looking for either malicious or undesirable traffic (such as credit card numbers heading out to the network). The drawback is that this method totally breaks the browser trust model (well breaks it even more). When the SSL session is set up, the server proves its identity by sending a public key; a certificate counter-signed by a Certificate Authority known (or trusted) by the browser. This mechanism is designed to prevent MITM attacks from happening. However, when the firewall intercepts the session, the user’s browser would spot the mismatch between the identity of the real server and the one provided by the Firewall. Encryption will still work, but the user will get a very visible and very necessary “nag” screen. The trick to preventing the “nag”, is to mess with the trust model further by creating a “wildcard” certificate. This allows the firewall to impersonate any SSL enabled site. For this to work, the CA key for this bogus certificate is placed in the “Trusted Root CA Authorities” list on any device that needs to connect through your network inspection point (firewall, web proxy, etc.). If your network consists of Windows 2000->8.x domain-member workstations this is about 5 minutes work. If you have almost anything else connected, it can be a significant logistical exercise.  In fact, a significant part of the challenges around “Bring your own Device” (BYOD) policies are around establishing client-to-network trust; and most of them have to mess about using certificates to do it.


Performance


As has been mentioned by several thread commenters, applying these advanced protocol-level defences have an impact on performance.

  • PAD features are relatively lightweight and are often implemented in hardware as they are dealing with a limited selection of parameters. There is only a limited definition of what the protocol-compliant service should be expected to deal with and match traffic as “good” or "bad”.  I would expect basic PAD features to be enabled on almost all firewalls.
  • Protocol Pattern matching is a bit more difficult. For each packet that comes in, it has to be matched to a potentially huge list of “known bad” stuff before it can classed as good. For common services such as HTTP, there are many thousands of signatures that have to be processed, even when the traffic has been successfully identified.  Inescapably, this takes time and processor cycles. A few firewall vendors use specialized ASICS to perform the inspection (fast but costly) but most use conventional x86 processor designs and take the hit. This is why having an appropriately and updated suite of patterns attached to your security policy is critical. Telling a firewall (or other security device) to match against all known vulnerabilities is a fools errand; it is far better to match against the most recent vulnerabilities and the ones that are specific to your environment. For example, inspecting traffic heading to an Apache server looking for IIS vulnerabilities wastes processing resources and increases latency.
  • SSL inspection. SSL inspection creates a significant workload on firewall or secure web proxy, as a result many hardware devices use dedicated SSL processors to perform the encrypt/decrypt functions. Any given device has a finite capacity, which makes it critical to decide up-front what, and how much traffic of encrypted you want to inspect.


A “good” firewall is the one that is properly matched to your environment, and properly maintained. All the fancy application identification and protocol security techniques won’t help if under production loads it barfs the first time you turn them on, or you fail to regularly review the your policy.


In my next and final post, I shall touch on the performance management and troubleshooting aspects of DPI, Thanks for reading and thank you even more to those who participated in the little survey at the end my last post!




In my last blog, I introduced some of the basics of network security and why enforcing traffic on the standard source port isn't enough to determine what’s in a payload. I've talked mostly about port 80 and HTTP, but the ports associated with DNS (UDP 53) are also often abused.


In the context of firewalls, Deep Packet Inspection (DPI) is often the first line of defence against port tunnelling or applications that obfuscate their intent. Different security vendors call it different things, but the differences amount to marketing and implementation. Firewall DPI looks beyond the simple source/destination/port three tuples and attempts to understand what is actually going on. There are two ways this is commonly achieved, protocol anomaly detection (PAD) and signatures checks.


Protocol Anomaly Detection


Commonly used protocols usually have an IETF RFC associated (the ones that don’t tend to be closed implementations of client/server apps). Each RFC defines the rules that two hosts must follow if they are to successfully communicate. As an example the specification for HTTP 1.1 is defined in RFC 2616*. It lays out the standard actions for GET, POST, DELETE etc.  PAD inspects the traffic flowing through the firewall (or IDS device, for that matter) and compares it with a literal definition of the RFCs (plus any common customizations). If TCP packet contains a payload of HTTP but is using the ports typically associated with DNS; then clearly something is amiss.  With PAD enabled, some applications that attempt to tunnel using any open port (Skype and VPN clients are common culprits) may be stopped by the firewall. Additionally, it prevents some vendor-implementation attacks. For example, if bounds checking isn't properly implemented a malformed string may cause a process to crash; or arbitrary code execution. A nice, tight, PAD engine should pick this up and protect a vulnerable server.  Code Red was the classic example and to this day, very occasionally, I see a match signature match in my firewall logs as some lonely, un-patched IIS server trawls the net looking for someone to infect..


PAD has it limitations though;


  • Some protocols change often and are not that well documented; valid traffic can sometimes be blocked by an out-of-date or too restrictive implementation of PAD
  • It’s not that hard to write an application that appears to confirm to the RFC but still allows the exchange of data


As result, PAD is often combined with another common defence; protocol signatures.


Protocol Signatures


Protocol signatures are analogous to desktop AntiVirus signatures. Every time something “bad” is detected (lets take Heartbleed as a recent example), security vendors scramble to create a pattern that matches so it can be identified and blocked. I've written about this elsewhere before, but signatures are also an imperfect measure of intention. There are often shades of grey (and no, there are not 50 of them, you filthy people) between what is definitely good traffic and definitely bad. But, it’s not all bad. As a side-effect, signatures can also be used to be provide fine control over the users web activities.
This is not an accurate science, and vendor implementations vary greatly in what they can offer.  For example, identifying Facebook over HTTP is straightforward enough, but blocking it outright is unlikely to win many friends, especially at the executive level. Apparently, a lot of business is conducted on Facebook so draconian policies are effectively impossible. As result, one has to disappear down the rabbit-hole of Facebook applications. Some vendors boast of being able to identify thousands of individual applications, but trying to establish a meaningful corporate security policy on each one would be futile. The best firewall implementations break it down into categories, and enable the creation of a policy along these lines:


  • Permit Chat applications, but warn the user they are logged (and not just by the NSA)
  • Block Games outright, except after 6pm
  • Otherwise Allow Facebook


This is not perfect, and indeed determined users will always find a way past, but it might deter that idiot in Sales from abusing his ex-girlfriend in Accounts on company time and equipment.


In most cases, both PAD and signatures are enabled on the firewall. Signatures are good for identifying stuff that is known to be bad, whilst PAD can mop up *some* of the unknown stuff and prevent some protocol tunnelling attacks.


For my next post, I’m going to move onto the “Making things go faster” aspects of DPI. Edit: Sorry, after the popular vote I've covered the security limitations of DPI. Plenty more to come!


* During the research for this post I discovered that RFC 2616 is essentially deprecated have been rewritten as a series of more precise RFCs, 7230-7235. I stumbled across this blog by one of the revision authors, so I thought it would be nice to share.





In this series of blogs I’m going to explore some of the use-cases for deep packet inspection on the network and types of devices that use it. I’ll also cover some of the common caveats for each specific definition and possible resolutions.


There are (at least in my estimation) three common(ish) use-cases for Deep Packet Inspection (DPI):


  • Keeping bad guys out - using DPI to identify “bad” traffic on a Firewall or Intrusion Detection / Prevention system
  • Making things go faster - using DPI to identify traffic flows that should given priority access to bandwidth, or are candidates for protocol optimisation/compression
  • Trying to fix something - DPI is very useful when trying to understand when an application, or indeed an entire network is running slowly

 

The security functions and limitations of DPI are also discussed here.


There are of course overlaps, some network devices can use DPI to achieve all the above. If I’ve forgotten or missed any use-cases, I’m sure you’ll let know me know in the comments


As is my want, I’ll start with my own primary use case for DPI; keeping the bad guys out, but first it’s worth visiting some security fundamentals.


Trust in your network


A network of any size will usually have at least two zones of trust; Trusted (i.e. the internal network) and Untrusted (i.e. the dirty internet). Things start getting murky when you consider zones of partial or incomplete trust, such as guest Wi-Fi, links to 3rd parties, or Internet facing services.  In most cases traffic between these zones of trust is policed by a common-garden firewall. Traffic that wants to be somewhere else is routed to the firewall network interface; a security policy is checked before the firewall forwards the traffic to it’s next hop. The policy is matched against the simplest parameters in standard IP headers; where it is coming from (source address), where it wants to go (destination address), and the apparent protocol (destination port). There are of course exceptions; some people use conventional servers as a security enforcement points and/or try and enforce security on layer-2 broadcast domains; but those people are mostly crazy. The non-crazy exception is when you need security between virtualized hosts, but that's a can of worms which I’ll come back to.


The Evolution of HTTP


Most security policies are straightforward enough; Allow this server access to the internet. Prevent the internet from telneting on to my core router; that kind of thing. Where it gets a bit trickier is when you need to start making policy decisions based upon something ambiguous such as HTTP. Allowing any group of users unfettered access to this protocol is right up there with dousing yourself in petrol and taking up chainsaw juggling. In a fireworks factory. In North Korea. With Courtney Love. This is because HTTP has evolved beyond it’s intended purpose into a transport that used for (from the top of my head): voice chat, video chat, text chat, email, large files transfer, Remote Desktops/Terminal Services, every flavour of Social media, and very occasionally, displaying a web page. The point is, allowing port 80 through the firewall in any direction (either from a group of your trusted users to the Internet or from the Internet to a server you have anything less that total ambivalence towards) is a sure-fire way of bad things happening.  Whilst you can infer things like probable country of origin from the IP address or application in use from the headers, you are far from being able to establish intent, and whether the HTTP payload is actually something you’d want transiting your network.


In the next post, I’ll dig deeper into how DPI on firewalls can help you avoid the whole Petrol/Chainsaw/North Korea/Love paradigm.

Filter Blog

By date: By tag: