Deep Packet Analysis - Untangling the Web

In my last post, I took a look the DNS protocol with tcpdump; and as it turns out you can do some really useful stuff with the embedded protocol decoders. So, how far can we take troubleshooting with tcpdump? Well pretty far; but in troubleshooting you have to decide whether the fastest resolution will come from the tools you have to hand or grabbing the capture and using something better. Whilst you *can* definitely do some analysis of HTTP, as we’ll see, once the initial handshake is complete, it gets messy, real quick.


ASCII - Still a thing


The most useful switch for debugging http is -A, which decodes the traffic in ASCII format, which kinda makes it human readable. To kick of a capture on our server we run:


[~] # tcpdump -i eth0 -pnA port 80


Capture 0.PNG


For sanity’s sake; I've snipped out the initial handshake. After a few packets we can see the client's request, the ACK, and the server’s response. Probably the most interesting parts are highlighted in yellow;


  • The HTTP GET request from the client (GET / HTTP/1.1)
  • The server HTTP 200 return code (HTTP/1.1 200 OK)
  • The Content-encoding (gzip)
  • The content type returned (text/html)


Sorry, I've got no HEAD


Various other headers are displayed, but they not usually that useful. Beyond that, it’s HTML:


Capture 1.PNG


But, that doesn't look anything like HTML. There are no lovely <HEAD> or <HTML> tags. The clue is in the client and server headers. Whilst the sessions is not encrypted; with gzip compression enabled, for a human it may as well be. You can’t see the conversation between the client and server once the TCP and HTTP parameters are established. However, we can divine the following:


  1. The URL the client requested
  2. The server was happy to accept the request
  3. The parameters/session features enabled (such as gzip compression)
  4. But not much else


Somewhere in this packet exchange, a redirect sends the client to a different TCP port. However, from just the tcpdump we can’t see that. There *may* be an arcane way of getting tcpdump to decompress gzip on the fly, but I’ll be darned if I can figure it out. As a workaround, you could disable compression on your browser, or use a CLI web tool such as cURL. However, changing things just to troubleshoot is never a good idea; and that wouldn't help if your problem is with gzip compression.


404 - Not Found


Another example shows a less than healthy connection:


Capture 2.PNG


This time, the client is requesting reallybadfile.exe; so the server returns a 404 Not Found error. Random clients attempting to request executables is of course an example of virus or other malicious activity. Many firewalls can filter this stuff out at the edge, but this a job best suited to a load balancer or Application Delivery controller (posh load balancer).


If you are just interested in the negative status codes, you of course can just pipe the output to grep:


[~] # tcpdump -i eth0 -pnA port 80 | grep '404\|GET'


Capture 3.PNG


This is an especially quick and dirty method; of course you could pipe in multiple status codes, or use egrep and a regex, but from the CLI you run a pretty big risk of missing something.


(insert Wireshark pun)


Sometimes, it’s best to admit defeat and make a permanent capture for analysis elsewhere. To do this we use the -w switch to write the packets to disk. The verbose switch is also helpful here as it reports the number of packets received during the capture so you know you've actually caught something..


[~] # tcpdump -i eth0 -pnAv port 80 -w httpcapture.pcap


Capture 3.PNG


Then, the  session can then be analysed in Wireshark with minimal effort. Just grab the file any way you can, and open it; the built in decoders will work their magic. Once loaded, go to Analyze and Follow TCP Stream. This will show you a cleaned up version of the capture we took in the beginning; but with the payload still encoded.


Capture5.PNG


No problem, just go and find the HTTP response packet, dig through and the payload should be in the clear:


Capture 6.PNG

And here, now highlighted in yellow we can see a tiny piece of JavaScript that redirects my HTTP session to another location. By not using a HTTP/1.1 300 status code to redirect the session, it became much harder (if not impossible) to realise what was going on just using tcpdump. With some elite-ninja regex skills, and a perfect recollection of the HTTP protocol, maybe you could have figured this out without bringing in the Wireshark heavy artillery. However, for mere mortals such as myself, it’s just a case of knowing when to bring in the really big hammer.


So, that’s my little treatise on the deep packet analysis, and with some practical applications. Please let me know in the comments your thoughts, and any tips you can share with tcpdump, Wireshark, or any of the other excellent tools out there.


Parents
  • I considered introducing the -X switch as part of this post, but I struggled to find anything useful that the standard decoders with either -v or -vv wouldn't tell you. The -A switch will at least show you any human-readable parts easily; I've find that if I'm worrying about what's going on in hex, then it's time to open Wireshark, because at that point it's not a quick or easy fix..

Comment
  • I considered introducing the -X switch as part of this post, but I struggled to find anything useful that the standard decoders with either -v or -vv wouldn't tell you. The -A switch will at least show you any human-readable parts easily; I've find that if I'm worrying about what's going on in hex, then it's time to open Wireshark, because at that point it's not a quick or easy fix..

Children
No Data
Thwack - Symbolize TM, R, and C