    Tracking Web Sites Visited with NTA


      In an effort to "increase productivity" I have been asked to track employee web site usage during business hours.  I have been using NTA to monitor and track network usage as a whole in terms of protocols and have a pretty good indication of top users, top protocols, etc.  However, when looking into web usage I find I get a little lost.  I am tracking top conversations, top transmitters and receivers.

      When I look at the lists of external hosts I see DNS info that I am having a hard time sorting out.  I see stuff like www.facebook.com that are pretty easy to understand, then there are entries like 80.mtl-mg07.streamtheworld.net ( and I can assume it is streaming audio via web browser.  But then I see entries like a63-80-4-73.deploy.akamaitechnologies.com ( or cdn-68-142-93-133.sea2.llnw.net ( and I'm not sure what to do with those.  These are all http conversations, using port 80.

      I'm not really sure how to take that info and turn it into a web site.  I've tried browsing to those IP's with port 80, but most of the time I get nothing.  Any hints?

      Thanks in advance.

          You can't get accurate statistics at the HTTP level with NetFlow. This isn't a limitation of NTA: it's due to the fact that most websites are hosted on servers that have multiple sites associated with the same IP address, and are differentiated only by the host headers in the HTTP request. Flow-level data doesn't contain HTTP headers. Furthermore, many sites are hosted on content distribution networks like Akamai and LLNW, which makes flow-based tracking extremely difficult.

          You need an HTTP-level inspection tool to get accurate information about websites visited.

            Akamai (akamaitechnologies.com) and LimeLight Networks (llnw.net) are two Content Delivery Networks (CDN) that host a lot of the streaming media (both video and audio) going over the Internet. There are many other CDNs but I believe these may be the most popular, based on the information I'm seeing in NTA. These CDNs have agreements/deals/partnerships with different kinds of web sites ranging from YouTube and Facebook to CNN and MSNBC.

            The problem here is that not many of these sites disclose who their CDN is and it could change from year to year, it seems. Some might even use more than one CDN so you can see how it could be very difficult to map a CDN to a web site or vice versa.

            It sounds like jswan's suggestion would be the best option for you.

              Having some of the same issues.  I've been chasing today.  Owned by Global Crossing but no DNS assigned to any of the hosts in that range even though all of them show 80, 25, 443, 21 on them.  I'm guessing they are Gmail or Hotmail, but I'm seeing a lot of traffic moving between my LAN and that range which makes me nervous.  We use Fortinet firewalls for content filtering.  You might look into those.  Depending on the size of your company, you can purchase one for a modest price.  It includes AV filtering, IDS, IPS, Web Content Filtering, VPN, etc, etc, etc.  Makes it very easy to say no adult content, streaming media, etc can be viewed at work.  Ticks off the users but they get over it.  Can't recommend them highly enough.

                Take a look at running squid proxy in transparent mode. I am running pfSense firewall which actually includes a prebuild package.

                All http traffic gets transparently ran through a squid cache which tracks all the requests (URLs).

                Let me know if you need more details.