File Analysis on NAS (NetApp, Celerra, etc.) and Virtual Machines

A few posts ago in we reviewed how to turn on file analysis for drives on your servers in Profiler, but many of you have centralized file shares on NAS devices (especially NetApps and Celerras) and may want to do file analysis on Virtual Machines.  In this post, we will explore how to set up file analysis for these devices.

Before we start, you ask why would you want to do this?  Simply to know who has files, what kind of files and how old they are - on each share you select.  This allows you to get to the size and details on each share, which can be used for quota enforcement, compliance reporting (no MP3s!!), and storage recovery.  Also, you can group shares into logical collections for chargeback - one of our largest customer processes 50,000 shares and some 500+ million files a week, which are grouped into departments and used for chargeback reports.

Let's chat about the prerequisites before going thru the steps:

  • First, we need a list of shares from the device. For NetApp, this is done automatically through the API.  For Celerra, Virtual Machines, and other NAS devices, you assign this work to an agent. If you are not seeing a list of shares, then most likely you have a permissions issue. See the FAQ at the bottom for more details.  
  • An agent that is assigned the work must be able to access the share.  For Windows, this means changing the Service Account to a domain user that has access to those CIFS shares.  For Unix/Linux, it means the root user should have access to the NFS shares.
  • You must turn file analysis to "On" on the agent doing the work.  The previous post  shows you how to do that.
  • All CIFS scanning is done by agents on Windows servers, and all NFS scanning by agents on Unix/Linux servers.

Ok, so lets focus on assigning out shares for a NetApp.

  1. Go to Settings > Assign Remote Shares and choose the following (changes in selection will refresh the screen):

    nas-fa-1.jpg

    1. Assign By: NAS or VM - lets you choose if you want to see NAS devices or Virtual Machines
    2. [NAS/VM] Device: Select one device or all devices.
    3. Share Type: CIFS or NFS.  This will automatically change available agent to match the protocol.

  2. Once you have set the above criteria, you can then select the shares you want:

    nas-fa-4.jpg

    1. If you have lots of shares, you can narrow it down by using the regular expression filter.
    2. Select the shares you want.  You can select multiple shares by using the CTRL or SHIFT key. 
    3. Now you will select which agent is going to do the work
      1. Resource: select which agent you want to do the work
      2. Share Depth: Leave this at zero for now - more on depth later
      3. Move: Press the down arrow to assign the share, up arrow to unassign shares.

  3. Once you have assigned the shares to the agents you want, press Save at the bottom.  This will immediately assign out the shares to that agent, and file analysis will be performed at the next scheduled file analysis start time for that agent.  If you want to change the start time, the previous post shows you how to do that. 

So what happens from here?  If file analysis is scheduled to run at 1:00am, then at that time the agent will connect to the first share, perform file analysis, then connect to the next share, and so on, until there are no more shares to do. As it completes each share, the data will appear in Profiler.

FAQ:

  • Where will I see reports?
    • NetApp: From the NetApp console, under the NAS Shares tab, and the vFiler Files tab. 
    • Celerra: From a Datamover console, under the NAS Shares tab.
    • Windows VM: From the VM console, under the Local Shares tab.
    • Reports: Dozens of predefined reports for shares, users, file types, file age, etc. (Reports > Quick Reports)
  • What kind of reports will I see?
    • Shares - how much space is used by shares and how fast they are growing
    • User - how much space used by Users
    • File Type Groups - how much space is used by different file types
    • File Age Categories - how old are my files (ex: 30, 60, 90, 365 days)
    • File Rules - find specific files, like the 100 biggest files, or orphaned files, or files created in the last 24 hours
  • How fast will it do file analysis? 
    • This varies from environment to  environment, ranging from 10K-90K files per min, with most environments  falling in the 30-50K files per min.
  • What is Share Depth? 
    • Say you had a share called "Users" with 100 users directories underneath it (sound familiar?) and you wanted to assign out all 100 shares so you could see how much space each person is using.  Assigning out 100 things (not to mention having to assign out the 101st in the future) is a real pain. Depth allows you to pick a share and tell Profiler to subdivide it at the directory level of X (the depth).  So in our example, if I chose depth "1" and the "Users" share, the data generated would not be for "Users" but for "Users\Brian", "Users\Craig", "Users\Denny" and so on.  It does this dynamically each time file analysis executes, so it automatically picks up when a new user is added or removed. 
  • Where are my Linux VMs?
    • The file analysis is for Window VMs at this time.
  • What else can I do with File Analysis?
    • Rules (Settings > File Analysis Rules) - find specific files you are interested in (Ex: Find all MP3)
    • File Type Groups (Settings > File Type Groups) - Group file types for reporting (default grouping of 700+ file types)
    • Share Groups (Settings > Resource Groups) - Group shares into logical collections for chargeback.
    • Reporting (Reports > Quick Reports) - Dozens of predefined reports for shares, users, file types, file age, etc.
  • How do I get a list of shares?
    • NetApp - automatically gets shares from the API
    • Celerra and other NAS devices - when you are configuring Profiler to monitor the device, you can select a Windows and Unix/Linux agent to get the list of shares (CIFS/NFS respectively).
    • VMs - go to Settings > Discover VM Targets and select which Windows agents to get the list of shares. 
  • If it isn't working, what should I look for?
    • If you don't have a list of shares, make sure your account permission have access to the shares.
    • If you assigned out shares but see no data, you make need to turn on file analysis for that agent.
    • If you are missing some shares, you may have a permissions issue or the share may be empty.

Thanks for taking the time for reading the blog, and as always, please let us know what you think!

WARNING:  You will find files you did not know about, and you will find files you don't want found!

Thwack - Symbolize TM, R, and C