Managing files has always been a challenge, and even in today's world of virtualization, automatic tiering and migration, finding out what files are really out there is still difficult - and even more important. In Profiler, file analysis rules allow you to find specific files across all your storage, local, SAN and NAS.  If you need a refresher before we get started, please read Files, Files, Everywhere... and File Analysis on NAS (NetApp, Celerra, etc.) and Virtual Machines.

Once you have file analysis turned on, Profiler will start telling you interesting summary reports like "You have 24.5 GB of mp3" or "17% of your disk space hasn't been accessed in a year".  If you are curious like me, those files immediately become "files of interest" that I want to track down.  Alas, Profiler only has the summary data by default, it does not store information about every file it encounters in the database.

That's where file analysis rules come in, they let you get all the details of those "files of interest" into the database so you can easily view them from the comfort of your browser.  Find that stash of MP3 that you can quickly go delete and reclaim that valuable storage for your virtualization environment. Identify old files that can be deleted or moved to another tier of storage.

Lets get started finding some files.   First, go to Settings > File Analysis Rules to see the list of current rules.  Click Add New Rule and click File Analysis Rules.

The page for defining rules is very long, so we will take it in sections.

  • Rule Name:  Simply the name of the rule.  Rule names need to be unique.

This next set of parameters allow you to set the criteria of size, file age and number of files to return.

  • Find: This allows you to filter how many files each rule will return (ex: 500) and how to rank those results (by age or size).
  • Size: Define the minimum, maximum or range of the size of the files.
  • Accessed Age: Define the minimum, maximum or range of the last accessed time.
  • Modified Age: Define the minimum, maximum or range of the last modified time.
  • Created Age: Define the minimum, maximum or range of the creation time (Windows only).

Profiler allows you to define the file path using regular expression, in case you want to limit your results to just certain directories (ex .*[Uu]sers?.* which should find "User", "user", "Users", "users").

  • File Path Regular Expression: Enter a regular expression to filter the path of the file.

You can also select file types as criteria.  The list of file types is generated from files previously encountered during file analysis in the environment.  Highly recommend using the file type regex filter, there can be tens of thousands of types here.

  • File Type: This filter allows you to pick the file types (file extensions).  Select one or more file types on the left and click the right arrow.

You can also select file owners as criteria. This list is generated from the owners previously encountered during file analysis in your environment.

  • File Owners:  This filter allows you to pick the owners you are interested in.   Select one or more owners on the left and click the right arrow.

Ok, you created your rule, what's next?  First, you have to apply the rule to a policy, go to Settings > Policies.  You have to apply your rule to each policy for it to be used during file analysis.  If you want it applied to your servers and VMs, use the OS policy.  Also, the agent doing the work should have file analysis turned on and scheduled, see the links at the beginning of the post for more details.

So select the rules and press the down arrow, then press save.  When you get back to the Policy page, press the Push button - that will push the configuration to the agent doing the work.

So lets look at specific example of an file analysis rule.  Lets say I want to find the largest files each user owns in their home directory (ex C:\Users\Brian or sharename/users/Matt).  I can build the following rule and apply to the desired policies.

Once I apply this to the OS policy, I get the following report:

An viola - I found a bunch of files that I can now go delete and reclaim space.

The file analysis is a really powerful feature of Profiler.  Its is harder to use than we would like (and we will make that better), but the results are worth it.

Notes:

  • File Analysis works on local file systems on all agents, CIFS and NFS shares on NAS devices, and CIFS shares on Virtual Machines.
  • The number of files is limited in the rule to keep Profiler working well.  That being said, if you build a rule that is limited to 100 files, that is 100 files per target.  A target is a file system (C:\, D:\) or a share on a VM or NAS (C:\users$, /user/brian).  If you have 1000 targets, that means the rule would return information on up to 100,000 files. 
  • File analysis is driven by the schedule on the agent that the file analysis is assigned to.
  • If file analysis has previously been run, when you push out a new rule, it will be evaluated immediately on the historical data stored on the agent.
  • There are a few default file rules - Biggest files, oldest files, and new files.

As always, let us know your thoughts, suggestions and experiences with File Analysis.