SolarWinds NetFlow Traffic Analyzer 4.X (Flow Storage)

Version 3

    This template assesses the performance of the SolarWinds NetFlow Traffic Analyzer Flow Storage 4.X by retrieving performance data from NTA Flow Storage performance counters.

    Note: If NTA and the NTA Flow Storage Database are installed on different servers, this template should be assigned on the server with the NTA Database.


    Prerequisites: WMI access to the target server.

    Credentials: User with administrative privileges on the target server.


    Monitored Components

    Data appending: Active Appending Threads

    This monitor returns the number of threads executing a data append. At certain times the threads may block one another from continuing execution using locks. This counter should be used to determine, if at certain times, an investigating append was running (and how intensively) or not.


    Data appending: Records in High Priority Queue

    This monitor returns the number of high priority records currently queued for storing. This counter should be used to determine if Flow Storage is able to store data in time.

    The returned value should be 0 - 100k and never exceed 900k.


    Data appending: Records in Low Priority Queue

    This monitor returns the number of low priority records currently queued for storing. This counter should be used to determine if Flow Storage is able to migrate data in time.

    The returned value should be 0 - 100k and never exceed 900k.


    Data appending: Records in High Priority Queue Above Threshold

    This monitor returns the number of high priority records in queue, which are above limit causing not storing low priority records. This counter should be used to determine if Flow Storage migrates data.

    The returned value should be zero.


    Data appending: Records in Flush Avg

    This monitor returns the number of records per flush in average. The average is calculated as EMA with alpha = 0.2. This counter should be used to diagnose problems with slow appending with suspicion that the bottleneck is disk and amount of appended records bytes does not correspond with maximum disk speed.

    The returned value should be 15k-30k.


    Data appending: Flushes count/sec

    This monitor returns the number of explicit disk flushes per sampling interval that are performed to ensure durability during data appending. Does not include flushes from indexing. This counter should be used to diagnose problems with slow appending with suspicion that the bottleneck is disk and amount of appended records bytes does not correspond with maximum disk speed.

    The returned value should be 0-2.


    Data appending: Records appended/sec

    This monitor returns the number of records appended per time interval. Low number in comparison to disk throughput speed indicates some problem. Check if Flushes count is not too high indicating time is spent on record durability or Flushed records indicating records are appended in too small chunks.

    The returned value should be 0-100k.


    Data appending: Written Data Bytes/sec

    This monitor returns the number of data file bytes written per time interval. (The raw useful data written.) If only appending is active (indexing is on another thread and another disk) then this number should correspond with the theoretical maximum throughput of the disk minus time required for seeks in flushes.


    Data appending: Rows received/sec

    This monitor returns the number of rows received per time interval. This counter should be used to diagnose a problem in communication between storage and receiver.

    The returned value should be the same as Records appended.

    Note: This monitor is available beginning in NTA 4.0.2 and is disabled by default.


    Data appending: Time to append 100k flows in high priority queue (ms)

    This monitor returns the time in milliseconds required for storing 100 thousand high priority flows. This counter should be used to determine how many high priority flows Flow Storage is able to store in one second.

    Returned values should be as low as possible.


    Data appending: Time to append 100k flows in low priority queue (ms)

    This monitor returns the time in milliseconds required for storing 100 thousand low priority flows. This counter should be used to determine how many high priority flows Flow Storage is able to store in one second.

    Returned values should be as low as possible.


    Updating: Processor Time spent by Update (%)

    This monitor returns the percentage of single core processor time spent updating data and index files. Percentage of load that a single processor could execute within time interval (i.e. 400% = 4CPU x 100%). This counter should be used to determine what percentage of consumed CPU belongs to data updating and whether updating may be causing system overload.

    Returned values should be 0 - 100 x CPUs.


    Updating: Updated Index Bytes/sec

    This monitor returns the number of index file bytes written per time interval during update. This counter should be used to determine what percentage of disk I/O load is caused by indexing and whether indexing may be causing system overload.


    Updating: Updates Running

    This monitor returns the number of elementary updates being executed. Currently, only a single bulk updates; however, many elementary updates can run at once. This counter should be used to determine, if at certain times, one is investigating update was running (and how intensively) or not.

    Returned values should be 0-1.


    Updating: Records pending to be updated

    This monitor returns the number of records known to need updating. The algorithm goes through each partition and first finds what needs to be done. Thus, during updating, this counter looks like a saw with each peak meaning a new partition processing started. This counter should be used to isolate performance problems in updating and detect how long it took to process some partitions.


    Updating: Updated Values/sec

    This monitor returns the number of values that were updated per time interval. This counter should be used to diagnose that number of values updated on overloaded system corresponds with disk I/O throughput.


    Updating: Updated Bitvectors/sec

    This monitor returns the number of index bit vectors modified per second as a result of data update. Bitvectors are elementary building blocks of index files and update needs to update them. A returned low value in a system otherwise overloaded with update means the index updating and causing a bottleneck.


    Indexing: Processor Time spent by Indexing (%)

    This monitor returns the percentage of single core processor time spent indexing. This is the percentage of load that a single processor could execute within the time interval. This counter should be used to determine what percentage of consumed CPU usage belongs to indexing and whether indexing may be causing system overload.

    Returned values should be 0 - 100 x CPUs.


    Indexing: Written Index Bytes/sec

    This monitor returns the number of index file bytes written per time interval during an append. This counter should be used to determine what percentage of disk I/O load is caused by indexing and whether indexing may be causing system overload.


    Indexing: Index values merged in append Avg

    This monitor returns the number of index values merged as a result of new data appended. Indicies with large amounts of records need to merge more values from time to time. The average is calculated as EMA with alpha = 0.2. This counter should be used to determine whether CPU consumption may be caused by processing indices with many records (ie. IP addresses) when indexing is caused by new data.


    Indexing: Index values merged in finish Avg

    This monitor returns the number of index values merged as a result of optimizing indexing structures (a.k.a. partition finishing). Indicies with large amounts of records need to merge more values from time to time. This counter should be used to determine whether CPU consumption may be caused by processing indices with many records when indexing is caused by optimizing indexing structures.


    Indexing: Index values merged in slot joining Avg

    This monitor returns an average number of values merged per slot merge. Indicies with large spread cardinality need to merge more values. This counter should be used to determine whether CPU consumption may be caused by processing indices with large cardinalities (ie. IP addresses).


    Querying: Processor Time spent by Querying (%)

    This monitor returns the percentage of single core processor time spent querying. This is the percentage of load that a single processor could execute within time interval. This counter should be used to determine what percentage of consumed CPU belongs to querying and whether querying may be causing system overload.

    Returned values should be 0 - 100 x CPUs.


    Querying: Active Query Threads

    This monitor returns the number of threads executing a query. At certain times the threads may block one another from continuing execution using locks. This counter should be used to determine if at certain times, one is investigating query was running (and how intensively) or not.

    Returned values should be thread per query.


    Querying: Records Scanned by Query (%)

    This monitor returns the percentage of records accessed using scanning from raw data. The total is scanned plus indexed record accesses. Note that the majority of records of the entire database should be eliminated by the TimeStamp index and therefore even a small value may be indicate a problem. The value is scaled by 1,000,000 so the range is [0; 1,000,000]. This counter should be used to diagnose slow queries that should be fast because of indices.


    Querying: Intermediate Records Count in Query

    This monitor returns the number of intermediate records loaded into memory during querying. The records are reported as processing is almost finished with a single partition but before GROUP BY. This counter should be used to determine why private memory gets allocated during processing.


    Memory management: Private memory

    This monitor returns the number of bytes of private memory consumed and managed by the Flow Storage Memory Manager. This counter should be used to detect memory leaks and detect excessive memory consumption problems.

    Returned values should be 0 - system memory.


    Memory management: Private mapped memory

    This monitor returns the number of private memory bytes currently mapped into the address space and managed by the Flow Storage Memory Manager. This counter should be used to detect memory usage peaks and possible RAM requirements

    Returned values can be greater than private memory.


    Memory management: Mapped memory

    This monitor returns the number of bytes of data currently mapped into the address space and managed by the Flow Storage Memory Manager (both private and shared). This is the consumption of the address space. This counter should be used to detect memory usage peaks and possible RAM requirements

    Note: This monitor is available starting beginning with NTA 4.0.1 and is disabled by default.


    Memory management: Mapped memory blocks

    This monitor returns the number of allocations in Mapped memory.

    Note: This monitor is available beginning with NTA 4.0.2 and is disabled by default.


    Service: SolarWinds NetFlow Storage

    This monitor returns CPU and memory usage of SolarWinds NetFlow Storage Service. This service stores and manages the NetFlow database.

    Note: By default, this monitor is disabled.

     

    Last updated 6/27/2014