Cassandra

Version 1

    This template retrieves the status of a Cassandra server installed on a Linux or Unix computer. This template was tested on Cassandra 2.0.12.


    Prerequisites:
    SSH and Perl installed on the target server.

    SNMP installed on the target server and permission to monitor the java process.


    Credentials: Root credentials on the target server.

    Note: All script monitors except Specific Table Statistic require the following arguments:

    perl ${SCRIPT} nodetool_path cassandra_login cassandra_password
    where
    nodetool_path – Full path to nodetool command.
    cassandra_login – Login name to use nodetool command. If login and password is not required just leave argument without changes.
    cassandra _password – Password to use nodetool command. If login and password is not required just leave argument without changes.

    Below is an example using the Scripts Arguments field (based on example configuration above):
    perl ${SCRIPT} /usr/bin/nodetool casuser caspass


    Monitored Components

    Network

    This monitor returns a network statistic. The returned values are as follows:

         Commands Active – This component returns the number of active network commands.

         Commands Pending – This component returns the number of pending network commands.

         Commands Completed – This component returns the number of completed network commands.

         Responses Active – This component returns the number of active network responses.

         Responses Pending – This component returns the number of pending network responses.

         Responses Completed – This component returns the number of completed network responses.

         Attempted – This component returns the number of successfully completed read repair operations.

         Mismatch Blocking – This component returns the number of read repair operations since server restart that blocked a query.

         Mismatch Background – This component returns the number of read repair operations since server restart performed in the background.

     

    Node Status

    This monitor returns node statistic. The returned values are as follows:

         Status – This component indicates whether the node is functioning (1) or not (0).

         State – This component returns the state of the node in relation to the cluster.
          Possible values:
          0 – Leaving.
          1 – Moving.
          2 – Joining.
          3 – Normal.

         Load – This component returns the amount of file system data under the Cassandra data directory after excluding all content in the snapshots subdirectories. Because all SSTable data files are included, any data that is not cleaned up, such as TTL-expired cell or tombstoned data) is counted.

         Owns – This component returns the percentage of the data owned by the node per data center times the replication factor. For example, a node can own 33% of the ring, but show100% if the replication factor is 3.

     

    Info

    This monitor returns additional node statistic. The returned values are as follows:

         Gossip Active – This component indicates whether the gossip protocol is functioning (1) or not (0).

         Thrift Active – This component indicates whether the thrift framework is functioning (1) or not (0).

         Native Transport Active – This component indicates whether the native transport is functioning (1) or not (0).

         Heap Memory Usage – This component returns the percentage of heap memory usage.

         Status Active – This component indicates whether the binary protocol is functioning (1) or not (0).

     

    Key and Row Cache

    This monitor returns additional node statistic. The returned values are as follows:

         Cache Size – This component returns cache size.

         Cache Capacity – This component returns cache capacity.

         Cache Hits – This component returns the number of cache hits.

         Cache Requests – This component returns the number of cache requests.

         Cache Recent Hit Rate – This component returns the number of cache recent hit rate.

         Row Size – This component returns row size.

         Row Capacity – This component returns row capacity.

         Row Hits – This component returns the number of row hits.

         Row Requests – This component returns the number of row requests.

         Row Recent Hit Rate – This component returns the number of row recent hit rate.

     

    Thread Pool: Read Stage Tasks

    This component returns active, pending, completed, blocked and all time blocked tasks for ReadStage thread pool. This poll is responsible for local reads.

     

    Thread Pool: Request Response Stage Tasks

    This component returns active, pending, completed, blocked and all time blocked tasks for RequestResponseStage thread pool. This pool handle responses from other nodes.

     

    Thread Pool: Mutation Stage Tasks

    This component returns active, pending, completed, blocked and all time blocked tasks for MutationStage thread pool. This poll is responsible for local writes.

     

    Thread Pool: Read Repair Stage Tasks

    This component returns active, pending, completed, blocked and all time blocked tasks for ReadRepairStage thread pool. This poll is responsible for digest queries and updates of replicas of a key.

     

    Thread Pool: Replicate On Write Stage Tasks

    This component returns active, pending, completed, blocked and all time blocked tasks for ReplicateOnWriteStage thread pool. This poll is responsible for counter writes, replications after a local write.

     

    Thread Pool: Gossip Stage Tasks

    This component returns active, pending, completed, blocked and all time blocked tasks for GossipStage thread pool. This poll handles gossip rounds every second.

     

    Thread Pool: Cache Cleanup Executor Tasks

    This component returns active, pending, completed, blocked and all time blocked tasks for CacheCleanupExecutor thread pool.

     

    Thread Pool: Anti Entropy Stage Tasks

    This component returns active, pending, completed, blocked and all time blocked tasks for AntiEntropyStage thread pool. This poll is responsible for repairing consistency.

     

    Thread Pool: Migration Stage Tasks

    This component returns active, pending, completed, blocked and all time blocked tasks for MigrationStage thread pool. This poll is responsible for schema changes.

     

    Thread Pool: Memtable Post Flusher Tasks

    This component returns active, pending, completed, blocked and all time blocked tasks for MemtablePostFlusher thread pool. This poll is responsible for flushing the commit log and other operations after flushing the memtable.

     

    Thread Pool: Memory Meter Tasks

    This component returns active, pending, completed, blocked and all time blocked tasks for MemoryMeter thread pool. This poll is responsible for actual object memory including JVM overhead.

     

    Thread Pool: Flush Writer Tasks

    This component returns active, pending, completed, blocked and all time blocked tasks for FlushWriter thread pool. This poll is responsible for flushing the memtable to disk, the status of the sort and write-to-disk operations.

     

    Thread Pool: Validation Executor Tasks

    This component returns active, pending, completed, blocked and all time blocked tasks for ValidationExecutor thread pool.

     

    Thread Pool: Misc Stage Tasks

    This component returns active, pending, completed, blocked and all time blocked tasks for MiscStage thread pool. This poll is responsible for miscellaneous operations.

     

    Thread Pool: Pending Range Calculator Tasks

    This component returns active, pending, completed, blocked and all time blocked tasks for PendingRangeCalculator thread pool. This poll is responsible for calculating pending ranges per bootstraps and departed nodes.

     

    Thread Pool: Compaction Executor Tasks

    This component returns active, pending, completed, blocked and all time blocked tasks for CompactionExecutor thread pool.

     

    Thread Pool: Commit Log Archiver Tasks

    This component returns active, pending, completed, blocked and all time blocked tasks for commitlog_archiver thread pool. This poll is responsible for saving the commit logs.

     

    Thread Pool: Internal Response Stage Tasks

    This component returns active, pending, completed, blocked and all time blocked tasks for InternalResponseStage thread pool. This poll is responsible for responding to non-client initiated messages, including bootstrapping and schema checking.

     

    Thread Pool: Hinted Handoff Tasks

    This component returns active, pending, completed, blocked and all time blocked tasks for HintedHandoff thread pool. This poll is responsible for sending missed mutations to other nodes.

     

    Thread Pool: Dropped Messages

    This monitor returns dropped messages in thread pools. It returns statistic for these thread polls: Range Slice, Read Repair, Paged Range, Binary, Read, Mutation, Trace, Request Response and Counter Mutation.

     

    Specific Table Statistic

    This monitor returns specific table statistic.

         The returned values are as follows:

         SSTable Count – This component returns the number of SSTables containing data from the table.

         Space Used – This component returns space usage that is measured depends on operating system.

         SSTable Compression Ratio – This component returns the fraction of data-representation size resulting from compression.

         Memtable Data Size – This component returns the size of the memtable data.

         Local Read Latency – This component returns the round trip time in milliseconds to complete a request to read the libout table.

         Local Write Latency – This component returns the round trip time in milliseconds to complete an update to the libout table.

         Pending Tasks – This component returns the number of read, write, and cluster operations that are pending.

         Bloom Filter False Ratio – This component returns the fraction of all bloom filter checks resulting in a false positive.

         Compacted Partition Mean Bytes – This component returns the average size of compacted table rows.

         Note: By default, this monitor is disabled.

         Note: This monitor requires the following arguments:

         perl ${SCRIPT} nodetool_path cassandra_login cassandra_password keyspace.table
        
    where
            nodetool_path – Full path to nodetool command.
          cassandra_login – Login name to use nodetool command. If login and password is not required just leave argument without changes.
          cassandra _password – Password to use nodetool command. If login and password is not required just leave argument without changes.
          keyspace.table – Target keyspace and table with dot between them.

           Below is an example using the Scripts Arguments field (based on example configuration above):
           perl ${SCRIPT} /usr/bin/nodetool casuser caspass system_traces.events

     

    Process: Cassandra

    This monitor returns CPU and memory usage of Cassandra (java) process.

    Note: On Enterprise Linux systems, the Cassandra service runs as a java process. On Debian systems, the Cassandra service runs as a jsvc process.

     

    TCP Port: Cluster Communication

    This component monitor tests the ability of Cassandra to accept inter-node cluster requests. It monitors TCP port 7000.

    Note: By default, this monitor is disabled.

     

    TCP Port: Client Port

    This component monitor tests the ability of Cassandra to accept incoming requests. It monitors TCP port 9042

    Note: By default, this monitor is disabled.

     

    TCP Port: Client Port (Thrift)

    This component monitor tests the ability of Cassandra to accept incoming requests. It monitors TCP port 9160.

    Note: By default, this monitor is disabled.

     

    Portions of this template are based on the following. Copyright 2015:
    http://www.datastax.com/documentation/cassandra/2.0/cassandra/tools/toolsNodetool_r.html

    Last updated: 2/11/2015