This discussion has been locked. The information referenced herein may be inaccurate due to age, software updates, or external references.
You can no longer post new replies to this discussion. If you have a similar question you can start a new discussion in this forum.

Kiwi Syslog Server High CPU Utilization - Messages Seem to be behind

The CPU on my Kiwi Syslog Server is Pegged.  Here is the Diagnostic info file from the server.

 

Kiwi Syslog Server [Registered] Version 9.0.3


///       Kiwi Syslog Server Statistics         ///
---------------------------------------------------
24 hour period ending on: Wed, 08 Sep 2010 14:44:34
Syslog Server started on: Wed, 08 Sep 2010 13:37:39
Syslog Server uptime:     1 hour, 7 minutes
---------------------------------------------------

+ Messages received - Total:          1098753
+ Messages received - Last 24 hours:  1098753
+ Messages received - Since Midnight: 1098753
+ Messages received - Last hour:      996804
+ Message queue overflow - Last hour: 416654
+ Messages received - This hour:      101949
+ Message queue overflow - This hour: 12336
+ Messages per hour - Average:        996804

+ Messages forwarded:                 769810
+ Messages logged to disk:            1194581

+ Errors - Logging to disk:           0
+ Errors - Invalid priority tag:      0
+ Errors - No priority tag:           2
+ Errors - Oversize message:          309

+ Disk space remaining on drive E:    41554 MB

    Breakdown of Syslog messages by severity  
+--------------------+------------+------------+
| Message Level      |  Messages  | Percentage |
+--------------------+------------+------------+
| 0 - Emerg          |         0  |      0.00% |
| 1 - Alert          |      2753  |      0.25% |
| 2 - Critical       |       496  |      0.05% |
| 3 - Error          |      5745  |      0.52% |
| 4 - Warning        |    103603  |      9.43% |
| 5 - Notice         |     42938  |      3.91% |
| 6 - Info           |    775902  |     70.62% |
| 7 - Debug          |    167316  |     15.23% |
+--------------------+------------+------------+

Custom statistics
-----------------
CustomStats01: 0
CustomStats02: 0
CustomStats03: 0
CustomStats04: 0
CustomStats05: 0
CustomStats06: 0
CustomStats07: 0
CustomStats08: 0
CustomStats09: 0
CustomStats10: 0
CustomStats11: 0
CustomStats12: 0
CustomStats13: 0
CustomStats14: 0
CustomStats15: 0
CustomStats16: 0

End of Report.


DNS Cache size  20000
DNS Cache entries 2
Entries in queue 0
DNS Cache hits  0
DNS Cache misses 0
DNS Cache TTL  1440 minutes
Total DNS Lookups 0
Successful cache hits 0%


IP Address Hostname TTL (minutes)
127.0.0.1       localhost Static
::1             localhost Static


Message Buffer Information
==========================
Message Queue Max Size: 20000
Message Queue overflow: 428990
Message Count:          19932
Message Count Max:      20000
Percentage free:        1

 

E-mail Buffer Information
==========================
Message Queue Max Size: 1000
Message Queue overflow: 0
Message Count:          0
Message Count Max:      13
Percentage free:        100

  • Hi Mike,

    Here's your problem: Message Queue overflow: 428990
    Which basically means - messages are being lost due to overloading.

    Check out this from out online help:  "How to increase the Message Buffer Size"
    http://www.kiwisyslog.com/help/syslog/adv_reg_msg_buffer_size.htm


    Kind Regards,

  • Mikeyjay, we have a lot of experience here with pushing Kiwi to the max

     

    + Messages received - Last 24 hours:  134,712,626(commas added for readability)

     

    From my experience, two things primarily control Kiwi's throughput:

        DISK THROUGHPUT

        - Larger volumes require RAID array to increase write throughput

        - NT Compression works wonders for increasing DISK throughput, but hits CPU a little more

        - Antivirus scanning on your log directory will send things towards the toilet fast. Exclude your log directories from on-demand scans

     

        CPU (processes the logic of your Rules)

        - If your CPU is maxxed because it is old, or burdened with other necessary functions... well not much to do there.

        - Kiwi is a single threaded process.  Multiple CPUs only help so far as to spread other processes onto other CPUs.  Windows provides ZERO CPU managemnt, unlike OSes where you could dedicate a processor to a single program/process. CPU features like hyperthreading HURTS Kiwi.  Turning hyperthreading off buys you a little headroom, but not much.

        - The biggest savings is in the construction of your rules and filters. 

    Try reordering your rules so that the earliest rules handle the greatest volume of messages. 

    After you write the msg to disk include a STOP PROCESSING action.  Otherwise every message will still be matched against every rule that follows.  It takes some careful logic--depending upon what you are doing.  But you want to STOP processing a message as soon in the rules as possible. 

    Use Simple Filter rather than Complex or RegExGrep filters.  Each complexity adds CPU hit.

     

     

    Hope those touch on something that will help.

  • RoyalIEF,

    Can you provide some details on the server hardware that you are using?

    Is it physical or virtual?

    Is this a single server?

    If more than one, are you load balancing between servers?

    What type of rules are you doing on the messages coming in?

    We are looking at logging to disk (no other rules or actions) ~350,000,000 msg/day and I'm looking for guidance on what hardware it would take.

    Thanks,

    Michael

  • We currently have 3 syslog servers--all physical.  This IT shop has poor skills when it comes to disk & SAN performance. Maintenance and capacity planning are non existent.  I would never trust a virtual server here for a high-performance system.  If you have a tight handle on determining how single threaded CPU processing and disk performance work within the Vserver--you could make it work for these peak demands.  Another issue is when other virtservs are created, they can silently reduce capacity available to a hungry app like this.

    One server is dedicate to a FW that handles all internet browsing traffic(10 million msg per hour peak).   Second handles all other firewalls (DMZs, server farms).  Third handles all other devices (over 1,000).

    The pain is the CPU.  Kiwi is a single thread app.  Most benchmarks now emphasis multi-core/CPU performance, not single thread.  It can be tough to find benchmarks on CPUs that focus on single thread performance.  I can't find the link (one of the only found) I had with single thread benchmarks.  As always with hardware, it is about evaluating that day's offerings within the context of your budget. 

    More CPU/cores does not increase Kiwi performance.  The E8600 @ 3.3 Ghz Core2 Duo outperformed all Quads for a good year or two.  Hyper-threading should be turned off.

    Here's the hardware we are using today:

    Intel Xeon X5677 @ 3.47GHZ, hyperthreading disabled
    12Gb memory  (overkill, standard in this shop)
    Data drive of Raid 5 array of four 600GB 15K SAS 3.5" 600GB 15KRPM SAS 3.5" (Seagate ST3600057SS)

    The data drive is a compromise for budget.  Raid 1+0/0+1 is always better than 5, but we were looking to significantly up the capacity.  Disk throughput is not an issue for us.  Even when we are grepping through 100s of megs of firewall logs.

    If you don't have the disk throughput you need, you can virtually improve disk throughput by using NTFS compressed folders.  For a small CPU hit you get much greater disk throughput, and syslogs compress 4 to 1.  Read 1 Mb of data off disk controller, becomes 4x greater in memory.   It paid off greatly on our older hardware, which didn't have great disk throughput.

    We use Windows Server 2008 Standard SP2 (32-bit).  64-bit would run 32-bit with a 2% overhead according to docs @ Microsoft.  We opted to stay at 32-bit.

    We do no load balancing, other than isolating the biggest FW (actually a ASA pair) to one dedicate server.

    Ruleset has been optimized.  I use DNS cached names in the directory name.  I've tested it with/without and there was no performance hit that I could detect.  I provide Kiwi with a file of cached DNS names.

    I design my rules around a "log and STOP" methodology, which is a process of elimination.

    I have eleven rules.  Everything gets logged to the main logs, which are one file per minute.  A series of Exception logs are created for messages that should be reviewed or are know to appear in some quantity.  To have such exception logs, you must first eliminate some noisy messages that always exist but don't require attention. Each line STOPs processing for any match, steadily eliminating messages.  Alerts are sent for top three (0-2) priority messages.  Our FW sends massages with Local3.

    99.98% of ASA syslogs are processed and STOP @ rule 1

    01) Local 3,Priority 4-7;  Log to MAIN LOG [D:\syslog\%HostName\%DateISO\%DateISO_%TimeHH-%TimeMM %HostName (%IPAdd4).txt]; STOP processing
    02) Local 3,"-1-10500"   %ASA-1-10500-FailOver testing (1-ALERT);   Log to MAIN LOG;  STOP processing
    03) Local 3,"-1-106021:" %ASA-1-106021-RevPath (1-ALERT);  Log to MAIN LOG;  Log to Exception Logs [d:\syslog\P1\!Critical-ReversePath-%DateISO.txt];  STOP processing
    04) Local 3,PRIORITY 1-ALERT;  Log to MAIN LOG; Log to Exception Logs [D:\syslog\P1\! ALERT %DateISO %HostName (%IPAdd4).txt];  STOP processing
    05) Local 3,"-2-10600"   %ASA-2-10600 - basic DENY; (2-CRITICAL);   Log to MAIN LOG;  STOP processing
    06) Local 3,"-2-106020"|"-2-106016"  %ASA-2-106016,20-IDS  (2-CRITICAL);  Log to MAIN LOG;   Log to Exception Logs [d:\Syslog\P1\!Critical-IDS-%DateISO.txt]; STOP processing
    07) PRIORITY 2-CRITICAL;  EMAIL Network Staff;  Log to MAIN LOG;   Log to Exception Logs [d:\Syslog\P1\!Critical-%DateISO.txt]; STOP processing
    08) Local 3,"-3-30500" NO TRANSLATIONS (3-ERROR);  Log to MAIN LOG;   Log to Exception Logs [D:\syslog\noTranslations\%DateISO_noTranslations.txt]; STOP processing
    09) Local 3,PRIORITY 3-ERROR;  Log to MAIN LOG;   Log to Exception Logs [D:\syslog\P1\!Errors %DateISO %HostName (%IPAdd4).txt]; STOP processing
    10) PRIORITY 0-EMERGENCY;  EMAIL Network Staff;  Log to MAIN LOG;   Log to Exception Logs [d:\Syslog\P1\! EMERGENCY-%DateISO.txt]; STOP processing
    11) (no filter, default rule);  Log to root of Syslog folders [D:\syslog\%DateISO Kiwi-Needs-Rules-for %IPAdd4.txt]

    Rules 2 - 4 Process ALERT level, which is the next largest volume. We eliminate Failure and Reverse Path messages, then log to a generic Alert log for review.
    Rules 5 - 7 process CRITICAL level.  Basic deny msgs & noisy IDS messages are eliminated and anything else is emailed to the team.
    Rules 8 & 9 process ERROR level.  I track no translations because that warns you on an ASA that you have no established NATs for private IPs to access the Internet.
    Rule 10 emails all emergency messages to the team.
    Rule 11 is a catch all that will document any devices, by IP, that are sending syslogs that we didn't anticpiate

  • I think 350 million a day is going to be tough to do, without overcoming the CPU limitation.  I don't know what you peak is.  350/24 hours = 15 million per hour average.   You probably don't have a flat msg rate around the clock, so your peak might be 20mill/hour.

     

    I know I've seen 225million here.  We handle 9million/hour everyday (9x24 hours = 216)  We've peaked above the 10 million/hour mark (10x24 = 240 mill).  After 10 million we've seen buffering.  (BTW we set Message Queue Max Size = 500000).  We had a RAID issue recently that might have lowered our throughput, so maybe we can handle more on this new hardware.  A single CPU hits 100% when we max out.  The others are idle.

     

    I'm certain we can't handle 20mill/hour with this setup.  I'd be looking at some other setup (likely a multi-threaded product).

     

    I'm not a VMware/ESX guy, so I don't know how much you can virtualize to overcome such a limit.  Could you virtualize multiple CPU cores to be a single mega-core where the single syslogd process would use essentially spread across multiple cores?  I don't know if virtualization has that capability yet.

  • On each of your servers, how much of the 12GB of RAM is being used?

    How big are your log files?  On your server that logs the internet traffic, what is the size for each of the 1 minute log files?  I've been doing some calculations based on what we are logging now, which is nowhere near what we will be doing, and I'm sure it is off quite a bit.  So I just wanted to get an idea of what a real world number would be.

    Thanks

    Michael

  • On our most worked box, which is running Kiwi 9.2 as of two days ago:

        Windows 2008 32-bit (so only 4gb available)
        Physical Memory usage: 1.57 Gb

    We increased the Overflow Queue many years ago. When kiwi maxxes the CPU that buffer fills and the syslogd process shows an additional 200 mb of memory usage.  Normally is consumes only 36 Mb.

     

    At the end of this message is a day's distribution of minute-by-minute log file sizes (in megs).  They range from 4mb to 39 Mb. This is for a Tuesday, when we would have classes (a University) as well as residences creating demand.   For this day there was 47.5 GBs for the day.

    Our slowest period is typically 3 AM to 6 AM.  Our busiest periods are 9 AM-12PM, 1PM-5PM.

    One caveat... which I've forgotten in these discussions.  We elected NOT to logs URLs five and a half years ago.  We do not perform any WebSense-style content filtering.  If you syslog the URL on your FW you data files will balloon far beyond ours.  A single URL can be 500-1000 characters, so it can get intensive quickly.  We disabled this message on our ASAs because it did not add value to our troubleshooting, and because it is one of the most privacy-invasive things you can log.  We have residences here and what people do in the privacy of their rooms is an area that has to be tread carefully.

    4000000 Count    3
    5000000 Count    94
    6000000 Count    47
    7000000 Count    57
    8000000 Count    34
    9000000 Count    23
    10000000 Count    37
    11000000 Count    37
    12000000 Count    32
    13000000 Count    27
    14000000 Count    38
    15000000 Count    62
    16000000 Count    71
    17000000 Count    99
    18000000 Count    117
    19000000 Count    68
    20000000 Count    34
    21000000 Count    28
    22000000 Count    18
    23000000 Count    23
    24000000 Count    48
    25000000 Count    44
    26000000 Count    30
    27000000 Count    26
    28000000 Count    29
    29000000 Count    54
    30000000 Count    52
    31000000 Count    64
    32000000 Count    60
    33000000 Count    35
    34000000 Count    21
    35000000 Count    7
    36000000 Count    6
    37000000 Count    5
    38000000 Count    5
    39000000 Count    4

  • Do these Stat's look bad?  Seems like the message Queue overflow is high?

    Kiwi Syslog Server [Registered] Version 9.0.3


    ///       Kiwi Syslog Server Statistics         ///
    ---------------------------------------------------
    24 hour period ending on: Fri, 25 Mar 2011 13:00:02
    Syslog Server started on: Sat, 23 Oct 2010 15:38:25
    Syslog Server uptime:     152 days, 19 hours, 4 minutes
    ---------------------------------------------------

    + Messages received - Total:          2593467624
    + Messages received - Last 24 hours:  21222978
    + Messages received - Since Midnight: 12207601
    + Messages received - Last hour:      1213039
    + Message queue overflow - Last hour: 10011
    + Messages received - This hour:      104810
    + Message queue overflow - This hour: 0
    + Messages per hour - Average:        879924

    + Messages forwarded:                 8471406
    + Messages logged to disk:            12207733

    + Errors - Logging to disk:           0
    + Errors - Invalid priority tag:      0
    + Errors - No priority tag:           92
    + Errors - Oversize message:          4874

    + Disk space remaining on drive E:    38277 MB

    ---------------------------------------------------


         Breakdown of Syslog messages by sending host 
    +--------------------------+------------+------------+
    | Top 20 Hosts             |  Messages  | Percentage |
    +--------------------------+------------+------------+
    | 10.131.0.3               |   4626290  |     37.90% |
    | 10.5.101.5               |   2272092  |     18.61% |
    | 10.5.33.10               |   1728762  |     14.16% |
    | 10.5.33.12               |    725170  |      5.94% |
    | 10.5.33.11               |    614549  |      5.03% |
    | 10.33.1.8                |    290712  |      2.38% |
    | 10.33.1.5                |    259071  |      2.12% |
    | 10.33.1.7                |    242798  |      1.99% |
    | 10.33.1.11               |    188978  |      1.55% |
    | 10.33.1.9                |    147855  |      1.21% |
    | 10.33.1.6                |    125941  |      1.03% |
    | 10.55.5.5                |    106657  |      0.87% |
    | 10.6.41.10               |    101783  |      0.83% |
    | 10.6.41.11               |     97989  |      0.80% |
    | 10.131.1.25              |     77319  |      0.63% |
    | 10.45.1.5                |     75323  |      0.62% |
    | 10.33.1.12               |     72554  |      0.59% |
    | 10.33.1.10               |     65966  |      0.54% |
    | 10.33.52.5               |     57346  |      0.47% |
    | 10.33.1.13               |     45710  |      0.37% |
    | All others (185)         |    284736  |      2.33% |
    +--------------------------+------------+------------+


        Breakdown of Syslog messages by severity  
    +--------------------+------------+------------+
    | Message Level      |  Messages  | Percentage |
    +--------------------+------------+------------+
    | 0 - Emerg          |         0  |      0.00% |
    | 1 - Alert          |      6419  |      0.05% |
    | 2 - Critical       |     51755  |      0.42% |
    | 3 - Error          |    121884  |      1.00% |
    | 4 - Warning        |   1672726  |     13.70% |
    | 5 - Notice         |    872894  |      7.15% |
    | 6 - Info           |   9153617  |     74.98% |
    | 7 - Debug          |    328306  |      2.69% |
    +--------------------+------------+------------+

    Custom statistics
    -----------------
    CustomStats01: 0
    CustomStats02: 0
    CustomStats03: 0
    CustomStats04: 0
    CustomStats05: 0
    CustomStats06: 0
    CustomStats07: 0
    CustomStats08: 0
    CustomStats09: 0
    CustomStats10: 0
    CustomStats11: 0
    CustomStats12: 0
    CustomStats13: 0
    CustomStats14: 0
    CustomStats15: 0
    CustomStats16: 0

    End of Report.


    DNS Cache size  20000
    DNS Cache entries 2
    Entries in queue 0
    DNS Cache hits  0
    DNS Cache misses 0
    DNS Cache TTL  1440 minutes
    Total DNS Lookups 0
    Successful cache hits 0%


    IP Address Hostname TTL (minutes)
    127.0.0.1       localhost Static
    ::1             localhost Static


    Message Buffer Information
    ==========================
    Message Queue Max Size: 20000
    Message Queue overflow: 142905674
    Message Count:          2808
    Message Count Max:      20000
    Percentage free:        86

     

    E-mail Buffer Information
    ==========================
    Message Queue Max Size: 1000
    Message Queue overflow: 0
    Message Count:          143
    Message Count Max:      241
    Percentage free:        86

     

     


    End of Diagnostics report

  • Mikeyjay,

    An overflow value of anything other than 0 is bad as it means that syslog messages are being dropped.  The overflow is likely because your Queue Max Size is 20 thousand.  If you upgrade to Kiwi Syslog Server 9.2.1, this will automatically be increased to 500 thousand, which should avoid any overflows in the future.

    Sincerely,

    Chris Foley | Support Specialist
    SolarWinds | IT Management, Inpired By You
    Support:866.530.8040 || Fax:512.857.0125

    ---------------------------------------
    explore our IT management solutions for:
    networks | applications | storage | virtualization