This discussion has been locked. The information referenced herein may be inaccurate due to age, software updates, or external references.

You can no longer post new replies to this discussion. If you have a similar question you can start a new discussion in this forum.

Kiwi Syslog Server High CPU Utilization - Messages Seem to be behind

mikeyjay over 13 years ago

The CPU on my Kiwi Syslog Server is Pegged. Here is the Diagnostic info file from the server.

Kiwi Syslog Server [Registered] Version 9.0.3

/// Kiwi Syslog Server Statistics ///
---------------------------------------------------
24 hour period ending on: Wed, 08 Sep 2010 14:44:34
Syslog Server started on: Wed, 08 Sep 2010 13:37:39
Syslog Server uptime: 1 hour, 7 minutes
---------------------------------------------------

+ Messages received - Total:          1098753
+ Messages received - Last 24 hours: 1098753
+ Messages received - Since Midnight: 1098753
+ Messages received - Last hour:      996804
+ Message queue overflow - Last hour: 416654
+ Messages received - This hour:      101949
+ Message queue overflow - This hour: 12336
+ Messages per hour - Average:        996804

+ Messages forwarded: 769810
+ Messages logged to disk: 1194581

+ Errors - Logging to disk:           0
+ Errors - Invalid priority tag:      0
+ Errors - No priority tag:           2
+ Errors - Oversize message:          309

+ Disk space remaining on drive E:    41554 MB

    Breakdown of Syslog messages by severity
+--------------------+------------+------------+
| Message Level      | Messages | Percentage |
+--------------------+------------+------------+
| 0 - Emerg          |         0 |      0.00% |
| 1 - Alert          |      2753 |      0.25% |
| 2 - Critical       |       496 |      0.05% |
| 3 - Error          |      5745 |      0.52% |
| 4 - Warning        |    103603 |      9.43% |
| 5 - Notice         |     42938 |      3.91% |
| 6 - Info           |    775902 |     70.62% |
| 7 - Debug          |    167316 |     15.23% |
+--------------------+------------+------------+

Custom statistics
-----------------
CustomStats01: 0
CustomStats02: 0
CustomStats03: 0
CustomStats04: 0
CustomStats05: 0
CustomStats06: 0
CustomStats07: 0
CustomStats08: 0
CustomStats09: 0
CustomStats10: 0
CustomStats11: 0
CustomStats12: 0
CustomStats13: 0
CustomStats14: 0
CustomStats15: 0
CustomStats16: 0

End of Report.

DNS Cache size  20000
DNS Cache entries 2
Entries in queue 0
DNS Cache hits  0
DNS Cache misses 0
DNS Cache TTL  1440 minutes
Total DNS Lookups 0
Successful cache hits 0%

IP Address Hostname TTL (minutes)
127.0.0.1 localhost Static
::1 localhost Static

Message Buffer Information
==========================
Message Queue Max Size: 20000
Message Queue overflow: 428990
Message Count:          19932
Message Count Max:      20000
Percentage free:        1

E-mail Buffer Information
==========================
Message Queue Max Size: 1000
Message Queue overflow: 0
Message Count:          0
Message Count Max:      13
Percentage free:        100

0 MarieB over 13 years ago

HI Mike--
Marked for the PM.
M
Cancel
Vote Up 0 Vote Down

Cancel
0 Kuz over 13 years ago

Hi Mike,
Here's your problem: Message Queue overflow: 428990
Which basically means - messages are being lost due to overloading.
Check out this from out online help: "How to increase the Message Buffer Size"
http://www.kiwisyslog.com/help/syslog/adv_reg_msg_buffer_size.htm

Kind Regards,
Cancel
Vote Up 0 Vote Down

Cancel
0 RoyalEF over 13 years ago

Mikeyjay, we have a lot of experience here with pushing Kiwi to the max

+ Messages received - Last 24 hours: 134,712,626(commas added for readability)

From my experience, two things primarily control Kiwi's throughput:
    DISK THROUGHPUT
    - Larger volumes require RAID array to increase write throughput
    - NT Compression works wonders for increasing DISK throughput, but hits CPU a little more
    - Antivirus scanning on your log directory will send things towards the toilet fast. Exclude your log directories from on-demand scans

    CPU (processes the logic of your Rules)
    - If your CPU is maxxed because it is old, or burdened with other necessary functions... well not much to do there.
    - Kiwi is a single threaded process. Multiple CPUs only help so far as to spread other processes onto other CPUs. Windows provides ZERO CPU managemnt, unlike OSes where you could dedicate a processor to a single program/process. CPU features like hyperthreading HURTS Kiwi. Turning hyperthreading off buys you a little headroom, but not much.
    - The biggest savings is in the construction of your rules and filters.
Try reordering your rules so that the earliest rules handle the greatest volume of messages.
After you write the msg to disk include a STOP PROCESSING action. Otherwise every message will still be matched against every rule that follows. It takes some careful logic--depending upon what you are doing. But you want to STOP processing a message as soon in the rules as possible.
Use Simple Filter rather than Complex or RegExGrep filters. Each complexity adds CPU hit.

Hope those touch on something that will help.
Cancel
Vote Up 0 Vote Down

Cancel
0 michael248363 over 13 years ago in reply to RoyalEF

RoyalIEF,
Can you provide some details on the server hardware that you are using?
Is it physical or virtual?
Is this a single server?
If more than one, are you load balancing between servers?
What type of rules are you doing on the messages coming in?
We are looking at logging to disk (no other rules or actions) ~350,000,000 msg/day and I'm looking for guidance on what hardware it would take.
Thanks,
Michael
Cancel
Vote Up 0 Vote Down

Cancel
0 RoyalEF over 13 years ago in reply to michael248363

We currently have 3 syslog servers--all physical. This IT shop has poor skills when it comes to disk & SAN performance. Maintenance and capacity planning are non existent. I would never trust a virtual server here for a high-performance system. If you have a tight handle on determining how single threaded CPU processing and disk performance work within the Vserver--you could make it work for these peak demands. Another issue is when other virtservs are created, they can silently reduce capacity available to a hungry app like this.
One server is dedicate to a FW that handles all internet browsing traffic(10 million msg per hour peak).   Second handles all other firewalls (DMZs, server farms). Third handles all other devices (over 1,000).
The pain is the CPU. Kiwi is a single thread app. Most benchmarks now emphasis multi-core/CPU performance, not single thread. It can be tough to find benchmarks on CPUs that focus on single thread performance. I can't find the link (one of the only found) I had with single thread benchmarks. As always with hardware, it is about evaluating that day's offerings within the context of your budget.
More CPU/cores does not increase Kiwi performance. The E8600 @ 3.3 Ghz Core2 Duo outperformed all Quads for a good year or two. Hyper-threading should be turned off.
Here's the hardware we are using today:
Intel Xeon X5677 @ 3.47GHZ, hyperthreading disabled
12Gb memory (overkill, standard in this shop)
Data drive of Raid 5 array of four 600GB 15K SAS 3.5" 600GB 15KRPM SAS 3.5" (Seagate ST3600057SS)
The data drive is a compromise for budget. Raid 1+0/0+1 is always better than 5, but we were looking to significantly up the capacity. Disk throughput is not an issue for us. Even when we are grepping through 100s of megs of firewall logs.
If you don't have the disk throughput you need, you can virtually improve disk throughput by using NTFS compressed folders. For a small CPU hit you get much greater disk throughput, and syslogs compress 4 to 1. Read 1 Mb of data off disk controller, becomes 4x greater in memory.   It paid off greatly on our older hardware, which didn't have great disk throughput.
We use Windows Server 2008 Standard SP2 (32-bit). 64-bit would run 32-bit with a 2% overhead according to docs @ Microsoft. We opted to stay at 32-bit.
We do no load balancing, other than isolating the biggest FW (actually a ASA pair) to one dedicate server.
Ruleset has been optimized. I use DNS cached names in the directory name. I've tested it with/without and there was no performance hit that I could detect. I provide Kiwi with a file of cached DNS names.
I design my rules around a "log and STOP" methodology, which is a process of elimination.
I have eleven rules. Everything gets logged to the main logs, which are one file per minute. A series of Exception logs are created for messages that should be reviewed or are know to appear in some quantity. To have such exception logs, you must first eliminate some noisy messages that always exist but don't require attention. Each line STOPs processing for any match, steadily eliminating messages. Alerts are sent for top three (0-2) priority messages. Our FW sends massages with Local3.

99.98% of ASA syslogs are processed and STOP @ rule 1

01) Local 3,Priority 4-7; Log to MAIN LOG [D:\syslog\%HostName\%DateISO\%DateISO_%TimeHH-%TimeMM %HostName (%IPAdd4).txt]; STOP processing
02) Local 3,"-1-10500"   %ASA-1-10500-FailOver testing (1-ALERT);   Log to MAIN LOG; STOP processing
03) Local 3,"-1-106021:" %ASA-1-106021-RevPath (1-ALERT); Log to MAIN LOG; Log to Exception Logs [d:\syslog\P1\!Critical-ReversePath-%DateISO.txt]; STOP processing
04) Local 3,PRIORITY 1-ALERT; Log to MAIN LOG; Log to Exception Logs [D:\syslog\P1\! ALERT %DateISO %HostName (%IPAdd4).txt]; STOP processing
05) Local 3,"-2-10600"   %ASA-2-10600 - basic DENY; (2-CRITICAL);   Log to MAIN LOG; STOP processing
06) Local 3,"-2-106020"|"-2-106016" %ASA-2-106016,20-IDS (2-CRITICAL); Log to MAIN LOG;   Log to Exception Logs [d:\Syslog\P1\!Critical-IDS-%DateISO.txt]; STOP processing
07) PRIORITY 2-CRITICAL; EMAIL Network Staff; Log to MAIN LOG;   Log to Exception Logs [d:\Syslog\P1\!Critical-%DateISO.txt]; STOP processing
08) Local 3,"-3-30500" NO TRANSLATIONS (3-ERROR); Log to MAIN LOG;   Log to Exception Logs [D:\syslog\noTranslations\%DateISO_noTranslations.txt]; STOP processing
09) Local 3,PRIORITY 3-ERROR; Log to MAIN LOG;   Log to Exception Logs [D:\syslog\P1\!Errors %DateISO %HostName (%IPAdd4).txt]; STOP processing
10) PRIORITY 0-EMERGENCY; EMAIL Network Staff; Log to MAIN LOG;   Log to Exception Logs [d:\Syslog\P1\! EMERGENCY-%DateISO.txt]; STOP processing
11) (no filter, default rule); Log to root of Syslog folders [D:\syslog\%DateISO Kiwi-Needs-Rules-for %IPAdd4.txt]

Rules 2 - 4 Process ALERT level, which is the next largest volume. We eliminate Failure and Reverse Path messages, then log to a generic Alert log for review.
Rules 5 - 7 process CRITICAL level. Basic deny msgs & noisy IDS messages are eliminated and anything else is emailed to the team.
Rules 8 & 9 process ERROR level. I track no translations because that warns you on an ASA that you have no established NATs for private IPs to access the Internet.
Rule 10 emails all emergency messages to the team.
Rule 11 is a catch all that will document any devices, by IP, that are sending syslogs that we didn't anticpiate
Cancel
Vote Up 0 Vote Down

Cancel
0 RoyalEF over 13 years ago in reply to michael248363

I think 350 million a day is going to be tough to do, without overcoming the CPU limitation. I don't know what you peak is. 350/24 hours = 15 million per hour average. You probably don't have a flat msg rate around the clock, so your peak might be 20mill/hour.

I know I've seen 225million here. We handle 9million/hour everyday (9x24 hours = 216) We've peaked above the 10 million/hour mark (10x24 = 240 mill). After 10 million we've seen buffering. (BTW we set Message Queue Max Size = 500000). We had a RAID issue recently that might have lowered our throughput, so maybe we can handle more on this new hardware. A single CPU hits 100% when we max out. The others are idle.

I'm certain we can't handle 20mill/hour with this setup. I'd be looking at some other setup (likely a multi-threaded product).

I'm not a VMware/ESX guy, so I don't know how much you can virtualize to overcome such a limit. Could you virtualize multiple CPU cores to be a single mega-core where the single syslogd process would use essentially spread across multiple cores? I don't know if virtualization has that capability yet.
Cancel
Vote Up 0 Vote Down

Cancel
0 michael248363 over 13 years ago in reply to RoyalEF

On each of your servers, how much of the 12GB of RAM is being used?
How big are your log files? On your server that logs the internet traffic, what is the size for each of the 1 minute log files? I've been doing some calculations based on what we are logging now, which is nowhere near what we will be doing, and I'm sure it is off quite a bit. So I just wanted to get an idea of what a real world number would be.
Thanks
Michael
Cancel
Vote Up 0 Vote Down

Cancel
0 RoyalEF over 13 years ago in reply to michael248363

On our most worked box, which is running Kiwi 9.2 as of two days ago:
    Windows 2008 32-bit (so only 4gb available)
    Physical Memory usage: 1.57 Gb
We increased the Overflow Queue many years ago. When kiwi maxxes the CPU that buffer fills and the syslogd process shows an additional 200 mb of memory usage. Normally is consumes only 36 Mb.

At the end of this message is a day's distribution of minute-by-minute log file sizes (in megs). They range from 4mb to 39 Mb. This is for a Tuesday, when we would have classes (a University) as well as residences creating demand.   For this day there was 47.5 GBs for the day.
Our slowest period is typically 3 AM to 6 AM. Our busiest periods are 9 AM-12PM, 1PM-5PM.
One caveat... which I've forgotten in these discussions. We elected NOT to logs URLs five and a half years ago. We do not perform any WebSense-style content filtering. If you syslog the URL on your FW you data files will balloon far beyond ours. A single URL can be 500-1000 characters, so it can get intensive quickly. We disabled this message on our ASAs because it did not add value to our troubleshooting, and because it is one of the most privacy-invasive things you can log. We have residences here and what people do in the privacy of their rooms is an area that has to be tread carefully.
4000000 Count    3
5000000 Count    94
6000000 Count    47
7000000 Count    57
8000000 Count    34
9000000 Count    23
10000000 Count    37
11000000 Count    37
12000000 Count    32
13000000 Count    27
14000000 Count    38
15000000 Count    62
16000000 Count    71
17000000 Count    99
18000000 Count    117
19000000 Count    68
20000000 Count    34
21000000 Count    28
22000000 Count    18
23000000 Count    23
24000000 Count    48
25000000 Count    44
26000000 Count    30
27000000 Count    26
28000000 Count    29
29000000 Count    54
30000000 Count    52
31000000 Count    64
32000000 Count    60
33000000 Count    35
34000000 Count    21
35000000 Count    7
36000000 Count    6
37000000 Count    5
38000000 Count    5
39000000 Count    4
Cancel
Vote Up 0 Vote Down

Cancel
0 mikeyjay over 13 years ago in reply to Kuz

Do these Stat's look bad? Seems like the message Queue overflow is high?
Kiwi Syslog Server [Registered] Version 9.0.3

///       Kiwi Syslog Server Statistics         ///
---------------------------------------------------
24 hour period ending on: Fri, 25 Mar 2011 13:00:02
Syslog Server started on: Sat, 23 Oct 2010 15:38:25
Syslog Server uptime:     152 days, 19 hours, 4 minutes
---------------------------------------------------
+ Messages received - Total:          2593467624
+ Messages received - Last 24 hours: 21222978
+ Messages received - Since Midnight: 12207601
+ Messages received - Last hour:      1213039
+ Message queue overflow - Last hour: 10011
+ Messages received - This hour:      104810
+ Message queue overflow - This hour: 0
+ Messages per hour - Average:        879924
+ Messages forwarded:                 8471406
+ Messages logged to disk:            12207733
+ Errors - Logging to disk:           0
+ Errors - Invalid priority tag:      0
+ Errors - No priority tag:           92
+ Errors - Oversize message:          4874
+ Disk space remaining on drive E:    38277 MB
---------------------------------------------------

     Breakdown of Syslog messages by sending host
+--------------------------+------------+------------+
| Top 20 Hosts             | Messages | Percentage |
+--------------------------+------------+------------+
| 10.131.0.3               |   4626290 |     37.90% |
| 10.5.101.5               |   2272092 |     18.61% |
| 10.5.33.10               |   1728762 |     14.16% |
| 10.5.33.12               |    725170 |      5.94% |
| 10.5.33.11               |    614549 |      5.03% |
| 10.33.1.8                |    290712 |      2.38% |
| 10.33.1.5                |    259071 |      2.12% |
| 10.33.1.7                |    242798 |      1.99% |
| 10.33.1.11               |    188978 |      1.55% |
| 10.33.1.9                |    147855 |      1.21% |
| 10.33.1.6                |    125941 |      1.03% |
| 10.55.5.5                |    106657 |      0.87% |
| 10.6.41.10               |    101783 |      0.83% |
| 10.6.41.11               |     97989 |      0.80% |
| 10.131.1.25              |     77319 |      0.63% |
| 10.45.1.5                |     75323 |      0.62% |
| 10.33.1.12               |     72554 |      0.59% |
| 10.33.1.10               |     65966 |      0.54% |
| 10.33.52.5               |     57346 |      0.47% |
| 10.33.1.13               |     45710 |      0.37% |
| All others (185)         |    284736 |      2.33% |
+--------------------------+------------+------------+

    Breakdown of Syslog messages by severity
+--------------------+------------+------------+
| Message Level      | Messages | Percentage |
+--------------------+------------+------------+
| 0 - Emerg          |         0 |      0.00% |
| 1 - Alert          |      6419 |      0.05% |
| 2 - Critical       |     51755 |      0.42% |
| 3 - Error          |    121884 |      1.00% |
| 4 - Warning        |   1672726 |     13.70% |
| 5 - Notice         |    872894 |      7.15% |
| 6 - Info           |   9153617 |     74.98% |
| 7 - Debug          |    328306 |      2.69% |
+--------------------+------------+------------+
Custom statistics
-----------------
CustomStats01: 0
CustomStats02: 0
CustomStats03: 0
CustomStats04: 0
CustomStats05: 0
CustomStats06: 0
CustomStats07: 0
CustomStats08: 0
CustomStats09: 0
CustomStats10: 0
CustomStats11: 0
CustomStats12: 0
CustomStats13: 0
CustomStats14: 0
CustomStats15: 0
CustomStats16: 0
End of Report.

DNS Cache size  20000
DNS Cache entries 2
Entries in queue 0
DNS Cache hits  0
DNS Cache misses 0
DNS Cache TTL  1440 minutes
Total DNS Lookups 0
Successful cache hits 0%

IP Address Hostname TTL (minutes)
127.0.0.1      localhost Static
::1            localhost Static

Message Buffer Information
==========================
Message Queue Max Size: 20000
Message Queue overflow: 142905674
Message Count:          2808
Message Count Max:      20000
Percentage free:        86

E-mail Buffer Information
==========================
Message Queue Max Size: 1000
Message Queue overflow: 0
Message Count:          143
Message Count Max:      241
Percentage free:        86

End of Diagnostics report
Cancel
Vote Up 0 Vote Down

Cancel
0 Fodome over 13 years ago

Mikeyjay,
An overflow value of anything other than 0 is bad as it means that syslog messages are being dropped. The overflow is likely because your Queue Max Size is 20 thousand. If you upgrade to Kiwi Syslog Server 9.2.1, this will automatically be increased to 500 thousand, which should avoid any overflows in the future.
Sincerely,
Chris Foley | Support Specialist
SolarWinds | IT Management, Inpired By You
Support:866.530.8040 || Fax:512.857.0125
---------------------------------------
explore our IT management solutions for:
networks | applications | storage | virtualization
Cancel
Vote Up 0 Vote Down

Cancel