This discussion has been locked. The information referenced herein may be inaccurate due to age, software updates, or external references.

You can no longer post new replies to this discussion. If you have a similar question you can start a new discussion in this forum.

Console not responsive and keeps crashing

harrijs over 10 years ago

I am experiencing horrible performance in the console.

The console session will crash my browser session.

This is interesting because the appliance has been running for 8 weeks during a demo with no issues. The day we applied our license everything got borked.

I started to troubleshoot the problem and am coming up empty handed.

The interesting thing I am noticing is a negative number of alerts waiting in memory. The console is also almost a day behind on processing events.

cmc::acm# diskusage

Checking Disk Usage (this could take a moment)... ....oo.oo.oo.oo.oo.oo.oo.

Partition Disk Usage:

LEM: 34% (965M/3.0G)

OS: 40% (1.1G/3.0G)

Logs/Data: 43% (95G/234G)

Temp: 4% (179M/5.9G)

Database Queue(s): 4.0K (No alerts queued, -4241689380 alerts waiting in memory)

Rules Queue: 2.1M (0 alerts queued, 0 alerts waiting in memory)

Console Queue: 2.1M (0 alerts queued, 0 alerts waiting in memory)

DataCenter Queue: 2.1M (0 alerts queued, 0 alerts waiting in memory)

EPIC Rules Queue: 2.1M (0 alerts queued, 0 alerts waiting in memory)

Forensic Database Queue: 2.1M (0 data queued, 0 data items waiting in memory)

Logs: 64G

Tool Profiles Message Queue: 2.1M (0 alerts queued, 0 alerts waiting in memory)

cmc::cmm# viewsysinfo
Collecting general system information......... done.
                  CMC version: 4737
                  The time is: 2014/03/18 17:07:43
            Machine uptime is: 58 min
             Linux version is: 3.2.0-3-amd64
      Machine architecture is: 64-bit
         Physical Memory info:
                              MemTotal:       16474176 kB
                              MemFree:         6024856 kB
               Number of CPUs: 4
                    CPU model: Intel(R) Xeon(R) CPU E5-2420 0 @ 1.90GHz
       Memory Overcommit is default.
      ---------------------------------
    TriGeo manager version is: 5.7.0
      TriGeo manager build is: release
      TriGeo upgrade build is: 520398
          Product Support Key: XXXXXXXXXXXXXXXXXXXXX

License Type: Commercial
Total Number of Node Licenses: 50
Available Licenses: 40

Manager heap configuration: Initial heap size is 5300M and maximum heap size is 5300M
Max # of alerts in memory: 600000

Flow configured: false

    Virtualization Platform: VMware
    ------------------------------------------
    Clock
      Synchronization : Enabled
      Hypervisor Time : 18 Mar 2014 16:59:29
      Guest Time      : Tue Mar 18 17:07:44 2014

    CPU
      Speed           : 1900 MHz
      Reservation     : 2000 MHz
      Limit           : Unlimited

    Memory
      Reservation     : 16384 MB
      Limit           : Unlimited
      Swapped         : 0 MB
      Ballooned       : 0 MB

cmc::acm# top
Press <enter> to view manager CPU/memory statistics with "top" (use q to quit)
top - 17:09:49 up 1:00, 1 user, load average: 0.98, 0.85, 1.69
Tasks: 74 total,   1 running, 72 sleeping,   0 stopped,   1 zombie
Cpu(s): 58.5%us, 2.2%sy, 0.0%ni, 39.2%id, 0.0%wa, 0.0%hi, 0.1%si, 0.0%st
Mem: 16474176k total, 10224316k used, 6249860k free,    36092k buffers
Swap:   995996k total,        0k used,   995996k free, 4520524k cached

PID USER      PR NI VIRT RES SHR S %CPU %MEM    TIME+ COMMAND
1328 trigeo    20   0 13.5g 5.3g 43m S 241 33.5 116:47.35 java
915 root      20   0 49812 2584 1868 S    2 0.0   2:02.18 syslog-ng

Top Replies

0 curtisi over 10 years ago

   Clock
      Synchronization : Enabled
      Hypervisor Time : 18 Mar 2014 16:59:29
      Guest Time      : Tue Mar 18 17:07:44 2014

You have an almost 10 minute discrepancy in the time between host and guest, and this could cause problems. Can you go to APPLIANCE in the CMC shell and run DATECONFIG? Press enter 4 times to see the current time, and re-run to correct the time.
Cancel
Vote Up 0 Vote Down

Cancel
0 harrijs over 10 years ago

Thanks for pointing this out. I didn't notice it before.
Before I ran the dateconfig command as you suggested, I ran viewsysinfo again to see what the disparity was between the hypervisor and the guest. It was a lot closer than yesterday.
cmc::cmm# viewsysinfo
Collecting general system information......... done.
                  CMC version: 4737
                  The time is: 2014/03/19 16:26:48
            Machine uptime is: 1 day, 17 min
             Linux version is: 3.2.0-3-amd64
      Machine architecture is: 64-bit
         Physical Memory info:
                              MemTotal:       16474176 kB
                              MemFree:         2650488 kB
               Number of CPUs: 4
                    CPU model: Intel(R) Xeon(R) CPU E5-2420 0 @ 1.90GHz
       Memory Overcommit is default.
      ---------------------------------
    TriGeo manager version is: 5.7.0
      TriGeo manager build is: release
      TriGeo upgrade build is: 520398
          Product Support Key: R3WME-BSRB6-92AA-YLNG-DBPRN-X7A2R
                 License Type: Commercial
Total Number of Node Licenses: 50
           Available Licenses: 40
   Manager heap configuration: Initial heap size is 5300M and maximum heap size is 5300M
    Max # of alerts in memory: 600000
              Flow configured: false
    Virtualization Platform: VMware
    ------------------------------------------
    Clock
      Synchronization : Enabled
      Hypervisor Time : 19 Mar 2014 16:26:27
      Guest Time      : Wed Mar 19 16:26:49 2014
    CPU
      Speed           : 1900 MHz
      Reservation     : 2000 MHz
      Limit           : Unlimited
    Memory
      Reservation     : 16384 MB
      Limit           : Unlimited
      Swapped         : 0 MB
      Ballooned       : 0 MB
After running the command from the appliance prompt, I received the following output.
cmc::acm# dateconfig
Press <enter> to update your manager's current date and time
Enter the current date in month/day/year format. (MM/DD/YYYY)>
Enter the current time in hour:minute format. (hh:mm)>
setting date to ...
Wed Mar 19 16:28:51 CDT 2014
The problem is this is almost 10 minutes behind the time that my NTP server is showing. I am going to check my VMware configuration for ntp settings to make sure they are working correctly.
Cancel
Vote Up +1 Vote Down

Cancel
0 harrijs over 10 years ago

After a little more investigating we found that the VM setting for synchronizing guest time with host was checked. This is not a setting that we modify on VM guests so I have to assume that it was set in the original VM image we loaded from Solarwinds. I am attaching an image with the setting location. This needs to be unchecked, and then I had to run ntpconfig again from the CLI. Now my time is at least synchronized. I will continue to monitor the performance to see if this resolved the issue. Thanks again for pointing me in the correct direction.
Cancel
Vote Up +1 Vote Down

Cancel
0 FormerMember over 10 years ago in reply to harrijs

Just wanted to confirm - the default setting on the virtual appliance is indeed to sync guest time with host (this seems to work GENERALLY well without having to do any additional work on the customer's part, since the hypervisor tends to be NTP synced itself). When you have that setting configured, LEM will ignore the NTP configuration, but as soon as you turn it off you can configure an actual NTP host.
Hopefully this helps with your unresponsiveness issue, it sounds awfully coincidental so far but I'll take it
Cancel
Vote Up 0 Vote Down

Cancel