0 Replies Latest reply on Apr 26, 2013 11:58 AM by mcoupe

    Manager Service dying

    mcoupe

      I've got an issue with my manager service dying.  I've worked two different tickets with tech support and both times have basically been told I need to increase the reserved resources for my VM.  I've got 4 - 2.0Ghz procs and 16GB of memory reserved.  I am doing raw logging in addition to the normalized database and have 67 Windows servers and 95 Cisco devices logging to the LEM.  It seems to me that I'm not overworking the box.

       

      I did notice the messages below in the manager log while poking around and wonder if it could be related.  At first I thought it was a resource issue too because it seemed like the process was dying after a few day (then after about 10 days once I made the resource reservations) now I'm not so sure.  If anyone has any thoughts or ideas I'm more than willing to listen.  The product doesn't do me much good if the manager process won't stay running.

       

      Thanks,

      -Mark

       

       

       

      (Tue Sep 25 06:36:14 PDT 2012) II:NOTICE [com.trigeo.manager.database.vertica.Partitions v24468] {pool-3-thread-1:182} Partition maintenance cancelled. The data server may be unavailable. Message:

      Code 08006 : An I/O error occured while sending to the backend.;

      (Tue Sep 25 06:36:17 PDT 2012) II:INFO [FASTCenter-EPS] {CiscoFirewalls-Cisco Firewalls:127} (75/1909.28/12238) +/- 2784.78 eps;

      (Tue Sep 25 06:36:25 PDT 2012) EE:ERR [com.trigeo.core.database.repository.VerticaSQL v23476] {BBS:DequeueToDB-1:55} postBufferData failure on alert stream with executeCopyIn  [DEFAULT DB CODE=53200]

      com.vertica.util.PSQLException: ERROR: malloc of 1020799360 bytes for Sort Buffer failed EXCEPTION: com.vertica.util.PSQLException: ERROR: malloc of 1020799360 bytes for Sort Buffer failed

              at com.vertica.core.v3.QueryExecutorImpl.receiveErrorResponse(Unknown Source)

              at com.vertica.core.v3.QueryExecutorImpl.processResults(Unknown Source)

              at com.vertica.core.v3.QueryExecutorImpl.executeWithStream(Unknown Source)

              at com.vertica.jdbc2.AbstractJdbc2Statement.executeWithStream(Unknown Source)

              at com.vertica.jdbc3.AbstractJdbc3Statement.executeCopyIn(Unknown Source)

              at com.trigeo.core.database.repository.VerticaSQLBase.writeAlerts(VerticaSQLBase.java:1149)

              at com.trigeo.core.database.repository.VerticaSQLBase.postBufferArrayData(VerticaSQLBase.java:921)

              at com.trigeo.core.database.repository.VerticaSQLBase.postBufferArrayData(VerticaSQLBase.java:620)

              at com.trigeo.core.AlertToDB.postBufferArrayToDatabase(AlertToDB.java:970)

              at com.trigeo.core.AlertToDB.postBufferArrayAlerts(AlertToDB.java:811)

              at com.trigeo.core.AlertToDB.access$1800(AlertToDB.java:53)

              at com.trigeo.core.AlertToDB$AlertBufferReader.readAllAvailableBuffers(AlertToDB.java:309)

              at com.trigeo.core.AlertToDB$AlertBufferReader.call(AlertToDB.java:222)

              at com.trigeo.core.AlertToDB$AlertBufferReader.call(AlertToDB.java:203)

              at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)

              at java.util.concurrent.FutureTask.run(FutureTask.java:138)

              at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885)

              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)

              at java.lang.Thread.run(Thread.java:619)

      ;

      (Tue Sep 25 06:36:26 PDT 2012) EE:ERR [com.trigeo.util.io.EncoderInputStream v0] {DecoderEncoder internal processor:18915} EXCEPTION: java.io.InterruptedIOException