5 Replies Latest reply on Feb 13, 2015 4:17 PM by rarunachalam

    Duplicate not connected agents eating up licences

    rarunachalam

      I have a strange issue for a while now and I have a support case with Solarwinds and they havent been able to help me or even point in the right direction. I have a few windows 2012 R2 servers that keep creating agents on the LEM over and over again until my licences run out. They never connect. I have tried everything, removed anti-virus, disabled firewalls, uninstalled, reinstalled, upgraded to 6.0.1 which supposedly fixes agent connectivity issues and I am still having this issue. Has anyone seen anything like this? The agents show up on the LEM and doesnt show any info about it other than the IP, name, install date and last connected. Last connected is always 1900 Dec 31st 1969. There are other 2003 and 2008 machines on the same network that do not have any issues. The log says....

       

      (Fri Dec 12 11:37:29 EST 2014) II:INFO [Communications] {ComModuleSpop:21} We are not installed yet, certificates missing;

      (Fri Dec 12 11:37:29 EST 2014) WW:WARNING [Communications] {ComModuleSpop:21} Max number of agent install attempt property is not a numerical value, default to 10;

      (Fri Dec 12 11:37:29 EST 2014) II:INFO [FastCenter v25767] {Initialize Connectors:13} Loaded connector pack : C:\WINDOWS\SysWOW64\ContegoSPOP\tools\vistasecurity.xml;

      (Fri Dec 12 11:37:29 EST 2014) II:INFO [FastCenter v25767] {Initialize Connectors:13} Online;

      (Fri Dec 12 11:37:29 EST 2014) DD:DEBUG [NetworkAddressUtilities v25767] {ComModuleSpop:21} Retrieved 24 interfaces for local host;

      (Fri Dec 12 11:37:29 EST 2014) II:NOTICE [NioComNetworkParent v24745] {ComModuleSpop:21} Making install request to: 10.240.1.51;

      (Fri Dec 12 11:37:29 EST 2014) DD:DEBUG TASK [NioComNetworkParent v24745] {ComModuleSpop:21} Sending data via command channel to manager at 10.240.1.51 remoteport 37890 for message 0;

      (Fri Dec 12 11:37:29 EST 2014) DD:DEBUG TASK [LocalPortRange v23873] {ComModuleSpop:21} ports remaining: [37893, 37894];

      (Fri Dec 12 11:37:29 EST 2014) DD:DEBUG TASK [LocalPortRange v23873] {ComModuleSpop:21} Next available port in list is: 37893;

      (Fri Dec 12 11:37:30 EST 2014) DD:DEBUG MODE [NioComNetworkParent v24745] {ComModuleSpop:21} bound to local port: 37893;

      (Fri Dec 12 11:37:30 EST 2014) DD:DEBUG TASK [NioComNetworkParent v24745] {ComModuleSpop:21} Command channel opened to manager at 10.240.1.51 remoteport 37890 localport 37893 for message 0;

      (Fri Dec 12 11:37:30 EST 2014) DD:DEBUG TASK [NioComNetworkParent v24745] {ComModuleSpop:21} new ObjectOutputStream;

      (Fri Dec 12 11:37:35 EST 2014) DD:DEBUG TASK [NioComNetworkParent v24745] {ComModuleSpop:21} ObjectOutputStream flush header;

      (Fri Dec 12 11:37:35 EST 2014) DD:DEBUG TASK [NioComNetworkParent v24745] {ComModuleSpop:21} new ObjectInputStream;

      (Fri Dec 12 11:37:35 EST 2014) DD:DEBUG TASK [NioComNetworkParent v24745] {ComModuleSpop:21} ObjectOutputStream write object;

      (Fri Dec 12 11:37:35 EST 2014) DD:DEBUG TASK [NioComNetworkParent v24745] {ComModuleSpop:21} ObjectOutputStream flush object;

      (Fri Dec 12 11:37:35 EST 2014) DD:DEBUG TASK [NioComNetworkParent v24745] {ComModuleSpop:21} ObjectInputStream: waiting for response;

      (Fri Dec 12 11:38:02 EST 2014) DD:DEBUG [Signals] {SPOP:8} java.lang.IllegalArgumentException: Signal already used by VM or OS: SIGINT;

      (Fri Dec 12 11:38:02 EST 2014) DD:DEBUG [Signals] {SPOP:8} java.lang.IllegalArgumentException: Unknown signal: HUP;

      (Fri Dec 12 11:38:02 EST 2014) II:NOTICE [Contego] {SPOP:8} Starting TriGeo Agent (Release 6.0.0) build [release];

      (Fri Dec 12 11:38:02 EST 2014) II:NOTICE [SpopModule v25768] {SPOP:8} build server version string: 6.0.391784.44;

      (Fri Dec 12 11:38:02 EST 2014) II:INFO [Contego] {SPOP:8} Creating config dir: spop;

      (Fri Dec 12 11:38:02 EST 2014) DD:DEBUG [UAIDGenerator v25756] {SPOP:8} Instantiated and attempting to check for manager: spop;

      (Fri Dec 12 11:38:02 EST 2014) II:NOTICE [InDepthConfigProps v25342] {SPOP:8} nDepth enabled via default because InDepthEnable not present;

      (Fri Dec 12 11:38:02 EST 2014) II:NOTICE [InDepthConfigProps v25342] {SPOP:8} indepth.conf not found at C:\WINDOWS\SysWOW64\ContegoSPOP\indepth.conf;

      (Fri Dec 12 11:38:02 EST 2014) II:NOTICE [RawDataClient v25751] {SPOP:8} Status Inactive;

      (Fri Dec 12 11:38:02 EST 2014) II:INFO [UpdateClient v22734] {SPOP:8} OS signature: Windows Server 2012;6.2;x86;

      (Fri Dec 12 11:38:02 EST 2014) II:INFO [UpdateClient v22734] {SPOP:8} Update Bootstrap initialized;

      (Fri Dec 12 11:38:02 EST 2014) II:NOTICE [Contego] {SPOP:8} Initializing database;

      (Fri Dec 12 11:38:02 EST 2014) II:NOTICE [Contego] {SPOP:8} Database Initialized;

      (Fri Dec 12 11:38:02 EST 2014) II:NOTICE [Contego] {Initialize Communications:10} Initializing Agent communications;

      (Fri Dec 12 11:38:02 EST 2014) II:NOTICE [Contego] {Initialize Connectors:13} Initializing ConnectorAPI;

      (Fri Dec 12 11:38:02 EST 2014) II:NOTICE [Contego] {Initialize Connectors:13} Initializing FAST;

      (Fri Dec 12 11:38:02 EST 2014) II:INFO [FastCenter] {Initialize Connectors:13} Initializing;

      (Fri Dec 12 11:38:02 EST 2014) II:INFO [BuffBytesOneReaderOneWriter v25275] {Initialize Communications:10} CommDataQueue BBS configured to queue directory: spop\q\CommDataQueue;

      (Fri Dec 12 11:38:02 EST 2014) II:NOTICE [Communications] {Initialize Communications:10} Operating System == Windows Server 2012;6.2;x86;

      (Fri Dec 12 11:38:02 EST 2014) DD:DEBUG MODE [DatabaseManager] {Initialize Communications:10} File used: spop\communications.xml;

      (Fri Dec 12 11:38:02 EST 2014) II:INFO [ComStoreInfo v24890] {Initialize Communications:10} store values: alias: null trustedStore: spop\hierarchy.trigeo privateStorespop\private.trigeo;

      (Fri Dec 12 11:38:02 EST 2014) DD:DEBUG [FastCenter v25767] {Initialize Connectors:13} Successfully configured reader Vista Security;

      (Fri Dec 12 11:38:02 EST 2014) DD:DEBUG [FastCenter v25767] {Initialize Connectors:13} Successfully configured reader Windows System;

      (Fri Dec 12 11:38:02 EST 2014) DD:DEBUG [FastCenter v25767] {Initialize Connectors:13} Successfully configured reader Windows Application;

      (Fri Dec 12 11:38:02 EST 2014) II:INFO [FastCenter v25767] {Initialize Connectors:13} Loaded connector pack : C:\WINDOWS\SysWOW64\ContegoSPOP\tools\ntapplication.xml;

      (Fri Dec 12 11:38:02 EST 2014) II:INFO [FastCenter v25767] {Initialize Connectors:13} Loaded connector pack : C:\WINDOWS\SysWOW64\ContegoSPOP\tools\ntsecurity.xml;

      (Fri Dec 12 11:38:02 EST 2014) II:INFO [ComStoreInfo v24890] {Initialize Communications:10} store values: alias: null trustedStore: spop\hierarchy.trigeo privateStore: spop\private.trigeo;

      (Fri Dec 12 11:38:02 EST 2014) II:INFO [FastCenter v25767] {Initialize Connectors:13} Loaded connector pack : C:\WINDOWS\SysWOW64\ContegoSPOP\tools\ntsystem.xml;

      (Fri Dec 12 11:38:02 EST 2014) II:INFO [FastCenter v25767] {Initialize Connectors:13} Loaded connector pack : C:\WINDOWS\SysWOW64\ContegoSPOP\tools\vistasecurity.xml;

      (Fri Dec 12 11:38:02 EST 2014) II:INFO [FastCenter v25767] {Initialize Connectors:13} Online;

      (Fri Dec 12 11:38:02 EST 2014) DD:DEBUG [USBUtilities] {Initialize Communications:10} USB Returning false;

      (Fri Dec 12 11:38:02 EST 2014) II:INFO [Communications] {ComModuleSpop:20} We are not installed yet, certificates missing;

      (Fri Dec 12 11:38:02 EST 2014) WW:WARNING [Communications] {ComModuleSpop:20} Max number of agent install attempt property is not a numerical value, default to 10;

      (Fri Dec 12 11:38:02 EST 2014) DD:DEBUG [NetworkAddressUtilities v25767] {ComModuleSpop:20} Retrieved 24 interfaces for local host;

      (Fri Dec 12 11:38:02 EST 2014) II:NOTICE [NioComNetworkParent v24745] {ComModuleSpop:20} Making install request to: 10.240.1.51;

      (Fri Dec 12 11:38:03 EST 2014) DD:DEBUG TASK [NioComNetworkParent v24745] {ComModuleSpop:20} Sending data via command channel to manager at 10.240.1.51 remoteport 37890 for message 0;

      (Fri Dec 12 11:38:03 EST 2014) DD:DEBUG TASK [LocalPortRange v23873] {ComModuleSpop:20} ports remaining: [37893, 37894];

      (Fri Dec 12 11:38:03 EST 2014) DD:DEBUG TASK [LocalPortRange v23873] {ComModuleSpop:20} Next available port in list is: 37893;

      (Fri Dec 12 11:38:03 EST 2014) DD:DEBUG MODE [NioComNetworkParent v24745] {ComModuleSpop:20} bound to local port: 37893;

      (Fri Dec 12 11:38:03 EST 2014) DD:DEBUG TASK [NioComNetworkParent v24745] {ComModuleSpop:20} Command channel opened to manager at 10.240.1.51 remoteport 37890 localport 37893 for message 0;

      (Fri Dec 12 11:38:03 EST 2014) DD:DEBUG TASK [NioComNetworkParent v24745] {ComModuleSpop:20} new ObjectOutputStream;

      (Fri Dec 12 11:38:08 EST 2014) DD:DEBUG TASK [NioComNetworkParent v24745] {ComModuleSpop:20} ObjectOutputStream flush header;

      (Fri Dec 12 11:38:08 EST 2014) DD:DEBUG TASK [NioComNetworkParent v24745] {ComModuleSpop:20} new ObjectInputStream;

      (Fri Dec 12 11:38:08 EST 2014) DD:DEBUG TASK [NioComNetworkParent v24745] {ComModuleSpop:20} ObjectOutputStream write object;

      (Fri Dec 12 11:38:08 EST 2014) DD:DEBUG TASK [NioComNetworkParent v24745] {ComModuleSpop:20} ObjectOutputStream flush object;

      (Fri Dec 12 11:38:08 EST 2014) DD:DEBUG TASK [NioComNetworkParent v24745] {ComModuleSpop:20} ObjectInputStream: waiting for response;

      (Fri Dec 12 11:54:05 EST 2014) EE:ERR [NioComNetworkParent v24745] {ComModuleSpop:20} EXCEPTION: java.net.SocketException: Connection reset

        at java.net.SocketInputStream.read(Unknown Source)

        at java.net.SocketInputStream.read(Unknown Source)

        at sun.security.ssl.InputRecord.readFully(Unknown Source)

        at sun.security.ssl.InputRecord.read(Unknown Source)

        at sun.security.ssl.SSLSocketImpl.readRecord(Unknown Source)

        at sun.security.ssl.SSLSocketImpl.readDataRecord(Unknown Source)

        at sun.security.ssl.AppInputStream.read(Unknown Source)

        at sun.security.ssl.AppInputStream.read(Unknown Source)

        at java.io.ObjectInputStream$PeekInputStream.peek(Unknown Source)

        at java.io.ObjectInputStream$BlockDataInputStream.readBlockHeader(Unknown Source)

        at java.io.ObjectInputStream$BlockDataInputStream.refill(Unknown Source)

        at java.io.ObjectInputStream$BlockDataInputStream.skipBlockData(Unknown Source)

        at java.io.ObjectInputStream.skipCustomData(Unknown Source)

        at java.io.ObjectInputStream.readNonProxyDesc(Unknown Source)

        at java.io.ObjectInputStream.readClassDesc(Unknown Source)

        at java.io.ObjectInputStream.readOrdinaryObject(Unknown Source)

        at java.io.ObjectInputStream.readObject0(Unknown Source)

        at java.io.ObjectInputStream.readObject(Unknown Source)

        at com.trigeo.core.communications.common.ComNetworkParent.writeMessageToCommandChannel(ComNetworkParent.java:977)

        at com.trigeo.core.communications.common.ComNetworkParent.sendParentViaCommandChannelForResponse(ComNetworkParent.java:295)

        at com.trigeo.core.communications.common.ComNetworkParent.installRequest(ComNetworkParent.java:226)

        at com.trigeo.core.communications.nio.client.NioComNetworkParent.installRequest(NioComNetworkParent.java:107)

        at com.trigeo.core.communications.common.ComModule.autoInstall(ComModule.java:550)

        at com.trigeo.core.communications.common.ComModule.setUp(ComModule.java:364)

        at com.trigeo.core.communications.spop.ComModuleSpop.run(ComModuleSpop.java:172)

        at java.lang.Thread.run(Unknown Source)

        at com.trigeo.util.TriGeoThread.run(TriGeoThread.java:57)

      ;

      (Fri Dec 12 11:54:05 EST 2014) DD:DEBUG TASK [ComNetworkParent v25767] {ComModuleSpop:20} closing ccd:0;

      (Fri Dec 12 11:54:05 EST 2014) II:NOTICE [NioComNetworkParent v24745] {ComModuleSpop:20} Install request completed (not installed);

      (Fri Dec 12 11:54:35 EST 2014) II:NOTICE [NioComNetworkParent v24745] {ComModuleSpop:20} Making install request to: 10.240.1.51;

      (Fri Dec 12 11:54:35 EST 2014) DD:DEBUG TASK [NioComNetworkParent v24745] {ComModuleSpop:20} Sending data via command channel to manager at 10.240.1.51 remoteport 37890 for message 0;

      (Fri Dec 12 11:54:35 EST 2014) DD:DEBUG TASK [LocalPortRange v23873] {ComModuleSpop:20} ports remaining: [37893, 37894];

      (Fri Dec 12 11:54:35 EST 2014) DD:DEBUG TASK [LocalPortRange v23873] {ComModuleSpop:20} Next available port in list is: 37893;

      (Fri Dec 12 11:54:35 EST 2014) DD:DEBUG MODE [NioComNetworkParent v24745] {ComModuleSpop:20} bound to local port: 37893;

      (Fri Dec 12 11:54:35 EST 2014) DD:DEBUG TASK [NioComNetworkParent v24745] {ComModuleSpop:20} Command channel opened to manager at 10.240.1.51 remoteport 37890 localport 37893 for message 0;

      (Fri Dec 12 11:54:35 EST 2014) DD:DEBUG TASK [NioComNetworkParent v24745] {ComModuleSpop:20} new ObjectOutputStream;

      (Fri Dec 12 11:54:40 EST 2014) DD:DEBUG TASK [NioComNetworkParent v24745] {ComModuleSpop:20} ObjectOutputStream flush header;

      (Fri Dec 12 11:54:40 EST 2014) DD:DEBUG TASK [NioComNetworkParent v24745] {ComModuleSpop:20} new ObjectInputStream;

      (Fri Dec 12 11:54:41 EST 2014) DD:DEBUG TASK [NioComNetworkParent v24745] {ComModuleSpop:20} ObjectOutputStream write object;

      (Fri Dec 12 11:54:41 EST 2014) DD:DEBUG TASK [NioComNetworkParent v24745] {ComModuleSpop:20} ObjectOutputStream flush object;

      (Fri Dec 12 11:54:41 EST 2014) DD:DEBUG TASK [NioComNetworkParent v24745] {ComModuleSpop:20} ObjectInputStream: waiting for response;

        • Re: Duplicate not connected agents eating up licences
          curtisi

          It almost looks like a Java problem.  What version of Java do you have on the working servers vs the not-working servers?

          Can you telnet from those servers to the LEM on 37890?

          You could also use license recycling to regularly have the LEM go through and clean up those dead duplicate nodes.  This option is under "Manage --> Appliances"

           

          2014-12-19 14_46_45-SolarWinds Log and Event Manager Console.png

          • Re: Duplicate not connected agents eating up licences
            nicole pauls

            The way the initial handshake works is:

            1. Agent detects it doesn't have any certificates (first line in the log) and needs to get itself signed by a manager appliance.
            2. Agent connects to the given manager appliance IP on port 37890 (outgoing port 37893, there's basically a pool and that's the first one it chooses)
            3. Manager signs agent's certificate and the agent creates the keystores to store that info
            4. Agent connects to actually transfer data on 37892 (outgoing port from the pool, 37893-6)

             

            It looks like the manager thinks it's signing the agent's cert, but the agent isn't storing that for some reason, so it keeps sending a new request to get its certificate signed, repeat, repeat repeat.

             

            Is it failing with the "Connection Reset" exception every time? Do they all fail the same way?

             

            Do you see any events related to those agents in the SolarWinds Events filter (under "Default Filters" in Monitor)? Looking for possible "InternalUnknownAgent" or "InternalDuplicateAgent" or error/warning alerts that might give us some insight to what's happening on the appliance side.

             

            Lastly, do you have a support case # on this one, maybe I can push from that end, too?

              • Re: Duplicate not connected agents eating up licences
                rarunachalam

                Yes, I have a support case (712756) and solarwinds has tried everything. The ticket is actually closed now and I am in the process of trying to recreate that environment within our network to eliminate any firewalls and security in between. I see duplicate agent events and warnings. See below

                 

                Event FieldInformation
                Event NameInternalWarning
                EventInfoAgent simon has a duplicate agent of the same name.

                 

                Yes, the connect reset is happening every time it duplicates itself. I have been trying to recreate the issue with other agents. In fact, by pure luck, agents that connected fine before had the same issue, but resetting the spop folder and restarting the service fixed the issue. This one set of agents, I just cannot find out what the issue is. The server is a VM, can you think of anything on an ESX host that can prevent the agent from coming online?