We recently did an upgrade of Orion NPM direct from 8.5 to 9.1 SP5. We have the following servers:
DB server (W2k3 server, SQL 2000 Std)
SLX Polling engine/webserver (W2k3 server)
Additional web server (W2k3 server)
What we found was that after the upgrade, we were unable to view any node details on the additional web server. The summary page (Top XX views) worked fine. The web site on the main polling engine also worked fine. Interestingly, when looking at the 'Admin' page & 'Polling Engine' option, it was showing as Down on the additional web server, but Up on the main polling engine server.
The error returned when looking at node details was as follows:
Orion Website Error
An error has occurred with the Orion website.
Additional Information
System.ServiceModel.Security.MessageSecurityException: An unsecured or incorrectly secured fault was received from the other party. See the inner FaultException for the fault code and detail. ---> System.ServiceModel.FaultException: An error occurred when verifying security for the message.
--- End of inner exception stack trace ---
Server stack trace:
at System.ServiceModel.Channels.SecurityChannelFactory`1.SecurityRequestChannel.ProcessReply(Message reply, SecurityProtocolCorrelationState correlationState, TimeSpan timeout)
at System.ServiceModel.Channels.SecurityChannelFactory`1.SecurityRequestChannel.Request(Message message, TimeSpan timeout)
at System.ServiceModel.Security.SecuritySessionSecurityTokenProvider.DoOperation(SecuritySessionOperation operation, EndpointAddress target, Uri via, SecurityToken currentToken, TimeSpan timeout)
at System.ServiceModel.Security.SecuritySessionSecurityTokenProvider.GetTokenCore(TimeSpan timeout)
at System.IdentityModel.Selectors.SecurityTokenProvider.GetToken(TimeSpan timeout)
at System.ServiceModel.Security.SecuritySessionClientSettings`1.ClientSecuritySessionChannel.OnOpen(TimeSpan timeout)
at System.ServiceModel.Channels.CommunicationObject.Open(TimeSpan timeout)
at System.ServiceModel.Channels.ServiceChannel.OnOpen(TimeSpan timeout)
at System.ServiceModel.Channels.CommunicationObject.Open(TimeSpan timeout)
at System.ServiceModel.Channels.CommunicationObject.Open()
Exception rethrown at [0]:
at System.Runtime.Remoting.Proxies.RealProxy.HandleReturnMessage(IMessage reqMsg, IMessage retMsg)
at System.Runtime.Remoting.Proxies.RealProxy.PrivateInvoke(MessageData& msgData, Int32 type)
at System.ServiceModel.ICommunicationObject.Open()
at SolarWinds.NPM.Common.NPMBusinessLayerProxy.Connect(String host, String port) in C:\Source\OrionNPM\DEV\Release\Orion\Core\9.1\Src\Lib\SolarWinds.NPM.Common\NPMBusinessLayerProxy.cs:line 263
at SolarWinds.NPM.Common.NPMBusinessLayerProxy..ctor(String host, String port, HandleBusinessLayerException exceptionDelegate) in C:\Source\OrionNPM\DEV\Release\Orion\Core\9.1\Src\Lib\SolarWinds.NPM.Common\NPMBusinessLayerProxy.cs:line 214
at SolarWinds.NPM.Common.NPMBusinessLayerProxy.TryToConnect(String host, String port, HandleBusinessLayerException exceptionHandler) in C:\Source\OrionNPM\DEV\Release\Orion\Core\9.1\Src\Lib\SolarWinds.NPM.Common\NPMBusinessLayerProxy.cs:line 192
at SolarWinds.NPM.Common.NPMBusinessLayerProxy.CreateNPMBusinessLayerProxy(HandleBusinessLayerException exceptionHandler) in C:\Source\OrionNPM\DEV\Release\Orion\Core\9.1\Src\Lib\SolarWinds.NPM.Common\NPMBusinessLayerProxy.cs:line 161
at Orion_NetPerfMon_Controls_VMInfo.Page_Load(Object sender, EventArgs e)
at System.Web.Util.CalliHelper.EventArgFunctionCaller(IntPtr fp, Object o, Object t, EventArgs e)
at System.Web.Util.CalliEventHandlerDelegateProxy.Callback(Object sender, EventArgs e)
at System.Web.UI.Control.OnLoad(EventArgs e)
at System.Web.UI.Control.LoadRecursive()
at System.Web.UI.Control.LoadRecursive()
at System.Web.UI.Control.LoadRecursive()
at System.Web.UI.Control.LoadRecursive()
at System.Web.UI.Control.LoadRecursive()
at System.Web.UI.Control.LoadRecursive()
at System.Web.UI.Control.LoadRecursive()
at System.Web.UI.Control.LoadRecursive()
at System.Web.UI.Page.ProcessRequestMain(Boolean includeStagesBeforeAsyncPoint, Boolean includeStagesAfterAsyncPoint)
I logged a ticket with SW support & tried many things, mainly uninstalling & re-installing the website & services (on both servers). We also checked permissions repeatedly. NB - I was told a few times to make sure that the Internet Guest Account & Solarwinds_Website users were local admins, but this does not seem to be necessary. In the course of this, I found that the servers were unable to contact CAs for CRL lists. Specifically, they were attempting to connect to pages at:
crl.microsoft.com
crl.thawte.com
crl.verisign.com
Once I had set the WinHTTP proxy with the proxycfg utility, and added these sites to the authentication bypass list on our proxy server, we were then able to connect to these sites, but it did not fix the problem.
The root of the problem was the time on the servers - although we had set NTP servers on all 3, the additional web site server had drifted as it was not able to contact the NTP server. It was only a few minutes out, but as soon as I had corrected the NTP issue, and the time was in sync, the error disappeared. Of course - our servers are not in a domain, only a workgroup, which didn't help.
I'd only spent a week looking at this & thought that I'd post here to help others who may experience the same issue - hope it helps someone!