cancel
Showing results for 
Search instead for 
Did you mean: 
Create Post
Level 13

IPAM 4.6 is broken

TL;DR I've had big issues with my IPAM install since installing IPAM 4.6 and it has caused the following issues that have been unresolved since November 17th:

-Case #1361094 - "IPAM Polling getting stuck"

-Case # - 00019818 DHCP server addition

-Case # - 00027114 - Manage credentials on IPAM settings is blank.

Let me preface this rant by saying that this isn't typical for SolarWinds Support or with my past experiences with other product's development teams when it comes to using release candidate products.

My company acquired a large project that due to technical requirements is not able to be routed to the subnet our primary polling engine is on therefore necessitating the installation of a additional polling engine (APE) on a subnet at the location with access to the location's subnets. This APE accesses devices at the site with NPM, SAM, NCM, NTA, and IPAM.

With the introduction of 4.6RC1, support for APEs was introduced:

Additional Polling Engine (APE) support

You can now install Additional Polling Engines to IPAM. If you have a large number of subnets, adding polling engines can reduce the time it takes to scan your network. Use the Orion Web Console to assign a polling engine to scan individual subnets. The DHCP server properties dialog now shows the name of the polling engine assigned to that server, and a status indicator shows if the polling engine is online and available. 

I updated my instance of IPAM to 4.6 RC1 on Sunday, November 12th. I upgraded to 4.6 RC2 on Thursday, November 16th. Over the next couple days, I noticed a number of issues in 4.6 and contacted support on Friday, November 17th to report these issues.

The first issue I encountered was that my IPAM polling jobs are queuing up but, not finishing regardless whether they're running from the primary or additional polling engine. These jobs don't seem to be completing regardless if they were automated or manually started.  The "Last Discovery" field isn't incrementing and the "Status" field for most subnets has a started time of 5000+ minutes ago.

pastedImage_1.png

pastedImage_2.png

I contacted support on Friday, November 17th and spoke to Alejandro Jay Realo.  In this call he was not familiar with the additional polling engine feature introduced with IPAM 4.6.  He warned me against installing release candidate versions in production environments as in his words "they're basically beta software". I responded that RC versions are fully supported by SolarWinds support and that I required the APE feature introduced in this RC for the new location my employer acquired.  The tech connected to my primary polling engine via Webex and ran the configuration wizard, changed the logging level to debug on most services, deleted a number of file caches, and running and saving a copy of diagnostics. "Jay" created  Case #1361094 - "IPAM Polling getting stuck" for this issue, requested I upload diagnostics for this issue, and let me know they would be investigated and I would be contacted when more information was available.

On Wednesday, November 22nd I requested an update on this case as I had not heard back on it.

Jay responded with the following:

Update for Case #1361094 - "IPAM Polling getting stuck"

Hello Josh,

I apologize, I was out for 2 days due to influenza.  I have consulted this to my internal team and they do not think this is a bug.

They told me that this could be caused by DHCP and DNS Failures.

Kindly go to DHCP and DNS management.  Edit the servers one by one and then test the credentials.  Make note of the failing servers.

Sincerely,

Alejandro Jay Realo

Solarwinds Technical Support| SolarWinds – Unexpected Simplicity

Office: 866.530.8040

Our Products: Network | Systems and Applications | Virtualization | Storage | SIEM

I did not understand how this issue would contribute to my problem as we don't manage our DNS within IPAM and that DHCP server polling worked without issue in IPAM 4.5.2.

I did notice that the "Polling Engine" field under the "Automatic Scanning" option in "Edit Subnet Properties" was blank for all our subnets.

I replied back to Jay with the following on Wednesday, November 22nd:

I disagree.

Per the following THWACK posting there are other individuals experiencing this same issue: https://thwack.solarwinds.com/thread/116013

I queried all my subnets in the "IPAM.GroupNode" table and I see that 8,247 of the subnets have auto scanning enabled but, no EngineID is assigned. These subnets were successfully being scanned on IPAM 4.5.2 (before multiple IPAM polling engines existed) and now fail to scan on IPAM 4.6RC2.

Jay did not reply until Monday, November 27th.  He replied with the following:

Hello Josh,

OK, however, you did not specify that you are encountering the same error message.  I had a customer that has that message and what I did was to have him install rc3.

https://downloads.solarwinds.com/solarwinds/OnlineInstallers/RC/IPAM/Solarwinds-Orion-IPAM-4.6-RC3.e...

However, I’m not sure if your issue falls under that.  Kindly check each and every DHCP and DNS and check the credentials and note any failures.

Sincerely,

Alejandro Jay Realo

I confirmed that my credentials work on all of my managed DHCP servers in IPAM and replied to Jay within an hour with the following:

I have successfully checked all my DHCP (I don't have an DNS polling) credentials and they were successful.

What would be the next step?

Jay replied with the following:

Hello Josh,

Go to IPAM Settings  >> SNMP Credentials.


Take a screenshot.

Sincerely,

Alejandro Jay Realo

I did so and following up with another finding the same day:

pastedImage_12.png

I have updated my environment with IPAM 4.6.0HF3 and the issue remains.

If I run a manual scan on any of the "stuck" subnets, they complete successfully (however the "Last Discovery" date doesn't change).

pastedImage_14.png

The fact that the manual scan works and that the auto scan jobs worked prior to installing 4.6RC I would think that there isn't any issues with my DHCP/DNS credentials.

Please advise what else can be done or if you'd like to schedule another webmeeting to look into this.

Thanks,

**NOTE: When I said "IPAM 4.6.0HF3" I was actually referring to IPAM 4.6RC3

Jay conducted another Webex session on the main polling engine and APE on Wednesday, November 29th where he repaired the installation of Orion without any progress being made.

Jay followed up on Wednesday, November 29th with the following:

Hello Joshua,

I got a hold of our Application Engineer, he wanted to do a remote session on Friday, 10 AM US Central Time.  Would that be ok?

Sincerely,

Alejandro Jay Realo

I accepted this:

Yes, that works for me.

On Friday, December 1st I had a multi-hour Webex session with an application engineer, Matthew Lamb.

No progress was made on the issue and Matt requested I submit diagnostics from the session.

On Wednesday, December 6th I updated to IPAM 4.6 GA and the issues persisted.

I requested an update on the case:

Alejandro, Can this case be escalated? This has been going on for over a week and the issue persists even now that IPAM 4.6 is a GA version.

I received the following response:

Hello Josh,

This has been escalated.  The Application Engineer is the Highest support and your case was undoubtedly consulted to the development team.  They are in the process of investigating this issue.

Sincerely,

Alejandro Jay Realo

On Thursday, December 7th I noticed I was unable to poll DHCP servers from my APE.

I called support for this new issue and Jay created Case # - 00019818 DHCP server addition for this issue.

He created this case without performing any troubleshooting with the issue. He also questioned whether this feature exists in IPAM 4.6 and asked that I submit screenshot proof of where this feature is documented.

Here's the information I submitted for this case:

Node # 4954 is being monitored in NPM by the additional polling engine *removed* It is a Cisco Catalyst 4500 that is also setup in NCM with credentials that have full "enable" access.

pastedImage_38.png

pastedImage_39.png

When I attempt to test it using the same credentials that NCM uses in IPAM, I receive an error that states "Test Failed. Node 4954 is not on the Main poller. Please move node to the Main poller and try again."

pastedImage_40.png

Per the IPAM 4.6 release notes here: https://support.solarwinds.com/Success_Center/IP_Address_Manager_(IPAM)/release_notes

"The DHCP server properties dialog now shows the name of the polling engine assigned to that server, and a status indicator shows if the polling engine is online and available."

pastedImage_41.png

I have also uploaded the diagnostics as requested.

Please let me know when we can work to resolve this asap as adding DHCP server from the APE is a critical feature for us.

Thanks,

Matthew Lamb replied to case #1361094 with the following:

Hello Josh,

I wasn't aware that you had upgraded IPAM to the fully released version of 4.6. That changes some things that development need to look at. If you have upgraded IPAM to 4.6 GA, can you do the following for me? It will assist development with their investigation:

- On Each poller, go to Start > Run and Type in LogAdjuster

- Scroll down to IP Address Manager and set the Control for each of the 3 options in there to DEBUG

- Click on apply

- Wait 1 hour

- Create a set of diags on each poller by going to Start > Run and type in Orion Diag

- Leave the default options selected for the diags and just continue through for creation.

Once the diags are completed on each poller, please upload them to the link below:

https://Share.SolarWinds.com/?ShareToken=*removed*

Best regards,

Matthew Lamb

In frustration on Thursday, December 7th I sent the following to Connie Dowdle (ding​ IPAM Project Manager), Jay, and "technicalsupportfeedback@solarwinds.com":

Hi Connie,

​I hate to contact you directly however, I have had 2 critical issues since IPAM 4.6 RC that still persist with the GA version. I have been in contact with support over the past 4 weeks and they said they've been trying to arrange a remote session with the IPAM dev team to get this resolved.  This step of the troubleshooting process seems to be delayed and the support rep assigned to my cases (Alejandro "Jay" Realo) has said the delay is with the IPAM dev team.

Case # 1361064 has to do with IPAM scan jobs getting "stuck" and first occurred on November 17th with IPAM 4.6 RC2. Initially, Jay chastised me for installing an RC version in a production environment as in his words "it's like a beta version and may be broken". I let him know I was fully aware of this and that RC versions are still fully supported by SolarWinds support and that the features introduced in IPAM 4.6 are critical to my company's environment.

I completed 3 remote sessions to troubleshoot this issue with Jay with the latest one including an IPAM application engineer lasting for a few hours with no progress on the issue.

Case # 00019818 was just opened today. This issue concerns trying to add a DHCP server to IPAM that is being polled from an additional polling engine(APE). When I attempt to add the node that is already being polled by the APE in NPM as a DHCP server in IPAM 4.6 GA, I receive an error that says "Test Failed Node 4956 is not on the Main poller. Please move the node to the Main poller and try again."

pastedImage_48.png

I recevied Jay again when calling support. He immediately instructed me to submit diagnostics and to email a screenshot of the error without doing any troubleshooting or a Webex session.  He said that this issue will be sent directly to the dev team for resolution once I submitted the requested items.

The process for getting these issues resolved has been very frustrating and I'm not accustomed to SolarWinds support, which has always been very responsive and helpful in the past, taking so long to get such major issues resolved.  When I recommended SolarWinds products in our company's current use case as a solution I was confident I could get it to do what we need it to do and vouched for that in meetings to request purchase of the licenses needed for this deployment.

I don't want to have to go into my update meeting next week and present that I'm not able to proceed due to unresolved technical issues that have existed for almost a month now.

Thanks,

On Thursday, December 7th I responded to Matt's request:

Matt,

Changed the logging on both my polling engines as requested. I'll wait until 12:15 PM CST and create/send the diag files.

Thanks,

Then Matt:

Josh,

Excellent, thank you. I would also ask that when you send the diags in, to provide me all the times that you can meet Monday through Friday next week during the morning hours (preferably between 8-11am CST).

I've already asked development to look at this directly, but I won't hear back from them until tomorrow morning on availability, which would be too late. Gathering diags like you are doing so now is to prep them before the meeting. They really need to see this themselves, same as I needed, so I'm pushing for that to happen asap.

Best regards,

Matthew Lamb

Then Me:

Diags are uploading now.  Monday at 8am would work for me.

Then Matt:

Josh,

I have received them, thank you. I'll go ahead and schedule the meeting for Monday, 12/11 @ 8am CST:

https://sw.webex.com/sw/j.php?MTID=*removed*

Meeting number (access code): *removed*

Join by phone

*removed* United States Toll

*removed* United States Toll Free

Best regards,

Matthew Lamb

On Monday, December 11th I had a multi-hour Webex session with Matt and 2 members of the IPAM development team. During this Webex, most Orion components were uninstalled and reinstalled. The (server name)\"private$\solarwinds/collector/processingqueue/ipam.dhcp.subnet.polling" message queue was cleared as it was getting queued with over 20,000 messages and not being processed.

pastedImage_65.png

Logs were generated and uploaded.  The devs weren't sure how to resolve the issue and said they would follow up after reviewing the logs.

On Monday, December 11th Jay replied to Case # - 00019818 DHCP server addition with the following:

Hello Josh,

Upon checking, it seems that is the expected behavior.

To verify, can you create custom whitelist for Hostname, IP and MAC and then put asterisk.

Just check if you would still get rogue alerting.

Sincerely,

Alejandro Jay Realo

I wasn't sure what Jay was talking about as his instructions didn't seem to apply to my issue.

I spoke to Matt about this and he said this is a known issue in IPAM 4.6 GA and he would be sure I would be alerted when it's resolved.

On Thursday, December 14th I requested an update on this:

Hi, Are there any updates on this request? Thanks

Matt responded with:

Josh,

Not at this time. The core devs are currently working with the IPAM devs to reproduce the issue. Once they can reproduce it, then they can root cause it and determine fix. I'll update you as soon as I receive word.

Best regards,

Matthew Lamb

On Friday, December 15th Matt sent the following:

Josh,

Not at this time. The core devs are currently working with the IPAM devs to reproduce the issue. Once they can reproduce it, then they can root cause it and determine fix. I'll update you as soon as I receive word.

Best regards,

Matthew Lamb

I uploaded the files as requested and let Matt know.

On Monday, December 18th I sent the following to Matt, Connie Dowdle, and technicalsupportfeedback@solarwinds.com:

Hi Matt,

Would it be possible to schedule a session this week to get this resolved?  Should I escalate this with a member of the Core team to get this resolved?

Please let me know what can be done to get this taken care of, regardless of which product is responsible.  This issue started when IPAM 4.6RC1 was installed on Sunday, November 12th.  I contacted SolarWinds support to open this case on Friday, November 18th.  This case has now been open for over a month, I've worked with support over the course of 4 Webex meeting with over 6 hours on those meetings, and IPAM 4.6 GA is out and no progress has been made on this.

Thanks,

Matt responded with:

Hello Josh,

Unfortunately a meeting would not be productive until development has a possible fix or needs specific information directly. At this time, they have determined WHAT is happening, but now are trying to determine WHY.

I have already let our management know of the request for escalation, as well as Connie with the product management group and will update you with what I hear back from them and development asap.

Best regards,

Matthew Lamb

On Thursday, December 21st I noticed that the "Credentials for scope scans" under "Manage Credentials" is blank and shows "No credentials have been added." I currently have a number of credentials that are assigned to numerous scope scans and have even tried to add new credentials without them showing up.  I contacted support and Case # - 00027114 - Manage credentials on IPAM settings is blank. was created for this.

pastedImage_79.png

On Friday, December 22nd I conducted a Webex with Dave Roallos for this issue and he had me export and upload diagnostics for this.

On Tuesday, December 26th I received the following up for Case # - 00027114 - Manage credentials on IPAM settings is blank.:

Hi Joshua,

Good day.

Thank you for uploading your Diagnostics for IPAM and UDT, I already consult this case and mention with our Application Engineer that you also have a case that's already with our DevTeam.

Regards,

Earl Chrys B. Caranguian

I apologize for the huge amount of info but, I needed to unload this to see if anyone is seeing these issues and in the hope I can get this resolved before 2018

dingwabbott

133 Replies
Level 13

Concluded my call at 12:30 PM CST.  The vast majority of the issues remain unresolved. As part of the "IPAM Polling getting stuck", IPAM was starting automated subnet scans for a lot of subnets until it could no longer handle all the queued jobs. Before IPAM 4.6 the maximum simultaneous subnet scan jobs was set to 5 (or could be manually changed via the web console).  This issue seems to be resolved with a buddy drop and a lot of manual changes.

-There are still scans failing and then "getting stuck" in the job status window.

-The IPAM "Manage Credentials" windows is still blank.

-I'm still receiving an error when trying to add a DHCP server to an APE for IPAM scanning.

It was mentioned on a previous call that there may be an issue with the collector service requiring assistance from the "collector dev team". The call ended today because, the collector dev team was no longer in the office and despite this issue being mentioned in the past they weren't included in this call for some reason.

I'm in central time and have arranged for a 4AM support call tomorrow to accommodate the schedules of the SolarWinds dev teams to troubleshoot this as I can't continue to sacrifice 3+ hours of my work days on SolarWinds support calls.

I'm very angry and frustrated at the multiple missed opportunities to troubleshoot these issues completely much earlier when the issues were first reported.

I really hope I can close this thread for good tomorrow...

0 Kudos

Any word on resolution yet?

0 Kudos

I was looking forward to the IPAM 4.6 upgrade until I read this. It sounds a bit like the growing pains from the NPM 11/11.5 upgrades. As much as I was planning an NPM 12.2 multi-module/multi-poller upgrade, I think it may be wiser to wait until this gets resolved. ding​ - can you follow up on this mega-thread if a new hotfix is posted, as it sounds like that may be forthcoming.

The only other thought I have is wondering if those with issues are upgrading from an older NPM 10.x or 11.x  Orion system (upgraded to 12.01, then 12.1 etc.), instead of a 12.x fresh installed system. I have noticed that newer additional pollers 'seem' to behave nicer as they are brought online after a couple of upgrades have occurred on the rest of the servers. That's more of a windows thing than anything else. I don't recommend rebuilding an entire environment for every upgrade (but it has crossed my mind - deploy current versions on new servers, then upgrade).

Marc

0 Kudos

My issues have been resolved with a buddy fix during a long support session

with the dev team on Friday.

I'm travelling this week and I'm hoping on getting time to write a proper

summary later. This took alot longer than I hoped it would but, hates off

to the SolarWinds staff for getting this resolved.

I'm not sure when a proper hotfix will be released for the subnet polling

but, I would imagine it would be soon.

On Mon, Jan 8, 2018, 14:29 marcrobinson

When will a hotfix be released for this fix? I really need this fixed for me ASAP.

0 Kudos

tasmar85 , do I understand correctly that support resolved your issue yesterday??

0 Kudos

ding​, yes you are correct. I was provided a buddy drop to fix the issue with subnet scanning, however I do have another issue I see with DNS zone transfer scanning as it is not running at all. I have updated my ticket and hopefully will get that fixed, but the biggest concern for me is fixed and that is with the subnet scans.

0 Kudos

I'm not sure when a proper hotfix with what was done to resolve the issues on my system will be released.

ding​, can you provide any advise on this?

0 Kudos

It's in the process of being published as we speak!

Level 13

I've been on a call with support and the IPAM dev team since 9:30AM CST.  They're still working on the issues, I'll update with what happened when we conclude.

0 Kudos

Right now, I am unable to update/upgrade several of my remote Polling Engines (#Case# 00028732) because the Orion-Installer is unable to pull down the required data.  I will have to wait on the IPAM Hotfix until then.  But I am having several issues with IPAM as well.  I also noticed that the "Simultaneous Scans" number line item is missing. 

Eric

Can you let us know when your case has been resolved CourtesyIT​? We have an upgrade scheduled for next week.

0 Kudos
Level 13

Update on this.

After installing IPAM4.6HF1 all the documented issues remain and none of them were resolved.  I spent 2 hours and 40 minutes on the phone with support attempting to continue to resolve this issue after the Hotfix was installed.

As of now, over 1,800 subnet scan jobs are being queued up maxing out the resources on my primary polling engine.  The maximum subnet scan setting is set to 5 subnets in the DB but, IPAM 4.6 is completely ignoring this setting (in fact, there was a setting in the web interface to set the maximum amount of simultaneous subnet scans yourself but it's been removed. I'm told this is a feature in 4.6 and not a bug).

As a temporary fix until development can get a fix in place, support assisted me in disabling all subnet scans.  The other option was to disable IPAM altogether however, we need IPAM even in a static form too much to disable it.

I would like to think I've been extremely patient with the issues related to IPAM 4.6 and I've been very transparent with how my support experience has been.

Most of our maintenance license renewals are coming due in the next few months. Given that this issue has persisted for nearly 2 months, the handoff of my issues was fumbled from 1st level support to development during RC, and despite the fact that I'm clearly not the only user with these issues and 4.6 was released into GA anyways I'm not very confident the value of having active support provides.

At this point I don't think it's unfair to expect someone senior within SolarWinds to contact me to review my support experience so far, figure out how to prevent this from happening again (expedited support for RC version users!?!), and to finally fix this issue for good.

ding

I wouldn't hold your breath

0 Kudos

Is there a thread in NPM for this? I would like to follow this discussion.

0 Kudos

I have also applied the 4.6 HF1 from the customer portal on both my Main Polling Engine and Additional Polling Engine and the scans are still broken for me as well. The one thing that did work for me was the feature to add a DHCP server that is being polled on my Additional Polling Engine. This is very frustrating and I have not had any accurate scans working for well over 1 month and if this isn't fixed soon I might have to also consider not renewing IPAM that we have for two separate Orion Instances and was planning on purchasing a third IPAM for when we deploy a third instance of SolarWinds. This really needs to be priority number 1 and if development needs to be on a call with me to find a fix then I am all for that.

Level 13

Unfortunately, none of my outstanding issues mentioned above have been addressed.

I installed the hotfix about 30 minutes ago.  None of the scans queued up have completed including the small "/29" subnet that queued up first.

My "Manage Credentials" for IPAM scope is still blank.  I tried adding test credentials but, they didn't appear either.

hi,
regarding scans could you please try clear up JobEngine35.sdf as described here (IPAM scans not running - SolarWinds Worldwide, LLC. Help and Support ) and rerun scans.

0 Kudos

I've been on with support for an hour and 30 minutes since installing the hotfix. The rep performed the task you suggested and this hasn't resolved the issue.

I presume you're a SolarWinds employee yarl​?

0 Kudos

Since installing HF3 my job scans list in the web interface is now completely empty and it tells me "No subnets are configured for reoccurring scans". Despite subnets being configured for automatic scanning.

Level 11

Am I missing something? I downloaded the offline installer (Solarwinds-Orion-HotFix-2017.3-OfflineInstaller.exe) from the customer portal, that indicates it was released/updated today but when I run it all I get is "All products are up-to-date" and I can't progress.

2018-01-03_18h17_05.png

2018-01-03_18h16_18.png

Also being very pedantic in the release notes it talks about the hotfix being called SolarWinds-IPAM-v3.3.0-HotFix1.msp when I assume it actually means SolarWinds-IPAM-v4.6.0-HotFix1.msp?