This discussion has been locked. The information referenced herein may be inaccurate due to age, software updates, or external references.
You can no longer post new replies to this discussion. If you have a similar question you can start a new discussion in this forum.

IPAM 4.6 is broken

TL;DR I've had big issues with my IPAM install since installing IPAM 4.6 and it has caused the following issues that have been unresolved since November 17th:

-Case #1361094 - "IPAM Polling getting stuck"

-Case # - 00019818 DHCP server addition

-Case # - 00027114 - Manage credentials on IPAM settings is blank.

Let me preface this rant by saying that this isn't typical for SolarWinds Support or with my past experiences with other product's development teams when it comes to using release candidate products.

My company acquired a large project that due to technical requirements is not able to be routed to the subnet our primary polling engine is on therefore necessitating the installation of a additional polling engine (APE) on a subnet at the location with access to the location's subnets. This APE accesses devices at the site with NPM, SAM, NCM, NTA, and IPAM.

With the introduction of 4.6RC1, support for APEs was introduced:

Additional Polling Engine (APE) support

You can now install Additional Polling Engines to IPAM. If you have a large number of subnets, adding polling engines can reduce the time it takes to scan your network. Use the Orion Web Console to assign a polling engine to scan individual subnets. The DHCP server properties dialog now shows the name of the polling engine assigned to that server, and a status indicator shows if the polling engine is online and available. 

I updated my instance of IPAM to 4.6 RC1 on Sunday, November 12th. I upgraded to 4.6 RC2 on Thursday, November 16th. Over the next couple days, I noticed a number of issues in 4.6 and contacted support on Friday, November 17th to report these issues.

The first issue I encountered was that my IPAM polling jobs are queuing up but, not finishing regardless whether they're running from the primary or additional polling engine. These jobs don't seem to be completing regardless if they were automated or manually started.  The "Last Discovery" field isn't incrementing and the "Status" field for most subnets has a started time of 5000+ minutes ago.

pastedImage_1.png

pastedImage_2.png

I contacted support on Friday, November 17th and spoke to Alejandro Jay Realo.  In this call he was not familiar with the additional polling engine feature introduced with IPAM 4.6.  He warned me against installing release candidate versions in production environments as in his words "they're basically beta software". I responded that RC versions are fully supported by SolarWinds support and that I required the APE feature introduced in this RC for the new location my employer acquired.  The tech connected to my primary polling engine via Webex and ran the configuration wizard, changed the logging level to debug on most services, deleted a number of file caches, and running and saving a copy of diagnostics. "Jay" created  Case #1361094 - "IPAM Polling getting stuck" for this issue, requested I upload diagnostics for this issue, and let me know they would be investigated and I would be contacted when more information was available.

On Wednesday, November 22nd I requested an update on this case as I had not heard back on it.

Jay responded with the following:

Update for Case #1361094 - "IPAM Polling getting stuck"

Hello Josh,

I apologize, I was out for 2 days due to influenza.  I have consulted this to my internal team and they do not think this is a bug.

They told me that this could be caused by DHCP and DNS Failures.

Kindly go to DHCP and DNS management.  Edit the servers one by one and then test the credentials.  Make note of the failing servers.

Sincerely,

Alejandro Jay Realo

Solarwinds Technical Support| SolarWinds – Unexpected Simplicity

Office: 866.530.8040

Our Products: Network | Systems and Applications | Virtualization | Storage | SIEM

I did not understand how this issue would contribute to my problem as we don't manage our DNS within IPAM and that DHCP server polling worked without issue in IPAM 4.5.2.

I did notice that the "Polling Engine" field under the "Automatic Scanning" option in "Edit Subnet Properties" was blank for all our subnets.

I replied back to Jay with the following on Wednesday, November 22nd:

I disagree.

Per the following THWACK posting there are other individuals experiencing this same issue: https://thwack.solarwinds.com/thread/116013

I queried all my subnets in the "IPAM.GroupNode" table and I see that 8,247 of the subnets have auto scanning enabled but, no EngineID is assigned. These subnets were successfully being scanned on IPAM 4.5.2 (before multiple IPAM polling engines existed) and now fail to scan on IPAM 4.6RC2.

Jay did not reply until Monday, November 27th.  He replied with the following:

Hello Josh,

OK, however, you did not specify that you are encountering the same error message.  I had a customer that has that message and what I did was to have him install rc3.

https://downloads.solarwinds.com/solarwinds/OnlineInstallers/RC/IPAM/Solarwinds-Orion-IPAM-4.6-RC3.exe

However, I’m not sure if your issue falls under that.  Kindly check each and every DHCP and DNS and check the credentials and note any failures.

Sincerely,

Alejandro Jay Realo

I confirmed that my credentials work on all of my managed DHCP servers in IPAM and replied to Jay within an hour with the following:

I have successfully checked all my DHCP (I don't have an DNS polling) credentials and they were successful.

What would be the next step?

Jay replied with the following:

Hello Josh,

Go to IPAM Settings  >> SNMP Credentials.


Take a screenshot.

Sincerely,

Alejandro Jay Realo

I did so and following up with another finding the same day:

pastedImage_12.png

I have updated my environment with IPAM 4.6.0HF3 and the issue remains.

If I run a manual scan on any of the "stuck" subnets, they complete successfully (however the "Last Discovery" date doesn't change).

pastedImage_14.png

The fact that the manual scan works and that the auto scan jobs worked prior to installing 4.6RC I would think that there isn't any issues with my DHCP/DNS credentials.

Please advise what else can be done or if you'd like to schedule another webmeeting to look into this.

Thanks,

**NOTE: When I said "IPAM 4.6.0HF3" I was actually referring to IPAM 4.6RC3

Jay conducted another Webex session on the main polling engine and APE on Wednesday, November 29th where he repaired the installation of Orion without any progress being made.

Jay followed up on Wednesday, November 29th with the following:

Hello Joshua,

I got a hold of our Application Engineer, he wanted to do a remote session on Friday, 10 AM US Central Time.  Would that be ok?

Sincerely,

Alejandro Jay Realo

I accepted this:

Yes, that works for me.

On Friday, December 1st I had a multi-hour Webex session with an application engineer, Matthew Lamb.

No progress was made on the issue and Matt requested I submit diagnostics from the session.

On Wednesday, December 6th I updated to IPAM 4.6 GA and the issues persisted.

I requested an update on the case:

Alejandro, Can this case be escalated? This has been going on for over a week and the issue persists even now that IPAM 4.6 is a GA version.

I received the following response:

Hello Josh,

This has been escalated.  The Application Engineer is the Highest support and your case was undoubtedly consulted to the development team.  They are in the process of investigating this issue.

Sincerely,

Alejandro Jay Realo

On Thursday, December 7th I noticed I was unable to poll DHCP servers from my APE.

I called support for this new issue and Jay created Case # - 00019818 DHCP server addition for this issue.

He created this case without performing any troubleshooting with the issue. He also questioned whether this feature exists in IPAM 4.6 and asked that I submit screenshot proof of where this feature is documented.

Here's the information I submitted for this case:

Node # 4954 is being monitored in NPM by the additional polling engine *removed* It is a Cisco Catalyst 4500 that is also setup in NCM with credentials that have full "enable" access.

pastedImage_38.png

pastedImage_39.png

When I attempt to test it using the same credentials that NCM uses in IPAM, I receive an error that states "Test Failed. Node 4954 is not on the Main poller. Please move node to the Main poller and try again."

pastedImage_40.png

Per the IPAM 4.6 release notes here: https://support.solarwinds.com/Success_Center/IP_Address_Manager_(IPAM)/release_notes

"The DHCP server properties dialog now shows the name of the polling engine assigned to that server, and a status indicator shows if the polling engine is online and available."

pastedImage_41.png

I have also uploaded the diagnostics as requested.

Please let me know when we can work to resolve this asap as adding DHCP server from the APE is a critical feature for us.

Thanks,

Matthew Lamb replied to case #1361094 with the following:

Hello Josh,

I wasn't aware that you had upgraded IPAM to the fully released version of 4.6. That changes some things that development need to look at. If you have upgraded IPAM to 4.6 GA, can you do the following for me? It will assist development with their investigation:

- On Each poller, go to Start > Run and Type in LogAdjuster

- Scroll down to IP Address Manager and set the Control for each of the 3 options in there to DEBUG

- Click on apply

- Wait 1 hour

- Create a set of diags on each poller by going to Start > Run and type in Orion Diag

- Leave the default options selected for the diags and just continue through for creation.

Once the diags are completed on each poller, please upload them to the link below:

https://Share.SolarWinds.com/?ShareToken=*removed*

Best regards,

Matthew Lamb

In frustration on Thursday, December 7th I sent the following to Connie Dowdle (ding​ IPAM Project Manager), Jay, and "technicalsupportfeedback@solarwinds.com":

Hi Connie,

​I hate to contact you directly however, I have had 2 critical issues since IPAM 4.6 RC that still persist with the GA version. I have been in contact with support over the past 4 weeks and they said they've been trying to arrange a remote session with the IPAM dev team to get this resolved.  This step of the troubleshooting process seems to be delayed and the support rep assigned to my cases (Alejandro "Jay" Realo) has said the delay is with the IPAM dev team.

Case # 1361064 has to do with IPAM scan jobs getting "stuck" and first occurred on November 17th with IPAM 4.6 RC2. Initially, Jay chastised me for installing an RC version in a production environment as in his words "it's like a beta version and may be broken". I let him know I was fully aware of this and that RC versions are still fully supported by SolarWinds support and that the features introduced in IPAM 4.6 are critical to my company's environment.

I completed 3 remote sessions to troubleshoot this issue with Jay with the latest one including an IPAM application engineer lasting for a few hours with no progress on the issue.

Case # 00019818 was just opened today. This issue concerns trying to add a DHCP server to IPAM that is being polled from an additional polling engine(APE). When I attempt to add the node that is already being polled by the APE in NPM as a DHCP server in IPAM 4.6 GA, I receive an error that says "Test Failed Node 4956 is not on the Main poller. Please move the node to the Main poller and try again."

pastedImage_48.png

I recevied Jay again when calling support. He immediately instructed me to submit diagnostics and to email a screenshot of the error without doing any troubleshooting or a Webex session.  He said that this issue will be sent directly to the dev team for resolution once I submitted the requested items.

The process for getting these issues resolved has been very frustrating and I'm not accustomed to SolarWinds support, which has always been very responsive and helpful in the past, taking so long to get such major issues resolved.  When I recommended SolarWinds products in our company's current use case as a solution I was confident I could get it to do what we need it to do and vouched for that in meetings to request purchase of the licenses needed for this deployment.

I don't want to have to go into my update meeting next week and present that I'm not able to proceed due to unresolved technical issues that have existed for almost a month now.

Thanks,

On Thursday, December 7th I responded to Matt's request:

Matt,

Changed the logging on both my polling engines as requested. I'll wait until 12:15 PM CST and create/send the diag files.

Thanks,

Then Matt:

Josh,

Excellent, thank you. I would also ask that when you send the diags in, to provide me all the times that you can meet Monday through Friday next week during the morning hours (preferably between 8-11am CST).

I've already asked development to look at this directly, but I won't hear back from them until tomorrow morning on availability, which would be too late. Gathering diags like you are doing so now is to prep them before the meeting. They really need to see this themselves, same as I needed, so I'm pushing for that to happen asap.

Best regards,

Matthew Lamb

Then Me:

Diags are uploading now.  Monday at 8am would work for me.

Then Matt:

Josh,

I have received them, thank you. I'll go ahead and schedule the meeting for Monday, 12/11 @ 8am CST:

https://sw.webex.com/sw/j.php?MTID=*removed*

Meeting number (access code): *removed*

Join by phone

*removed* United States Toll

*removed* United States Toll Free

Best regards,

Matthew Lamb

On Monday, December 11th I had a multi-hour Webex session with Matt and 2 members of the IPAM development team. During this Webex, most Orion components were uninstalled and reinstalled. The (server name)\"private$\solarwinds/collector/processingqueue/ipam.dhcp.subnet.polling" message queue was cleared as it was getting queued with over 20,000 messages and not being processed.

pastedImage_65.png

Logs were generated and uploaded.  The devs weren't sure how to resolve the issue and said they would follow up after reviewing the logs.

On Monday, December 11th Jay replied to Case # - 00019818 DHCP server addition with the following:

Hello Josh,

Upon checking, it seems that is the expected behavior.

To verify, can you create custom whitelist for Hostname, IP and MAC and then put asterisk.

Just check if you would still get rogue alerting.

Sincerely,

Alejandro Jay Realo

I wasn't sure what Jay was talking about as his instructions didn't seem to apply to my issue.

I spoke to Matt about this and he said this is a known issue in IPAM 4.6 GA and he would be sure I would be alerted when it's resolved.

On Thursday, December 14th I requested an update on this:

Hi, Are there any updates on this request? Thanks

Matt responded with:

Josh,

Not at this time. The core devs are currently working with the IPAM devs to reproduce the issue. Once they can reproduce it, then they can root cause it and determine fix. I'll update you as soon as I receive word.

Best regards,

Matthew Lamb

On Friday, December 15th Matt sent the following:

Josh,

Not at this time. The core devs are currently working with the IPAM devs to reproduce the issue. Once they can reproduce it, then they can root cause it and determine fix. I'll update you as soon as I receive word.

Best regards,

Matthew Lamb

I uploaded the files as requested and let Matt know.

On Monday, December 18th I sent the following to Matt, Connie Dowdle, and technicalsupportfeedback@solarwinds.com:

Hi Matt,

Would it be possible to schedule a session this week to get this resolved?  Should I escalate this with a member of the Core team to get this resolved?

Please let me know what can be done to get this taken care of, regardless of which product is responsible.  This issue started when IPAM 4.6RC1 was installed on Sunday, November 12th.  I contacted SolarWinds support to open this case on Friday, November 18th.  This case has now been open for over a month, I've worked with support over the course of 4 Webex meeting with over 6 hours on those meetings, and IPAM 4.6 GA is out and no progress has been made on this.

Thanks,

Matt responded with:

Hello Josh,

Unfortunately a meeting would not be productive until development has a possible fix or needs specific information directly. At this time, they have determined WHAT is happening, but now are trying to determine WHY.

I have already let our management know of the request for escalation, as well as Connie with the product management group and will update you with what I hear back from them and development asap.

Best regards,

Matthew Lamb

On Thursday, December 21st I noticed that the "Credentials for scope scans" under "Manage Credentials" is blank and shows "No credentials have been added." I currently have a number of credentials that are assigned to numerous scope scans and have even tried to add new credentials without them showing up.  I contacted support and Case # - 00027114 - Manage credentials on IPAM settings is blank. was created for this.

pastedImage_79.png

On Friday, December 22nd I conducted a Webex with Dave Roallos for this issue and he had me export and upload diagnostics for this.

On Tuesday, December 26th I received the following up for Case # - 00027114 - Manage credentials on IPAM settings is blank.:

Hi Joshua,

Good day.

Thank you for uploading your Diagnostics for IPAM and UDT, I already consult this case and mention with our Application Engineer that you also have a case that's already with our DevTeam.

Regards,

Earl Chrys B. Caranguian

I apologize for the huge amount of info but, I needed to unload this to see if anyone is seeing these issues and in the hope I can get this resolved before 2018 emoticons_sad.png

dingwabbott

  • Josh,

    I am also seeing the issue of the subnet scan being stuck, and never completing.  I also have a case open with support, but we are not making any progress as far as I can tell.

    Since I do not need any of the new features in 4.6 I will most likely roll back to 4.5 and stay there until they get this straightened out.

    Andy

  • Wow Josh... I'm not currently even using IPAM but I read through your entire fiasco and geesh I really feel your pain!

  • Josh, I've reviewed my e-mail history and discovered I was mistaken about IPAM and SQL version.  It's the new NTA beta that requires MSSQL 2016, not IPAM.  I think you should be good to go with the SQL versions listed in the IPAM installer guide:  IP Address Manager 4.6 system requirements - SolarWinds Worldwide, LLC. Help and Support

    Please forgive me, and accept my apologies for steering you wrong there.

    Rick Schroeder

  • Hi Josh, thanks for your clear information about the problems encountered on this IPAM 4.6.

    Please share the info as soon as Solarwinds has fixed the problem.

    We are very interested in this new "DHCP Failover Support" of 4.6 emoticons_wink.png

  • I'm glad you found the documentation, but just in case others need to see it here is the IPAM requirements documentation. SQL 2012, 2014 and 2016 are supported:

    IPAM system requirements - SolarWinds Worldwide, LLC. Help and Support

  • josh.haberman​, I'm sorry about the issues you've been experiencing with the newest release. Your cases are now in the hands of the Application Engineer for IPAM who is working closely with Engineering to get you up and functional. We have resolved a couple of these issues and will be delivering a hotfix in the near future.

    I just want to be clear that the following information you were given is wrong: "He warned me against installing release candidate versions in production environments as in his words "they're basically beta software"."

    For the information of anyone reading this, RCs are fully supported in production environments and are by no means beta versions. I will take your experiences and the information you were given to support management.

  • josh.haberman, I apologize for the lack of communication – I know it seems like you are getting the same update, but that is because all of the work is being done on our side to prepare the hotfix to get this issue resolved.  While I can’t comment publicly on a timeline, we should be doing a better job of getting information to you and I will make sure we do that in the future. 

    This issue was not isolated until post-GA; while we do have a handful of other customers experiencing this, it is not affecting the majority of people who have upgraded to IPAM 4.6.  This does not lessen its priority for us to get it fixed, of course, but it was not something that was common during the RC phase. 

    Again I apologize for the experience you have had; while we strive to make every Release Candidate production ready I can certainly understand why this has shaken your confidence.  I will be reaching out to you via private e-mail to discuss further how we can help you.

  • I have to agree with Josh. I am running into the same exact issues Josh is facing with IPAM. I upgraded from 4.5 to 4.6 RC2 to find that it completely broke the IPAM product for me rendering IPAM completely useless. I have opened a ticket on the issue over a month ago and was also told not to install RC versions in production environment. For me as well this isn't the first time an update broke the product for me. I have pretty much refused to work with level 1 support as they brought no value on troubleshooting the issue, and when I asked multiple times to escalate my issue to an Application Engineer or Development it never happened and it seems that they want to keep my case stuck with them. So far I am disappointed with how this issue is being treated and would expect better as my company has been a very loyal customer of 16 years.

    If you need to reference my case number its

    1364032

  • I'm also having scanning issues since upgrading to IPAM 4.6 GA. I wish I saw this thread earlier as we just upgraded yesterday morning. I was wondering if you have seen any changes in resource utilization on your APEs and Main Polling Engine? Since the upgrade, our APE's have had a noticeable reduction in utilization while our main polling engine has been grinding it out more than usual.

  • tasmar85​, I'll review your case and reach out offline.