How a Million-Dollar Prize Problem Popped up in Orion Configuration Wizard

Almost 10 years ago, we implemented a packaging framework in the Orion® Platform improving processes around installing, upgrading, and removing product components. The packaging framework proved its value and quickly became popular and used by SolarWinds® engineers. With the packaging framework, we introduced flexible rules for package resolution—we could configure it so one package depended on others or define conflicting packages that couldn’t be installed together.

After several years, we started observing the upgrade process of a single Orion Platform machine for some customers took many hours, and Configuration Wizard was hanging in the package resolution phase. After some investigation, we found out the root cause was the package resolution algorithm. The algorithm scaled poorly—by adding each new package, the algorithm was approximately twice as slow.1 We projected as the number of packages grew, the package resolution algorithm would become unusable. It was clear we needed a more sophisticated algorithm to more quickly resolve packages.

We analyzed the problem and found out this problem is “hard.” Unfortunately, there’s no fast algorithm known2 for this problem, and it’s believed a fast algorithm doesn’t even exist. The question of whether there’s a fast algorithm for "hard" problems is called "P vs. NP"—the problem isn’t solved, and there’s a million-dollar award ready for the discoverer of the solution.

The best-known example of “hard” problems is the travelling salesman problem—there are cities with distances between them, and the salesman is asked to visit all the cities while traveling the shortest distance. The exact algorithm always giving the correct answers is slow—it’ll give you a quick answer for 10 cities, but it’ll take longer than the universe has been around to give you an answer for 1,000,000 cities. This is slow compared to “easy” problems where fast algorithms are known—a good example is sorting, as we can sort a billion values very quickly.3

Many years ago, we created a flexible packaging framework, and it became slow with an increased number of packages. On top of this, there was no fast algorithm available. We tried to simplify the problem so it could be solved by fast algorithms. By removing the rule defining conflicting packages, we could resolve packages with fast algorithms.4 Unfortunately, though, we couldn’t remove rules such as "package A can’t be installed with package B," as they were already used by many packages, and removing such rules would break customer installations.

This is one of the lessons we’ve learned—always analyze the scalability of your software design or used algorithms and project how it’ll perform with increased input. Computational complexity analyses and Big O Notation are good tools for performance evaluation and projection.

We found ourselves in a situation where we had to speed up package resolution with no fast algorithm available. Fortunately, there’s a lot of research going on in this area. Some researchers are working on optimizations that work well and allow us to solve "hard" problems on big inputs in a reasonable time. Other research streams are working on approximation algorithms that don’t find an exact solution but find one that’s close enough very quickly. Software engineers are implementing these ideas and creating software libraries capable of being used to solve "hard" problems by leveraging state-of-the-art technologies.

We examined the latest research and available software libraries and found the open-source library OR-Tools implemented by Google. This library implements sophisticated optimizations for solving "hard” problems. We leveraged this library in the following way:

  1. We transformed our package resolution problem to a set of constraints that served as inputs for the OR-Tools library
  2. We used ConstraintSolver from Google OR-Tools library to find a solution
  3. We transformed the solution found by ConstraintSolver back to the package resolution problem

The idea of problem reduction or transformation is common in theories about "hard" problems, and the reduction technique allowed us to solve our packaging resolution problem by solving the constraint problem.

When we integrated the library into the product, we ran performance tests. The performance test results showed the package resolution for a current number of packages took a couple of seconds. We wanted to project the performance to the future, so we also ran the test for thousands of packages, and this took a couple of minutes. We were satisfied with the test results.

We released an Orion Platform hotfix with a new package resolution algorithm and verified environments where resolution took many hours before. After customers deployed the hotfix, the package resolution took a couple of seconds. We implemented this improvement in 2017, and since 2017, we haven’t seen any performance issues in package resolution. 

We learned from this experience. We analyze performance and run performance tests to evaluate the performance complexity and characteristics of our design decisions and algorithms. We’ve also started working on removing the existing complex package framework, so we will no longer need to solve “hard” problem of complex package resolution. Additionally, we’ve found powerful tools like Google OR-Tools to solve “hard” problems in a reasonable amount of time or with reasonable accuracy. This library can be used for many "hard" optimization problems, such as the travelling salesman problem, the knapsack problem, or constraint optimization problems like sudoku.

 

The complexity of our algorithm was exponential, O(2n) in Big O Notation.

We reduced the known NP-complete SAT problem to our package resolution problem. No polynomial algorithm for NP-complete problems is known.

3 The travelling salesmen problem algorithm has complexity O(n22n)—when we increase input by ten (+10) items, it takes ~1,000 times longer to find a solution. Sorting can be done in O(n log n), increasing input 10 times (10x)—doing so will take only 10 to 15 times longer to find a solution.

4 We can simplify the problem by removing the rule “package A can’t be installed with package B.” Such a problem is no longer NP-complete and can be solved in polynomial time.

Anonymous
  • I love learning about 'under the bonnet' mechanic/mechanisms, so this was fascinating to see that something which appears to be a straightforward system check is actually very challenging to accomplish. Thanks for the post.