Wednesday, February 03, 2010

OSGi resolution is NP-Complete. So what?

At last we have a proof, thanks to Robert Dunne, that the OSGi R4 resolution problem is NP-Complete. He showed that it is possible to represent a solution to the NP-Complete boolean satisfaction problem 3-SAT as a wiring of a carefully constructed set of OSGi bundles. After the idea arose of using a SAT solver to construct an OSGi resolver, some of us have been discussing the possibility of a SAT-based proof. The hard step was how to represent a boolean OR in terms of resolution, which Robert has now provided.

NP-Completeness tallies with the experience of those who have written an OSGi R4 resolver: it takes about a fortnight of intensive hacking, combined with sleepless nights, to crack the problem followed by a much longer period of bug fixing and optimisation. Now we know why: NP-Complete problems are hard to solve efficiently.

I'm partly to blame as I led the spec work on OSGi RFC 79 which added considerable function, and with it complexity, to the resolution algorithm. I presented some of the background at the 2004 OSGi Congress. The RFC 79 spec was one of the more interesting that the OSGi Core Platform Expert Group has worked on. There was a tension between addressing the basic use cases and creating something that was too hard to implement.

Fortunately, we developed a prototype resolver in the Equinox incubator. This provided useful feedback to the spec work, particularly when features were dropped from the spec which made the implementation more tractable. The team which created the prototype included Simon Burns, Steve Poole, and Tom Watson. Simon wrote the initial R4 resolver. Steve created a rigorous model-based test based on a slow, but functionally corrrect, resolver of his own. Tom provided much advice on the R3 resolver and eventually integrated the new resolver into Equinox and has been improving it ever since.

So what features emerged from this process that made the resolver NP-Complete? Well, crucial to the proof are the uses constraint and, its evil twin, optional imports. Steve Powell observes that the proof does not depend on the transitivity of the uses constraint. (Other, arguably much more evil, features in R4, require-bundle and fragments, also did not contribute to the resolver's intrinsic complexity.)

This suggests some practical ways round the problem. Examples occasionally crop up when the time to resolve a set of bundles becomes unacceptable. How can we work around such issues if we are willing to put in the effort?

Firstly, uses constraints can be avoided by preventing types from one package leaking out through the interface of another package. This is an application of the Law of Demeter.

Secondly, in some cases it may be possible to remove, or at least cut down, optional imports by using mandatory imports instead and splitting the function which has those dependencies across bundles so that the user's dependencies determine which bundles are needed.

These approaches are feasible only for new code. Bundles generated from popular Java libraries tend to exhibit both uses constraints and optional imports.

Another possibility we've been considering is a "strong" form of optionality in which the optional import cannot be discarded if it is locally satisfiable, i.e. before uses constraints are applied. Certainly the current definition of optional import is critical to Robert's proof. The challenge is to convince ourselves that R4 resolution with strong optionality in place of the current definition is no longer NP-Complete or, conversely, to prove the opposite.

I'm undecided. Strong optionality drastically reduces the search space in some cases. On the other hand, I feel there may be a proof based on 3-SAT in which multiple bundles import, but do not export, packages.


Tom said...

I believe with a finite set of installed bundles the algorithm can be made fast enough for almost all cases even if the problem is NP-Complete. I have been in discussions with Richard Hall (Felix) about an improved algorithm that is much smarter about selecting a valid solution from the potentially large set of possible solutions.

We are hopeful that for the runtime solving the NP-Complete problem is reasonable and can be made to be fast. I think we get into different issues when dealing with provisioning agents that have huge repositories of bundles to query and find a valid deployment solution. This is where SAT solvers have become important to the p2 project at Equinox.


Glyn said...

That's encouraging Tom. I chatted with Richard this afternoon and found out about the progress on the resolver under development for Felix 3.0. It turns out Felix has always interpreted optional imports as "strong" unlike Equinox. I believe there is enough latitude in the spec to permit this, so I'd be interested in your views. I may post a short blog on this topic.

Huge repositories are another interesting area. The design point for dm Server is to make minimal indices of remote repositories available to the repository "client" and containing basic metadata, like bundle manifests, and secure hashes for caching. Then we run the Equinox resolver in a side state to determine which bundles to download. If some of this calculation is offloaded to a remote repository, then it will need to allow for the current wiring which probably boils down to returning a superset of the necessary bundles so that accurate resolution can be completed on the repository client side. Again material for another blog, especially as Virgo gets established and needs to integrate with p2.

Rob Harrop said...

Tom, I'm interested in the notion of selecting _a_ valid solution. What if there are many valid solutions? Can the user know which of these solutions will be chosen in advance? Will there be some optimally valid solution that will always be chosen?

Glyn said...

Rob, The question seems to presume the existence of an ordering "more optimal than". But what if that ordering allows there to be more than one equally optimal valid solution? Presumably the implementation would then be free to return any one of these and the user wouldn't care which.


OSGi (130) Virgo (59) Eclipse (10) Equinox (9) dm Server (8) Felix (4) WebSphere (3) Aries (2) GlassFish (2) JBoss (1) Newton (1) WebLogic (1)