Friday, September 30, 2011

Extenders: Pattern or Anti-Pattern?

The extender pattern has become popular in recent years and has even been utilised in OSGi standards such as the Blueprint service and the Web Applications specification. In Virgo, we've been working with extenders from the start, but in spite of their advantages, they have some significant downsides. Since the OSGi Alliance is considering using extenders in other specifications, I agreed to document some of the issues.

The first difficulty is knowing when an extender has finished processing a bundle. For example, a bundle containing a blueprint XML file will transition to ACTIVE state as soon as any bundle activator has been driven. But that's not the whole story. Administrators are interested in when the bundle is ready for use and so the management code in Virgo tracks the progress of the extender and presents an amalgamated state for the install artefact representing the bundle. The install artefact stays in STARTING state until the application context has been published, at which point it transitions to ACTIVE. Without such additional infrastructure, administrator cannot tell when a bundle processed by an extender really is ready for business.

That's the successful case, but there are complications in error cases too. The first complication is that since an extender runs in a separate thread to that which installed the bundle, if the extender throws an exception, this is not propagated to the code which installed the bundle. So the installer needs somehow to check for errors. Therefore Virgo has infrastructure to detect such errors and propagate them back to the thread which initiated deployment of the bundle: the deployment operation fails with a stack trace indicating what went wrong.

The other error complication is where there is a (possibly indefinite) delay in an extender processing a bundle. For this kind of error Virgo tracks the progress of extender processing and issues warnings to the event log (intended for the administrator's eyes) saying which pieces of processing have been delayed and in some common situations, for example when a blueprint is waiting for a dependency, what is causing the delay.

Extenders suffer from needing to be able to see bundle lifecycle events and so for systems that partition the framework, it is necessary to install each extender into multiple partitions. On the flip side it is crucial to prevent multiple instances of an extender from ever seeing the same bundle event otherwise they will both attempt to extend the bundle.

Another issue with extenders is the need to keep them running and healthy as there is little indication that an extender is down or sickly other than bundles not being processed by the extender. Virgo takes care to ensure its extenders are correctly started and its infrastructure for detecting delays helps to diagnose extender crashes or sickness (both of which are extremely rare situations).

There is also an issue in passing parameters to an extender to affect its behaviour. This is typically done by embedding extender configuration in the bundles being processed or by attaching a fragment containing configuration to the extender bundle. But since the extender is not driven by an API, the normal approach of passing parameters on a call is not available. Essentially, an extender model implies that the programming model for deployment is restricted to BundleContext.installBundle.

With considerable investment in additional infrastructure, Virgo has managed to support the Blueprint and Spring DM extenders reasonably well. But in the case of the Web Applications extender, Virgo couldn't make this sufficiently robust and so it drives the underlying web componentry directly from the Virgo deployment pipeline to avoid the above issues.

I understand at least one other server runtime project has encountered similar issues with extenders, so Virgo is not alone. There is a trade-off between loosely coupling the installer from the resource-specific processing, the main strength of the extender pattern (but far from unique to that pattern), and providing a robust programming model and usable management view -- crucial features of a server runtime -- which is far more straightforward without extenders.

Thursday, September 22, 2011

OSGi Subsystems in Virgo

A public draft(*) of the OSGi subsystems RFC (152) should soon emerge from the OSGi Alliance. A subsystem is a multi-bundle application, not dissimilar to a PAR or plan in Virgo. IBM is leading the spec work and a number of other vendors, including SpringSource/VMware, are contributing. Quite a few projects have multi-bundle application constructs, so it makes sense to agree a standard form.

After going through the RFC again with my implementer's hat on, I listed the features necessary to support subsystems in Virgo. Most of the changes are in the Virgo kernel, although I hope to structure the support into non-subsystem specific generalisations of the kernel plus subsystem-specific code running in the user region.

Currently, it is not possible to deploy a plan which contain artefacts that are already deployed. This will need generalising so that we can support subsystems using a common data structure of deployed artefacts: a directed acyclic graph (DAG) rather than the current collection of trees.

The switch to a DAG has interesting implications for lifecycle management of shared subgraphs. With today's tree, when a node is stopped, started, or uninstalled, any subtrees are also stopped, started, or uninstalled, respectively. With a DAG, shared subgraphs need to be sensitive to all their parents. This boils down to keeping each shared subgraph at the maximum state required by any of its parents. States are ordered: ACTIVE > RESOLVED > UNINSTALLED.

Lifecycle management will also get interesting when a shared subgraph belongs to one or more atomic subgraphs as then lifecycle changes in the common subgraph will propagate to all the containing atomic subgraphs. I think that will just "work", but users might need to be careful in their use of atomic plans if they want to avoid management operations on one application affecting other applications.

I'm also considering using garbage collection as a means of uninstalling artefacts which are no longer needed. Given the number of types of dependency that are possible, this is likely to be more reliable than the alternative of maintaining reference counts.

I'm a little concerned about a possible race between garbage collection detecting an artefact as dead and a new dependency being created on the artefact just before garbage collection goes ahead and uninstalls the artefact, but there would be a similar concern for reference counting. The basic issue in this race is that a dead artefact may be found by a live bundle and a new dependency created before the dead artefact can be uninstalled. For instance, a dead bundle may be found by using the OSGi API to list all bundles. It may be possible to use some technique such as a special region in the region digraph to isolate dead bundles, although this issue is probably something to discuss among those working on the RFC.

Anyway, there's plenty of work to be getting on with. I haven't done detailed estimates of the features identified so far, but I guess there's a person year or so of effort needed, so I'm initially targeting Virgo 4.0. If you feel like lending a hand, please get in touch on virgo-dev.

* - RFC 152 has changed quite a bit since the version in the Enterprise spec early access draft dated 16 May 2011. A new draft is being prepared.

Projects

OSGi (130) Virgo (59) Eclipse (10) Equinox (9) dm Server (8) Felix (4) WebSphere (3) Aries (2) GlassFish (2) JBoss (1) Newton (1) WebLogic (1)