Apache

Content Distribution

Overview

The Sling Content Distribution module main goal is allowing distribution of content (Sling resources) among different Sling instances. The term "distribution" here means the ability of picking one or more resources on a certain Sling instance in order to copy and persist them onto another Sling instance. The Sling Content Distribution module is able to distribute content by:

  • "pushing" from Sling instance A to Sling instance B
  • "pulling" from Sling instance B to Sling instance A
  • "synchronizing" Sling instances A and B via a (third) coordinating instance C

Bundles

The Sling Content Distribution module consists of the following bundles:

  • org.apache.sling.distribution.api: this is where the APIs are defined
  • org.apache.sling.distribution.core: this is where the basic infrastructure for distributing content is implemented
  • org.apache.sling.distribution.kryo-serializer: Kryo based distribution package serializer
  • org.apache.sling.distribution.avro-serializer: Apache Avro based distribution package serializer
  • org.apache.sling.distribution.sample: this is a set of sample configurations and implementations for demo purpose
  • org.apache.sling.distribution.it: this is the integration testing suite

Design

The Sling Content Distribution aims to be: Reliable, simple and extensible.

Reliability means that the system should be able to keep working also in presence of failures regarding I/O, network, etc. An example of such problems is when pushing content from instance A to instance B fails because B is unreachable: in such scenarios instance A should be able to keep pushing (pulling, etc.) content to other instances seamlessly. Another example is when delivery of a certain content (package) fails too many times the distribution module should be able to either drop it or move it into a different "bucket" of failed items. Simplicity means that this module should be able to accomplish its tasks by providing clear, minimal and easy to use APIs together with smart but not overly complicated or "hacky" implementations (see "Simple software is hard"). Extensibility means that the Sling Content Distribution module provides a set of APIs for distributing resources where each component coming into place during the distribution lifecycle can be extended or totally replaced.

A distribution request represents the need of aggregating some resources and to copy them from / to another Sling instance. Such requests are handled by agents that are the main entry point for working with the distribution module. Each agent distributes content from one or more sources to one or more targets, such distribution can be triggered by:

  • "pushing" the content to the (remote) target instances
  • "pulling" content from the (remote) source instances
  • "coordinating" instances, that is they are used to synchronize multiple instances by having them as both sources and targets

An agent is capable of handling a certain distribution request by creating one or more packages of resources out of it from the source(s), dispatching such packages to one or more queues and of processing such queued packages by persisting them into the target instance(s).

The process of creating one or more packages is called exporting as such operation may either happen locally to the agent (the "push" scenario) or remotely (the "pull" scenario).

The process of persisting one or more packages is called importing as such operation may either happen locally (the "pull" scenario) or remotely (the "push" scenario).

In order to properly handle large number of requests against the same agent each of them is provided with queues where the exported packages are sent, the agent takes then care to process such a queue in order to import each package.

Distribution agents configuration

Distribution agents configurations are proper OSGi configurations (backed by nodes of type sling:OsgiConfig in the repository).

There are specialized factories for each supported scenario:

For example a "forward" agent can be defined specifying

  • The name of the agent (name property)
  • The sub service name used to access content and build packages (serviceName property)
  • The endpoints where the packages are to be imported (packageImporter.endpoints property)

The sample package contains endpoints for exposing configuration for distribution agents. The DistributionConfigurationResourceProviderFactory is used to expose agent configurations as resources.

{
  "jcr:primaryType": "sling:OsgiConfig",
  "provider.roots": [ "/libs/sling/distribution/settings/agents" ],
  "kind" : "agent"
}

Distribution agents' configurations can be retrieved via HTTP GET:

$ curl -u admin:admin http://localhost:8080/libs/sling/distribution/settings/agents/{agentName}.json

Distribution agents services

Each distribution agent is an OSGi service and is resolved using a Sling Resource Provider who locate it under libs/sling/distribution/services/agents.

The DistributionConfigurationResourceProviderFactory allows one to configure HTTP endpoints to access distribution OSGI configurations. The sample package contains endpoints for exposing distribution agents. The DistributionServiceResourceProviderFactory is used to expose agent services as resources.

{
  "jcr:primaryType": "sling:OsgiConfig",
  "provider.roots": [ "/libs/sling/distribution/services/agents" ],
  "kind" : "agent"
}

Distribution agents can be triggered by sending HTTP POST requests to

http://$host:$port/libs/sling/distribution/services/agents/{agentName}

with HTTP parameters action and path.

Distribution queues

In Memory queue

That's a draft implementation using an in memory blocking queue together with a Sling scheduled processor which periodically fetches the first item of each queue and trigger a distribution of such an item. It's not suitable for production as it's currently not persisted and therefore restarting the bundle / platform would not keep the queue together with its items.

Sling Job Handling based queue

That's a queue implementation based on the queues and jobs provided by Sling Event bundle. Each item addition to a queue triggers the creation of a Sling job which will handle the processing of that item in the queue. By default Sling queues for distribution have the following options:

  • ordered
  • with max priority
  • with infinite retries
  • keeping job history

Distribution of packages among queues

Each distribution agent uses a specific queue distribution mechanism, specified via a 'queue distribution strategy', which defines how packages are routed into agent queues. The currently available distribution strategies are

  • single: the agent has one only queue and all the items are routed there
  • priority path: the agent can route a configurable set of paths (note that this configuration is currently global for the system, not per agent) to a dedicated priority queue while all the others go to the default queue
  • error aware: the agent has one default queue for all the items, items failing for a configurable amount of times are either dropped or moved to an error queue (depending on configuration)

Usecases

Forward distribution

In order to configure the "forward" distribution workflow, that transfers content from an author instance to a publish instance:

  • configure a remote importer on publish
  • configure a "forward" agent on author pointing to the url of the importer on publish

Send HTTP POSTrequest to http://localhost:8080/libs/sling/distribution/services/agents/publish with parameters action=ADD and path=/content

Create/update content

$ curl -v -u admin:admin http://localhost:8080/libs/sling/distribution/services/agents/publish -d 'action=ADD' -d 'path=/content/sample1'

Delete content

$ curl -v -u admin:admin http://localhost:8080/libs/sling/distribution/services/agents/publish -d 'action= DELETE' -d 'path=/content/sample1'

Reverse distribution

In order to configure the "reverse" distribution workflow, that transfers content from a publish instance to an author instance: - configure a queue agent on publish to hold the packages that need to be distributed to author - configure a remote exporter on publish that exports package from the queue agent - configure a "reverse" agent on author pointing to the url of the exporter on publish

Send HTTP POSTrequest to http://localhost:8080/libs/sling/distribution/services/agents/publish-reverse with parameters action=PULL

Create/update content

$ curl -u admin:admin http://localhost:8081/libs/sling/distribution/services/agents/reverse -d 'action=ADD' -d 'path=/content/sample1'
$ curl -u admin:admin http://localhost:8080/libs/sling/distribution/services/agents/publish-reverse -d 'action=PULL'

Sync distribution

In order to configure the "sync" distribution workflow, that transfers content from two publish instances via an author instance: - configure a remote exporter on each publish instance - configure a remote importer on each publish instance - configure a "sync" agent on author pointing to the urls of the exporter and importers on publish

Send HTTP POSTrequest to http://localhost:8080/libs/sling/distribution/services/agents/pubsync with parameters action=PULL

Create/update content

$ curl -u admin:admin http://localhost:8081/libs/sling/distribution/services/agents/reverse-pubsync -d 'action=ADD' -d 'path=/content/sample1'
$ curl -u admin:admin http://localhost:8080/libs/sling/distribution/services/agents/pubsync -d 'action=PULL'

Installation

  • install the dependency bundles on all Sling instances
  • install Sling Distribution api, core, samples on all Sling instances

HTTP API

API Requirements

We need to expose APIs for configuring, commanding and monitoring distribution agents.

  • Configuration API should allow:
  • CRUD operations for agent configs
  • Command API (eventually issued to multiple agents at once) should allow:
  • to trigger a distribution request on a specific agent
  • to explicitly create and export a package
  • to explicitly import a formerly created package
  • Monitoring API should allow:
  • inspection to internal queues of distribution agents
  • inspection of commands history

 API endpoints

Configuration API

  • Create config: - POST /libs/sling/distribution/settings/agents
  • Read config - GET /libs/sling/distribution/settings/agents/{agentName}
  • Update config - PUT /libs/sling/distribution/settings/agents/{agentName}
  • Delete config - DELETE /libs/sling/distribution/settings/agents/{agentName}

Command API

  • Distribute - POST /libs/sling/distribution/services/agents/{agentName}
  • Import package - POST /libs/sling/distribution/services/importers/{importerName}
  • Export package - POST /libs/sling/distribution/services/exporters/{exporterName}

Monitoring API

  • Distribution history - GET /libs/sling/distribution/services/agents/{agentName}/log
  • Agent queue inspection - GET /libs/sling/distribution/services/agents/{agentName}/queues

Java API

There is a single entry point in triggering a distribution workflow, via Distributor API.

Distributor.distribute(agentName, resourceResolver, distributionRequest)

Extensions

The following extensions for Apache Sling Content Distribution exist.

Apache Avro serializer

The org.apache.sling.distribution.avro-serializer contains a DistributionContentSerializer based on Apache Avro.

Kryo serializer

The org.apache.sling.distribution.kryo-serializer contains a DistributionContentSerializer based on Kryo.

Ideas for future developments

  • distributed configuration
  • pushing to / pulling from JMS (pros: established pattern for producers/consumers problems, cons: other library / systems involved as a possible PoF)
  • WebSocket support (pros: once established it's bidirectional and therefore also publish can directly push stuff to author)
  • asynchronous import of packages (pros: parallel transport and import, cons: complex management of multiple queues on different publish instances)