publishing

Output Rewriting Pipelines (org.apache.sling.rewriter)

The Apache Sling Rewriter is a module for rewriting the output generated by a usual Sling rendering process. Some possible use cases include rewriting or checking all links in an HTML page, manipulating the HTML page, or using the generated output as the base for further transformation. An example of further transformation is to use XSLT to transform rendered XML to some output format like HTML or XSL:FO for generating PDF.

For supporting these use cases, the rewriter uses the concept for a processor. The processor is a component that is injected through a servlet filter into the response. By implementing the Processor interface one is able to rewrite the whole response in one go. A more convenient way of processing the output is by using a so called pipeline; the Apache Sling rewriter basically uses the same concept as the famous Apache Cocoon: an XML based pipeline for further post processing of the output. The pipeline is based on SAX events.

SAX Pipelines

The rewriter allows to configure a pipeline for post processing of the generated response. Depending on how the pipeline is assembled the rewriting process might buffer the whole output in order to do proper post processing - for example this is required if an HTML response is "transformed" to XHTML or if XSLT is used to process the response.

As the pipeline is based on SAX events, there needs to be a component that generates these events and sends them through the pipeline. By default the Sling rendering scripts write to an output stream, so there is a need to parse this output and generate the SAX events.

The first component in the pipeline generating the initial SAX events is called a generator. The generator gets the output from Sling, generates SAX events (XML), and streams these events into the pipeline. The counterpart of the generator is the serializer which builds the end of the pipeline. The serializer collects all incomming SAX events, transforms them into the required response by writing into output stream of the response.

Between the generator and the serializer so called transformers can be placed in a chain. A transformer receives SAX events from the previous component in the pipeline and sends SAX events to the next component in the pipeline. A transformer can remove events, change events, add events or just pass on the events.

Sling contains a default pipeline which is executed for all HTML responses: it starts with an HTML generator, parsing the HTML output and sending events into the pipeline. An HTML serializer collects all events and serializes the output.

You can overwrite the configuration or contribute more specific configurations as outlined below in Configuring a Processor. Only one pipeline is being picked based on the matching configuration with the highest order.

Default Pipeline

The default pipeline is configured for the text/html mime type and the html extensions and consists of the html-generator as the generator, and the html-serializer for generating the final response. As the HTML generated by Sling is not required to be valid XHTML, the HTML parser is using an HTML parser to create valid SAX events. In order to perform this, the generator needs to buffer the whole response first.

Implementing Pipeline Components

Each pipeline component type has a corresponding Java interface (Generator, Transformer, and Serializer) together with a factory interface (GeneratorFactory, TransformerFactory, and SerializerFactory). When implementing such a component, both interfaces need to be implemented. The factory has only one method which creates a new instance of that type for the current request. The factory has to be registered as a service. For example if you're using the Maven SCR plugin, it looks like this:

@scr.component metatype="no" 
@scr.service interface="TransformerFactory"
@scr.property value="pipeline.type" value="validator"

The factory needs to implement the according interface and should be registered as a service for this factory interface (this is a plain service and not a factory service in the OSGi sense). Each factory gets a unique name through the pipeline.type property. The pipeline configuration in the repository just references this unique name (like validator).

Extending the Pipeline

With the possibilities from above, it is possible to define new pipelines and add custom components to the pipeline. However, in some cases it is required to just add a custom transformer to the existing pipeline. Therefore the rewriting can be configured with pre and post transformers that are simply added to each configured pipeline. This allows a more flexible way of customizing the pipeline without changing/adding a configuration in the repository.

The approach here is nearly the same. A transformer factory needs to be implemented, but instead of giving this factory a unique name, this factory is marked as a global factory:

@scr.component metatype="no"
@scr.service interface="TransformerFactory"
@scr.property name="pipeline.mode" value="global"
@scr.property name="service.ranking" value="RANKING" type="Integer"

RANKING is an integer value (don't forget the type attribute otherwise the ranking is interpreted as zero!) specifying where to add the transformer in the pipeline. If the value is less than zero the transformer is added at the beginning of the pipeline right after the generator. If the ranking is equal or higher as zero, the transformer is added at the end of the pipeline before the serializer.

The TransformerFactory interface has just one method which returns a new transformer instance. If you plan to use other services in your transformer you might declare the references on the factory and pass in the instances into the newly created transformer.

Since the transformer carries information about the current response it is not advisable to reuse the same transformer instance among multiple calls of TransformerFactory.createTransformer.

Implementing a Processor

A processor must conform to the Java interface org.apache.sling.rewriter.Processor. It gets initializd (method init) with the ProcessingContext. This context contains all necessary information for the current request (especially the output writer to write the rewritten content to). The getWriter method should return a writer where the output is written to. When the output is written or an error occured finished is called.

Like the pipeline components a processor is generated by a factory which has to be registered as a service factory, like this:

@scr.component metatype="no" 
@scr.service interface="ProcessorFactory"
@scr.property value="pipeline.type" value="uniqueName"

Configuring a Processor

The pipelines can be configured in the repository as a child resource of /apps/<APPNAME>/config/rewriter/* (or /libs/<APPNAME>/config/rewriter/*). (In fact the configured search paths of the resource resolver are observed.) Each resource can have the following properties:

Property	Type	Description	Example Value	Mandatory
`generatorType`	String	The type of the generator. Identifies the generator being registered via service property `pipeline.type` of a service implementing a `GeneratorFactory`	`html-generator`	yes
`transformerTypes`	String[]	The types of the transformers. Identifies the transformers being registered via service property `pipeline.type` of a service implementing a `TransformerFactory`	`link-rewriter` (Sling itself does not contain any TransformerFactories)	no
`serializerType`	String	The type of the serializer. Identifies the serializer being registered via service property `pipeline.type` of a service implementing a `SerializerFactory`	`html-serializer`	yes
`paths`	String[]	The paths this pipeline should run on (content paths). Only if the request's resource path starts with one of the given `paths` or one of the given paths is `*` the pipeline configuration is considered.	`/content/`	no
`contentTypes`	String[]	The content types this pipeline should be used for . If no explicit content type is set on the response yet, `text/html` is assumed. May contain `*` values which match for all content types. Only if the response has one of the given content types the pipeline configuration is considered.	`text/html`	no
`extensions`	String	The extensions this pipeline should be used for. Only if the request's extension is equal to one of the given extensions the pipeline configuration is considered.	`html`	no
`resourceTypes`	String[]	The resource types this pipeline should be used for. Only if the request's resource type is equal (via `ResourceResolver.isResourceType(<request's resource>, <given resource type>`) to one of the given resourceTypes the pipeline configuration is considered.	`myapp/customresourcetype`	no
`unwrapResources`	Boolean	Check resource types of unwrapped resources as well if this is set to `true`. Available since 1.1.0 (SLING-5012).	`false`	no
`selectors`	String[]	A set of selectors the pipeline should be used for. Each value is a single selector (i.e. must not contain `.`). Only if the request contains at least one selector which is equal to one of the given selectors, this pipeline configuration is considered. Available since 1.1.0 (SLING-3511)	`myselector`	no
`order`	Long	The configurations are sorted by this order, order must be higher or equal to 0. The configuration with the highest order is tried first. Default value (if not set): 0	100	no
`enabled`	Boolean	Is this configuration active? (default yes)	`false`	no
`processError`	Boolean	Only if this is set to `true` also error responses are processed by this pipeline configuration. Default `true`	`true`	no

As you can see from the configuration there are several possibilities to define when a pipeline should be used for a response, like paths, extensions, content types, or resource types. It is possible to specify several of them at once. In this case all conditions must be met.

If a component needs a configuration, the configuration is stored in a child node which name is {componentType}-{name}, e.g. to configure the HTML generator (named html-generator), the node should have the name generator-html-generator. In the case that the pipeline contains the same transformer several times, the configuration child node should have the formant {componentType}-{index} where index is the index of the transformer starting with 1. For example if you have a pipeline with the following transformers, xslt, html-cleaner, xslt, link-checker, then the configuration nodes should be named transformer-1 (for the first xslt), transformer-html-cleaner, transformer-3 (for the second xslt), and transformer-link-checker.