Sling Health Checks are deprecated - please migrate to Felix Health Checks - see migration guide for instructions
This documentation is kept temporarily for projects that have not yet migrated
Based on simple HealthCheck
OSGi services, the Sling Health Check Tools ("hc" in short form) are used to check the health of live Sling systems, based on inputs like JMX MBean attribute values, OSGi framework information, Sling requests status, etc.
Health checks are easily extensible either by configuring the supplied default HealthCheck
services, or by implementing your own HealthCheck
services to cater for project specific requirements.
However for simple setups, the out of the box health checks are often sufficient. Executing Health Checks is a good starting point to run existing checks and to get familiar with how health checks work.
See also:
Generally health checks have two high level use cases:
The strength of Health Checks are to surface internal Sling state for external use:
The health check subsystem uses tags to select which health checks to execute so you can for example execute just the performance or security health checks once they are configured with the corresponding tags.
The out of the box health check services also allow for using them as JMX aggregators and processors, which take JMX attribute values as input and make the results accessible via JMX MBeans.
HealthCheck
?A HealthCheck
is just an OSGi service that returns a Result
.
public interface HealthCheck {
/** Execute this health check and return a {@link Result}
* This is meant to execute quickly, access to external
* systems, for example, should be managed asynchronously.
*/
public Result execute();
}
Where Result
is a simple immutable class that provides a Status
(OK, WARN, CRITICAL etc.) and one or more log-like messages that can provide more info about what, if anything, went wrong.
public class Result implements Iterable <ResultLog.Entry> {
public boolean isOk() {
return getStatus().equals(Status.OK);
}
public Status getStatus() {
return resultLog.getAggregateStatus();
}
@Override
public Iterator<ResultLog.Entry> iterator() {
return resultLog.iterator();
}
... details omitted
}
The SlingHealthCheck
annotation makes it easier to specify the required HealthCheck
service properties.
Here's an example from the samples
module - see the annotations
module for more details.
@SlingHealthCheck(
name="Annotated Health Check Sample",
mbeanName="annotatedHC",
description="Sample Health Check defined by a java annotation",
tags={"sample","annotation"})
public class AnnotatedHealthCheckSample implements HealthCheck {
@Override
public Result execute() {
...health check code
}
}
Health Checks can be executed via a webconsole plugin, the health check servlet or via JMX. HealthCheck
services can be selected for execution based on their hc.tags
multi-value service property.
The HealthCheckFilter
utility accepts positive and negative tag parameters, so that -security,sling
selects all HealthCheck
having the sling
tag but not the security
tag, for example.
For advanced use cases it is also possible to use the API directly by using the interface org.apache.sling.hc.api.execution.HealthCheckExecutor
.
The Health Check subsystem consists of the following bundles:
org.apache.sling.hc.api
which provides the API and org.apache.sling.hc.core
which provides some utility classes and some generally useful HealthCheck
services (e.g. the health check executor)org.apache.sling.hc.support
provides more Sling-specific HealthCheck
services.org.apache.sling.hc.webconsole
provides the Webconsole plugin described below.org.apache.sling.junit.healthcheck
provides a HealthCheck
service that executes JUnit tests in the server-side OSGi context.org.apache.sling.hc.samples
provides sample OSGi configurations and HealthCheck
services. The sample configurations are provided as Sling content, so the Sling Installer is required to activate them.org.apache.sling.hc.junit.bridge
makes selected Health Checks available as server-side JUnit tests. See below for more info.HealthCheck
servicesThe following default HealthCheck
services are provided by the org.apache.sling.hc.core
bundle:
The org.apache.sling.hc.samples
bundle provides OSGi configurations that demonstrate them.
JmxAttributeHealthCheck
checks the value of a single JMX attribute and supports ranges like between 12 and 42.ScriptableHealthCheck
evaluates an expression written in any scripting language that Sling supports, and provides bindings to access JMX attributes.CompositeHealthCheck
executes a set of HealthCheck
selected by tags, useful for creating higher-level checks.A few more Sling-specific ones are provided by the org.apache.sling.hc.support
bundle:
SlingRequestStatusHealthCheck
checks the HTTP status of Sling requests.DefaultLoginsHealthCheck
can be used to verify that the default Sling logins fail.ThreadUsageHealthCheck
can be used to monitor for deadlocks using JRE ThreadMXBean (see SLING-6698 )A bridge to server-side OSGi-aware JUnit tests is provided by the JUnitHealthCheck
, from the org.apache.sling.junit.healthcheck
bundle.
The org.apache.sling.hc.samples
bundle provides an example OsgiScriptBindingsProvider
for the default ScriptableHealthCheck
, which provides OSGi-related information to health check script expressions.
HealthCheck
services are created via OSGi configurations. Generic health check service properties are interpreted by the health check executor service. Custom health check service properties can be used by the health check implementation itself to configure its behaviour.
The following generic Health Check properties may be used for all checks:
Property | Type | Description |
---|---|---|
hc.name | String | The name of the health check as shown in UI |
hc.tags | String[] | List of tags: Both Felix Console Plugin and Health Check servlet support selecting relevant checks by providing a list of tags |
hc.mbean.name | String | Makes the HC result available via given MBean name. If not provided no MBean is created for that HealthCheck |
hc.async.cronExpression | String | Used to schedule the execution of a HealthCheck at regular intervals, using a cron expression as specified by the Sling Scheduler module. |
hc.resultCacheTtlInMs | Long | Overrides the global default TTL as configured in health check executor for health check responses (since v1.2.6 of core) |
hc.warningsStickForMinutes | Long | This property will make WARN/CRITICAL results stay visible for future executions, even if the current state has returned to status OK. It is useful to keep attention on issues that might still require action after the state went back to OK, e.g. if an event pool has overflown and some events might have been lost (since v1.2.10 of core) |
All service properties are optional.
As an example, here's a ScriptableHealthCheck
configuration provided by the org.apache.sling.hc.samples
bundle:
Factory PID = org.apache.sling.hc.ScriptableHealthCheck
"hc.name" : "LoadedClassCount and ManagementSpecVersion are in range"
"hc.mbean.name" : "LoadedClassCount and ManagementSpecVersion"
"hc.tags" : [jvm, script]
"expression" : "jmx.attribute('java.lang:type=ClassLoading', 'LoadedClassCount') > 10 && jmx.attribute('java.lang:type=Runtime', 'ManagementSpecVersion') > 1"
"language.extension" : "ecma"
The service properties starting with the hc.
prefix in this example should be provided by all HealthCheck
services.
The health check executor can optionally be configured via service PID org.apache.sling.hc.core.impl.executor.HealthCheckExecutorImpl
:
Property | Type | Default | Description |
---|---|---|---|
timeoutInMs | Long | 2000ms | Timeout in ms until a check is marked as timed out |
longRunningFutureThresholdForCriticalMs | Long | 300000ms = 5min | Threshold in ms until a check is marked as 'exceedingly' timed out and will marked CRITICAL instead of WARN only |
resultCacheTtlInMs | Long | 2000ms | Result Cache time to live - results will be cached for the given time |
If the org.apache.sling.hc.webconsole
bundle is active, a webconsole plugin at /system/console/healthcheck
allows for executing health checks, optionally selected based on their tags (positive and negative selection, see the HealthCheckFilter
mention above).
The DEBUG logs of health checks can optionally be displayed, and an option allows for showing only health checks that have a non-OK status.
The screenshot below shows an example.
If the org.apache.sling.hc.jmx
bundle is active, a JMX MBean is created for each HealthCheck
which has the service property hc.mbean.name
service property set. All health check MBeans are registered in the domain org.apache.sling.healthcheck
with a type of HealthCheck
.
The MBean gives access to the Result
and the log, as shown on the screenshot below.
See the example configurations of the org.apache.sling.hc.samples
for more details.
Starting with version 1.2.4 of the org.apache.sling.hc.core
bundle, a flexible Health Checks execution servlet is available. It provides similar features to the Web Console plugin described above, with output in HTML, JSON (plain or jsonp) and TXT (concise or verbose) formats (see HTML format rendering page for more documentation).
The Health Checks Servlet is disabled by default, to enable it create an OSGi configuration like
PID = org.apache.sling.hc.core.impl.servlet.HealthCheckExecutorServlet
servletPath = /system/health
which specifies the servlet's base path. That URL then returns an HTML page, by default with the results of all active health checks and with instructions at the end of the page about URL parameters which can be used to select specific Health Checks and control their execution and output format.
Note that by design the Health Checks Servlet doesn't do any access control by itself to ensure it can detect unhealthy states of the authentication itself. Make sure the configured path is only accessible to relevant infrastructure and operations people. Usually all /system/*
paths are only accessible from a local network and not routed to the Internet.
By default the HC servlet sends the CORS header Access-Control-Allow-Origin: *
to allow for client-side browser integrations. The behaviour can be configured using the OSGi config property cors.accessControlAllowOrigin
(a blank value disables the header).