Monitoring Petals ESB with Prometheus

Version 32 by Pierre Souquet
on Oct 05, 2018 16:07.

<< Changes from 31 to 32

compared with

Version 33 by Pierre Souquet
on Oct 12, 2018 11:31.

Changes from 33 to 34 >>

Key

This line was removed.

This word was removed. This word was added.

This line was added.

Changes (1)

View Page History

...

{section}{column}
[Prometheus|https://prometheus.io] is an open-source systems monitoring and alerting toolkit.

!https://prometheus.io/assets/architecture.png|border=1,width=700!

Prometheus scrapes metrics from instrumented jobs, either directly or via an intermediary push gateway for short-lived jobs. It stores all scraped samples locally and runs rules over this data to either aggregate and record new time series from existing data or generate alerts. Grafana or other API consumers can be used to visualize the collected data.

Prometheus's main features are:

* a multi-dimensional data model with time series data identified by metric name and key/value pairs
* a flexible query language to leverage this dimensionality
* no reliance on distributed storage; single server nodes are autonomous
* time series collection happens via a pull model over HTTP
* pushing time series is supported via an intermediary gateway
* targets are discovered via service discovery or static configuration
* multiple modes of graphing and dashboarding support

{column}
{column:width=350px}{panel:title=Table of contents}{toc:outline=true}{panel}
{panel:title=Contributors}{contributors:order=name|mode=list|showAnonymous=true|showCount=true|showLastTime=true}{panel}{column}
{section}

h1. Connecting to Petals JMX

Petals exposes its metrics on [JMX|https://en.wikipedia.org/wiki/Java_Management_Extensions], but Prometheus itself cannot natively gather metrics through JMX. So we need to expose those metrics on HTTP, which Prometheus can access.   
Luckily Prometheus maintains [jmx_exporter|https://github.com/prometheus/jmx_exporter] which exposes metrics on HTTP. It can either act as a java agent injected into the [JVM|https://en.wikipedia.org/wiki/Java_virtual_machine] during Petals ESB startup or an independent server connecting to Petals ESB by [RMI|https://en.wikipedia.org/wiki/Java_remote_method_invocation].

h2. Installing jmx_exporter as java agent:

From Prometheus documentation:

{quote}
_JMX to Prometheus exporter: a collector that can configurably scrape and expose MBeans of a JMX target._
_This exporter is intended to be run as a Java Agent, exposing a HTTP server and serving metrics of the local JVM. It can be also run as an independent HTTP server and scrape remote JMX targets, but this has various disadvantages, such as being harder to configure and being unable to expose process metrics (e.g., memory and CPU usage). Running the exporter as a Java Agent is thus strongly encouraged._
{quote}

* Copy *jmx_prometheus_javaagent-XXX.jar* in _petals-esb-directory/lib_ folder
* Create a yaml config file in _petals-esb-directory/conf_ folder, here it is named *prometheus-jmx.yaml*. The file can be empty for now, but this default config will display everything available:{code}
startDelaySeconds: 0
rules:
-pattern: ".*"
{code}
* Add the following line to *petals-esb.sh*, just before the “_exec_” command at the very end of the script. If necessary, change the version number to match the jar file you downloaded. _8585_ is the port number on which HTTP metrics will be exposed (once gathered by the jmx_exporter), set is as you see fit.{code}JAVA_OPTS="$JAVA_OPTS -javaagent:${PETALS_LIB_DIR}/jmx_prometheus_javaagent-0.3.1.jar=8585:${PETALS_CONF_DIR}/prometheus-jmx.yaml"{code}
* Run _petals-esb.sh_
* Metrics are available at *[http://localhost:8585/metrics|http://localhost:8484/metrics*]*

+?+{+}Raw metrics sample:+

!Screenshot from 2018-09-24 16-56-28.png!

h2. Alternate jmx_exporter install: as HTTP server

* Download [jmx_prometheus_httpserver|https://mvnrepository.com/artifact/io.prometheus.jmx/jmx_prometheus_httpserver]. Be careful about the confusing version number, check the date to have the last version.
* Adapt the *prometheus-jmx.yaml* config file to connect by RMI. You can use either *jmxUrl* or *hostPort*, *username* and *password* are mandatory. {code}
startDelaySeconds: 0

# jmxUrl: service:jmx:rmi:///jndi/rmi://localhost:7700/PetalsJMX
hostPort: localhost:7700
username: petals
password: petals

rules:
- pattern: ".*"
{code}
* Start the server, with the exposition HTTP *ip:port* and config file as argument: {code}java \-jar jmx_prometheus_httpserver-0.3.1-jar-with-dependencies.jar localhost:8585 prometheus-jmx.yaml{code}

h2. Install Prometheus

* Install: [https://prometheus.io/docs/prometheus/latest/getting_started/|https://prometheus.io/docs/prometheus/latest/getting_started/]
* Configure Prometheus, here is a sample *prometheus.yml* config:{code}
global:
scrape_interval: 5s
evaluation_interval: 5s

scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: 'petals monitoring'

# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
static_configs:
- targets: ['localhost:8585']
labels:
container: 'petals-sample-0'
{code}
* Start Prometheus: {code}./prometheus --config.file=prometheus.yml{code}

h1. Configuring jmx agent

Jmx agent can be configured in its yaml config file.

Note that:
* Only *numeric values* are supported by Prometheus (though string can me interpreted as regexp to extract numeric values)

* Custom and complex objects may not be exported by the exporter, *having ‘\- pattern “.*”’ as only rule will return every metric available\* (useful for testing). 
* Petals ESB container MBeans metrics are all typed as Map, so are ignored by the jmx agent (v0.3.1). As is, *you can monitor some components metrics but cannot monitor container metrics with Prometheus.*

* Rules order is important: Eventually, *a single MBean attributes is processed by a single rule*\! To decide which rule is applied: MBean attributes will be parsed by each rule (in order) until a pattern matches, then this rule is applied to the MBean attribute. In other words, all rules are tested against each MBean attribute the first one to match is kept for the MBean. the attribute. So very specific rules should be put first, and generic/default rules last.

* Prometheus can make extensive use of *labels* through queries to determine *metrics* sources. Think about your needs when designing your labels, more explanations on [the official documentation|https://prometheus.io/docs/concepts/data_model/] or [this blog post.|https://pierrevincent.github.io/2017/12/prometheus-blog-series-part-1-metrics-and-labels/]
* Metrics can be typed (conceptually, as gauge, counter or histogram) for Prometheus to know how to handle them. More details on the [official documentation|https://prometheus.io/docs/concepts/metric_types/].

...

* Metrics format: {code}<metric name>{<label name>=<label value>, ...}{code}

Be careful passing strings as labels (quote from [Pierre Vincent's blog|https://pierrevincent.github.io/2017/12/prometheus-blog-series-part-1-metrics-and-labels/]):

{quote}
A word on label cardinality
Labels are really powerful so it can be tempting to annotate each metric with very specific information, however there are some important limitations to what should be used for labels.

Prometheus considers each unique combination of labels and label value as a different time series. As a result if a label has an unbounded set of possible values, Prometheus will have a very hard time storing all these time series. In order to avoid performance issues, labels should not be used for high cardinality data sets (e.g. Customer unique ids).
{quote}

h2. Configuration samples:

The following samples are produced monitoring a Petals ESB single container topology hosting 3 components (SOAP, REST and Camel).
Raw metrics can be hard to exploit, as the exporter automatically creates metrics:

Wildcard pattern rule:
{code}
rules:
- pattern: ".*"
{code}

Raw metrics sample:
{code}
# metric: java.lang<type=OperatingSystem><>SystemCpuLoad
java_lang_OperatingSystem_SystemCpuLoad 0.10240228944418933

# metric: java.lang<type=OperatingSystem><>ProcessCpuLoad
java_lang_OperatingSystem_ProcessCpuLoad 3.158981547513337E-4

# metrics: org.ow2.petals<type=custom, name=monitoring_petals-(se-camel | bc-soap | bc-rest)><>MessageExchangeProcessorThreadPoolQueuedRequestsMax)
org_ow2_petals_custom_MessageExchangeProcessorThreadPoolQueuedRequestsMax{name="monitoring_petals-se-camel",} 0.0
org_ow2_petals_custom_MessageExchangeProcessorThreadPoolQueuedRequestsMax{name="monitoring_petals-bc-soap",} 0.0
org_ow2_petals_custom_MessageExchangeProcessorThreadPoolQueuedRequestsMax{name="monitoring_petals-bc-rest",} 0.0
{code}

In this case, we cannot know later in Prometheus where the metrics originated or which Petals ESB container is concerned. By adding a few generic rules, we can add label and control the metric names.

h3. Adding generic rules

In this example, the point of our rules is:
* gather _java.lang_ metrics, name the metric with the explicit MBean and label them by type.
* gather component metrics, name the metric with the explicit MBean, and label in a usable way component and type (monitoring or runtime_configuration).

Generic rules samples:
{code}
rules:
- pattern: 'java.lang<type=(.+)><>(.+): (.+)'
name: "$2"
value: "$3"
labels:
type: "$1"

- pattern: 'org.ow2.petals<type=custom, name=monitoring_(.+)><>(.+): (.+)'
name: "$2"
value: "$3"
labels:
type: "monitoring"
component: "$1"

- pattern: 'org.ow2.petals<type=custom, name=runtime_configuration_(.+)><>(.+): (.+)'
name: "$2"
value: "$3"
labels:
type: "runtime_config"
component: "$1"
{code}

Metrics parsed by generic rules:
{code}
ProcessCpuLoad{type="OperatingSystem",} 2.5760609293017057E-4
SystemCpuLoad{type="OperatingSystem",} 0.10177234194298118

MessageExchangeProcessorThreadPoolQueuedRequestsMax{component="petals-bc-soap",type="monitoring",} 0.0
MessageExchangeProcessorThreadPoolQueuedRequestsMax{component="petals-se-camel",type="monitoring",} 0.0
MessageExchangeProcessorThreadPoolQueuedRequestsMax{component="petals-bc-rest",type="monitoring",} 0.0
{code}

h3. Adding specific rules

And you can go further by adding rules for specific MBeans. Here we will
* group *SystemCpuLoad* and *ProcessCpuLoad* as a single metric.
* rename *MessageExchangeProcessorThreadPoolQueuedRequestsMax* into a shorter metric, while keeping the full name as label and helper.

{code}
- pattern: 'java.lang<type=OperatingSystem><>SystemCpuLoad: (.*)'
name: CpuLoad
value: "$1"
labels:
type: "OperatingSystem"
target: "system"

- pattern: 'java.lang<type=OperatingSystem><>ProcessCpuLoad: (.*)'
name: CpuLoad
value: "$1"
labels:
type: "OperatingSystem"
target: "process"

- pattern: 'org.ow2.petals<type=custom, name=monitoring_(.+)><>MessageExchangeProcessorThreadPoolQueuedRequestsMax: (.+)'
name: "MEPTP_QueuedRequests_Max"
value: "$2"
help: "MessageExchangeProcessorThreadPoolQueuedRequestsMax"
labels:
type: "monitoring"
mbean: "MessageExchangeProcessorThreadPoolQueuedRequestsMax"
component: "$1"
{code}

Metrics parsed by advanced rules:
{code}
CpuLoad{target="system",type="OperatingSystem",} 0.10234667681404555
CpuLoad{target="process",type="OperatingSystem",} 2.655985589352835E-4

MEPTP_QueuedRequests_Max{component="petals-bc-soap",mbean="MessageExchangeProcessorThreadPoolQueuedRequestsMax",type="monitoring",} 0.0
MEPTP_QueuedRequests_Max{component="petals-se-camel",mbean="MessageExchangeProcessorThreadPoolQueuedRequestsMax",type="monitoring",} 0.0
MEPTP_QueuedRequests_Max{component="petals-bc-rest",mbean="MessageExchangeProcessorThreadPoolQueuedRequestsMax",type="monitoring",} 0.0
{code}

You can mix generic and specific patterns, but remember that they are applied in order, so *always put specific rules first\!*

h1. Configuring Prometheus

h2. Configuration file

Prometheus can be configured to connect statically or dynamically to metrics sources, these configurations are under the *scrape_configs* section of the yaml config file.
Depending on how you manage you machines, Prometheus can be connecter dynamically to several services systems including: Azure, Consul, EC2, OpenStack, GCE, Kubernetes, Marathon, AirBnB's Nerve, Zookeeper Serverset, Triton.
You can also rely on a DNS-based service discovery system allowing specifying a set of DNS domain names which are periodically queried to discover a list of targets.

Here we are going to demonstrate static configuration ([static_configs|https://prometheus.io/docs/prometheus/latest/configuration/configuration/#%3Cstatic_config%3E]), specifying a set of targets with direct connection. Note that they can be factored in a file, using [file_std_config|https://prometheus.io/docs/prometheus/latest/configuration/configuration/#%3Cfile_sd_config%3E] )

For the following sample, we are connecting to 2 petals container instances, both are running locally on ports 8585 and 8686.
Sample static config:
{code}
scrape_configs:
- job_name: 'petals monitoring'

static_configs:

- targets: ['localhost:8585']
labels:
container: 'petals-sample-0'

- targets: ['localhost:8686']
labels:
container: 'petals-sample-1'
{code}
We are labeling each one individually, to help differentiating them. Prometheus will add the labels, job-names from the config and also an instance one for each source. This produces in Prometheus interface the following metrics (keeping on with our previous examples):
{code}
CpuLoad{container="petals-sample-0",instance="localhost:8585",job="petals monitoring",target="process",type="OperatingSystem"} 0.007285089849441476
CpuLoad{container="petals-sample-0",instance="localhost:8585",job="petals monitoring",target="system",type="OperatingSystem"} 0.2049538610976202
CpuLoad{container="petals-sample-1",instance="localhost:8686",job="petals monitoring",target="process",type="OperatingSystem"} 0.022037218413320275
CpuLoad{container="petals-sample-1",instance="localhost:8686",job="petals monitoring",target="system",type="OperatingSystem"} 0.22624877571008814

MEPTP_QueuedRequests_Max{component="petals-bc-rest",container="petals-sample-0",instance="localhost:8585",job="petals monitoring",mbean="MessageExchangeProcessorThreadPoolQueuedRequestsMax",type="monitoring"} 0
MEPTP_QueuedRequests_Max{component="petals-bc-rest",container="petals-sample-1",instance="localhost:8686",job="petals monitoring",mbean="MessageExchangeProcessorThreadPoolQueuedRequestsMax",type="monitoring"} 0
MEPTP_QueuedRequests_Max{component="petals-bc-soap",container="petals-sample-0",instance="localhost:8585",job="petals monitoring",mbean="MessageExchangeProcessorThreadPoolQueuedRequestsMax",type="monitoring"} 0
MEPTP_QueuedRequests_Max{component="petals-bc-soap",container="petals-sample-1",instance="localhost:8686",job="petals monitoring",mbean="MessageExchangeProcessorThreadPoolQueuedRequestsMax",type="monitoring"} 0
MEPTP_QueuedRequests_Max{component="petals-se-camel",container="petals-sample-0",instance="localhost:8585",job="petals monitoring",mbean="MessageExchangeProcessorThreadPoolQueuedRequestsMax",type="monitoring"} 0
MEPTP_QueuedRequests_Max{component="petals-se-camel",container="petals-sample-1",instance="localhost:8686",job="petals monitoring",mbean="MessageExchangeProcessorThreadPoolQueuedRequestsMax",type="monitoring"} 0
{code}

There is also the option to define multiple instances in the same targets list:
{code}
scrape_configs:
- job_name: 'petals monitoring'

static_configs:

- targets: ['localhost:8585','localhost:8686']
labels:
container: 'petals-samples'

{code}
{code}
CpuLoad{container="petals-samples",instance="localhost:8585",job="petals monitoring",target="process",type="OperatingSystem"} 0.007285089849441476
CpuLoad{container="petals-samples",instance="localhost:8585",job="petals monitoring",target="system",type="OperatingSystem"} 0.2049538610976202
CpuLoad{container="petals-samples",instance="localhost:8686",job="petals monitoring",target="process",type="OperatingSystem"} 0.022037218413320275
CpuLoad{container="petals-samples",instance="localhost:8686",job="petals monitoring",target="system",type="OperatingSystem"} 0.22624877571008814
{code}

More information on ??[Prometheus documentation|https://prometheus.io/docs/prometheus/latest/configuration/configuration/]

h2. Reload configuration

If the scraping configuration is not set dynamically, you can change the configuration and make Prometheus reload the file.

h3. Remote command

There are two ways to ask Prometheus to reload it's configuration remotely:

Send a *SIGHUP*: determine the [process id|https://www.digitalocean.com/community/tutorials/how-to-use-ps-kill-and-nice-to-manage-processes-in-linux] of Prometheus (look in _'var/run/prometheus.pid'_, or use tools as '_pgrep'_, '_ps aux \| grep prometheus'_). Then use the kill command to send the signal:

{code}kill -HUP 1234{code}

Or, send a *HTTP POST* to the Prometheus web server _'/-/reload'_ handler:

{code}curl -X POST http://localhost:9090/-/reload{code}

Note: as of Prometheus 2.0, to reload over HTTP the_ '--web.enable-lifecycle'\_ command line flag must be set.

In any case, Prometheus should acknowledge the reload:
{code}
level=info ts=2018-10-01T14:57:17.292032129Z caller=main.go:624 msg="Loading configuration file" filename=prometheus.yml
level=info ts=2018-10-01T14:57:17.293868363Z caller=main.go:650 msg="Completed loading of configuration file" filename=prometheus.yml
{code}

h3. File configuration

As mentioned in the [documentation on file configuration|https://prometheus.io/docs/prometheus/latest/configuration/configuration/#%3Cfile_sd_config%3E], using this method will allow to reload automatically and periodically.
{quote}
Changes to all defined files are detected via disk watches and applied immediately. Files may be provided in YAML or JSON format. Only changes resulting in well-formed target groups are applied.
\[. . .\]
As a fallback, the file contents are also re-read periodically at the specified refresh interval.
{quote}

h1. Visualizing monitored metrics

h2. Prometheus API

Prometheus server is reachable through its [HTTP API|https://prometheus.io/docs/prometheus/latest/querying/api]. It allows to directly query metrics and can be useful in specific cases.

For instance, by requesting */api/v1/targets* you can get an overview of configured targets and their health in json format.
request: {code}curl -X GET http://localhost:9090/api/v1/targets{code}
response: [^prometheus_get-api-targets.json].

However there are simpler solutions, as a web UI already build in Prometheus server or open sources softwares natively compatible with this API (like [Grafana|https://grafana.com]).

h2. Prometheus web UI

This UI is accessible connecting to prometheus server */graph*, in our example:
{code}http://localhost:9090/graph{code}

This web UI allows you to enter any expression and see its result either in a table or graphed over time. This is primarily useful for ad-hoc queries and debugging.
But you can also view various prometheus server configuration (targets, rules, alerts, services discovery, etc...).

h2. Grafana

h3. Installing

[Grafana|http://grafana.com/] installation is documented on [Grafana website|http://docs.grafana.org/installation/] and setup for prometheus is documented on [Prometheus website|https://prometheus.io/docs/visualization/grafana/]. It is advised to rely on these sources for an up to date installation.

* In short, install and run as standalone:
{code}
wget https://s3-us-west-2.amazonaws.com/grafana-releases/release/grafana-5.2.4.linux-amd64.tar.gz
tar -zxvf grafana-5.2.4.linux-amd64.tar.gz
cd grafana-5.2.4
./bin/grafana-server web
{code}

* Or as package:
Add the following line to your */etc/apt/sources.list* file (even if you are on Ubuntu or another Debian version).
{code}
deb https://packagecloud.io/grafana/stable/debian/ stretch main
{code}
Then run:
{code}
curl https://packagecloud.io/gpg.key | sudo apt-key add -
sudo apt-get update
sudo apt-get install grafana
sudo service grafana-server start
{code}

By default, Grafana web UI is available at *localhost:3000*, the default user is *admin/admin*.

h3. Connecting to Prometheus

For an exhaustive documentation go to [Grafana website|http://docs.grafana.org/features/datasources/prometheus/]

In short, once logged as admin:
# Open the side menu by clicking the Grafana icon in the top header.
# In the side menu under the Dashboards link you should find a link named Data Sources.
# Click the + Add data source button in the top header.
# Select _Prometheus_ from the Type dropdown.
# Give a name to the data source
# Set the URL of prometheus server, in our example the default: localhost:9090
# Clic _Save & Test_

*Note:* Grafana data sources can also be [configured by files|http://docs.grafana.org/administration/provisioning/#datasources]

h3. Creating a graph

Follow instructions from [Grafana documentation|http://docs.grafana.org/guides/getting_started/] to create a dashboard and add panels.

While editing a graph, in the metrics tab, you can use the same queries tested in Prometheus. For instance:
{code}
CpuLoad{container="petals-sample-0"}
{code}
Will display the _CpuLoad_ metric only for _petals-sample-0_ container:

!Screenshot from 2018-10-03 16-44-25.png!

You can add different panel types that suit your need ton create your own tailored dashboard:

? !Screenshot from 2018-10-05 15-48-54.png|width=1112!

Wiki

All versions of Petals ESB

Helpful links

Monitoring Petals ESB with Prometheus

Changes (1)