Centreon Troubleshooting Series | Episode 2: Help! Actions on monitored objects aren’t being applied

In the first article of this series, which aims to provide a bounty of hints to solve technical problems faster, you discovered tools and methods to spot and troubleshoot connection issues between pollers and the central server.

In this article, we will be looking at those cases when actions on monitored objects are not being applied in the interface. What’s more annoying than acknowledging an alert, scheduling a maintenance or running an immediate check, and then seeing none of that is happening.

Operating overview

Let’s take a closer look at what happens when users trigger an action through the interface:

Our user is connected to the Centreon web interface and acknowledges an alert (for example).
The Apache Web server (httpd24-httpd) will then communicate with the centreon-gorgone API on the Central server and send it the command to run (in this example: the acknowledgment).
There are then 2 possibilities: if the acknowledgment is for a device monitored by the central server then centreon-gorgone writes in a pipe file opened by the centreon-engine monitoring engine; if the acknowledgment is for a device monitored by a poller, then the central’s centreon-gorgone communicates with the poller’s centreon-gorgone process which also writes in the file opened by the poller’s centreon-engine monitoring engine.
This pipe file is continuously read by the centreon-engine process. As soon as a new command is written in the file, centreon-engine runs it and sends the information back to the central.

External commands: What are they?

External commands refer to all the monitoring actions that can be run by a Centreon user with the web interface, for example:

Acknowledge (an alert): taking an alert into account and muting notifications to avoid alert fatigue;
Set downtime: disabling alerts for a resource over a given period of time;
Check: running an immediate check on a resource and refreshing the monitoring.

There are other actions such as submitting results (submit result) for passive services, etc. You can learn more about monitoring management here.

Solving this issue in Centreon 21.04: With great power comes great responsibility (to quote Spider-Man)

You might get a déjà vu reading this section, but it’s worth the revisit 😉

We’re still on the same platform as in the previous article:

a central server with the main Centreon components as well as the database instances (IP 192.168.56.125)
a poller (IP 192.168.56.126)

In this article, the commands passed by SSH on the servers use root. Here’s where you need to remember that with great power comes great responsibility.

Check 1: Check access to the “gorgoned” API

As seen previously, when a monitoring action is triggered by users from the interface, the action is sent by the Web server to centreon-gorgone via its API.

Let’s first see if the centreon-gorgone process API (‘gorgoned’) is listening on its default TCP/8085 port. To do this you can use the netstat or ss utilities:

[root@centreon-central ~]# netstat -plant | grep 8085
LISTEN     0      5            *:8085                     *:*                   users:(("gorgone-httpser",pid=7855,fd=36))

If this command doesn’t return any results, check that the gorgoned process is currently running on your central server using the following command:

[root@centreon-central ~]# systemctl status gorgoned

```
 gorgoned.service - Centreon Gorgone
```

   Loaded: loaded (/etc/systemd/system/gorgoned.service; enabled; vendor preset: disabled)
   Active: active (running) since jeu. 2021-08-19 15:43:32 CEST; 1min 12s ago
 Main PID: 14468 (perl)
   CGroup: /system.slice/gorgoned.service
           ├─14468 /usr/bin/perl /usr/bin/gorgoned --config=/etc/centreon-gorgone/config.yaml --logfile=/var/log/centreon-gorgone/gorgoned.log --severity=d...
           ├─14476 gorgone-nodes
           ├─14477 gorgone-dbcleaner
           ├─14478 gorgone-autodiscovery
           ├─14479 gorgone-cron
           ├─14480 gorgone-engine
           ├─14511 gorgone-statistics
           ├─14512 gorgone-action
           ├─14513 gorgone-httpserver
           ├─14514 gorgone-legacycmd
           ├─14536 gorgone-proxy
           ├─14537 gorgone-proxy
           ├─14544 gorgone-proxy
           ├─14545 gorgone-proxy
           └─14558 gorgone-proxy
août 19 15:43:32 centreon-central systemd[1]: Started Centreon Gorgone.

This command confirms that the gorgoned process is active/running. In the list of child processes, gorgone-httpserver is the process for the centreon-gorgone API.

If the process isn’t started, you can first try to restart it using the following command:

[root@centreon-central ~]# systemctl restart gorgoned

And if the process still hasn’t started, you’ll have to go and look at the centreon-gorgone log file in /var/log/centreon-gorgone/centreon-gorgone.log.

The most common causes for errors are

SElinux in ENFORCING mode;
Rights on directories and files;
Missing dependency or library, etc.

Check 2: Check our central’s pipe file

Let’s check if a command that needs to be run by central can actually be run.

Previously we saw that centreon-gorgone writes the monitoring action commands in a pipe file which is continuously read by the centreon-engine monitoring engine.

Let’s check that the file exists using the command:

[root@centreon-central ~]# ll /var/lib/centreon-engine/rw/centengine.cmd
prw-rw----. 1 centreon-engine centreon-engine 0 18 août  18:20 /var/lib/centreon-engine/rw/centengine.cmd

This file must have the corresponding rights rw-rw—-, belong to the centreon-engine user, to the centreon-engine group and be a pipe or named pipe identifiable by the first character p in the rights chain.

This file is what’s known as a FIFO (First In/First Out): the writing made to this file are instantly consumed by centreon-engine.

In some cases, the format of this file or its rights may be incorrect. Restarting the monitoring engine is then required for the file to be correctly recreated:

[root@centreon-central ~]# systemctl restart centengine

This file is created by an external module (exactly like cbmod.so): externalcmd.so. If the file is missing, you can check that the module is properly loaded by centreon-engine when it starts by looking in the /var/log/centreon-engine/centengine.log log file:

[root@centreon-central ~]# grep -RIri external /var/log/centreon-engine/centengine.log 
[1629366622] [29381] Event broker module '/usr/lib64/centreon-engine/externalcmd.so' deinitialized successfully
[1629366622] [24389] Event broker module '/usr/lib64/centreon-engine/externalcmd.so' initialized successfully
You can also check its presence in the centreon-engine configuration, as well as the proper loading of the module:
[root@centreon-central ~]# grep -RIri external /etc/centreon-engine/centengine.cfg 
broker_module=/usr/lib64/centreon-engine/externalcmd.so
check_external_commands=1

If you can’t see the module loaded or present in the centeron-engine configuration, you can check the collector configuration in the Configuration > Collectors > Collection engine configuration menu by clicking on the central configuration and going to the Data tab.

This configuration must include a command that points to:

/usr/lib64/centreon-engine/externalcmd.so:

If this configuration is missing, add it, referring to the screenshot above. Save the form and then export the new central configuration by choosing the “Restart” method:

To test that everything’s working properly, you can, for example, run an immediate check from the Centreon interface for a service linked to a host monitored by your central server.

In parallel, check log files /var/log/centreon-gorgone/centreon-gorgone.log and /var/log/centreon-engine/centengine.log, the action in question should appear as follows:

In /var/log/centreon-gorgone/centreon-gorgone.log:

2021-08-19 17:17:32 - INFO - [engine] Processing external command '[1629386252] SCHEDULE_FORCED_SVC_CHECK;HQ-FW-Inet;Ping;1629386252'

In /var/log/centreon-engine/centengine.log:

[1629386253] [24389] EXTERNAL COMMAND: SCHEDULE_FORCED_SVC_CHECK;HQ-FW-Inet;Ping;1629386252

Check 3: On our poller, check the Gorgone communication

Now that you’ve validated that monitoring actions are properly operating on the central server, let’s check that these same actions are also running on a device monitored by the poller.

When monitoring through a poller, the Apache server doesn’t communicate directly with the poller, first transferring the command to the central’s gorgone process. This process then takes charge of dispatching and transferring the commands to the poller(s) in charge of the host on which you want to run the action.

Let’s first check that the communication between the centreon-gorgone processes of the central with the poller at Gorgone level is effective.

To do that, we need to make sure that the ZMQ/TCP5556 communication is established between the 2 servers (the poller plays the role of a server: it must listen on the port in question, the central plays the role of client and connects to it):

[root@centreon-poller ~]# ss -plantu | grep 5556
tcp    LISTEN     0      100       *:5556                  *:*                   users:(("gorgone-action",pid=1991,fd=36),("gorgone-engine",pid=1990,fd=36),("gorgone-dbclean",pid=1989,fd=36),("perl",pid=1976,fd=36))
tcp    ESTAB      0      0      192.168.56.126:5556               192.168.56.125:45960               users:(("perl",pid=1976,fd=41))

If in this case you get a TIME-WAIT or you don’t see a socket open on the poller’s TCP/5556 port, check the /var/log/centreon-gorgone/gorgoned.log logs on it:

[root@centreon-poller ~]# tailf /var/log/centreon-gorgone/gorgoned.log
2021-08-19 15:09:45 - ERROR - [core] Client pubkey is not authorized. Thumbprint is 'XzVJ5kbmfxYqktqKLLTF62fbf3_qdHn1fH7HLtPj_a8'

In the previous example, the communication can’t be established because of a client (the central server) authentication problem. Central’s thumbprint is not in the poller’s trusted client list.

The easiest thing to do in this case is to retrieve the gorgone configuration of your poller again using the official documentation available on this link.

You can also refer to “Check 3” in the previous article in this series.

(everything is connected in the Centreon Cinematic Universe)

Check 4: Check the pipe file on our poller

Just like the central server, the poller must also load the external module allowing it to read and run external commands, so we can rely on the same check items as for the central server:

Check that the externalcmd.so module is loaded at startup;
Check that the module is present in the configuration;
Otherwise add it to the centreon-engine configuration and then export the configuration with a “restart” of your poller;

Ensure the entire chain is working properly

On the Centreon interface, let’s now launch an immediate check on a resource monitored by the poller in order to validate the monitoring action is running properly:

As before, if we check the /var/log/centreon-gorgone/centreon-gorgone.log log file on the central, we should be able to see the action in question:

[root@centreon-central ~]# tailf /var/log/centreon-gorgone/gorgoned.log
2021-08-19 15:43:52 - INFO - [legacycmd] Handling command 'EXTERNALCMD', Target: '3', Parameters: '[1629388009] SCHEDULE_FORCED_SVC_CHECK;HQ-FW-Inet;Ping;1629388009'

This time, we can see that the message is slightly different when a poller controls the resource: the Target instance: ‘3′ is the target Poller ID, the one to which the central gorgone process will forward the request.

Poller side, you can find the external command in the gorgone logs:
[root@centreon-poller ~]# tailf /var/log/centreon-gorgone/gorgoned.log
2021-08-19 15:44:01 - INFO - [engine] Processing external command '[1629388009] SCHEDULE_FORCED_SVC_CHECK;HQ-FW-Inet;Ping;1629388009'

Finally, let’s check the centreon-engine log file on the poller:

[root@centreon-poller ~]# tailf /var/lib/centreon-engine/centengine.log
[1629388012] [2032] EXTERNAL COMMAND: SCHEDULE_FORCED_SVC_CHECK;SV-WINDO-PAR;Ping;1629388009

Mission accomplished!

You’re now better equipped to understand and troubleshoot monitoring actions.

If this still isn’t enough, you can increase the verbosity of the centreon-gorgone log by adjusting the /etc/sysconfig/gorgoned file to change the –severity=info option to –severity=debug. This option can be defined on the different centreon-gorgone processes, both on the central server and on the pollers.

You will then need to reload the systemd configuration and restart centreon-gorgone:

[root@centreon-poller ~]# systemctl daemon-reload
[root@centreon-poller ~]# systemctl restart gorgoned

Remember to disable debug when you’ve finished troubleshooting ;-))

Finally, feel free to ask your questions on our Slack dedicated to the Centreon Community. We’ll be happy to help!