Theodore Baschak

BOFH. Open Source Guru. Founder/Operator of Hextet Systems, AS395089 and Network Architect for Daemon Defense Systems, AS55101.

Service Status via SaltStack 2014.7 with Nagios

Fri, 21 Nov 2014 23:14:59 -0600 » Nerd Projects, Nagios, Network Monitoring, CLI, Programming, Virtualization, SaltStack, System Administration

One of the most exciting new features to me in SaltStack 2014.7 is the nagios module. This module supports remote execution of nagios-plugins on your minions. It can also execute pre-defined lists of checks and targets defined (and targeted) in a Pillar. Jinja templating and Grains can of course be used as well, making for an extremely versatile monitoring and testing solution.

I haven’t implemented anything crazy with it yet, but I am really seeing the power of this. A couple ideas off the top of my head:

  • Service configuration best practices checklists
  • Distributed connectivity tests
  • Distributed latency reporting

I would like to build a very simple internal help desk service status page with no Nagios back-end requirement, perhaps even just running out of a */5 * * * * ... cron with the JSON output from salt --out=json -s -G roles:monitoring nagios.run_all_pillar nagios_test redirected to a file in a web directory, then displayed to help desk techs via jQuery/HTML5 generated page with nice green/yellow/red statuses for each item monitored based on the return codes of the checks.

I have made a documentation pull request to SaltStack, their example pillar on the nagios module docs doesn’t fully work as exists in the docs right now.

** Update: ** My doc changes have been committed! I’ve got one commit in SaltStack!

Below is a working simple sample nagios check Pillar and some output from when it is run.


    - nagios


    - check_icmp:
    - check_icmp:
    - check_load: -w 0.8 -c 1
    - check_apt

This check Pillar can then be run with salt nagios.\* nagios.run_pillar nagios_test, and produces output like below:
            APT OK: 0 packages available for upgrade (0 critical updates).
            OK - load average: 0.09, 0.13, 0.13|load1=0.090;0.800;1.000;0; load5=0.130;0.800;1.000;0; load15=0.130;0.800;1.000;0;
            OK - rta 30.415ms, lost 0%|rta=30.415ms;200.000;500.000;0; pl=0%;40;80;; rtmax=30.718ms;;;; rtmin=30.226ms;;;;
            OK - rta 1.328ms, lost 0%|rta=1.328ms;200.000;500.000;0; pl=0%;40;80;; rtmax=1.425ms;;;; rtmin=1.247ms;;;;
© Theodore Baschak - - Powered by Jekyll.
Powered by is a personal website. Opinions expressed are not necessarily those of his employer.