We've been using PRTG for network monitoring for some time and now evaluating expanding the system as a full replacement for our legacy monitoring system which monitors application servers, backend processes, and everything non-network device.
Unlike PRTG which operates as a set of sensor that are responsible for running the checks, this legacy system listens on a port to receive status updates for monitoring objects (and uses this mechanism internally for the checks that are run from the monitoring server itself). This allows us to hook almost anything into the "stoplight" monitoring system such as a backup job or any custom script. Basically, if script fails it outputs red and if it succeeds green. We have a lot of custom logic in external application monitoring that relies on this. "Stoplight" monitoring can be integrated into PRTG via an API call instead of having the sensor perform the check.
However, there is an additional status in our current monitoring system (purple) that indicates that a monitoring object has not received an update. When a remote processes sends an update to legacy monitoring server, it also sends along a validity period (eg. green+60 for "be green for 60 minutes"). After that validity period expires, the legacy monitoring system will start alerting that it's not receiving updates. This is immensely helpful in knowing when batch processes and other jobs not just fail but are hung or don't run at all.
As this is a show-stopping critical function of our monitoring strategy, how would we replicate this process using PRTG?
Add comment