Hello Jason,
thank you for your KB-Post.
There's no easy approach for this. PRTG doesn't implement any sort of "cluster-awareness", but some things are possible depending mainly of the sensor types that you're using.
First of all. Does the cluster have a DNS or IP that always points to the active node? If this exists, you should be able to use this address for monitoring, so that the queries would always be done against the "active" node. The problem is, you may not be notified if the failover is suddenly dead, so this alone doesn't solve the issue completely. You should still monitor the nodes in a way that allows you to see that they still 'exist'.
I'm not familiar with the "Pacemaker/Corosync HA cluster", but some clusters will implement API's (or SNMP MIB's) that report the status of all nodes. This usually requires some investigation and trial and error, but if such an API exists for your cluster service, is the most reliable way of knowing the cluster's status (all nodes), because the cluster already needs to "monitor itself" to be able to work.
Lastly, the PRTG implementation will strongly depend on the sensor types that you're using. Many sensor us "internal" error statuses for alerting, in most cases you won't be able to customize these. For some sensor types (that use lookups or limits for alerting), you may be able to customize their behavior/operation.
If you're able to share more concrete details about how you're monitoring this (what sensors) I may be able to provide further input/recommendations.
Best Regards,
Luciano Lingnau [Paessler Support]
Add comment