I would like to know more about the clustering feature of PRTG. Can you give me an introduction and some basic information about clustering?
This article applies as of PRTG 22
One very helpful feature of PRTG is the cluster. A cluster consists of two or more installations of PRTG that work together to form a highly available monitoring system.
The objective is to reach true 100% percent uptime for the monitoring tool. If you use a cluster, uptime will no longer be degraded by failing connections because of an internet outage at the location of a PRTG core server, failing hardware, or because of downtime due to a software upgrade for the operating system or PRTG itself.
How a cluster works
A cluster consists of one primary master node and one or more failover nodes. Each cluster node is a full installation of PRTG that could perform monitoring and alerting on its own. Cluster nodes are connected to each other using two TCP/IP connections. These connections communicate in both directions and a single cluster node only needs to connect to one other cluster node to integrate into the cluster.
Normal cluster operation
Central configuration, distributed data storage, and central notifications
During normal operation, the primary master node is used to configure devices and sensors. It automatically distributes the configuration to all other cluster nodes in real time. All cluster nodes permanently monitor the network according to this common configuration, and each cluster node stores its results into its own database. This way, the storage of monitoring results is also distributed among the cluster. The downside of this concept is that monitoring traffic and load on the network is multiplied by the number of cluster nodes, but this is not a problem for most usage scenarios.
You can review the monitoring results by logging in to the PRTG web interface of any of the cluster nodes in read-only mode. Because the monitoring configuration is centrally managed, you can change it only on the master node.
If one or more cluster nodes discover downtime or threshold breaches, only the primary master node will send out notifications (for example, via email and SMS). So, you will not be flooded with notifications from all cluster nodes in the event of failures. Additionally, there is the Down (Partial) sensor status, which means that the sensor shows an error on some cluster nodes, but not on all.
Failure cluster operation
- Failure scenario 1
If one or more of the failover nodes are disconnected from the cluster (due to hardware or network failures), the remaining cluster nodes continue to work without disruption.
- Failure scenario 2
If the primary master node is disconnected from the cluster, one of the failover nodes becomes a failover master node. It takes over control of the cluster and also manages notifications until the primary master node reconnects to the cluster and takes back the master role.
Sample cluster configurations
Several cluster scenarios are possible in PRTG.
- Single failover: This is the most common usage of the cluster. Both PRTG core servers monitor the same network. When there is downtime on cluster node 1, cluster node 2 automatically takes over the master role until cluster node 1 is back online.
Click to enlarge.
- Double failover: This is a very advanced failover cluster. Even if two of the cluster nodes fail, network monitoring will still continue with a single cluster node (as failover master node) until the other cluster nodes are back online.
- The following four-node scenario shows one cluster node in disconnected mode. You can disconnect a cluster node at any time for maintenance tasks or to keep a powered-off server on standby in case another cluster node’s hardware fails.
Click to enlarge.
Usage scenarios for the cluster
The cluster is quite versatile and covers the following usage scenarios.
Failover LAN cluster
PRTG runs on two (or more) servers inside the company LAN (close to each other in a network topology perspective). All cluster nodes monitor the LAN and only the current master node sends out notifications.
- Reach 100% uptime for the monitoring system (for example, to control SLAs, create reliable billing data, and ensure that all failures create alarms if necessary)
- Avoid monitoring downtimes
Failover WAN or multi-location cluster
PRTG runs on two (or more) servers that are distributed throughout a multi-segmented LAN or even geographically distributed around the globe. All cluster nodes monitor the same set of servers or sensors, but only the current master node sends out notifications.
- Create multi-site monitoring results for a set of sensors
- Make monitoring and alerting independent from a single site, data center, or network connection
- The cluster technology is completely built into PRTG, no third-party software is necessary.
- Central configuration and notifications on the master node
- Configuration data and status information are automatically distributed among cluster nodes in real time.
- The storage of monitoring results is distributed to all cluster nodes.
- Each cluster node can take over the full monitoring and alerting functionality in case of a failover.
- Cluster nodes can run on different operating systems and different hardware or virtual machines. They should have similar system performance and resources.
- Node-to-node communication is always secure using SSL-encrypted connections.
- Automatic cluster update: You need to update to a newer PRTG version on one cluster node only, all other cluster nodes are automatically updated.
- Connect remote probes to all your cluster nodes.
What is special about a cluster in PRTG (compared to similar products)?
- Each node is truly self-sufficient (not even the database is shared).
- Our cluster technology is 100% “home grown” and does not rely on any external cluster technology like Windows Cluster or others.
- PRTG Manual: Failover Cluster Configuration
- I need help with my cluster configuration. Where do I find step-by-step instructions?
- In which web interface do I log in to if the master node fails?
- Can I use remote probes in a cluster with PRTG?
- What happens to historical data when a cluster node goes offline for some time?
- What are the bandwidth requirements for running a cluster?
- Are there alternatives to the cluster when running a large installation?
How find the Remote-Probe the current Master-Server when the Original-Master fails? Is here also the same Solution with a DNS-Update for the used A-Record in DNS-Server needed?
Remote Probes can only be set up for a connection to the Master Node server. Unfortunately, a DNS update doesn't change this fact. Please see Can I use Remote Probes in a cluster with PRTG? for more information about Remote Probes in a cluster setup.
Hi. It is necessary that the cluster has the same license as the main node?
If you are using a PRTG 100, 500, 1000, 2500 , 5000 and Unlimited license you can use 1 Master plus 1 for failover node in cluster within the same license. If you would like to use 1 master node and 3 failover nodes you have to use 2 seperate license keys in the same version (example: 2 Unlimited licenses). In case of using a Corporate Country license you can use one license for the
1 Master plus 1 for failover node
1 master node and 3 failover nodes.
@jochen - I know I'm practicing thread necromancy. Sorry.
With your explanation above, I have a PRTG 1000 license. So I can do 1 master and 1 failover node. Does that mean I have to split my 1000 sensors to 500 on master and 500 to the failover? I haven't seen anything (maybe I'm looking in the wrong places?) specifying how license limitations of sensors is addressed with failover.
You are correct! Your PRTG 1000 license can be used for one PRTG Master and one additional Failover Node. The number of sensors does not have a direct impact on the Nodes as it counts for all sensors that are deployed on all your Probes total.
For instance: You can have a PRTG Master with 500x sensors, a PRTG Failover with 200x sensors and two Remote Probes with 150x sensors each.
Let me know if there is still something unlcear.