I would like to know more about the clustering feature of PRTG. Can you give me an introduction and some basic information about clustering?
This article applies to PRTG Network Monitor 19 or later
PRTG Cluster Basics
One very helpful feature in PRTG Network Monitor is called Clustering. A PRTG cluster consists of two or more installations of PRTG Network Monitor that work together to form a highly available monitoring system.
The objective is to reach true 100% percent uptime for the monitoring tool. If you use clustering, uptime will no longer be degraded by failing connections because of an internet outage at a PRTG server’s location, failing hardware, or because of downtime due to a software upgrade for the operating system or PRTG itself.
How a PRTG Cluster Works
A PRTG cluster consists of one Primary Master Node and one or more Failover Nodes. Each node is simply a full installation of PRTG Network Monitor that could perform monitoring and alerting on its own. Nodes are connected to each other using two TCP/IP connections. These connections communicate in both directions and a single node only needs to connect to one other node to integrate into the cluster.
Normal Cluster Operation
Central Configuration, Distributed Data Storage, and Central Notifications
During normal operation, the Primary Master is used to configure devices and sensors. The master automatically distributes the configuration to all other nodes in real time. All nodes permanently monitor the network according to this common configuration, and each node stores its results into its own database. This way, the storage of monitoring results is also distributed among the cluster. The downside of this concept is that monitoring traffic and load on the network is multiplied by the number of cluster nodes, but this is not a problem for most usage scenarios.
You can review the monitoring results by logging in to the web interface of any of the cluster nodes in read-only mode. Because the monitoring configuration is centrally managed, you can change it only on the master node.
If one or more nodes discover downtimes or threshold breaches, only the primary master will send out notifications to the administrator (for example, via email and SMS). So, you will not be flooded with notifications from all cluster nodes in the event of failures. Additionally, there is a Partial Down sensor status, which means that the sensor shows an error on some nodes, but not on all.
Failure Cluster Operation
- Failure scenario 1
If one or more of the Failover nodes are disconnected from the cluster (due to hardware or network failures), the remaining cluster continues to work without disruption.
- Failure scenario 2
If the Primary Master node is disconnected from the cluster, one of the failover nodes becomes the new master node. It takes over control of the cluster and also manages notifications until the primary master reconnects to the cluster and takes back the master role.
Sample Cluster Configurations
Several cluster scenarios are possible in PRTG.
- Simple Failover. This is the most common usage of PRTG in a cluster. Both servers monitor the same network. When there is downtime on Node 1, Node 2 automatically takes over the master role until Node 1 is back online.
- Double Failover. This is a very advanced Failover cluster. Even if two of the nodes fail, network monitoring will still continue with a single node (in master role) until the other nodes are back online.
- The following Four-Node-Scenario shows one node in disconnected mode. The administrator can disconnect a node any time for maintenance tasks or to keep a powered-off server on standby in case another node’s hardware fails.
Usage Scenarios for the PRTG Cluster
The cluster feature in PRTG is quite versatile and covers the following usage scenarios.
Failover LAN Cluster
PRTG runs on two (or more) servers inside the company LAN (closely to each other in a network topology perspective). All cluster nodes monitor the LAN and only the current master node sends out notifications.
- Reach 100% uptime for the monitoring system (for example, to control SLAs, create reliable billing data, and ensure that all failures create alarms if necessary)
- Avoid monitoring downtimes
Failover WAN or Multi-Location Cluster
PRTG runs on two (or more) servers that are distributed throughout a multi-segmented LAN or even geographically distributed around the globe on the internet. All cluster nodes monitor the same set of servers or sensors, but only the current master node sends out notifications.
- Create multi-site monitoring results for a set of sensors
- Make monitoring and alerting independent from a single site, datacenter, or network connection
PRTG Cluster Features
- The cluster technology provided by Paessler is completely built into the PRTG software, no third-party software is necessary.
- PRTG clusters feature central configuration and notifications on the cluster master.
- Configuration data and status information are automatically distributed among cluster members in real time.
- The storage of monitoring results is distributed to all cluster nodes.
- Each cluster node can take over the full monitoring and alerting functionality in case of a failover.
- Cluster nodes can run on different operating systems and different hardware or virtual machines. They should have similar system performance and resources.
- Node-to-node communication is always secure using SSL-encrypted connections.
- Automatic cluster update: You need to install updates to a newer PRTG version on one node only, all other nodes of the cluster are updated automatically.
- Connect remote probes to all your cluster nodes.
What Is Special About a PRTG Cluster (Compared to Similar Products)
- Each node is truly self-sufficient (not even the database is shared).
- Our cluster technology is 100% “home grown” and does not rely on any external cluster technology like Windows Cluster or others.
- PRTG Manual: Failover Cluster Configuration
- I need help with my PRTG cluster configuration. Where do I find step-by-step instructions?
- In which web interface do I log in if the Master Node fails?
- Can I use Remote Probes in a cluster with PRTG?
- What happens to historical data when a cluster node goes offline for some time?
- What are the bandwidth requirements for running a PRTG Cluster?
- Are there alternatives to the PRTG cluster when running a large installation?
How find the Remote-Probe the current Master-Server when the Original-Master fails? Is here also the same Solution with a DNS-Update for the used A-Record in DNS-Server needed?
Remote Probes can only be set up for a connection to the Master Node server. Unfortunately, a DNS update doesn't change this fact. Please see Can I use Remote Probes in a cluster with PRTG? for more information about Remote Probes in a cluster setup.
Hi. It is necessary that the cluster has the same license as the main node?
If you are using a PRTG 100, 500, 1000, 2500 , 5000 and Unlimited license you can use 1 Master plus 1 for failover node in cluster within the same license. If you would like to use 1 master node and 3 failover nodes you have to use 2 seperate license keys in the same version (example: 2 Unlimited licenses). In case of using a Corporate Country license you can use one license for the
1 Master plus 1 for failover node
1 master node and 3 failover nodes.