This article applies as of PRTG 22
High-availability cluster for large installations
The cluster feature does not officially support more than 5,000 sensors, and we also do not recommend that you set up a cluster with more than 2,500 sensors. Always keep in mind that the monitoring traffic and the load are multiplied for each cluster node that you add. You might encounter performance issues in this case but this also depends on your individual setup.
So, before you create a cluster with such a high number of sensors, contact our presales team. We can discuss your options together. You can find some alternatives to a cluster below.
Alternatives to a cluster
The following alternatives neither replace nor provide equivalent features of a cluster. The aim is to give you some ideas that you can implement to help you quickly get your PRTG installation up and running (see also My PRTG has crashed and I can't restart it anymore. What can I do?).
We distinguish two cases here:
- PRTG is running on a real hardware server.
- PRTG is running on a virtual machine.
PRTG on real server hardware
If your PRTG installation is too large for a properly working cluster setup, you can alternatively implement the following approach to recover PRTG as fast as possible if it fails.
You need two real servers and both must have PRTG installed. The first acts as a "master node" and the second as a standby node. Keep the standby server up to date by regularly updating it to the same PRTG version as the master node.
The master node runs PRTG and monitors your infrastructure. The standby server has PRTG installed but the PRTG core server and its local probe services must be stopped. Copy or synchronize all PRTG data like configuration files, monitoring data, and templates on the master node with the standby server on a regular basis. You can do this by using a custom script that only copies data that has changed since the last synchronization.
Note: Copying the files requires that your master PRTG core server and its local probe services be stopped.
To keep the offline time short, your script can proceed as follows:
- Stop the Windows services of the master PRTG core server and its local probe.
- Copy all relevant data to a specific location where the copy time is short.
- Start the services of the PRTG core server and its local probe.
- Compress the copied data, transfer it to the standby server, and decompress it in the correct PRTG directory.
You can use a freeware version of PRTG that monitors the status of the master node server. When it fails, you will be notified to trigger the standby server to start monitoring your infrastructure.
Some manual configuration is necessary to configure your remote probes to send monitoring data to your new PRTG core server. You also need to migrate your PRTG license from the old server to the new server.
PRTG on a virtual machine
When running PRTG in a virtual environment, you have two options to keep monitoring downtime as low as possible.
Note: Both approaches require actions from the PRTG administrator to recover the PRTG installation once it is down. Moreover, there is a gap in the monitoring data due to the downtime.
1. Use snapshots
The idea is to make VMware or Hyper-V snapshots of the virtual machine where the PRTG core server runs. The snapshot contains the status of the virtual machine, disk data, and configuration at a given point in time.
Take snapshots regularly and carefully because performance may decrease as more snapshots are taken.
If the virtual machine crashes or fails, you can restore it quickly from the latest snapshot.
2. Use a VM backup
Hyper-V and VMware make it possible to have backups of virtual machines. The backup should contain the configuration, VM snapshots, and virtual hard disks used by the virtual machine.
If the virtual machine crashes, you can restore it from a backup copy.
More
Add comment