What is this?

This knowledgebase contains questions and answers about PRTG Network Monitor and network monitoring in general.

Learn more

PRTG Network Monitor

Intuitive to Use. Easy to manage.
More than 500,000 users rely on Paessler PRTG every day. Find out how you can reduce cost, increase QoS and ease planning, as well.

Free Download

Top Tags


View all Tags

Are there alternatives to the cluster when running a large installation?

Votes:

0

I run a large PRTG installation with thousands of sensors in my network and want to set up a high-availability cluster for fail-safe monitoring. However, according to the PRTG system recommendations, more than 2,500 sensors in a cluster are not recommended and more than 5,000 sensors in a cluster are not officially supported.

How can I set up fail-safe monitoring nevertheless in case my PRTG installation is too large for the cluster feature?

cluster large-installation prtg recommendations requirements

Created on Aug 9, 2017 1:04:05 PM by  Gerald Schoch [Paessler Support]

Last change on Jan 4, 2023 2:52:19 PM by  Brandy Greger [Paessler Support]



5 Replies

Accepted Answer

Votes:

0

This article applies as of PRTG 22

High-availability cluster for large installations

The cluster feature does not officially support more than 5,000 sensors, and we also do not recommend that you set up a cluster with more than 2,500 sensors. Always keep in mind that the monitoring traffic and the load are multiplied for each cluster node that you add. You might encounter performance issues in this case but this also depends on your individual setup.

So, before you create a cluster with such a high number of sensors, contact our presales team. We can discuss your options together. You can find some alternatives to a cluster below.

Alternatives to a cluster

The following alternatives neither replace nor provide equivalent features of a cluster. The aim is to give you some ideas that you can implement to help you quickly get your PRTG installation up and running (see also My PRTG has crashed and I can't restart it anymore. What can I do?).

We distinguish two cases here:

  • PRTG is running on a real hardware server.
  • PRTG is running on a virtual machine.

PRTG on real server hardware

If your PRTG installation is too large for a properly working cluster setup, you can alternatively implement the following approach to recover PRTG as fast as possible if it fails.

You need two real servers and both must have PRTG installed. The first acts as a "master node" and the second as a standby node. Keep the standby server up to date by regularly updating it to the same PRTG version as the master node.

The master node runs PRTG and monitors your infrastructure. The standby server has PRTG installed but the PRTG core server and its local probe services must be stopped. Copy or synchronize all PRTG data like configuration files, monitoring data, and templates on the master node with the standby server on a regular basis. You can do this by using a custom script that only copies data that has changed since the last synchronization.

Note: Copying the files requires that your master PRTG core server and its local probe services be stopped.

To keep the offline time short, your script can proceed as follows:

  1. Stop the Windows services of the master PRTG core server and its local probe.
  2. Copy all relevant data to a specific location where the copy time is short.
  3. Start the services of the PRTG core server and its local probe.
  4. Compress the copied data, transfer it to the standby server, and decompress it in the correct PRTG directory.

You can use a freeware version of PRTG that monitors the status of the master node server. When it fails, you will be notified to trigger the standby server to start monitoring your infrastructure.

Some manual configuration is necessary to configure your remote probes to send monitoring data to your new PRTG core server. You also need to migrate your PRTG license from the old server to the new server.

PRTG on a virtual machine

When running PRTG in a virtual environment, you have two options to keep monitoring downtime as low as possible.

Note: Both approaches require actions from the PRTG administrator to recover the PRTG installation once it is down. Moreover, there is a gap in the monitoring data due to the downtime.

1. Use snapshots

The idea is to make VMware or Hyper-V snapshots of the virtual machine where the PRTG core server runs. The snapshot contains the status of the virtual machine, disk data, and configuration at a given point in time. Take snapshots regularly and carefully because performance may decrease as more snapshots are taken.

If the virtual machine crashes or fails, you can restore it quickly from the latest snapshot.

2. Use a VM backup

Hyper-V and VMware make it possible to have backups of virtual machines. The backup should contain the configuration, VM snapshots, and virtual hard disks used by the virtual machine.

If the virtual machine crashes, you can restore it from a backup copy.

More

Created on Aug 9, 2017 2:31:42 PM by  Gerald Schoch [Paessler Support]

Last change on Jan 4, 2023 2:52:45 PM by  Brandy Greger [Paessler Support]



Votes:

0

Hello @Gerald,

we are going into this procedure but our monitoring environment doesn't allow stopping the services on the "master node" because it will take 10-15 minutes and we just can't have 10k+ sensors without monitoring. We REALLY need to stop the services? Is there any impacts if we try to copy files to the "slave server" with both services active on master?

We need to make 2-4 copies a day, so stopping services 2-4 days is out of question.

Thanks!

Created on Jun 28, 2018 6:18:26 PM

Last change on Jun 28, 2018 7:21:31 PM by  Luciano Lingnau [Paessler]



Votes:

0

Hey Mariows,

there is no need to stop the Core Server service on the Master node. Instead, you can work with snapshots of the configuration file as well. This snapshot can be created under Setup >> System Administration >> Administrative Tools on the Master node.

Once you have done this, please copy the configuration file from the snapshot-zip and the monitoring data to the Failover node (here, the Core Server service must be stopped).

Best regards,
Sven

Created on Jun 29, 2018 12:59:25 PM by  Sven Roggenhofer [Paessler Technical Support]



Votes:

0

Hi, this is a great writeup... but how would this process work with remote probes?

Can I do the same file copy process on remote probe (active) to remote probe standby and point the standby probe to the standby core server?

Created on Sep 13, 2021 8:06:24 PM



Votes:

0

Hello,

remote probes work differently. Sensor configuration and historic data is stored on the core server. If a remote probe goes down, you can setup a new one, confirm it in PRTG, and then move the device objects (excepts for the probe device itself) to the new probe.

Created on Sep 14, 2021 3:21:10 PM by  Arne Seifert [Paessler Support]




Disclaimer: The information in the Paessler Knowledge Base comes without warranty of any kind. Use at your own risk. Before applying any instructions please exercise proper system administrator housekeeping. You must make sure that a proper backup of all your data is available.