We installed a failover cluster, and slowly moved sensors to it, and it was working fine. Today, for no explicable reason, many of the sensors (mostly those for which we'd written custom vbscript sensors) went down, with the error (External EXE/Script did not return a response (code: PE087) ). But the exact same script is running fine on the main cluster, and also they were running fine on the secondary cluster till yesterday
5 Replies
Hello,
could it be a load issue on the Failover system? Is the Failover system of the exact same hardware specs as the Master Node?
Can you try pausing maybe half of script sensors in error-state? Do the other then recover?
best regards.
The failover system has a better configuration (Xeon 2.5 GHZ CPU / 4 gb ram) then the master node (Xeon 2.27 Ghz, 4 gb ram.) For some reason, the partial alerts went down on their own after about 24 hours of leaving them on in that state. But I'd still like to know why this was happening. And whether there are any settings for cluster nodes, that would explain this, in case it does happen again. if you need any other details in order to diagnose this problem, please let me know. I will include some of my colleagues in this discussion so that they can contribute to this thread.
I'm afraid there are no settings in the Cluster that should impact sensors to run different on a Failover node, as on the Master node. Could you please forward us your system log files for analysis? You will find the log files within the data directory as defined under the "Logs" tab of the "PRTG Server Administrator" tool, or in the following default paths:
XP/2003: C:\Documents and Settings\All Users\Application Data\Paessler\PRTG Network Monitor\Logs (System) Vista/2008/Windows 7/2008 R2/Windows 8/2012: C:\ProgramData\Paessler\PRTG Network Monitor\Logs (System)
Please be aware that it might also be a 'V7' or a 'V8'-folder in which then the "Logs (System)"-folder resides. You can also use the option to upload the logs to our FTP-Server, using the button "Send Logs to Paessler..." on the tab "Logs" in the "PRTG Server Admin Tool".
Please send us the logs for both nodes via email to [email protected] or upload them via the FTP upload for both nodes as well.
Apologies for not getting to this earlier. Things seem to be ok now.
Is there a setting to allow the cluster to start monitoring only if and when the primary goes down, instead of expending double the resources to check the same thing? Something like a cold swap. A delay of upto 10 minutes is ok for the cluster to come online.
I'm very much afraid that is not possible. In a PRTG Cluster, all nodes will be generally active.
Please log in or register to enter your reply.
Add comment