What is this?

This knowledgebase contains questions and answers about PRTG Network Monitor and network monitoring in general.

Learn more

PRTG Network Monitor

Intuitive to Use. Easy to manage.
More than 500,000 users rely on Paessler PRTG every day. Find out how you can reduce cost, increase QoS and ease planning, as well.

Free Download

Top Tags


View all Tags

Why are there frequent unrecoverable WMI timeouts and corresponding Local Probe restarts?

Votes:

0

We encountered a series of alerts from one server related to WMI timeouts. The appearing error message in these cases is WMI request timed out unrecoverable. If the problem persists with this sensor, consider pausing or deleting it. (code: PE051)

The Local Probe disconnects and restarts automatically some time after the WMI errors occurred. After that, all WMI errors disappeared from this server. I would like to understand the reason for this behavior in order to be able to troubleshoot such issues in a better way.

If a WMI sensor is down, does PRTG still keep trying to communicate with the device by opening new threads? What does happen with these threads once the device is reachable again? Do they all time out or do they stay active?

local-probe pe051 probe prtg restart sensor threads timeout windows wmi

Created on Aug 12, 2013 2:44:29 PM by  Gerald Schoch [Paessler Support]



1 Reply

Accepted Answer

Votes:

0

This article applies to PRTG Network Monitor 13 or later

WMI Timeouts and Probe Restarts

While Windows Management Instrumentation (WMI) sensors are showing timeouts, PRTG’s Local Probe continues to send WMI requests to the target machine and opens new threads at the same time. Because the number of available threads is limited to 500, PRTG restarts the probe service by design if the number of sensors that are in a timeout state reaches 40. This helps resolve connection issues in many cases. You can check the number of timeouts in PRTG’s system log files.

Regarding Open and Killed Threads

The PRTG probe starts a thread for each sensor scan. In this thread, the WMI sensor code sends WMI API calls to the probe computer’s WMI system. Most often, these API calls return after a short time, but sometimes they do not return at all. They just get lost somewhere in the Windows system. The thread cannot continue processing and has to wait until the function call returns.

After waiting for 45 minutes, PRTG stops the thread and marks it as unusable (which is similar to killing a thread, though, there is no action such as “killing” threads). Then, the sensor gets a new thread assigned and tries to process again. This method most often works as described, however, sometimes it does not—the new thread also gets stuck waiting for a response from Windows.

Unfortunately, there is no way to recover the resources of unusable (killed) threads. Because of this, PRTG restarts the whole probe process after 40 killed threads.

Causes

After all, there is not one device causing this issue on its own, neither the PRTG server, nor the target Windows server. Usually, the target computer’s WMI system is causing some trouble—this trouble results in hung up function calls on the probe computer’s WMI system. Concluding, this issue is kind of a system problem.

Workaround

If you detect recurring entries of certain target devices in the server log, consider to install a remote probe on the affected machines. This approach omits the usage of DCOM communication between probe and target computer, and, thus, helps in many cases to resolve these issues.

Created on Aug 12, 2013 2:48:31 PM by  Gerald Schoch [Paessler Support]




Disclaimer: The information in the Paessler Knowledge Base comes without warranty of any kind. Use at your own risk. Before applying any instructions please exercise proper system administrator housekeeping. You must make sure that a proper backup of all your data is available.