We have folder sensors that work well for a while and then will fail. They give error code "logon failure: no network provider accepted the given network path (1203) (code:PE029), or "cannot access folder: The filename or extension is too long(206)(code:PE032). Copying the folders to a new drive will usually make the sensor work again for a time. Even while working they will sometimes fail with either of these two error messages and then come back online.
folder sensor works erratically, multiple error codes
Votes:
0
Best Answer
Votes:
0
Hello,
we use standard Windows API Calls:
- 'WNetAddConnection2' to establish a connection to a share (using the IPC$ Share)
- 'FindFirstFile'/'FindnextFile' to iterate through folders
- 'WIN32_FIND_DATA' to collect the "actual data"
All sensor targeting the same host (with the same credentials) share one connection. Occuring API Error messages hint that either PRTG is permanently establishing new connections (due to connections with different credentials) or that something "external" is closing the connections (other applications, services that connect to the hosts, etc., or shaky network-connections).
Does that overhead have any impact on our experience with the reliability of this particular sensor type?
Apart from the connection that has to be established of course, there is no overhead we are aware of.
What can we do to affect the usefulness of this sensor?
Try reducing the load, and check for potential "sources of interference".
Is it possible to log the activity from this sensor to determine why it’s failing?
Unfortunately that's not possible.
Is it possible to modify the folder sensor type so that it records how long it takes to pull results even when it succeeds (Like the custom NetworkFileCount.exe)?
You could only check a Probe State File (can be created on the System Status-Page), with the ID of a sensor you can check its timings ("Timing: 156/115/0" would be last/average timing).
Would disabling the check for file age (newest / oldest) improve the sensor performance?
No.
Our recommendations would be to reduce the general load on these sensors, use the same credentials for the one and the same host (do not use different credentials for several sensors connecting to one host). Potentially there are internal Windows-Limits interfering (for example the limit to 10 share-connections to a Windows XP machine). Sensors that take very long to get their result might of course block other sensors, so use long scanning intervals (1h or longer), maybe use schedules to further distribute the sensors.
Also important do not use different names for one and the same host (that might slow things down significantly). Always use the same host name, FQDN or IP. It has to be exactly the same.
Best regards
Created on Nov 16, 2010 3:31:24 PM by
Torsten Lindner [Paessler Support]
Last change on Nov 16, 2010 3:31:43 PM by
Torsten Lindner [Paessler Support]
8 Replies
Votes:
0
if this is not a constant error, this could be a load issue.
How many sensors do you have? Are the shared folders you are monitoring in the same LAN?
Votes:
0
I wanted to follow up that with more details on the issue we are experiencing.
First, the most frustrating part of this problem is the inconsistency. Sometimes sensors tend to work fine for a period of time before failing, and will work on one server and then not the other though files and directories are equivalent. Sometimes they work with little effort, and sometime we can’t do anything to get them to work. Recently, we had a couple folders working reasonably well for weeks, and they stopped working after some seemingly trivial task like a folder rename or a partition extension using diskpart.
Secondly, let me give a description of how and why we use these folders. We have “large” folders with file and folder counts in the millions usually spanning a 200GB -1,000GB space that we replicate across multiple servers usually stored on a SAN or NAS. We simply use the folder sensors to verify that they are staying synchronized regardless of the method we use to sync the directories (robocopy scripts, DFS, SAN replication, etc). Due to the size of the folders an automated method through PRTG to verify file count and newest file is EXTREMELY valuable because even the normal methods in windows such as folder properties to monitor file count and size take as much as an hour or more to return the totals. When we have tested these folder sensors on smaller directories they seem to work with no problems, it is only on the large folders that the problem stated above presents itself. On a similar note, we tried to verify that some of these “large” shares are accessible by using file sensors to look for a single file in the large shares and they would also fail periodically with similar errors. The share free space sensors also appear to have related problems consistently working on tree’s with high file counts.
We’ve spent too much time trying workarounds because we desperately want it to work, but each attempt has very limited success if any. So far we have tried many different techniques in an attempt to get the sensors to work at all on some folders or work consistently on other folders… We modified the scan interval, simplified all permissions on the files and folders, we tried faster storage hardware, we checked for file system errors, we checked the directories for problems caused by path length or file name length, we tried different user accounts, verified results with other tools that could gather similar data such as forfiles & sysinternal’s DiskUse, copied the data to other drives, tried checking subsets of the directory structure, limited the number of sensors of this type to one per device, etc.. Further, we also tried the custom sensor that uses NetworkFileCount.exe and this less robust sensor gave an error on large directories while working on smaller directories just as the file sensor did.
We would like to know if you have any suggestions for alternate configurations or sensors to try in this situation. We would also like to know how the file sensor works so that we can possibly troubleshoot further or find a workaround. Does it gather the data from the file system journal or is it running some sort of operation to actually count files / folders? We’ve seen some references to an additional performance overhead from these sensors, but don’t know how to evaluate that statement. Does that overhead have any impact on our experience with the reliability of this particular sensor type? What can we do to affect the usefulness of this sensor? Is it possible to log the activity from this sensor to determine why it’s failing? Is it possible to modify the folder sensor type so that it records how long it takes to pull results even when it succeeds (Like the custom NetworkFileCount.exe)? Would disabling the check for file age (newest / oldest) improve the sensor performance?
We have currently almost 10,000 sensors we have 1 core PRTG server but are monitoring these sensors through 4 probes 3 of which are offsite. Some of these sensors are on the same LAN, however some are also offsite.
Votes:
0
Hello,
we use standard Windows API Calls:
- 'WNetAddConnection2' to establish a connection to a share (using the IPC$ Share)
- 'FindFirstFile'/'FindnextFile' to iterate through folders
- 'WIN32_FIND_DATA' to collect the "actual data"
All sensor targeting the same host (with the same credentials) share one connection. Occuring API Error messages hint that either PRTG is permanently establishing new connections (due to connections with different credentials) or that something "external" is closing the connections (other applications, services that connect to the hosts, etc., or shaky network-connections).
Does that overhead have any impact on our experience with the reliability of this particular sensor type?
Apart from the connection that has to be established of course, there is no overhead we are aware of.
What can we do to affect the usefulness of this sensor?
Try reducing the load, and check for potential "sources of interference".
Is it possible to log the activity from this sensor to determine why it’s failing?
Unfortunately that's not possible.
Is it possible to modify the folder sensor type so that it records how long it takes to pull results even when it succeeds (Like the custom NetworkFileCount.exe)?
You could only check a Probe State File (can be created on the System Status-Page), with the ID of a sensor you can check its timings ("Timing: 156/115/0" would be last/average timing).
Would disabling the check for file age (newest / oldest) improve the sensor performance?
No.
Our recommendations would be to reduce the general load on these sensors, use the same credentials for the one and the same host (do not use different credentials for several sensors connecting to one host). Potentially there are internal Windows-Limits interfering (for example the limit to 10 share-connections to a Windows XP machine). Sensors that take very long to get their result might of course block other sensors, so use long scanning intervals (1h or longer), maybe use schedules to further distribute the sensors.
Also important do not use different names for one and the same host (that might slow things down significantly). Always use the same host name, FQDN or IP. It has to be exactly the same.
Best regards
Created on Nov 16, 2010 3:31:24 PM by
Torsten Lindner [Paessler Support]
Last change on Nov 16, 2010 3:31:43 PM by
Torsten Lindner [Paessler Support]
Votes:
0
Thank you for your reply. I have a few follow-up questions and clarifications.
First can you clarify “reducing the load on the sensor” you state ““Our recommendations would be to reduce the general load on these sensors” and how would you recommend reducing the load on the sensors?
<pre>“You could only check a Probe State File (can be created on the System Status-Page), with the ID of a sensor you can check its timings ("Timing: 156/115/0" would be last/average timing).”</pre>
What time interval is this “timing” is recorded in (eg: seconds, milliseconds)? It does not appear to be seconds due to the extreme size of some values (would equal 8 days) but produce results every 4 hours. How is the AVG computed? All values in the log or values in the last 2 hours, etc? Can this value be added as a column to the results?
<pre>“ use the same credentials for the one and the same host (do not use different credentials for several sensors connecting to one host).”</pre> Does the sensor use the credentials supplied under “credentials for windows systems” that are either set per sensor or inherited?
<pre>“Potentially there are internal Windows-Limits interfering (for example the limit to 10 share-connections to a Windows XP machine). “</pre> We use Windows SERVER OS’s and not client OS’s, so it wouldn’t seem to be related to an OS connection limit. Further, it doesn’t appear to be limited to the number of share connections because we’ve set them to the Max.
<pre>“Sensors that take very long to get their result might of course block other sensors, so use long scanning intervals (1h or longer), maybe use schedules to further distribute the sensors.”</pre> The scanning intervals do not seem to have an effect on sensors that never succeed. We set them as high as 24 hours with no noticeable effect. For other sensors we’ve found that increasing them to intervals as great as 6 hours provides more consistent results. This issue should add support four our desire to have a column displaying the time required to scan and obtain a result. It’s very laborious to experiment and figure out the best interval through trial and error. The log is helpful, but acolum would be significantly more helpful and meaningful.
<pre>“Also important do not use different names for one and the same host (that might slow things down significantly). Always use the same host name, FQDN or IP. It has to be exactly the same.”</pre> I don’t believe we’ve been using different names as they’ve typically been inherited from the device and we’ve not used multiple devices with folders sensors set to scan the same physical server except for clusters. Do you have a recommendation for the best method to scan failover clusters? I assume folders or drives configured for failover should be scanned by using the virtual name, instead of cluster name however that could lead to multiple names being scanned on the same server. Do you have a recommendation for this situation? Additionally, are your recommendations for configuration the same when using windows clustering with both Windows server 2003 and 2008 OS’s? They seem to implement virtual name objects and differently.
What conditions cause the error message “cannot access folder: the filename or extension is too long(206) (code:PE032)”. Can you explain what the error means? Is code ‘(206)’ contained in the error significant?
As mentioned previously, “we had a couple folders working reasonably well for weeks, and they stopped working after some seemingly trivial task like a folder rename or a partition extension using diskpart.” We curretly have 3 identical shares set up on different servers and each has a folder sensor that gives a different problem. One give the error message “cannot access folder: the filename or extension is too long(206) (code:PE032)”, one gives the error code “login failure:connection in use (code:PE029)”, and the third just never returns any data or error message and stays in the “unknown” state.
Votes:
0
Dear Jesse,
I am sorry for the delay on this response. Unfortunately we don't have many experiences with monitoring shares on such a extremely large scale, and also not in an Microsoft Cluster environment.
The timing-values in the Probe-State-Files are in milli-seconds. It's just a 'debug'-output.
If a sensors stays gray/black, then most likely due to a scan taking too long. The error 'Filname or Extension too long' is exactly what it says, a too long filename or extension (we don't know which exact file/folder, because PRTG only gets this error from the Windows API), 206 is the error-code here.
'Connection in use', means that PRTG was not able to establish a connection, which might be the result of something else:
First PRTG checks if there is already a connection established with the same data (host, credentials). If it doesn't find one OR the actual sensor-check produces an error, then PRTG tries to establish a new connection, which then might fail because other sensors use it already, which is probably the case with these very long running sensors.
I'm very sorry but we don't have much more tips how to optimize or what to check in your case, maybe you could use remote probes (if the files/folders are physically stored on Windows Servers).
Votes:
1
You can use Long Path Tool to solve the problem.
Votes:
0
Another tool that can be helpful for finding out what permissions are on the sub-folders is AccessEnum from Sysinternals.
Votes:
0
Long path tool is an important tool to have in your machine. It is easy to download and its resolve this kind of errors.
Add comment