I use a lot of REST custom sensors in our PRTG instance. These sensors poll for JSON data exposed via HTTPS on another machine. The issue I have is that the downstream http server writes the JSON files to its webroot on a cronjob, therefore explicitly at the same time each minute (down to the millisecond). This is largely due to the fact that CRON doesn't give anything beyond granularity in minutes. It takes it about 750 milleseconds to complete the job.
So, if JSON file is created at the SAME time that PRTG sensor polls then it invariably fails with an error message of "Cannot parse content from endpoint <ENDPOINT>: no data. This is expected response when payload is EMPTY as http file is concatenated at the moment it is being rewritten to and a few millicsends later the latest JSON is written to the file and is available to be picked up by PRTG sensor.
The issue is that the REST custom sensor has been programmed to IMMEDIATELY retry. I would like to be able to change its retry interval from INSANT (or next second) to a few seconds later; perhaps 5. This will give the downstream web server the time it needs to finish processing and publishing the data to the web root.
The result of this situation is that the LOGS are absolutely FULL of clashes like this where PRTG polls aggressively, FAILS, then retries without a delay. Because we cannot change the polling windows in PRTG for when each sensor polls it comes down to sheer luck. It is sods law that PRTG will ask the webserver for the file at the exact moment that the webserver is writing it to disk :-)
In short, there is a two second window at the top of each minute where the HTTP server is writing the latest payloads of JSON to the webroot; and it would be great if PRTG didn't knock on the door until 3 seconds after the top of the minute.
When this is a single REST Custom sensor you'd probably never notice it. When you have a lot; it begins to get very messy indeed. It's great that PRTG spreads out its harvesting; but terrible that we cannot influence WHEN it does its thing.
To workaround the issue I force PRTG to poll twice a minute; inducing a quick recovery from clashing in this manner. The sensors rarely go into a race DOWN condition because they naturally recover with a valid payload of JSON 30 seconds later. But, I am forced to falsely double the load on my PRTG instance as a result.
So, can I change the RETRY interval xomewhere on the REST CUSTOM SENSOR and change it from 1 second to a different number; like 3 or 5 seconds?
Add comment