We have a setup where a probe monitors 20 machines, with 4 sensors on each:
- SNMP CPU test - 10 seconds timeout (we found it to be better than 60 seconds timeout in another case).
- SSH meminfo
- SSH disk free
- SSH custom script - a very light script on the server side
The probe's machine is Very strong: 16 cores, 16GB memory, Windows server 2012 - dedicated for this probe.
The 20 machines (with a total of 80 sensors on them) are not responding - since they are shut down - this is intentional. The 20 machines should go up only from time to time.
We get a very poor performance from the probe itself:
- More than 80% interval delay SNMP.
- We keep loosing connection to the probe - it becomes "disconnected" for tens of minutes at a time.
- Relatively high thread count sometimes - more than 300.
- All of the SSH sensors, a vast majority of the time, show no data (grey color).
All of this is co-located in our offices. We see more or less the same performance when we run the probe from within the cloud data-center where the sensors should be monitoring.
Add comment