We are currently using PRTG to monitor our enterprise environment and are looking to improve certain areas - specifically alerting to lower notification "noise".
We have made the specific decision to do all alerting from libraries - which are dynamically regenerated from the device tree as devices are added and removed.
Also of note is due to the very large scale of our environment (using a cluster and remote probes) we are still heavily dependent on leveraging SNMP as our primary polling method to keep performance at an acceptable level.
Monitoring of VMWare is implemented as follows:
- Vcentre Server (>500 VMs)
- autodiscovered / rebuilt every 24 hours using specialized guest template (VMGuest_only.odt included at end)
- VMs polled every 5 min for performance (using VMWare VM SOAP sensor)
- Specialized / Mission Critical VMs (ie: Mfg, SQL)
- monitored on a standalone basis (ping, snmp uptime, snmp cpu, snmp memory, snmp disk free, snmp traffic)
- notification and alerting is generated out of custom libraries dedicated to the technical teams responsible for the VM
- Ping is used as a primary means of up / down alerting (60 sec polling)
- Alerting is almost solely based on state triggers (which tend to be sensitive to short term events ie: cpu spikes)
- Alternatives would be to use threshold based triggers (ie: sustained high cpu) but this would impact all sensors in a library.
SNMP cpu sensors (most desired spec as we are hugely compute bound) would ideally yield a down state in the event of sustained high CPU. The Linux SNMP library allows use of 1,5,15 min cpu load sensors - something similar would be ideal as we could use the 15 min load as our trigger for a down state.
I have looked the VMWare provided MIBs in the hopes of finding something there of use but I'm not seeing anything. One solution we have looked at is doing a library based on filtered sensor types to CPU and setting a threshold trigger on that but this seems like more of a work around than a proper solution.
Anyone have any insights, suggestions or feedback ?
<?xml version="1.0" encoding="UTF-8"?> <devicetemplate id="vmware-guest-only" name="vmWare Guests Only" priority="1" deviceicon="C_OS_VMware.png" > <check id="ping" meta="ping"/> <check id="vmhostsoap" meta="MetaVmwareServerHostSoap" requires="ping"/> <!--create id="vmhost" kind="esxserversensorextern" meta="MetaVmwareServerHostExtern" requires="metavmwareserverhostsoap" displayname="Guest VM [name]"--> <create id="vm" kind="vcenterserverextern" meta="Metavcenterserverextern" requires="vmhostsoap"> <createdata> <priority>3</priority> <interval>300</interval> </createdata> </create> </devicetemplate>