What is this?

This knowledgebase contains questions and answers about PRTG Network Monitor and network monitoring in general.

Learn more

PRTG Network Monitor

Intuitive to Use. Easy to manage.
More than 500,000 users rely on Paessler PRTG every day. Find out how you can reduce cost, increase QoS and ease planning, as well.

Free Download

Top Tags


View all Tags

All SNMP HP Proliant System Health Sensors Goes Up and Down

Votes:

0

We currently monitor 5 HP Proliant servers with the SNMP health sensor and have a strange problem. Sporadically, all of them will go "down" at the same time and say "No response (check: firewalls, routing, snmp settings of device, IPs, SNMP version, community, passwords etc) (SNMP error # -2003)". About 12 hours later, they all come back up again for no apparent reason. While this is happening, other SNMP sensors continue to work fine.

I find it hard to imagine all 5 servers are having an issue at the same exact time. It seems more likely that PRTG is having an issue during this time frame, but the issue doesn't effect other SNMP sensors at all, so I'm not sure what to make of it. Whenever this happens, it seems to start between 12:30 and 1:30 in the morning and always lasts about 12 hours.

Any ideas?

hp-proliant-health prtg snmp

Created on Oct 3, 2017 4:19:46 PM



12 Replies

Votes:

0

Hello,

Thank you for the KB-Post. May I ask which SNMP Sensors all go into this error? If you have, do SNMP Uptime Sensors also show this error? Or do they (contrary to the SNMP HP Health Sensors) continue to work?

best regards.

Created on Oct 4, 2017 2:09:36 PM by  Torsten Lindner [Paessler Support]



Votes:

0

The only sensors that give this issue are the SNMP HP Health Sensors. We have other SNMP based sensors used with other devices, but they all continue to work when this issue occurs.

I've added the SNMP uptime sensor to a couple of the affected servers. The next time this issue occurs, I will check to see if they report issues at the same time.

Created on Oct 10, 2017 2:39:31 PM



Votes:

0

Thanks, just hold us posted.

Created on Oct 11, 2017 8:17:47 AM by  Torsten Lindner [Paessler Support]



Votes:

0

It's been a long time but this finally happened again last night. All of our HP servers had all of their SNMP sensors break at around the same time. This morning they are starting to come back one server at a time. Other non-SNMP sensors on the same servers continued to work fine. All SNMP sensors on non HP servers (Cisco switches, Nimble storage, etc) continued to work normally. On our HP servers, we have SNMP ProLiant System Health sensors as well as SNMP physical disk sensors for each disk in the server and all of those are down for each server.

Any thoughts?

Created on Apr 13, 2018 2:44:46 PM



Votes:

0

Which exact errors do you get? Please send some screenshots showing the exact error messages from the "Overview"-Tab. In case the sensors are not in error-state right now, you can also check the "Log" tab of the sensor, to view previous error-messages.

Created on Apr 16, 2018 8:26:52 AM by  Torsten Lindner [Paessler Support]



Votes:

0

This just happened again. I don't see a way of uploading pictures. Should I just post them somewhere else and link to them?

Created on May 10, 2018 10:12:30 PM



Votes:

0

Yes please share the screenshots exactly this way. I'm sorry for inconvenience.

Created on May 11, 2018 10:49:06 AM by  Torsten Lindner [Paessler Support]



Votes:

0

SNMP failures on example server SNMP failures most recent Probe Health During Failures

Created on May 11, 2018 1:33:05 PM



Votes:

0

The "no response"-error on the SNMP Sensors suggest that SNMP requests are not answered at all in this moment (otherwise you'd see other errors like no such name or similar).
The other sensors, which still work there, are those all WMI / Performance Counter based? That could then mean that in those moments SNMP communication isn't possible.
To check if this theory is correct, please use our SNMP Tester, run it on the PRTG Host (or host of the Remote Probe), and perform a "Read Device Uptime" against the target device. Which results do you get in the Tester? Please send us the result logfile from the Tester.

Created on May 11, 2018 1:54:05 PM by  Torsten Lindner [Paessler Support]



Votes:

0

The sensors still up are WMI. The issue I'm having is that when this happens, it affects all of our physical servers (all HP) at the same time. Last night they all were broken for 12 hours before coming back up. The entire time, all of the SNMP sensors for our Cisco network devices continued working fine. So SNMP is working on the probe (or the Cisco sensors would also be down) but all of the HP related sensors stop for all servers (so it's obviously not all the servers having the same problem at the same time).

Does that make sense?

Created on May 11, 2018 1:58:04 PM



Votes:

0

Thank you for the clarification. So the next step then really has to be 1-2 "Read Uptime"-Tests with the SNMP Tester against 1-2 of the HP Server. We need to see with this, if it's a PRTG Issue (then the Tester should work) or more a network issue (then the tester would also fail).

Created on May 14, 2018 7:49:26 AM by  Torsten Lindner [Paessler Support]



Votes:

0

As embarrassing as it is, I wanted to share what my solution was in case it helps someone else in the future. The issue happened again and I used the SNMP tester as you suggested. When it didn't work, I started digging and found that restarting the SNMP service on one of the affected servers fixed the issue for that server. I was really struggling to understand the peculiar traits of this issue but then it hit me!

A few months ago, I created a custom sensor that tested PXE responses for our Citrix environment. To do this, I had to add a second NIC to PRTG so that I could test PXE responses from the VDI VLAN. When I did this, I stupidly left it using DHCP, which caused PRTG to have 2 default gateways configured. The Windows server running PRTG kept trying to change which IP address it was using for SNMP communications. For most of my network appliances, VM hosts, & Linux servers, this isn't a problem because they only care about the SNMP community name, but we configure our Windows SNMP server service to only respond to specific IP addresses, which this second NIC on PRTG was not one of.

Long story short, I gave the second NIC a static IP address and didn't configure a gateway at all for it and all of the sensors immediately started working again.

Thanks for all the assistance with this. It was definitely a weird one!

Created on May 18, 2018 4:26:08 PM




Disclaimer: The information in the Paessler Knowledge Base comes without warranty of any kind. Use at your own risk. Before applying any instructions please exercise proper system administrator housekeeping. You must make sure that a proper backup of all your data is available.