What is this?

This knowledgebase contains questions and answers about PRTG Network Monitor and network monitoring in general.

Learn more

PRTG Network Monitor

Intuitive to Use. Easy to manage.
More than 500,000 users rely on Paessler PRTG every day. Find out how you can reduce cost, increase QoS and ease planning, as well.

Free Download

Top Tags


View all Tags

Advanced SSH Script sensor high failure rate

Votes:

0

I have established an advanced SSH sensor to SSH into a Raspberry Pi and execute a script, speedTesh.sh I can execute this script without issue and get my desired output. When running the script through PRTG's sensor, I have a common failure of 'No valid result from SSH Shell' for about 65% of the polls, and the rest of the time I get a proper response of data.

This is the log for a normal output:

linux

Last login: Tue May 28 15:24:11 2013 from 184.173.111.37-static.reverse.softlayer.com

root@raspberrypi:~# echo PAESSHSTART;/var/prtg/scriptsxml/speedTest.sh ;printf " 
\n";echo PAESSHEND
PAESSHSTART
<prtg>
<result>
<channel>Download Speed</channel>
<value>34672</value>
<unit>custom</unit>
<customunit>kbit/s</customunit>
</result>
<result>
<channel>Upload Speed</channel>
<value>67868</value>
<unit>custom</unit>
<customunit>kbit/s</customunit>
</result>
</prtg>

PAESSHEND
root@raspberrypi:~# 
--------------------------------------------------------------

This is the log for the output when I get a no valid result error:

linux

Last login: Tue May 28 15:34:11 2013 from 184.173.111.37-static.reverse.softlayer.com

echo PAESSHSTART;/var/prtg/scriptsxml/speedTest.sh ;printf "\n";echo PAESSHEND
root@raspberrypi:~# echo PAESSHSTART;/var/prtg/scriptsxml/speedTest.sh ;printf " 
\n";echo PAESSHEND
PAESSHSTART
<prtg>
----------------------------------------------------

This script will display the <prtg> tag right away and then will have an extended pause for 15 seconds while waiting for background data to be generated. Once the 15 seconds have finished the rest of the output follows. I have increased the sensor's connection timeout to 2 minutes and the shell timeout to 1 minute to no avail. I have tested an SSH connection from the PRTG probe itself to the target device and had no issues executing the script via PuTTy. I have also added a counter to my script to try and determine if the sensor fails due to a timeout, and this is the result every time it fails. Every number is 1 second passed while the script is executing:

linux

Last login: Tue May 28 16:49:13 2013 from 184.173.111.37-static.reverse.softlayer.com

echo PAESSHSTART;/var/prtg/scriptsxml/speedTest.sh ;printf "\n";echo PAESSHEND
root@raspberrypi:~# echo PAESSHSTART;/var/prtg/scriptsxml/speedTest.sh ;printf " 
\n";echo PAESSHEND
PAESSHSTART
<prtg>
1

Is there some rare quirk with the probe's SSH system that I am unaware of? It all seems to come back to an instability in the probe itself rather than my script, but I cannot tell for certain.

PRTG version 13.1.3.1643

prtg ssh ssh-script

Created on May 28, 2013 9:03:47 PM

Last change on May 29, 2013 3:17:44 PM by  Daniel Zobel [Product Manager]



6 Replies

Votes:

0

Hi,
what exact data are you pulling with your script? May you post the whole script? Additional may you try to set higher Shell and Connection Timeouts as well as a higher Scanning Interval (please try with 5min)?
Therefore switch to tab "Settings" of the sensor and go to section SSH Specific and set the Connection Timeout to 3 minutes and the Shell Timeout to 59 seconds (which is the maximum). Do you get better results then?

Created on May 31, 2013 5:22:01 AM by  Konstantin Wolff [Paessler Support]



Votes:

0

Hi,

may you try to configure/add

ClientAliveInterval 10
ClientAliveCountMax 6

in your sshd_config to be sure there is no keepalive-issue with the client-server-connection?

Kind regards

Created on Jun 5, 2013 3:32:33 PM by  Dieter Loskarn [Paessler Support]



Votes:

0

The addition of the ClientAliveInterval and ClientAliveCountMax certainly helped. My failure rate has dropped to about 15%, but the issue appears to still exist in the same failure spot in the code as before.

I have added some debugging notes to the scripts to track exactly where it crashes. When it fails, it will always fail when executing the sleep command the first time, no matter how short it is. I tested this by making a counter loop to output the current count in the loop and then sleep for one second (with 0 as the start before the first sleep). It always fails at 0 and thus means the session crashed at the first sleep command.

The very odd part is that this does not occur every time the script is run nor does the ssh session ever get killed by me running the script via PuTTY.

Output of a good run with results:

linux

Last login: Wed Jun  5 15:45:30 2013 from 184.173.111.37-static.reverse.softlayer.com

root@raspberrypi:~# echo PAESSHSTART;/var/prtg/scriptsxml/speedTest.sh -s 192.16 
8.128.145 -t 10;printf "\n";echo PAESSHEND
PAESSHSTART
start
line 16, check args
line 21, rm iperf.log
line 24, run iperf
line 30, start counter loop
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
line 36, kill iperf
line 41, read output
<prtg>
<result>
<channel>Download Speed</channel>
<value>33720</value>
<unit>custom</unit>
<customunit>kbit/s</customunit>
</result>
<result>
<channel>Upload Speed</channel>
<value>67346</value>
<unit>custom</unit>
<customunit>kbit/s</customunit>
</result>
</prtg>

PAESSHEND
root@raspberrypi:~# 

Output with killed connection on sleep:

linux

Last login: Wed Jun  5 15:56:32 2013 from 184.173.111.37-static.reverse.softlayer.com

echo PAESSHSTART;/var/prtg/scriptsxml/speedTest.sh -s 192.168.128.145 -t 10;printf "\n";echo PAESSHEND
root@raspberrypi:~# echo PAESSHSTART;/var/prtg/scriptsxml/speedTest.sh -s 192.16 
8.128.145 -t 10;printf "\n";echo PAESSHEND
PAESSHSTART
start
line 16, check args
line 21, rm iperf.log
line 24, run iperf
line 30, start counter loop
0

Any further ideas?

Created on Jun 5, 2013 7:57:56 PM



Votes:

0

Hi,

unfortunately not. According to the openSSH Manpage this may be correlated with a NAT-Connection. I see from your script that you are connecting from a public IP and submit a private IP. There is at least one NAT-device in the connection that may cause these Problems.

Kind Regards

Created on Jun 6, 2013 11:01:54 AM by  Dieter Loskarn [Paessler Support]



Votes:

0

I have tested both externally and internally based on where I am working for that day. I have static NAT and routing set up for these devices so that iperf and SSH are accessible from outside the network. I can conclude now that this is a bug within PRTG and will be submitting a ticket to have this investigated by the software engineers. Thank you for your help.

Created on Jun 13, 2013 3:20:14 PM



Votes:

0

Hi,

Confirmed and already fixed. At least in our next nightly canary build (see our blog for details of our branches). It will probably be released in the preview channel during next week.

If you want to check if your problem is fixed with this release do not change your productive environment to canary but install a separate trial-installation and switch to the other channel as there may be several bugs in new features that will be fixed until release.

Kind regards

Created on Jun 13, 2013 7:40:55 PM by  Dieter Loskarn [Paessler Support]




Disclaimer: The information in the Paessler Knowledge Base comes without warranty of any kind. Use at your own risk. Before applying any instructions please exercise proper system administrator housekeeping. You must make sure that a proper backup of all your data is available.