Skip to content

Reading from stdout with timeout set can raise Timeout before it has occurred #207

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
dharam1 opened this issue Aug 26, 2020 · 33 comments
Closed
Labels

Comments

@dharam1
Copy link

dharam1 commented Aug 26, 2020

Issue 1:
I am using native client and my code is
hosts = ['localhost']
client = ParallelSSHClient(hosts)
output = client.run_command('uname;pwd',return_list = True)
client.join(output,timeout = 5)

###Expected behaviour####
It should execute and return the output of commands

###Actual Behaviour####
getting timeout exception

The same timeout when i am using with run_command its working fine but with join method its failing
Can you explain on what is difference between timeout use with run_command vs timeout use in join method

Issue 2:
What will be the case if i have entered host name wrong
hosts = ['localhost','wrong_host_name']
output = client.run_command('uname;sleep 10;pwd',return_list = True,timeout = 5)

###Expected behaviour####
for localhost : print uname and timeout after that
for wrong_host_name : show exception (As Expected)

###Actual Behaviour###
for localhost : print uname --> sleep for 10 --> print pwd
for wrong_host_name : wait for 10 sec to connect and after that throw exception

Please help me understand on how timeouts is working for

  1. ParallelSSHClient
  2. run_command
  3. join

Issue 3:
If any host name is wrong in the list and I am not using any timeout throughout the script
client.join(output) is stuck indefinitely

Thanks

@pkittenis
Copy link
Member

Hey there,

Does ParallelSSHClient(.., timeout=<timeout>) not work?

In [19]: client = ParallelSSHClient(['not a real host'], timeout=1, num_retries=1)                                                          

In [20]: datetime.now()                                                                                                                     
Out[20]: datetime.datetime(2020, 8, 26, 18, 1, 19, 117922)

In [21]: client.run_command('echo me', return_list=True, stop_on_errors=False)                                                              
Out[21]: 
[
 	host=not a real host
 	exit_code=None
 	cmd=<Greenlet at 0x79d881b7d348: _run>
 	channel=None
 	stdout=None
 	stderr=None
 	stdin=None
 	exception=('Unknown host %s - %s - retry %s/%s', 'not a real host', 'Name or service not known', 1, 1)]

In [22]: datetime.now()                                                                                                                     
Out[22]: datetime.datetime(2020, 8, 26, 18, 1, 22, 597775)

@dharam1
Copy link
Author

dharam1 commented Aug 26, 2020

This Works fine I did not read the updated timeout in 1.1.1 release docs

  1. I needed to know is there any relation between ParallelSSHClient timeout and run_command timeout as you can referred to issue 2 mentioned above output is not as expected

  2. After client.join(output) method not able to get exit codes :
    output = client.run_command('pwd',return_list= True,stop_on_errors=False)
    client.join(output)
    for host_output in output:
    exitcode = host_output.exit_code
    print(exitcode)

in the above code even though exit code returned should be 0 its giving None

@pkittenis
Copy link
Member

For reporting issues a reproducible example is very useful in getting good help.

When calling client.join without reading from stdout, consume_output=True must be set to be able to get exit codes. A command is not considered finished by the SSH protocol unless its output has been consumed.

Eg:

output = client.run_command('pwd',return_list=True)
client.join(output, consume_output=True)
for host_output in output:
    exitcode = host_output.exit_code
    print(exitcode)
0

Or read from stdout without join:

output = client.run_command('pwd',return_list=True) 
for host_output in output:
    for line in host_output.stdout:
        print(line)
    exitcode = host_output.exit_code 
    print(exitcode)
<pwd>
0

join with a timeout will raise Timeout after as many seconds if command has not yet completed. See documentation on join and output timeouts.

See also Reading Partial Output of Commands That Do Not Terminate

The timeout on run_command is actually for timing out on reading from stdout/stderr. This is confusing, would like to remove it and default to client timeout in 2.0.0.

Leaving open to make documentation better for timeouts and consume_output in particular.

@pkittenis pkittenis changed the title timeout for join call not working as expected Enhance documentation on consume_output and client timeouts Aug 26, 2020
@dharam1
Copy link
Author

dharam1 commented Aug 26, 2020

Thanks!! exit code issue is resolved using consume_output = True

Yeah that will be great if only single timeout is kept instead of multiple timeouts

In below code :

output = client.run_command('pwd',return_list= True,stop_on_errors=False)
client.join(output,consume_output=True,timeout = 5)

getting timeout exception even though pwd is executed instantly.
If i remove timeout it works perfect.

@pkittenis pkittenis changed the title Enhance documentation on consume_output and client timeouts client.join with timeout can raise Timeout before it has occurred Aug 27, 2020
@pkittenis pkittenis added the bug label Aug 27, 2020
@pkittenis
Copy link
Member

pkittenis commented Aug 27, 2020

There is an issue with how Timeout is handled - it can raise Timeout early, before timeout value in seconds given, when the parallel client is busy with other commands.

Have also made #208 to better handle client.join without need for consume output flag.

pkittenis pushed a commit that referenced this issue Aug 27, 2020
@dharam1
Copy link
Author

dharam1 commented Aug 27, 2020

Without using join method i tried with using stdout. Same is happening in stdout. Before timeout is reach its throwing timeout exception.

output = client.run_command('pwd',return_list= True,stop_on_errors=False,timeout = 5)
for _output in output:
try:
print(_output.host,_cmd,list(_output.stderr))
print(_output.host,_cmd,list(_output.stdout))
except Timeout:
print("Timeout...")
print(_output.host,_cmd)
client.reset_output_generators(_output)
pass

In above code its throwing timeout exception instead of consuming output from channel.

######Expected Behaviour######
execute pwd command

#####Actual Behaviour#######
giving timeout exception

Thanks

@pkittenis
Copy link
Member

Can't reproduce with run_command timeout:

for _ in range(3): 
    client = ParallelSSHClient(['localhost']) 
    output = client.run_command('uname', timeout=1, stop_on_errors=False, return_list=True) 
    for host_out in output: 
        print(host_out.host, list(host_out.stdout)) 
        print(host_out.host, list(host_out.stderr)) 
localhost ['Linux']
localhost []
localhost ['Linux']
localhost []
localhost ['Linux']
localhost []

@dharam1
Copy link
Author

dharam1 commented Aug 27, 2020

In the above code in run_command if you give sudo=True parameter it throws Timeout exception but if you remove sudo = True it works fine.

That is very strange any idea why is that ??

@pkittenis
Copy link
Member

Does sudo work without a timeout?

pkittenis added a commit that referenced this issue Aug 27, 2020
* Added ssh-python parallel and single client implementations
* Updated readme, changelog and documentation
* Re-do join with timeout to be on all commands rather than EAGAIN on any - #207
* Moved tests into their own packages
@dharam1
Copy link
Author

dharam1 commented Aug 27, 2020

Yes without timeout sudo works

@pkittenis
Copy link
Member

Can't really say without code to reproduce. Will try again with sudo later on. There is likely still an issue with timeouts on run_command/reading from stdout but will need to reproduce it first.

@pkittenis pkittenis changed the title client.join with timeout can raise Timeout before it has occurred Reading from stdout with timeout set can raise Timeout before it has occurred Aug 27, 2020
@dharam1
Copy link
Author

dharam1 commented Aug 27, 2020

for _ in range(3):
client = ParallelSSHClient(['localhost'])
output = client.run_command('uname', timeout=1, stop_on_errors=False, return_list=True)
for host_out in output:
print(host_out.host, list(host_out.stdout))
print(host_out.host, list(host_out.stderr))
Above code works perfectly

for _ in range(3):
client = ParallelSSHClient(['localhost'])
output = client.run_command('uname', timeout=1, stop_on_errors=False, return_list=True, sudo=True)
for host_out in output:
print(host_out.host, list(host_out.stdout))
print(host_out.host, list(host_out.stderr))

Above code gives timeout exception

for _ in range(3):
client = ParallelSSHClient(['localhost'])
output = client.run_command('uname',stop_on_errors=False, return_list=True, sudo=True)
for host_out in output:
print(host_out.host, list(host_out.stdout))
print(host_out.host, list(host_out.stderr))

Above code works as well by removing timeout and writing sudo

@pkittenis
Copy link
Member

1.12.0 contains fix for client.join with timeout being raised incorrectly. Leaving this open for run_command.

@pkittenis
Copy link
Member

The above works on my system:

In [7]: for _ in range(5): 
   ...:     client = ParallelSSHClient(['localhost']) 
   ...:     output = client.run_command('whoami', timeout=1, stop_on_errors=False, return_list=True, sudo=True) 
   ...:     for host_out in output: 
   ...:         print(host_out.host, list(host_out.stdout)) 
   ...:         print(host_out.host, list(host_out.stderr)) 
   ...:                                                                                                                                     
localhost ['root']
localhost []
localhost ['root']
localhost []
localhost ['root']
localhost []
localhost ['root']
localhost []
localhost ['root']
localhost []

@dharam1
Copy link
Author

dharam1 commented Aug 27, 2020

Which version are you using?
I am using 1.11.2
Might be version issue. I will check on my system and get back.

@dharam1
Copy link
Author

dharam1 commented Aug 28, 2020

for _ in range(5):

output = client.run_command('whoami', timeout=5, stop_on_errors=False, return_list=True,sudo = True) 
for host_out in output: 
    try:
        #print(host_out)
        print(host_out.host, list(host_out.stdout)) 
    except Timeout:
        print("Timeout...")

Above code throws timeout exception when trying to consume stdout

for _ in range(5):

output = client.run_command('whoami', timeout=5, stop_on_errors=False, return_list=True,sudo = True) 
for host_out in output: 
    try:
        print(host_out)
        #print(host_out.host, list(host_out.stdout)) 
    except Timeout:
        print("Timeout...")

If i do not consume stdout it does not give timeout exception

I tried running your code mentioned above but it threw exception in my system

I am using latest version i.e. 1.12.0

@dharam1
Copy link
Author

dharam1 commented Aug 28, 2020

Final Observations in run_command while consuming stdout Version 1.12.0:

  1. without sudo and with timeout ===> works fine (Output as expected)
output = client.run_command('pwd',timeout = 5,return_list= True,stop_on_errors=False)
    for _output in output:
        host_out = _output
        try:
            print(host_out.host, list(host_out.stdout)) 
            print(host_out.host, list(host_out.stderr))
        except Timeout:
            print("In host",_output.host,"got timeout")
            #client.reset_output_generators(_output)
            pass
  1. without sudo and without timeout ===> works fine(Output as Exptected)
output = client.run_command('pwd',return_list= True,stop_on_errors=False)
    for _output in output:
        host_out = _output
        try:
            print(host_out.host, list(host_out.stdout)) 
            print(host_out.host, list(host_out.stderr))
        except Timeout:
            print("In host",_output.host,"got timeout")
            #client.reset_output_generators(_output)
            pass
  1. with sudo and without timeout ===> stuck infinitely (Stuck Infinitely)
output = client.run_command('pwd',return_list= True,stop_on_errors=False,sudo = True)
    for _output in output:
        host_out = _output
        try:
            print(host_out.host, list(host_out.stdout)) 
            print(host_out.host, list(host_out.stderr))
        except Timeout:
            print("In host",_output.host,"got timeout")
            #client.reset_output_generators(_output)
            pass
  1. with sudo and with timeout ===> timeout exception before timeout (Direct Throws timeout exception)
output = client.run_command('pwd',timeout = 5,return_list= True,stop_on_errors=False,sudo = True)
    for _output in output:
        host_out = _output
        try:
            print(host_out.host, list(host_out.stdout)) 
            print(host_out.host, list(host_out.stderr))
        except Timeout:
            print("In host",_output.host,"got timeout")
            #client.reset_output_generators(_output)
            pass

@pkittenis
Copy link
Member

For your use case best to use client.join with timeout.

Am still unable to reproduce the above behaviour on my system.

Can you please run the following:

import logging

from pssh import logger
from pssh.clients import ParallelSSHClient

logger.setLevel(logging.DEBUG)
logging.basicConfig()

client = ParallelSSHClient(['localhost']) 
output = client.run_command('whoami', stop_on_errors=False, return_list=True, sudo=True, timeout=1) 
for host_out in output: 
     print(host_out.host, list(host_out.stdout)) 
     print(host_out.host, list(host_out.stderr)) 

And please attach the debug output on this issue.

On my system:

DEBUG:pssh.clients.native.parallel:Make client request for host localhost, host in clients: False
DEBUG:pssh.clients.base.single:Connecting to localhost:22
DEBUG:pssh.clients.native.single:Authentication with SSH Agent succeeded
DEBUG:pssh.clients.native.single:Executing command 'sudo -S $SHELL -c 'whoami''
INFO:pssh.host_logger:[localhost]	root
localhost ['root']
localhost []

@dharam1
Copy link
Author

dharam1 commented Aug 28, 2020

DEBUG:pssh.clients.native.parallel:Make client request for host localhost, host in clients: False
DEBUG:pssh.clients.native.single:Connecting to localhost:22
DEBUG:pssh.clients.native.single:Agent auth failed with ('No identities match for user %s', <user_name>'), continuing with other authentication methods
DEBUG:pssh.clients.native.single:Private key auth failed, trying password
DEBUG:pssh.clients.native.single:Executing command 'sudo -S $SHELL -c 'whoami''
Traceback (most recent call last):
File "parallel-test.py", line 12, in
print(host_out.host, list(host_out.stdout))
File "/Library/Python/3.7/site-packages/pssh/clients/native/single.py", line 405, in read_output_buffer
for line in output_buffer:
File "pssh/native/_ssh2.pyx", line 47, in _read_output
pssh.exceptions.Timeout
DEBUG:pssh.clients.native.parallel:Disconnecting clients
DEBUG:pssh.clients.native.single:Disconnecting client for host localhost
DEBUG:pssh.clients.native.single:Client socket closed for host localhost

@pkittenis
Copy link
Member

pkittenis commented Aug 28, 2020

Yes, that looks like a similar bug to the one previously in join. Still have not found a good way to reproduce, meaning testing a fix is difficult, but have in mind a solution that should work.

Thanks for reporting and for the debug output.

For the use case of seeing if a command would time out using client.join with timeout is a better solution.

@dharam1
Copy link
Author

dharam1 commented Aug 28, 2020

  1. If i use client.join with output_consume = True, So after join i need the content of stdout,stderr how do i get that?

  2. if in client.join() timeout occurs i need to know which host raised timeout and dusring execution of which command

  3. If from a list of hosts timeout occurs on any host i should be able to capture that exception and move further to capture ouput of hosts where execution was successful

If you can share snippet code for above two requirements that will of really help to me.

I need o/p in format:
hosts {
       "ipaddress1"{
               "cmd1":{
                     "stdout": <output>,
                     "stderr": <stderr>,
                     "timeout": <yes/no>
                 },
               "reachable" : "true/false" i.e. capture ssh exception
        }
}

Thanks

@dharam1
Copy link
Author

dharam1 commented Aug 28, 2020

output = client.run_command('pwd', stop_on_errors=False, return_list=True)
try:
    client.join(output, consume_output = True,timeout = 2)
    for host_out in output:
        print(list(host_out.stdout))
except Timeout as e:
    print(e)

In above code stdout is empty list even cmd was executed
I can handle this without using consume_output

I need to know how to handle timeout exception capturing host and command
As i am executing set of commands serially i can capture which command just need to know how to capture host

@pkittenis
Copy link
Member

What you are describing is what run_command with timeout is for. And I am unable to reproduce it failing early.

Capturing partial output is described here.

Eg:

client.run_command(<cmd>, timeout=<timeout>, return_list=True, stop_on_errors=False)
stdout = []
for host_out in output:
    try:
        for line in host_out.stdout:
            stdout.append(line)
    except Timeout:
        pass

Here is simple script to test read timeouts. Reading from /dev/random never finishes.

import logging

from datetime import datetime
from pssh import logger
from pssh.clients import ParallelSSHClient
from pssh.exceptions import Timeout

logger.setLevel(logging.DEBUG)
# logging.basicConfig()

def get_out(host_out): 
    now = datetime.now()
    try:
        list(host_out.stdout)
    except Timeout:
        pass
    else:
        raise Exception("Did not get Timeout")
    finally: 
        dt = datetime.now() - now 
        return dt

cmd = 'cat /dev/random | tr -dc "\t\n [:alnum:]"'
timeout = 1
hosts = ['localhost' for _ in range(2)]

client = ParallelSSHClient(hosts)

for _ in range(5):
    output = client.run_command(
        cmd, timeout=timeout, return_list=True, stop_on_errors=False)
    for host_out in output:
        dt = get_out(host_out)
        dt_seconds = dt.total_seconds()
        print("Timeout in %s seconds" % (dt_seconds,))

Outputs:

Timeout in 1.001226 seconds
Timeout in 1.001244 seconds
Timeout in 1.001211 seconds
Timeout in 1.000868 seconds
Timeout in 1.001191 seconds
Timeout in 1.001229 seconds
Timeout in 1.00106 seconds
Timeout in 1.001205 seconds
Timeout in 1.001125 seconds
Timeout in 1.001155 seconds

with sudo and without timeout ===> stuck infinitely

That means the command never finishes, likely sudo hanging waiting for input. Would also explain the early timeout - there is no stdout but it is timing out waiting for input.

@dharam1
Copy link
Author

dharam1 commented Aug 30, 2020

Thanks!! for the code snippet that really helps in my use case :)

I got confuse in run_command timeout which i thought was that if there is continue output in stdout there will be no timeout, timeout will occur only when for the given timeout period there is nothing getting written on stdout

My Understanding : Timeout given 5s, so if /dev/random continue to write in stdout there will be no timeout. Timeout will occur only when there is nothing on stdout for continue 5s

Actual : Timeout will occur after 5s on stdout even though we are getting stdout

@dharam1
Copy link
Author

dharam1 commented Aug 30, 2020

Now only bug which i am facing is timeout with sudo(Direct Throws timeout exception) and sudo without timeout (stuck infinitely) in run_command

@pkittenis
Copy link
Member

pkittenis commented Aug 30, 2020

Cannot reproduce the sudo behaviour. Likely your system requires password for sudo or is waiting for input, causing timeout to trigger from lack of stdout.

Have separately been able to reproduce the timeout too early behaviour - see #180.

It happens when an output stream has had no output - the library incorrectly raises timeout immediately in this case.

This is likely the issue with sudo on your system - the command is not being run, no stdout to read, timeout raised too early. This also explains sudo being stuck without timeout if it is waiting on input and never terminates. Check stderr stream and confirm sudo works with ssh outside the library. Since cannot reproduce with password less sudo don't see any issue with sudo behaviour in the library.

My Understanding : Timeout given 5s, so if /dev/random continue to write in stdout there will be no timeout. Timeout will occur only when there is nothing on stdout for continue 5s

That is correct. /dev/random can block though (see man page), which causes timeout. Reading from, eg, /dev/urandom which always has output and never blocks causes no timeouts and the command never terminates - reading from stdout is infinite.

There is a bug here where timeout is raised when nothing has ever been written to the stdout stream at all, as above. This causes early timeouts.

@dharam1
Copy link
Author

dharam1 commented Aug 30, 2020

Okay, so help me understand the bug. When i give timeout as 5s and command is not run initially so no stdout hence timeout raised instantly whereas in actual behaviour it should wait for 5s and after that timeout.

Above case will only happen when command is not run initially at all. If command is run initially after that if it blocks for 5s it will throw timeout such as case in /dev/random

This is likely the issue with sudo on your system - the command is not being run, no stdout to read, timeout raised too. early. This also explains sudo being stuck without timeout if it is waiting on input and never terminates. Check stderr. stream and confirm sudo works with ssh outside the library. Since cannot reproduce with password less sudo don't see. any issue with sudo behaviour in the library.

This makes sense as it aligns with the bug, but still let me check once on my end.

Thanks

@pkittenis
Copy link
Member

pkittenis commented Aug 30, 2020

Easiest reproduction:

from datetime import datetime
<..>
cmd = 'sleep 1; echo me >&2; sleep 1; echo me >&2'
output = client.run_command(cmd, return_list=True, timeout=5)
for host_out in output:
    start = datetime.now()
    try:
        for line in host_out.stdout:
            pass
    except Timeout:
        print(datetime.now() - start)

Timeout raised early, datetime value is < 1sec.

Timeout raised early when trying to read from one output stream which has no data while the other output stream has pending data to read.

If no stream has data timeout does not occur. Stream can be one of stdout, stderr, stdin.

@dharam1
Copy link
Author

dharam1 commented Aug 30, 2020

Thanks for helping!! Yes there was issue with sudo on my system its waiting for password input. So sudo works fine

Any ETA on when above bug might get fix ??

@dharam1
Copy link
Author

dharam1 commented Aug 30, 2020

So i tried to reproduce your above code :
Observations : Whenever any command produces output to any stream you should be able to capture that output first inside the loop using that stream where output was written else direct timeout exception.

Suppose stdout stream has no data for specified timeout it will throw timeout exception instead further it should also be able to check in stderr stream and no data there for timeout than throw timeout.

Because if in multiple commands any command try to write to stderr and i am capturing stdout first, so in failed command it will wait in stdout stream for timeout and without checking stderr will raise timeout.

Code for reference :

___cmd = 'pwd;pwdddd'
output = client.run_command(___cmd, return_list=True,timeout = 5)
for host_out in output:
    start = datetime.now()
    try:
        for _stdout in host_out.stdout:
            print(_stdout)
        for _stdout in host_out.stderr:
            print(_stderr)
    except Timeout:
        print(datetime.now() - start)

Actual:
/home/ubuntu
timeout exception after waiting for 5s

Expected:
/home/ubuntu
command not found: pwdddd

I think along with timeout fix above case should also be handled.

Thanks

@pkittenis
Copy link
Member

Because if in multiple commands any command try to write to stderr and i am capturing stdout first, so in failed command it will wait in stdout stream for timeout and without checking stderr will raise timeout.

Yes, and that is covered. The read timeout should be per read buffer, so separate timeouts for stdout and stderr. This is to be able to capture output separately, otherwise it would not be known which output stream caused the timeout.

The client code does not have to read stdout first, it might only check stderr, or both at once with parallel readers. Timeout exceptions should be able to be handled separately per stream, as per user requirements.

In the above example timeout is raised (incorrectly) because there is output waiting on the stderr stream when reading from stdout. Once all stdout output is consumed and there is no more data in stdout, timeout is raised from unread data on stderr. The fix covers this case.

@dharam1
Copy link
Author

dharam1 commented Aug 30, 2020

Great!! I just thought of sharing it.

How will i be able to read both streams parallel?
Can you share snippet?

@pkittenis
Copy link
Member

Regular gevent code, eg:

from gevent import spawn, joinall
from pssh.clients import ParallelSSHClient

def read_stream(stream):
    stream_out = []
    try:
        for line in stream:
            stream_out.append(line)
    except Exception:
        pass
    return stream_out

client = ParallelSSHClient(<..>)
output = client.run_command(<..>, timeout=<timeout>)

for host_out in output:
    stdout_reader = spawn(read_stream, host_out.stdout)
    stderr_reader = spawn(read_stream, host_out.stderr)
    joinall([stdout_reader, stderr_reader])
    stdout = stdout_reader.get()
    stderr = stderr_reader.get()
    <..>

Could spawn readers for all hosts before joining to read output from all hosts at once and so on. There's more code needed to correctly handle timeouts and stream restarts, this is a simple example.

Can add some more complete examples in the documentation for doing this.

A client function to gather all output in parallel might be useful - standard warnings about large outputs and memory use apply.

PRs welcome for either.

1.12.1 contains a fix for the early timeout reading from stream issue - closing. Thanks for raising.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants