Skip to content

client.join() with timeout throws an exception whether the timeout was reached or not #104

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
gmyuval opened this issue Feb 26, 2018 · 3 comments
Labels

Comments

@gmyuval
Copy link

gmyuval commented Feb 26, 2018

Steps to reproduce:
Run the following snippet, after assigning values to host_lst, username and password variables

from pssh.pssh2_client import ParallelSSHClient

client = ParallelSSHClient(hosts=host_lst, user=username password=password)
cmd = client.run_command('ls -al')
client.join(cmd, timeout=60)
  1. The following exception is being thrown:
.../python2.7/site-packages/pssh/pssh2_client.pyc in join(self, output, consume_output, timeout)
    220                 raise Timeout(
    221                     "Timeout of %s sec(s) reached on host %s with command "
--> 222                     "still running", timeout, host)
    223             elif consume_output:
    224                 for line in output[host].stdout:

Timeout: ('Timeout of %s sec(s) reached on host %s with command still running', 60, '...')

Expected behaviour:
No exception was supposed to be thrown as the command completes successfully in the allotted time

Actual behaviour:
Timeout exception was thrown

Additional info:
From what I see the bug results from the following if statement (line 219 in pssh2_client.py):

if timeout and not output[host].channel.eof():
...

As far as I see, output[host].channel.eof() is False when the command finishes to run thus the condition evaluates as True and an exception is thrown.
The only time that I see that output[host].channel.eof() is True is after the output was consumed.

@parisnps
Copy link

I can confirm this behavior as well. I have scratched quite a bit of time on this :(

@pkittenis pkittenis added the bug label Mar 5, 2018
@pkittenis
Copy link
Member

Hi there,

Thank you for the interest and report.

Confirmed - this occurs when there is unread output and timeout is used.

As a work-around, read buffers prior to calling join. Consuming buffers is required for the join with timeout case and will be enforced as an API change in next release.

From further testing, there are cases where a join with a timeout will return immediately while commands are still running. These seem to be unavoidable as timeout is passed on to the select call and is applied to the socket while the socket could be running multiple commands. On one call to join the socket can be finished prior to timeout, while on the next call another remote command will have started which is not finished. Hence the attempt to use eof as an indicator of completion.

However, as that fails when buffers have not been read, and reading buffers can be subject to same timeout, a join with a timeout will have to force consumption of buffers.

Eg, consider this code:

client.join(client.run_command('echo blah; sleep 15'), timeout=16)

On the call to join, the first command will have finished already and join returns immediately as nothing is blocking the socket where the timeout is applied. Second command may or may not have started (race condition) but there is no indication from server that a command has started, only that it has been accepted. A subsequent join call will wait ~15sec for the second command as the socket is blocked for the duration of that command.

However, after the first join call not all commands have finished which is misleading.

This code, on the other hand:

output = client.run_command('echo blah; sleep 15', timeout=16)
for host, host_out in output.items():
    for line in host_out.stdout:
        pass
    for line in host_out.stderr:
        pass

Ensures all commands have finished as buffers are read and eof is reached.

pkittenis pushed a commit that referenced this issue Mar 5, 2018
…ed raising timeout on native client wait_finished.
pkittenis pushed a commit that referenced this issue Mar 7, 2018
Updated documentation, changelog.
Resolves #104.
@pkittenis
Copy link
Member

1.5.0 fixes this issue. As mentioned above, join with timeout now consumes output.

There is also some documentation on this functionality and possible race conditions.

Sorry for the issues you have been facing with this feature. Please do report any other issues you may face.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants