Skip to content

SocketRecvError #200

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
brunotissi opened this issue Sep 5, 2019 · 15 comments
Closed

SocketRecvError #200

brunotissi opened this issue Sep 5, 2019 · 15 comments
Labels

Comments

@brunotissi
Copy link

Sometimes I get a SocketRcvError after the script execution

Traceback (most recent call last):
File "C:/Users/btissi/PycharmProjects/Test/script_4_test.py", line 41, in
for line in host_output.stdout:
File "C:\Users\btissi\PycharmProjects\Test\venv\lib\site-packages\pssh\clients\native\single.py", line 401, in read_output_buffer
for line in output_buffer:
File "pssh/native/_ssh2.pyx", line 45, in _read_output
File "ssh2\channel.pyx", line 116, in ssh2.channel.Channel.read
File "ssh2\channel.pyx", line 144, in ssh2.channel.Channel.read_ex
File "ssh2\utils.pyx", line 179, in ssh2.utils.handle_error_codes
ssh2.exceptions.SocketRecvError

The code is the follow:

hosts = ['10.1.23.23', '10.1.23.24']
client = ParallelSSHClient(hosts, user='xx', password='xx')
command_stp = "show spanning-tree detail | inc ieee|occurr|from|is exec"
output = client.run_command(command_stp)

for host, host_output in output.items():
    for line in host_output.stdout:
        print(line)

When I run the code with only one hosts it works like a charm and doesn't raise any exceptions.

Switches model: WS-C3750X-48PF-S

Is it a common problem?

@vladimir-incountry
Copy link

vladimir-incountry commented Aug 13, 2020

It's pretty much the same for me. I do not get SocketRecvError when I run commands on one host, but I get it when I run it on multiple hosts.
It's weird because the command is the same. The exception occurs when I print STDOUT of the second host.

Example

from pssh.clients.native import ParallelSSHClient

def run_command(host_list, cmd):
    client = ParallelSSHClient(host_list)
    output = client.run_command(cmd)
    # https://parallel-ssh.readthedocs.io/en/latest/output.html
    for host in output.values():
        print("Command STDOUT:")
        for line in host.stdout:
            print(line)
        print("Command STDERR:")
        for line in host.stderr:
            print(line)
        print(f"Command exit code: {host.exit_code}")

If I run sleep 600 on one server, then it finishes without issues.
If I run the same command on multiple servers, I get SocketRecvError once the output of the first host is printed.

Sample output

12:34:37  Command STDOUT:
12:34:37  Command STDERR:
12:34:37  Command exit code: 0
12:34:37  Command STDOUT:
12:34:37  Traceback (most recent call last):
12:34:37    File "./script.py", line 64, in <module>
12:34:37      run_cmd(args.command)
12:34:37    File "./script.py", line 31, in run_cmd
12:34:37      ssh_control.run_command(ips, command)
12:34:37    File "/home/ubuntu/workspace/QA-tests/performance/vkozyrev-test/libs/ssh_control.py", line 12, in run_command
12:34:37      for line in host.stdout:
12:34:37    File "/usr/local/lib/python3.6/dist-packages/pssh/clients/native/single.py", line 401, in read_output_buffer
12:34:37      for line in output_buffer:
12:34:37    File "pssh/native/_ssh2.pyx", line 41, in _read_output
12:34:37    File "ssh2/channel.pyx", line 116, in ssh2.channel.Channel.read
12:34:37    File "ssh2/channel.pyx", line 144, in ssh2.channel.Channel.read_ex
12:34:37    File "ssh2/utils.pyx", line 179, in ssh2.utils.handle_error_codes
12:34:37  ssh2.exceptions.SocketRecvError

SSH config on target hosts

ClientAliveInterval 15
ClientAliveCountMax 3

This means that keepalive packets are sent from the target hosts to the script-running machine to prevent SSH connections from disconnecting. 3 keepalive packets must not be unanswered for a session to disconnect. Those setting work perfectly well for 1 host executing a long-running command.

@pkittenis pkittenis transferred this issue from ParallelSSH/ssh2-python Aug 13, 2020
@pkittenis
Copy link
Member

pkittenis commented Aug 13, 2020

Can you please enable debug logging and post log output for the two cases. Switches typically have non-standard SSH implementations, though if it works on one it should work with the other as long as the SSH server is the same (is it?).

import logging
from pssh import logger

logger.setLevel(logging.DEBUG)
logging.basicConfig()

<code>

@vladimir-incountry
Copy link

Thank you for the prompt reply @pkittenis!
Please find the log attached.

Debug log

09:59:49  DEBUG:pssh.clients.native.parallel:Make client request for host 10.9.11.225, host in clients: False
09:59:49  DEBUG:pssh.clients.native.single:Connecting to 10.9.11.225:22
09:59:49  DEBUG:pssh.clients.native.parallel:Make client request for host 10.9.11.224, host in clients: False
09:59:49  DEBUG:pssh.clients.native.parallel:Make client request for host 10.9.11.122, host in clients: False
09:59:49  DEBUG:pssh.clients.native.parallel:Make client request for host 10.9.11.74, host in clients: False
09:59:49  DEBUG:pssh.clients.native.parallel:Make client request for host 10.9.11.110, host in clients: False
09:59:49  DEBUG:pssh.clients.native.single:Agent auth failed with Unable to connect to agent, continuing with other authentication methods
09:59:49  DEBUG:pssh.clients.native.single:Trying to authenticate with identity file /home/ubuntu/.ssh/id_rsa
09:59:49  DEBUG:pssh.clients.native.single:Authentication succeeded with identity file /home/ubuntu/.ssh/id_rsa
09:59:49  DEBUG:pssh.clients.native.single:Connecting to 10.9.11.224:22
09:59:49  DEBUG:pssh.clients.native.single:Agent auth failed with Unable to connect to agent, continuing with other authentication methods
09:59:49  DEBUG:pssh.clients.native.single:Trying to authenticate with identity file /home/ubuntu/.ssh/id_rsa
09:59:50  DEBUG:pssh.clients.native.single:Authentication succeeded with identity file /home/ubuntu/.ssh/id_rsa
09:59:50  DEBUG:pssh.clients.native.single:Connecting to 10.9.11.122:22
09:59:50  DEBUG:pssh.clients.native.single:Agent auth failed with Unable to connect to agent, continuing with other authentication methods
09:59:50  DEBUG:pssh.clients.native.single:Trying to authenticate with identity file /home/ubuntu/.ssh/id_rsa
09:59:50  DEBUG:pssh.clients.native.single:Authentication succeeded with identity file /home/ubuntu/.ssh/id_rsa
09:59:50  DEBUG:pssh.clients.native.single:Connecting to 10.9.11.74:22
09:59:50  DEBUG:pssh.clients.native.single:Agent auth failed with Unable to connect to agent, continuing with other authentication methods
09:59:50  DEBUG:pssh.clients.native.single:Trying to authenticate with identity file /home/ubuntu/.ssh/id_rsa
09:59:50  DEBUG:pssh.clients.native.single:Authentication succeeded with identity file /home/ubuntu/.ssh/id_rsa
09:59:50  DEBUG:pssh.clients.native.single:Connecting to 10.9.11.110:22
09:59:50  DEBUG:pssh.clients.native.single:Agent auth failed with Unable to connect to agent, continuing with other authentication methods
09:59:50  DEBUG:pssh.clients.native.single:Trying to authenticate with identity file /home/ubuntu/.ssh/id_rsa
09:59:50  DEBUG:pssh.clients.native.single:Authentication succeeded with identity file /home/ubuntu/.ssh/id_rsa
09:59:53  DEBUG:pssh.clients.native.single:Executing command 'sleep 600'
09:59:53  DEBUG:pssh.clients.native.single:Executing command 'sleep 600'
09:59:53  DEBUG:pssh.clients.native.single:Executing command 'sleep 600'
09:59:53  DEBUG:pssh.clients.native.single:Executing command 'sleep 600'
09:59:54  DEBUG:pssh.clients.native.single:Executing command 'sleep 600'
10:10:01  Command STDOUT:
10:10:01  Command STDERR:
10:10:01  Command exit code: 0
10:10:01  Command STDOUT:
10:10:01  Terminating instances ['i-0ef1a797c8ca89daf', 'i-0c16788421ff6c9a7', 'i-025d8a32788ef0997', 'i-0ba84922ccb7ea394', 'i-0886bb5b119d34b8f']
10:10:01  Traceback (most recent call last):
10:10:01    File "./niflheim", line 64, in <module>
10:10:01      run_cmd(args.command)
10:10:01    File "./niflheim", line 31, in run_cmd
10:10:01      ssh_control.run_command(ips, command)
10:10:01    File "/home/ubuntu/workspace/QA-tests/performance/vkozyrev-test/libs/ssh_control.py", line 16, in run_command
10:10:01      for line in host.stdout:
10:10:01    File "/usr/local/lib/python3.6/dist-packages/pssh/clients/native/single.py", line 401, in read_output_buffer
10:10:01      for line in output_buffer:
10:10:01    File "pssh/native/_ssh2.pyx", line 41, in _read_output
10:10:01    File "ssh2/channel.pyx", line 116, in ssh2.channel.Channel.read
10:10:01    File "ssh2/channel.pyx", line 144, in ssh2.channel.Channel.read_ex
10:10:01    File "ssh2/utils.pyx", line 179, in ssh2.utils.handle_error_codes
10:10:01  ssh2.exceptions.SocketRecvError

@pkittenis
Copy link
Member

Thank you for the debug output.

  • Is it always the same host that errors out?
  • If so, can you show output with running only on that host?
  • Can you show successful output with only running on a single host?
  • Can you show output of ssh -V on each target server?

@vladimir-incountry
Copy link

vladimir-incountry commented Aug 17, 2020

Is it always the same host that errors out?
If so, can you show output with running only on that host?

Target hosts are created dynamically from the same AWS Amazon Machine Image, i.e. they are pretty much identical. Only IP addresses change. They get terminated after each run.

Can you show successful output with only running on a single host?

Yes, sure.

The output of running `sleep 600` on only one machine.

16:20:54  DEBUG:pssh.clients.native.parallel:Make client request for host 10.9.11.247, host in clients: False
16:20:54  DEBUG:pssh.clients.native.single:Connecting to 10.9.11.247:22
16:20:54  DEBUG:pssh.clients.native.single:Agent auth failed with Unable to connect to agent, continuing with other authentication methods
16:20:54  DEBUG:pssh.clients.native.single:Trying to authenticate with identity file /home/ubuntu/.ssh/id_rsa
16:20:54  DEBUG:pssh.clients.native.single:Authentication succeeded with identity file /home/ubuntu/.ssh/id_rsa
16:20:55  DEBUG:pssh.clients.native.single:Executing command 'sleep 600'
16:31:02  Command STDOUT:
16:31:02  Command STDERR:
16:31:02  Command exit code: 0

Can you show output of ssh -V on each target server?

10.9.11.20
OpenSSH_7.6p1 Ubuntu-4ubuntu0.3, OpenSSL 1.0.2n  7 Dec 2017
10.9.11.97
OpenSSH_7.6p1 Ubuntu-4ubuntu0.3, OpenSSL 1.0.2n  7 Dec 2017
10.9.11.174
OpenSSH_7.6p1 Ubuntu-4ubuntu0.3, OpenSSL 1.0.2n  7 Dec 2017
10.9.11.171
OpenSSH_7.6p1 Ubuntu-4ubuntu0.3, OpenSSL 1.0.2n  7 Dec 2017
10.9.11.223
OpenSSH_7.6p1 Ubuntu-4ubuntu0.3, OpenSSL 1.0.2n  7 Dec 2017

And finally here is a showcase of successfully running a command on all 5 servers. When they run fast, there are no issues.

The output of running `sleep 10; date` on 5 machines

16:16:09  DEBUG:pssh.clients.native.parallel:Make client request for host 10.9.11.189, host in clients: False
16:16:09  DEBUG:pssh.clients.native.single:Connecting to 10.9.11.189:22
16:16:09  DEBUG:pssh.clients.native.parallel:Make client request for host 10.9.11.92, host in clients: False
16:16:09  DEBUG:pssh.clients.native.parallel:Make client request for host 10.9.11.31, host in clients: False
16:16:09  DEBUG:pssh.clients.native.parallel:Make client request for host 10.9.11.88, host in clients: False
16:16:09  DEBUG:pssh.clients.native.parallel:Make client request for host 10.9.11.90, host in clients: False
16:16:09  DEBUG:pssh.clients.native.single:Agent auth failed with Unable to connect to agent, continuing with other authentication methods
16:16:09  DEBUG:pssh.clients.native.single:Trying to authenticate with identity file /home/ubuntu/.ssh/id_rsa
16:16:09  DEBUG:pssh.clients.native.single:Authentication succeeded with identity file /home/ubuntu/.ssh/id_rsa
16:16:09  DEBUG:pssh.clients.native.single:Connecting to 10.9.11.92:22
16:16:09  DEBUG:pssh.clients.native.single:Agent auth failed with Unable to connect to agent, continuing with other authentication methods
16:16:09  DEBUG:pssh.clients.native.single:Trying to authenticate with identity file /home/ubuntu/.ssh/id_rsa
16:16:09  DEBUG:pssh.clients.native.single:Authentication succeeded with identity file /home/ubuntu/.ssh/id_rsa
16:16:09  DEBUG:pssh.clients.native.single:Connecting to 10.9.11.31:22
16:16:09  DEBUG:pssh.clients.native.single:Agent auth failed with Unable to connect to agent, continuing with other authentication methods
16:16:09  DEBUG:pssh.clients.native.single:Trying to authenticate with identity file /home/ubuntu/.ssh/id_rsa
16:16:09  DEBUG:pssh.clients.native.single:Authentication succeeded with identity file /home/ubuntu/.ssh/id_rsa
16:16:09  DEBUG:pssh.clients.native.single:Connecting to 10.9.11.88:22
16:16:09  DEBUG:pssh.clients.native.single:Agent auth failed with Unable to connect to agent, continuing with other authentication methods
16:16:09  DEBUG:pssh.clients.native.single:Trying to authenticate with identity file /home/ubuntu/.ssh/id_rsa
16:16:09  DEBUG:pssh.clients.native.single:Authentication succeeded with identity file /home/ubuntu/.ssh/id_rsa
16:16:09  DEBUG:pssh.clients.native.single:Connecting to 10.9.11.90:22
16:16:09  DEBUG:pssh.clients.native.single:Agent auth failed with Unable to connect to agent, continuing with other authentication methods
16:16:09  DEBUG:pssh.clients.native.single:Trying to authenticate with identity file /home/ubuntu/.ssh/id_rsa
16:16:09  DEBUG:pssh.clients.native.single:Authentication succeeded with identity file /home/ubuntu/.ssh/id_rsa
16:16:10  DEBUG:pssh.clients.native.single:Executing command 'sleep 10; date'
16:16:10  DEBUG:pssh.clients.native.single:Executing command 'sleep 10; date'
16:16:10  DEBUG:pssh.clients.native.single:Executing command 'sleep 10; date'
16:16:10  DEBUG:pssh.clients.native.single:Executing command 'sleep 10; date'
16:16:10  DEBUG:pssh.clients.native.single:Executing command 'sleep 10; date'
16:16:20  INFO:pssh.host_logger:[10.9.11.189]  Mon Aug 17 13:16:33 UTC 2020
16:16:20  INFO:pssh.host_logger:[10.9.11.92]     Mon Aug 17 13:16:33 UTC 2020
16:16:20  INFO:pssh.host_logger:[10.9.11.31]     Mon Aug 17 13:16:33 UTC 2020
16:16:20  INFO:pssh.host_logger:[10.9.11.88]   Mon Aug 17 13:16:34 UTC 2020
16:16:20  INFO:pssh.host_logger:[10.9.11.90]     Mon Aug 17 13:16:34 UTC 2020
16:16:20  Command STDOUT:
16:16:20  Mon Aug 17 13:16:33 UTC 2020
16:16:20  Command STDERR:
16:16:20  Command exit code: 0
16:16:20  Command STDOUT:
16:16:20  Mon Aug 17 13:16:33 UTC 2020
16:16:20  Command STDERR:
16:16:20  Command exit code: 0
16:16:20  Command STDOUT:
16:16:20  Mon Aug 17 13:16:33 UTC 2020
16:16:20  Command STDERR:
16:16:20  Command exit code: 0
16:16:20  Command STDOUT:
16:16:20  Mon Aug 17 13:16:34 UTC 2020
16:16:20  Command STDERR:
16:16:20  Command exit code: 0
16:16:20  Command STDOUT:
16:16:20  Mon Aug 17 13:16:34 UTC 2020
16:16:20  Command STDERR:
16:16:20  Command exit code: 0

Please let me know if I can provide some other piece of information to make the troubleshooting easier.

@pkittenis
Copy link
Member

Does running ssh <your host> 'sleep 600; date' work? It looks like the server is disconnecting the client after inactivity. Don't see anything that would point to an issue with the library so far.

@lethalwp
Copy link

I think i have the same problem when using Single SSHClient?:

here is my code:

import re
from threading import Thread, Lock
from time import sleep
from datetime import datetime,timedelta,timezone
from pssh.clients import SSHClient
from pssh.clients import SSHClient as SSHClient2

if 1==1:
#block1
client=SSHClient(
"centosserver1",
user="root",
password="xxx",
num_retries=3, retry_delay=5, timeout=5, keepalive_seconds=60
)

o=client.run_command('ls')

o=client.run_command('command 1')
..
status = ''
for line in o[2]:
status += line+"\n"

print(status)
client.disconnect()

if 1==1:
#block2
client=SSHClient2(
"centosserver2",
user="root",
password="xxx",
num_retries=3, retry_delay=5, timeout=5, keepalive_seconds=60
)

o=client.run_command('ls | head -n 1')

o=client.run_command('command 2')

status = ''
for line in o[2]:
status += line+"\n"

print(status)
client.disconnect()


  • when i run block 1 independantly, or block 2 independantly: they both work.
  • I thought maybe it was a problem with the class initialization or something but it isn't. (thus the as SSHClient2)
  • If the commands are "ls" in block1 & block 2, the code works for both
  • when i run the real commands i need, which are pretty fast also, the second block will constantly fail on reading on output o[2]:
    (venv) [lethalwp@little requibuildingblocks]$ ./searchrequi-sshclienttest.py
    (snip output from block 1)

Traceback (most recent call last):
File "./searchrequi-sshclienttest.py", line 46, in
for line in o[2]:
File "/home/lethalwp/Downloads/boulot/requisitoire/requibuildingblocks/venv/lib64/python3.8/site-packages/pssh/clients/native/single.py", line 405, in read_output_buffer
for line in output_buffer:
File "pssh/native/_ssh2.pyx", line 45, in _read_output
File "ssh2/channel.pyx", line 116, in ssh2.channel.Channel.read
File "ssh2/channel.pyx", line 144, in ssh2.channel.Channel.read_ex
File "ssh2/utils.pyx", line 121, in ssh2.utils.handle_error_codes
ssh2.exceptions.SocketDisconnectError

@lethalwp
Copy link

my problem somehow happens because client.disconnect() is interfering with the the SSHClient() of block 2. If i remove disconnect() from block 1; it works as it should.

@pkittenis
Copy link
Member

pkittenis commented Aug 25, 2020

Cannot reproduce.

Simple example script with sleep and reading both stdout/stderr:

from pssh.clients import ParallelSSHClient
import datetime

output = []
host = 'localhost'
hosts = [host, host]
client = ParallelSSHClient(hosts)

# Run 10 five second sleeps
cmds = ['sleep 5; uname' for _ in range(10)]
start = datetime.datetime.now()
for cmd in cmds:
    output.append(client.run_command(cmd, stop_on_errors=False))
end = datetime.datetime.now()
print("Started %s commands on %s host(s) in %s" % (
    len(cmds), len(hosts), end-start,))
start = datetime.datetime.now()
for _output in output:
    client.join(_output)
    for host_out in _output.values():
        for line in host_out.stdout:
            print(line)
        for line in host_out.stderr:
            print(line)
        print(f"Exit code: {host_out.exit_code}")
end = datetime.datetime.now()
print("All commands finished in %s" % (end-start,))

Output:

Started 10 commands on 2 host(s) in 0:00:00.190672
Linux
Exit code: 0
Linux
Exit code: 0
Linux
Exit code: 0
Linux
Exit code: 0
Linux
Exit code: 0
Linux
Exit code: 0
Linux
Exit code: 0
Linux
Exit code: 0
Linux
Exit code: 0
Linux
Exit code: 0
Linux
Exit code: 0
Linux
Exit code: 0
Linux
Exit code: 0
Linux
Exit code: 0
Linux
Exit code: 0
Linux
Exit code: 0
Linux
Exit code: 0
Linux
Exit code: 0
Linux
Exit code: 0
Linux
Exit code: 0
All commands finished in 0:00:05.015416

Single clients:

from pssh.clients import SSHClient
import datetime

output = []
host = 'localhost'
client1 = SSHClient(host)
client2 = SSHClient(host)

out1 = client1.run_command('sleep 5; uname')
out2 = client2.run_command('sleep 5; uname')

for line in out1[2]:
    print(line)
client1.disconnect()
for line in out2[2]:
    print(line)
client2.disconnect()

Output:

Linux
Linux

For reporting issues please provide a clear and reproducible example.

@pkittenis
Copy link
Member

Note that heavy IDE environments that make use of threads, threading happening inside client code, multiprocessing et al will all interfere with gevent.

parallel-ssh itself uses threads to offload heavy CPU tasks like authentication but it does so in a way that works with gevent. Other code that attempts something similar might not.

As above, please provide a reproducible example to best get good help with an issue.

@sjvdm
Copy link

sjvdm commented Aug 25, 2020

Reproduced on a server running:
CentOS8
Python3.8
pssh 1.11.1

Run the following code in a python notebook, or a script. I get a SocketRecvError about 80% of the time.

from pssh.clients import ParallelSSHClient
import sys
import traceback
import pssh

print('pssh.__version__')
print(pssh.__version__)

for i in range(0,10):
    try:
        hosts = ['127.0.0.1']
        client = ParallelSSHClient(hosts)

        output = client.run_command('uname')

        for host, host_output in output.items():
            for line in host_output.stdout:
                print("Host [%s] - %s" % (host, line))
                
    except Exception as e:
        exc_type, exc_value, exc_traceback = sys.exc_info()
        e = repr(traceback.format_exception(exc_type, exc_value,
                                          exc_traceback))
        print(e)

Output:

pssh.__version__
1.11.1
Host [127.0.0.1] - Linux
Host [127.0.0.1] - Linux
['Traceback (most recent call last):\n', '  File "<ipython-input-15-0915ad9ddde1>", line 18, in <module>\n    for line in host_output.stdout:\n', '  File "..../lib/python3.8/site-packages/pssh/clients/native/single.py", line 405, in read_output_buffer\n    for line in output_buffer:\n', '  File "pssh/native/_ssh2.pyx", line 46, in _read_output\n', '  File "ssh2/channel.pyx", line 116, in ssh2.channel.Channel.read\n', '  File "ssh2/channel.pyx", line 144, in ssh2.channel.Channel.read_ex\n', '  File "ssh2/utils.pyx", line 179, in ssh2.utils.handle_error_codes\n', 'ssh2.exceptions.SocketRecvError\n']

...

@pkittenis
Copy link
Member

pkittenis commented Aug 25, 2020

Thanks, can reproduce. Looks like issue with single client disconnections from last version.

@pkittenis
Copy link
Member

As a workaround, call client.join(output) at some point before the client goes out of scope.

This works correctly:

from pssh.clients import ParallelSSHClient
import sys
import traceback
import pssh

print('pssh.__version__')
print(pssh.__version__)

for i in range(0,10):
    try:
        hosts = ['127.0.0.1']
        client = ParallelSSHClient(hosts)

        output = client.run_command('uname')

        for host, host_output in output.items():
            for line in host_output.stdout:
                print("Host [%s] - %s" % (host, line))
        client.join(output)
                
    except Exception as e:
        exc_type, exc_value, exc_traceback = sys.exc_info()
        e = repr(traceback.format_exception(exc_type, exc_value,
                                          exc_traceback))
        print(e)
Host [127.0.0.1] - Linux
Host [127.0.0.1] - Linux
Host [127.0.0.1] - Linux
Host [127.0.0.1] - Linux
Host [127.0.0.1] - Linux
Host [127.0.0.1] - Linux
Host [127.0.0.1] - Linux
Host [127.0.0.1] - Linux
Host [127.0.0.1] - Linux
Host [127.0.0.1] - Linux

pkittenis pushed a commit that referenced this issue Aug 25, 2020
@pkittenis
Copy link
Member

Issue was if channel EOF was not sent prior to disconnection, state can be left in libssh2. On subsequent open sessions, libssh2 attempts to EOF the left over channel. If the prior socket data is not available libssh2 can send corrupt packet to server and server terminates connection as seen in libssh2 trace messages.

DEBUG:pssh.clients.native.single:Disconnecting client for host 127.0.0.1
[libssh2] 1.279641 Socket: Sent 64/64 bytes at 0x7287440055c8          <---------- Client disconnected
                                            ==== New client ====
DEBUG:pssh.clients.native.single:Connecting to 127.0.0.1:22
DEBUG:pssh.clients.native.single:Authentication with SSH Agent succeeded
[libssh2] 1.337248 Conn: Allocated new channel ID#0
[libssh2] 1.337255 Conn: Opening Channel - win 2097152 pack 32768
[libssh2] 1.337281 Socket: Sent 80/80 bytes at 0x728744030dc8
[libssh2] 1.338485 Socket: Recved 688/16384 bytes to 0x72874402cd88+0
[libssh2] 1.338500 Conn: Received global request type [email protected] (wr 0)
[libssh2] 1.379052 Socket: Recved 224/16384 bytes to 0x72874402cd88+0
[libssh2] 1.379068 Conn: Connection Established - ID: 0/0 win: 0/2097152 pack: 32768/32768
DEBUG:pssh.clients.native.single:Executing command 'uname'
[libssh2] 1.379151 Conn: starting request(exec) on channel 0/0, message=uname
[libssh2] 1.379174 Socket: Sent 64/64 bytes at 0x728744030dc8
[libssh2] 1.379392 Socket: Recved 112/16384 bytes to 0x72874402cd88+0
[libssh2] 1.379402 Conn: Window adjust for channel 0/0, adding 2097152 bytes, new window_size=2097152
[libssh2] 1.379409 Conn: Freeing channel 0/0 resources              <------- libssh2 cleaning up previous channel
[libssh2] 1.379412 Conn: Sending EOF on channel 0/0                <------- libssh2 cleaning up previous channel
[libssh2] 1.379438 Socket: Sent 48/48 bytes at 0x7287440055c8
[libssh2] 1.379441 Conn: Closing channel 0/0                            <------- libssh2 cleaning up previous channel
[libssh2] 1.379449 Socket: Error sending 48 bytes: 32
[libssh2] 1.379452 Failure Event: -7 - Unable to send close-channel request, but closing anyway
[libssh2] 1.892815 Conn: channel_read() wants 1024 bytes from channel 0/0 stream #0           <-------- reading from new channel
[libssh2] 1.899815 Conn: channel_read() wants 1024 bytes from channel 0/0 stream #0     <-------- reading from new channel
[libssh2] 1.899837 Socket: Recved 240/16384 bytes to 0x71b34002cd88+0

Latest changes stop this from happening by not closing socket before channel can clean itself up. Sessions are disconnected correctly on both parallel and single client deletion/out of scope.

Calling client.join ensures channel is cleaned up correctly by explicitly sending EOF - not doing it should not cause clients to break either. Thanks for report.

Any other similar issue please report - reproducible example is very useful to get good help.

1.11.2 contains fixes.

@sjvdm
Copy link

sjvdm commented Aug 25, 2020

Confirmed also that in 1.11.2 this is resolved.

Thanks so much for an extremely quick fix!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants