-
Notifications
You must be signed in to change notification settings - Fork 151
scp_send silently fails to send all data for some targets (see bug #226) #337
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
As a side note, I haven't been able to reproduce these errors with the asyncssh 2.9.0 project that uses asyncio instead of gevent. |
I have seen a similar issue when upload a |
Hi there, Thanks for the interest and report. Since there have been multiple reports on this issue, I do believe there is one. Having gone through the code again, can see where there is potential for a race condition which sounds like the behaviour here. The difference in the written vs total bytes is exactly 2k as well, which is the internal buffer size of libssh2. That can't be an accident. I am curious if you see the same behaviour with Another question, are you able to reproduce this against a local SSH server or only remote? I still need to be able to reproduce in a test to confirm a fix works, that would help in doing so. The existing tests do not use such a large file size, will try again with 100MB. |
I can't use sftp with my systems, not sure why and I haven't deep dived on that. So I am using scp (was before with paramiko), and now I'm using scp with asyncssh as it does support it. |
I've been focused on this inside aws. It reproduces very consistently for me within the same ec2 region and when going across ec2 regions. So that's more overhead than a local ssh server of course. I haven't tried on my own home LAN. |
Yes, that is approximately the amount of bytes missing in my case. The remote-side in my production-system use We use
AFAIK our
https://github.com/shopify/toxiproxy Maybe you can play around with timings/bandwith using toxiproxy. Local setups might be "too good" under normal circumstances. |
Triggered by client going out of scope, which caused a race condition if client is not kept alive.
Please reply on this issue if still having problems, will look into it asap. |
Nice fix - thanks! |
Describe the bug
Same thing as the closed bug #226 - scp operations are not transferring the whole file at times. This bug was closed as can't reproduce. I realize this is really hard to deal with as this appears to be a timing issue with gevent greenlets and non-blocking I/O. I can't really debug it as just using a debugger introduces more time and then the problem never manifests.
To Reproduce
Steps to reproduce the behavior:
Expected behavior
Destination files match source files.
Actual behaviour
At least one target server's destination file is missing a certain number of files at the end.
For a file of 102912000 bytes I typically have one or more servers that only have 102891520 bytes, and that smaller number is very consistent.
Screenshots
If applicable, add screenshots to help explain your problem.
Additional information
I've run into this a bunch today as I've begun using this library. It's hard to repro every time, but I've got a path where sending 100mb files to two hosts is missing the last few bytes on just one about 50% of the time. Still investigating, but this is a real issue...
If I add a sleep(2) at the end of the data loop in _scp_send I never have the problem, so that's interesting.
This is against openssh-server 7.4p1-21 running on amazonlinux2, using:
parallel-ssh==2.8.0
ssh-python==0.9.0
ssh2-python==0.27.0
The virtualenv is using python 3.10.1
in a virtualenv. The code I'm running is just:
Using a file that is 99mb and sending to two servers. This is code integrated with some fabric2 code.
In further testing, I'm able to reproduce the problems with a 10561 bytes text file and 4 servers, and quite often or or two of the servers gets a 0 byte file. If I add the gevent sleep call in at the end of the _scp_send data/eagain_write loop like sleep(2) I never see the problem. Seems like a channel can get closed before all the data is flushed out?
This is an awesome library, you've done great work. I can't really use it though until I get this problem figured out as it happens at least 50% of the time.
The text was updated successfully, but these errors were encountered: