Skip to content

chore(client-kinesis): switch to H2 handler #3577

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Apr 29, 2022

Conversation

AllanZhengYP
Copy link
Contributor

@AllanZhengYP AllanZhengYP commented Apr 29, 2022

Issue

Resolves: #1206

Description

Thanks to #3541, using HTTP/2 handler in service client is no longer blocked by weird socket timeout. This change removes hard-coding HTTP/2 clients but rely on the Smithy model.

This change enable us to use HTTP/2 in Kinesis client without breaking customer. Added integration test for it.

This change doesn't include defaultConfigProvider to the handler constructor because none of the default config is applicable to HTTP/2 handler for now.

Testing

Integration test


By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@AllanZhengYP AllanZhengYP requested a review from a team as a code owner April 29, 2022 01:22
@trivikr trivikr changed the title chore(client-kinesis): switch to H2 handler and integration test chore(client-kinesis): switch to H2 handler Apr 29, 2022
@AllanZhengYP AllanZhengYP force-pushed the generate-h2 branch 2 times, most recently from c157f2a to 1de63ee Compare April 29, 2022 22:50
@curtdept
Copy link

curtdept commented May 5, 2022

@AllanZhengYP After about an hour I'm hitting tons of errors "The session has been destroyed" when using this update with Kinesis. Looks like its tearing down and not reconnecting?

edit: looks like elastic/apm-agent-nodejs#2670 unrelated still occurred after removing apm

image

@up73k
Copy link

up73k commented May 6, 2022

After about an hour I'm hitting tons of errors "The session has been destroyed" when using this update with Kinesis

Reproduces on Node 16 @ Heroku for me too.

@curtdept
Copy link

curtdept commented May 6, 2022

@AllanZhengYP I removed elastic apm as a test and this occured, its some defect in the http2 handling.
Current dump:

2022-05-06T18:21:00.231Z debug:Error Error [ERR_HTTP2_INVALID_SESSION]: The session has been destroyed
--
at new NodeError (node:internal/errors:372:5)
at ClientHttp2Session.request (node:internal/http2/core:1721:13)
at /app/node_modules/@aws-sdk/node-http-handler/dist-cjs/node-http2-handler.js:57:33
at new Promise (<anonymous>)
at NodeHttp2Handler.handle (/app/node_modules/@aws-sdk/node-http-handler/dist-cjs/node-http2-handler.js:37:16)
at /app/node_modules/@aws-sdk/client-kinesis/dist-cjs/commands/PutRecordCommand.js:27:58
at /app/node_modules/@aws-sdk/middleware-serde/dist-cjs/deserializerMiddleware.js:5:32
at /app/node_modules/@aws-sdk/middleware-signing/dist-cjs/middleware.js:11:26
at runMicrotasks (<anonymous>)
at processTicksAndRejections (node:internal/process/task_queues:96:5)   deviceId:LoadTest-95053  cid:47d7ade0-cd69-11ec-b20a-d773747d1635

@v-fpavlik
Copy link

Me and @up73k resolved that problem via using "@aws-sdk/client-kinesis": "3.81.0". If you need hotfix it works.

@AllanZhengYP
Copy link
Contributor Author

Hi @curtdept Thank you for providing the stacktrace. However, the stack trace is different to #3602. I haven't been able to reproduce it. Do you have any other observations? It'll be very helpful.

@curtdept
Copy link

curtdept commented May 17, 2022

@AllanZhengYP not sure on the trace, the 3602 one looks parsed to me. In terms of observations, nothing useful, we have a fairly busy system and it for sure takes a few minutes (as few as 30 I have seen but never seems to be more then an hour) before it starts erroring. Once that http2 session is in an errored state, all comms to kinesis stop and its 100% errors only at that point until container restart.

We are using pretty standard node:lts-slim (aka node 16.15.0) containers in fargate this is running in, and nothing really special code wise.

AllanZhengYP added a commit to AllanZhengYP/aws-sdk-js-v3 that referenced this pull request May 17, 2022
@curtdept
Copy link

curtdept commented May 18, 2022

@AllanZhengYP just looking at the node-http2-handler and the nodejs codebase, then only thing that makes sense is there is a session in the session cache that, under high load, is racing the destroy callback on the get. You may just need to guard clause the state when selecting a session from the cache (!session.destroyed). https://github.com/aws/aws-sdk-js-v3/blob/main/packages/node-http-handler/src/node-http2-handler.ts#L179

There could also be a leak in that cache cleanup somewhere in there too.

@AllanZhengYP
Copy link
Contributor Author

@curtdept I find we didn't handle the server-side session close properly in our handler. See the linked PR above for more context. Can you try the latest version(will be later published today) and confirm it it fix the issue for you?

@curtdept
Copy link

@AllanZhengYP sure ill keep an eye out for it.

@AllanZhengYP
Copy link
Contributor Author

@curtdept The fix is already available on 3.94.0. If it works, can you comment on the #3602 for better observability?

@github-actions
Copy link

github-actions bot commented Jun 2, 2022

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs and link to relevant comments in this thread.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jun 2, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Set default H2 session timeout
5 participants