Skip to content

Start MainLoop with a Task marked as LongRunning #1321

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 24, 2023
Merged

Start MainLoop with a Task marked as LongRunning #1321

merged 1 commit into from
Mar 24, 2023

Conversation

d-jagoda
Copy link

@d-jagoda d-jagoda commented Mar 21, 2023

Proposed Changes

Addresses: #1318

.NET Core in Linux uses epoll for monitoring IO descriptors for ready states and the PortableThreadPool for dispatching events returned by epoll_wait. Unlike the Windows implementation where IO completion ports are bound to an underlying Windows thread pool the use of PortableThreadPool for async IO makes it more sensitive to long running thread pool threads as the thread pool heuristics cannot accurately determine the available threads against the demand. Quite often the Gate thread has to interfere when it detects starvation of the thread pool in order to force increase of the threads and this process can some times take a second or more.

In a containerize world, having even a single long running thread pool thread leads to reduced throughputs and timeouts on time sensitive requests whenever async IO is used - for example Kestrel.

In my tests, Marking StartMainLoop task as LongRunning has increased the throughput of our services and has gotten rid of readiness probe timeouts when deployed to Kubernetes.

Types of Changes

What types of changes does your code introduce to this project?
Put an x in the boxes that apply

  • Bug fix (non-breaking change which fixes issue #NNNN)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause an observable behavior change in existing systems)
  • Documentation improvements (corrections, new content, etc)
  • Cosmetic change (whitespace, formatting, etc)

Checklist

Put an x in the boxes that apply. You can also fill these out after creating
the PR. If you're unsure about any of them, don't hesitate to ask on the
mailing list. We're here to help! This is simply a reminder of what we are
going to look for before merging your code.

  • I have read the CONTRIBUTING.md document
  • I have signed the CA (see https://cla.pivotal.io/sign/rabbitmq)
  • All tests pass locally with my changes
  • I have added tests that prove my fix is effective or that my feature works
  • I have added necessary documentation (if appropriate)
  • Any dependent changes have been merged and published in related repositories

Edit: syntax of checkboxes

@ArminShoeibi
Copy link
Contributor

ArminShoeibi commented Mar 21, 2023

@d-jagoda Please check this PR that is related to the main branch
#1264

the specific commit:
49c61dc

I'm not sure of the reasoning behind @stebet modification to the main branch

Here is the PR #999 that changes Task.Run to Task.Factory.StartNew

@ArminShoeibi
Copy link
Contributor

Ok I found out the reason:
#1318 (comment)

@Zerpet Zerpet linked an issue Mar 22, 2023 that may be closed by this pull request
@lukebakken lukebakken self-assigned this Mar 22, 2023
@lukebakken lukebakken added this to the 6.5.0 milestone Mar 22, 2023
@stebet
Copy link
Contributor

stebet commented Mar 22, 2023

Poking @stephentoub and @EgorBo here in case they have some good insights and time :)

@stephentoub
Copy link

Hi. What kind of feedback are you looking for?

@lukebakken
Copy link
Collaborator

Hi @stephentoub!

What kind of feedback are you looking for?

Just if this change makes sense. Seems reasonable to me but I don't have a way to reproduce the issues that @d-jagoda reports.

@stebet
Copy link
Contributor

stebet commented Mar 23, 2023

Hi. What kind of feedback are you looking for?

Hi @stephentoub.

Here is the original discussion: https://groups.google.com/g/rabbitmq-users/c/9lFwM_xF5cw

The TLDR version:
We have a MainLoop method which is started when a network connection to a RabbitMQ server is made that pretty much a while(true) that reads data from the connection and parses the frames and commands.

It is started by doing a Task.Run(MainLoop) currently but the issue is that @d-jagoda raised is that on Linux this will take a ThreadPool thread for a long-running operation and when running in Kubernetes the limited amount of threads will cause the healthcheck to take long to start since they have to wait for the ThreadPool to detect that it needs to add more threads to run tasks.

The proposal is to use Task.Factory.StartNew instead with TaskCreationOptions.LongRunning, but the caveat is that MainLoop will become async at some point which Task.Factory.StartNew doesn't support out of the box so it would return a Task<Task> which could be unwrapped though.

So I guess the the question is, is there a preferred alternative or an easy way to run a long-running async method on a non-ThreadPool thread?

Would it work if started it using Task.Factory.StartNew with TaskCreationOption.LongRunning and then unwrap the resulting Task or would that still just schedule the wrapped task on the ThreadPool?

@stephentoub
Copy link

We have a MainLoop method which is started when a network connection to a RabbitMQ server is made that pretty much a while(true) that reads data from the connection and parses the frames and commands.

but the caveat is that MainLoop will become async at some point

Is it doing blocking I/O? If it were implemented entirely asynchronously, would you still have thread starvation issues?

@stebet
Copy link
Contributor

stebet commented Mar 23, 2023

Is it doing blocking I/O? If it were implemented entirely asynchronously, would you still have thread starvation issues?

It was doing blocking I/O, but was recently moved over to System.IO.Pipelines and is now async: 49c61dc

@d-jagoda Have you tried your workload with the latest sources from main?

@d-jagoda
Copy link
Author

d-jagoda commented Mar 23, 2023

Is it doing blocking I/O? If it were implemented entirely asynchronously, would you still have thread starvation issues?

It was doing blocking I/O, but was recently moved over to System.IO.Pipelines and is now async: 49c61dc

@d-jagoda Have you tried your workload with the latest sources from main?

I haven't tried but it would work in theory because the MainLoop start thread would eventually yield at this async IO call

InboundFrame frame = await _frameHandler.ReadFrame();

@lukebakken lukebakken self-requested a review March 23, 2023 22:16
@lukebakken
Copy link
Collaborator

I plan on merging this later today or Monday unless someone chimes in. Seems like a good choice for 6.x.

@stephentoub
Copy link

I'm confused... is it still beneficial even when things are all async? There's a cost to spinning up a dedicated thread.

@stebet
Copy link
Contributor

stebet commented Mar 24, 2023

I'm confused... is it still beneficial even when things are all async? There's a cost to spinning up a dedicated thread.

The 6.x version has a non-async version of the MainLoop so I think it makes sense for that branch. The main branch has the async version so shouldn't be needed there.

@lukebakken lukebakken merged commit 262fe50 into rabbitmq:6.x Mar 24, 2023
@lukebakken
Copy link
Collaborator

Thanks to everyone who contributed here. Have a great weekend!

@d-jagoda d-jagoda deleted the long-running-task-in-connection branch March 25, 2023 15:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Investigate the use of TaskCreationOptions.LongRunning
5 participants