Skip to content

Use Pinned Object Heap for MemoryPool #21614

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 9, 2020
Merged

Conversation

benaadams
Copy link
Member

@benaadams benaadams commented May 8, 2020

Allocate in the Pinned Object Heap (new for .NET 5.0) rather than pushing to LOH

Also means Finalizer and GCHandle can be dropped from MemoryPoolSlab

/cc @halter73 @VSadov @Maoni0

/// <summary>
/// This handle pins the managed array in memory until the slab is disposed. This prevents it from being
/// relocated and enables any subsections of the array to be used as native memory pointers to P/Invoked API calls.
/// </summary>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice to see that all this code can be removed.

/// </summary>
public bool IsActive => !_isDisposed;
public bool IsActive => PinnedArray != null;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think IDisposable is no longer needed and PinnedArray can be a readonly field.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is needed to switch off the Slab; which then means its blocks are thrown away rather than returned to the pool (i.e. to take them out of circulation)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well sorta... since the pool doesn't really shrink; but if it did... 😉

{
fixed(byte* ptr = slab.PinnedArray)
{
basePtr = (IntPtr)ptr;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps Marshal.UnsafeAddrOfPinnedArrayElement could be simpler here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL :)

Done.

The alignment parameter of the pinned allocated doesn't look to be exposed? Otherwise could skip most of the following code also (which 4096 aligns the blocks)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be amazing

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. Since pinned objects do not move, alignment would be fairly simple to arrange internally. (compacting would be tricky but that is not an issue)

I could just get objSize + 4096 bytes from the heap and then format that as 2 or 3 objects, with one being the actual result with aligned payload, which I would return, and others immediately becoming garbage, up for reuse.

Copy link
Member

@VSadov VSadov May 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be an API addition though, with all the process attached, as usual.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you drive that addition @VSadov ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will try. Alignment was already a part of the original "cadillac" proposal for AllocateArray. Having actual use case would make it easier to argue for the usefullness.

The challenge could be to scope this to just pinned allocations. More general support for aligned allocations is possible, but that is completely different cost.

@benaadams
Copy link
Member Author

Squashed to retrigger CI

@davidfowl
Copy link
Member

@VSadov do we have counters for this yet?

cc @sywhang

@VSadov
Copy link
Member

VSadov commented May 8, 2020

Some diagnostics (like SOS stuff) is not yet aware of POH.
The ETW part is done though (dotnet/runtime#34549). You will see these in events like allocation ticks, heap stats, etc..

@benaadams
Copy link
Member Author

@VSadov
Copy link
Member

VSadov commented May 8, 2020

POH is logically in Gen2, so most everything should just work. The only part that exposes SOH/LOH/POH distinction seems to be _lohSizeCounter. I guess something similar needs to be added for POH.

You can get that data though via GC.GetGenerationSize(4)

@sywhang
Copy link
Contributor

sywhang commented May 8, 2020

I already have a work item assigned to me to add POH counters: dotnet/runtime#35424

@Maoni0
Copy link
Member

Maoni0 commented May 8, 2020

great to see this getting used. do you have a workload you could run and show some before vs after results?

@davidfowl
Copy link
Member

great to see this getting used. do you have a workload you could run and show some before vs after results?

All of them. We'll see the graphs change but it would be good to get a sense for what metrics would potentially get better so we can monitor those.

@benaadams
Copy link
Member Author

POH is logically in Gen2, so most everything should just work.

Would be good to expose it in a separate manner so it is can be exposed and monitored (as per page 6 on https://aka.ms/aspnet/benchmarks )

image

@VSadov
Copy link
Member

VSadov commented May 8, 2020

POH like LOH is collected as a part of Gen2 (and generally in the background) thus no separate POH or LOH collection counts.

(technically POH has no references to other generations and could collect less frequently than Gen2, but not a lot of reasons for doing that)

For the size - yes, you can get a POH size chart next to the LOH size chart and would be nice to have. I assume once the counter is added.

@Maoni0
Copy link
Member

Maoni0 commented May 9, 2020

even without separate POH stats you can detect perf changes (if there's any diff that is) and they would show up as % time in GC and/or heap size (reduction in them is the goal of this feature; POH data is just there to help with perf investigation). the idea is if you used to basically allocate all your pinned objects at the beginning of the process with other long lived objects, using this should show virtually no difference. if you have the scenarios described at the beginning of the design doc those are what we expect to see benefits.

@halter73
Copy link
Member

halter73 commented May 9, 2020

@aspnet-hello benchmark

@pr-benchmarks
Copy link

pr-benchmarks bot commented May 9, 2020

Starting 'Default' pipelined plaintext benchmark with session ID '01ed222c5a184bfab34a72897ffb6611'. This could take up to 30 minutes...

@pr-benchmarks
Copy link

pr-benchmarks bot commented May 9, 2020

Baseline

Starting baseline run on '95a22085303f9230b23294782c0ecd5668343641'...
RequestsPerSecond:           766,703
Max CPU (%):                 99
WorkingSet (MB):             91
Avg. Latency (ms):           3.18
Startup (ms):                466
First Request (ms):          125.93
Latency (ms):                0.39
Total Requests:              11,525,455
Duration: (ms)               15,030
Socket Errors:               0
Bad Responses:               0
Build Time (ms):             8,502
Published Size (KB):         120,701
SDK:                         5.0.100-preview.5.20258.4
Runtime:                     5.0.0-preview.5.20253.7
ASP.NET Core:                5.0.0-preview.5.20255.6


PR

Starting PR run on '722c207ededd0efe7cf49eb5c35e649ae191eb55'...
| Description |     RPS | CPU (%) | Memory (MB) | Avg. Latency (ms) | Startup (ms) | Build Time (ms) | Published Size (KB) | First Request (ms) | Latency (ms) | Errors | Ratio |
| ----------- | ------- | ------- | ----------- | ----------------- | ------------ | --------------- | ------------------- | ------------------ | ------------ | ------ | ----- |
|      Before | 766,703 |      99 |          91 |              3.18 |          466 |            8502 |              120701 |             125.93 |         0.39 |      0 |  1.00 |
|       After | 785,197 |      99 |          86 |              3.12 |          461 |            5502 |              120701 |             130.65 |          0.3 |      0 |  1.02 |


@benaadams
Copy link
Member Author

@aspnet-hello benchmark json

@benaadams
Copy link
Member Author

Can it run an mvc test or something with higher allocations?

@benaadams
Copy link
Member Author

@aspnet-hello benchmark http2

@pr-benchmarks
Copy link

pr-benchmarks bot commented May 9, 2020

Starting 'http2' pipelined plaintext benchmark with session ID '0dff5b4ef5ed4d1280b72bd92bba6a91'. This could take up to 30 minutes...

@pr-benchmarks
Copy link

pr-benchmarks bot commented May 9, 2020

Baseline

Starting baseline run on '95a22085303f9230b23294782c0ecd5668343641'...
RequestsPerSecond:           246,887
Max CPU (%):                 90
WorkingSet (MB):             202
Avg. Latency (ms):           1.88
Startup (ms):                468
First Request (ms):          172.18
Latency (ms):                0.38
Total Requests:              3,703,308
Duration: (ms)               15,010
Socket Errors:               0
Bad Responses:               0
Build Time (ms):             7,502
Published Size (KB):         120,694
SDK:                         5.0.100-preview.5.20251.2
Runtime:                     5.0.0-preview.5.20253.7
ASP.NET Core:                5.0.0-preview.5.20255.6


PR

Starting PR run on '722c207ededd0efe7cf49eb5c35e649ae191eb55'...
| Description |     RPS | CPU (%) | Memory (MB) | Avg. Latency (ms) | Startup (ms) | Build Time (ms) | Published Size (KB) | First Request (ms) | Latency (ms) | Errors | Ratio |
| ----------- | ------- | ------- | ----------- | ----------------- | ------------ | --------------- | ------------------- | ------------------ | ------------ | ------ | ----- |
|      Before | 246,887 |      90 |         202 |              1.88 |          468 |            7502 |              120694 |             172.18 |         0.38 |      0 |  1.00 |
|       After | 247,311 |      90 |         204 |              2.65 |          459 |            5502 |              120694 |             182.52 |          0.4 |      0 |  1.00 |


@pr-benchmarks
Copy link

pr-benchmarks bot commented May 9, 2020

Starting 'json' pipelined plaintext benchmark with session ID '8bc6fb7cd8304755bbf9152ba5892e24'. This could take up to 30 minutes...

@pr-benchmarks
Copy link

pr-benchmarks bot commented May 9, 2020

Baseline

stdout: 
stderr: System.IO.InvalidDataException: Job file 8bc6fb7cd8304755bbf9152ba5892e24.kestrel-pipelined-plaintext.json doesn't include a top-level 'json' property for the specified scenario.
   at JobConsumer.Program.GetBuildInstructions(FileInfo processingFile) in /app/src/JobConsumer/Program.cs:line 292
   at JobConsumer.Program.BenchmarkPR(FileInfo processingFile, String session) in /app/src/JobConsumer/Program.cs:line 142
   at JobConsumer.Program.<>c__DisplayClass25_0.<<Main>b__0>d.MoveNext() in /app/src/JobConsumer/Program.cs:line 123

PR


@halter73 halter73 merged commit a410ed4 into dotnet:master May 9, 2020
@halter73
Copy link
Member

halter73 commented May 9, 2020

Thanks!

@benaadams benaadams deleted the pinned-heap branch May 9, 2020 22:33
@Maoni0
Copy link
Member

Maoni0 commented May 9, 2020

what does "Build Time" mean?

@davidfowl
Copy link
Member

The benchmarking infrastructure builds the PR.

@amcasey amcasey added area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions and removed area-runtime labels Aug 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants