-
Notifications
You must be signed in to change notification settings - Fork 7.6k
RxRingBuffer with Inline Release #2189
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RxRingBuffer with Inline Release #2189
Conversation
@benjchristensen interesting! Btw if you want to understand the different shapes you can go there and "mark the range" so you only see allocations in that time area. You can then do it for the other area as well and maybe the difference in allocations gives you a hint when and where things are happening. And maybe you know that anyways and I'm just trying to be helpful ;) |
I see a few troubling points:
|
Doing this is problematic and exactly what we need to avoid. If we retain a reference to it and continue working with it after it is released then it can be concurrently used by 2 RxRingBuffer instances because the pool will be able to hand it out to someone else.
If so then it's been like that for a while and it can be fixed independent of this PR.
We need to get the SPSC queue fixed. It has had issues, hence the reason why it is still on SPMC. This has not changed.
I don't understand what you're referring to.
Please do. That code has not been changed in this PR.
See the first point above. This is a matter of trade-offs. If we ensure we don't get a NullPointerException we then end up synchronizing everything and kill the performance. We are optimizing here for the fact that almost all the time we will terminate serially, or unsubscribe serially. It is very rare we will receive a concurrent unsubscribe. In those cases we can not hold a reference to queue. An If there is a functional problem with it then let's look at that. If it's just a dislike for the pattern, then please propose an alternative that achieves the performance of this. As an aside, please do not use sarcasm in code reviews. The "really?" comment is unhelpful and condescending. I am very much aware of the ugliness of this approach. If elegance had succeeded in the many attempts we've made thus far we wouldn't still be discussing this issue.
Back to the trade offs that are driving this.
Considering all of this, what is your recommended approach that is better than the option proposed in this pull request? |
I apologize for the unprofessional comment. OperatorPublish L360: the if statement has identical sub-blocks, perhaps a break is missing? The compiler will likely lift the member queue access into register so the queue value will be accessible during the run of the method. A possible solution is to use a WriterReaderPhaser to mutually exclude the enqueue and unsubscribe calls: Volatile long ingress; In onXXX methods: In unsubscribe: This adds two uncontended volatile writes to each onNext costing few dozen cycles. I cant test this right now, but the lazy version might cost only a few cycles. I'd consider both as reasonable tradeoffs for correctness. |
That's an interesting approach. I look forward to seeing that impl and perf. Whichever route we take for RxRingBuffer we will also need to fix the Publish issues. Should I open a separate PR to start that or have you already started on a better publish fix?= |
Those issues aren't really related to RxRingBuffer in publish so it can be an independent PR. |
d8f4704
to
f44311f
Compare
Can release when ... - terminal event emitted - unsubscribe/release occurs and queue is empty - unsubscribe/release has occurred and poll/peek occurs Can result in a queue not being put back in the pool if an early unsubscribe occurs from a different thread and causes the poll/peek to not trigger the release. In this case GC will reclaim the pool so it still functions, it just misses the pooling optimization. There is a commented out test of using finalize(). It works as a safety net for the edge case, but at the cost of increased GC time.
f44311f
to
658ab3a
Compare
I merged in the changes to We can open a separate PR/issue for improving |
This code is somehow getting non-determistic failures such as these:
|
Quick update on the algorithm: it doesn't work in this form. The ingress counter has to convey the flip info, not the queue. volatile long ingress; |
I see a couple of problems in 1717:
As for the retry, Mockito's InOrder construct is O(n^2) which makes the verification slow. It is possible the test core run 200 times just runs out of time. |
While thinking about this approach it seemed to be exactly the same as the WIP approach I tried, and after implementing and testing it, that is exactly how it behaves. It is effectively the same thing, just instead of 1 counter it has 2 that get touched in the line of emission (in the try/finally). To explore this further I studied the
Here are the performance numbers, the phaser being the last column:
|
Since previous tests have shown that pooling does still make a big difference, I'm trying to find a way to do pooling without the concurrency issue. I've tried with work-in-progress counters, ReadWrite locks and other variants and all of them impact performance far too much.
This one makes different trade-offs:
It will release automatically when a ...
This can result in a queue not being put back in the pool if an early unsubscribe occurs from a different thread and causes the poll/peek to not trigger the release. In this case GC will reclaim the pool so it still functions, it just misses the pooling optimization.
There is a commented out test of using finalize(). It works as a safety net for the edge case, but at the cost of increased GC time.
Here are performance numbers of the various tests, this one being "Inline Volatile":
The memory allocation amounts look good, though I don't understand the shape of the red graph at the top. That is making me question this.
With
finalize()
(as an experiment, aware of all the reasons to not usefinalize()
from Effective Java, 2nd Edition) this is what it looks like:The GC times on this one are higher than without
finalize()
:compared with: