Skip to content

CaffeineCacheManager getCache method cause thread block #30066

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
huangxfchn opened this issue Mar 4, 2023 · 7 comments
Closed

CaffeineCacheManager getCache method cause thread block #30066

huangxfchn opened this issue Mar 4, 2023 · 7 comments
Assignees
Labels
in: core Issues in core modules (aop, beans, core, context, expression) status: backported An issue that has been backported to maintenance branches type: bug A general bug
Milestone

Comments

@huangxfchn
Copy link

huangxfchn commented Mar 4, 2023

  • spring-context-support 5.3.18
  • jdk version:17.0.4

here is my thread stack, block 193 threads

image

CaffeineCacheManager.java

public Cache getCache(String name) {
return this.cacheMap.computeIfAbsent(name, cacheName ->
		this.dynamic ? createCaffeineCache(cacheName) : null);
}

here is my code:

@CacheConfig(cacheNames = "branch")
public class BranchServiceImpl {
    @Cacheable(key = "#exp.id")
    public List<Branch> findById(Experiment exp) {
         // ……
    }
    
    @CachePut(key = "#exp.id")
    public List<Branch> updateCacheByExpId(Experiment exp, List<Branch> branches) {
         // ……
    }
}
@spring-projects-issues spring-projects-issues added the status: waiting-for-triage An issue we've not yet triaged or decided on label Mar 4, 2023
@bclozel
Copy link
Member

bclozel commented Mar 4, 2023

Could you provide a sample application that reproduces the problem? It should be a small and simple application, ideally created with https://start.spring.io. You can share it here as a zip or a git repository. Thanks!

@bclozel bclozel added the status: waiting-for-feedback We need additional information before we can continue label Mar 4, 2023
@huangxfchn
Copy link
Author

Could you provide a sample application that reproduces the problem? It should be a small and simple application, ideally created with https://start.spring.io. You can share it here as a zip or a git repository. Thanks!

Ok, wait a moment.

@spring-projects-issues spring-projects-issues added status: feedback-provided Feedback has been provided and removed status: waiting-for-feedback We need additional information before we can continue labels Mar 5, 2023
@bclozel bclozel added status: waiting-for-feedback We need additional information before we can continue and removed status: feedback-provided Feedback has been provided labels Mar 5, 2023
@huangxfchn
Copy link
Author

Then, use the jstack command to view the thread status. You can see that there are a lot of block threads. Here's one of the thread stacks


"pool-1-thread-52" #71 prio=5 os_prio=0 cpu=328.13ms elapsed=14.74s tid=0x000001994a54f540 nid=0x785c waiting for monitor entry  [0x0000007e704ff000]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at java.util.concurrent.ConcurrentHashMap.computeIfAbsent([email protected]/ConcurrentHashMap.java:1726)
        - waiting to lock <0x0000000683cfdb58> (a java.util.concurrent.ConcurrentHashMap$Node)
        at org.springframework.cache.caffeine.CaffeineCacheManager.getCache(CaffeineCacheManager.java:191)
        at com.example.demo.CacheTest.lambda$test$0(CacheTest.java:30)
        at com.example.demo.CacheTest$$Lambda$663/0x0000000800ef9238.call(Unknown Source)
        at java.util.concurrent.FutureTask.run([email protected]/FutureTask.java:264)
        at java.util.concurrent.ThreadPoolExecutor.runWorker([email protected]/ThreadPoolExecutor.java:1136)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run([email protected]/ThreadPoolExecutor.java:635)
        at java.lang.Thread.run([email protected]/Thread.java:833)

And, I found this problem in both jdk1.8 and jdk17.

I think, the CaffeineCacheManager#getCache() method can be optimized like this:

public Cache getCache(String name) {
	Cache cache = this.cacheMap.get(name);
	if (cache != null) {
		return cache;
	}
	return (Cache)this.cacheMap.computeIfAbsent(name, (cacheName) -> {
		return this.dynamic ? this.createCaffeineCache(cacheName) : null;
	});
}

@huangxfchn
Copy link
Author

The java.util.concurrent.ConcurrentHashMap#computeIfAbsent method has lock contention, especially if concurrency is very high

@spring-projects-issues spring-projects-issues added status: feedback-provided Feedback has been provided and removed status: waiting-for-feedback We need additional information before we can continue labels Mar 5, 2023
@huangxfchn
Copy link
Author

This problem leads to a lot of block threads in our online environment.

@rstoyanchev rstoyanchev added the in: core Issues in core modules (aop, beans, core, context, expression) label Mar 6, 2023
@bclozel bclozel self-assigned this Mar 8, 2023
@bclozel bclozel added type: bug A general bug and removed status: waiting-for-triage An issue we've not yet triaged or decided on status: feedback-provided Feedback has been provided labels Mar 8, 2023
@bclozel bclozel added this to the 6.0.7 milestone Mar 8, 2023
@github-actions github-actions bot added status: backported An issue that has been backported to maintenance branches and removed for: backport-to-5.3.x labels Mar 8, 2023
@bclozel
Copy link
Member

bclozel commented Mar 8, 2023

Thanks for the sample @huangxfchn .

As explained in this PR comment, the Javadoc for this class seems to indicate that read operations are completely lock free. I've reproduced the behavior you're describing even after pre-populating caches with the set of cache names.

We'll revisit the implementation there to avoid using computeIfAbsent when it's not needed and to avoid storing null entries in the map altogether. This will be backported to 53.x in #30085.

Thanks for your report!

@bclozel bclozel closed this as completed in 6cd6741 Mar 8, 2023
bclozel added a commit that referenced this issue Mar 8, 2023
Prior to this commit, using a dynamic `CaffeineCacheManager` would rely
on `ConcurrentHashMap#computeIfAbsent` for retrieving and creating cache
instances as needed. It turns out that using this method concurrently
can cause lock contention even when all known cache instances are
instantiated.

This commit avoids using this method if the cache instance already
exists and avoid storing `null` entries in the map. This change reduces
lock contention and the overall HashMap size in the non-dynamic case.

See gh-30066
Fixes gh-30085
@ben-manes
Copy link
Contributor

FYI, for context on computeIfAbsent(), in 2014:

Performance-wise, computeIfAbsent pessimistically locks instead of optimistically trying to return an existing entry.

There are lots of tradeoffs. With the current implementation, if you are implementing a cache, it may be better to code cache.get to itself do a pre-screen, as in:
V v = map.get(key);
return (v != null) ? v : map.computeIfAbsent(key, function);

However, the exact benefit depends on access patterns.
For example, I reran your benchmark cases (urls below) on a 32way x86, and got throughputs (ops/sec) that are dramatically better with pre-screen for the case of a single key, but worse with your Zipf-distributed keys. As an intermediate test, I measured the impact of adding a single-node prescreen ("1cif") before locking inside CHM.computeIfAbsent, that is similar to what was done in some pre-release versions:

Same key
cif: 1402559
get+cif: 3775886700
1cif: 1760916148

Zipf-distributed keys
cif: 1414945003
get+cif: 882477874
1cif: 618668961

One might think (I did until running similar experiments)
that the "1cif" version would be the best compromise.
But currently it isn't.
This is in part due to interactions with biased locking,
that in some cases basically provide "free" prescreens, but in other cases add safepoint/GC pressure in addition to lock contention. This is all up for re-consideration though.

-Doug

Caffeine therefore does this precscreen as shown in the benchmarks. Doug decided on adopting 1cif in recent jdks. He is currently pondering a per-entry lock vs the current bin lock, which should resolve this entirely.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
in: core Issues in core modules (aop, beans, core, context, expression) status: backported An issue that has been backported to maintenance branches type: bug A general bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants