CaffeineCacheManager getCache method cause thread block #30066

huangxfchn · 2023-03-04T12:59:27Z

spring-context-support 5.3.18
jdk version：17.0.4

here is my thread stack, block 193 threads

CaffeineCacheManager.java

public Cache getCache(String name) {
return this.cacheMap.computeIfAbsent(name, cacheName ->
		this.dynamic ? createCaffeineCache(cacheName) : null);
}

here is my code:

@CacheConfig(cacheNames = "branch")
public class BranchServiceImpl {
    @Cacheable(key = "#exp.id")
    public List<Branch> findById(Experiment exp) {
         // ……
    }
    
    @CachePut(key = "#exp.id")
    public List<Branch> updateCacheByExpId(Experiment exp, List<Branch> branches) {
         // ……
    }
}

bclozel · 2023-03-04T17:36:34Z

Could you provide a sample application that reproduces the problem? It should be a small and simple application, ideally created with https://start.spring.io. You can share it here as a zip or a git repository. Thanks!

huangxfchn · 2023-03-05T03:45:51Z

Could you provide a sample application that reproduces the problem? It should be a small and simple application, ideally created with https://start.spring.io. You can share it here as a zip or a git repository. Thanks!

Ok, wait a moment.

huangxfchn · 2023-03-05T09:54:54Z

git repository：https://github.com/huangxfchn/spring-cache-block-demo
run CacheTest

Then, use the jstack command to view the thread status. You can see that there are a lot of block threads. Here's one of the thread stacks


"pool-1-thread-52" #71 prio=5 os_prio=0 cpu=328.13ms elapsed=14.74s tid=0x000001994a54f540 nid=0x785c waiting for monitor entry  [0x0000007e704ff000]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at java.util.concurrent.ConcurrentHashMap.computeIfAbsent([email protected]/ConcurrentHashMap.java:1726)
        - waiting to lock <0x0000000683cfdb58> (a java.util.concurrent.ConcurrentHashMap$Node)
        at org.springframework.cache.caffeine.CaffeineCacheManager.getCache(CaffeineCacheManager.java:191)
        at com.example.demo.CacheTest.lambda$test$0(CacheTest.java:30)
        at com.example.demo.CacheTest$$Lambda$663/0x0000000800ef9238.call(Unknown Source)
        at java.util.concurrent.FutureTask.run([email protected]/FutureTask.java:264)
        at java.util.concurrent.ThreadPoolExecutor.runWorker([email protected]/ThreadPoolExecutor.java:1136)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run([email protected]/ThreadPoolExecutor.java:635)
        at java.lang.Thread.run([email protected]/Thread.java:833)

And, I found this problem in both jdk1.8 and jdk17.

I think, the CaffeineCacheManager#getCache() method can be optimized like this:

public Cache getCache(String name) {
	Cache cache = this.cacheMap.get(name);
	if (cache != null) {
		return cache;
	}
	return (Cache)this.cacheMap.computeIfAbsent(name, (cacheName) -> {
		return this.dynamic ? this.createCaffeineCache(cacheName) : null;
	});
}

huangxfchn · 2023-03-05T09:57:40Z

The java.util.concurrent.ConcurrentHashMap#computeIfAbsent method has lock contention, especially if concurrency is very high

huangxfchn · 2023-03-05T10:01:32Z

This problem leads to a lot of block threads in our online environment.

bclozel · 2023-03-08T14:26:53Z

Thanks for the sample @huangxfchn .

As explained in this PR comment, the Javadoc for this class seems to indicate that read operations are completely lock free. I've reproduced the behavior you're describing even after pre-populating caches with the set of cache names.

We'll revisit the implementation there to avoid using computeIfAbsent when it's not needed and to avoid storing null entries in the map altogether. This will be backported to 53.x in #30085.

Thanks for your report!

Prior to this commit, using a dynamic `CaffeineCacheManager` would rely on `ConcurrentHashMap#computeIfAbsent` for retrieving and creating cache instances as needed. It turns out that using this method concurrently can cause lock contention even when all known cache instances are instantiated. This commit avoids using this method if the cache instance already exists and avoid storing `null` entries in the map. This change reduces lock contention and the overall HashMap size in the non-dynamic case. See gh-30066 Fixes gh-30085

ben-manes · 2023-03-08T17:07:42Z

FYI, for context on computeIfAbsent(), in 2014:

Performance-wise, computeIfAbsent pessimistically locks instead of optimistically trying to return an existing entry.

There are lots of tradeoffs. With the current implementation, if you are implementing a cache, it may be better to code cache.get to itself do a pre-screen, as in:
V v = map.get(key);
return (v != null) ? v : map.computeIfAbsent(key, function);

However, the exact benefit depends on access patterns.
For example, I reran your benchmark cases (urls below) on a 32way x86, and got throughputs (ops/sec) that are dramatically better with pre-screen for the case of a single key, but worse with your Zipf-distributed keys. As an intermediate test, I measured the impact of adding a single-node prescreen ("1cif") before locking inside CHM.computeIfAbsent, that is similar to what was done in some pre-release versions:

Same key
cif: 1402559
get+cif: 3775886700
1cif: 1760916148

Zipf-distributed keys
cif: 1414945003
get+cif: 882477874
1cif: 618668961

One might think (I did until running similar experiments)
that the "1cif" version would be the best compromise.
But currently it isn't.
This is in part due to interactions with biased locking,
that in some cases basically provide "free" prescreens, but in other cases add safepoint/GC pressure in addition to lock contention. This is all up for re-consideration though.

-Doug

Caffeine therefore does this precscreen as shown in the benchmarks. Doug decided on adopting 1cif in recent jdks. He is currently pondering a per-entry lock vs the current bin lock, which should resolve this entirely.

spring-projects-issues added the status: waiting-for-triage An issue we've not yet triaged or decided on label Mar 4, 2023

bclozel added the status: waiting-for-feedback We need additional information before we can continue label Mar 4, 2023

spring-projects-issues added status: feedback-provided Feedback has been provided and removed status: waiting-for-feedback We need additional information before we can continue labels Mar 5, 2023

bclozel added status: waiting-for-feedback We need additional information before we can continue and removed status: feedback-provided Feedback has been provided labels Mar 5, 2023

spring-projects-issues added status: feedback-provided Feedback has been provided and removed status: waiting-for-feedback We need additional information before we can continue labels Mar 5, 2023

rstoyanchev added the in: core Issues in core modules (aop, beans, core, context, expression) label Mar 6, 2023

besscroft mentioned this issue Mar 8, 2023

CaffeineCacheManager:Modify the getCache method to reduce the probability of lock contention. #30080

Closed

bclozel self-assigned this Mar 8, 2023

bclozel added type: bug A general bug and removed status: waiting-for-triage An issue we've not yet triaged or decided on status: feedback-provided Feedback has been provided labels Mar 8, 2023

bclozel added this to the 6.0.7 milestone Mar 8, 2023

bclozel added the for: backport-to-5.3.x label Mar 8, 2023

github-actions bot added status: backported An issue that has been backported to maintenance branches and removed for: backport-to-5.3.x labels Mar 8, 2023

github-actions bot mentioned this issue Mar 8, 2023

CaffeineCacheManager getCache method cause thread block #30085

Closed

bclozel closed this as completed in 6cd6741 Mar 8, 2023

iProdigy mentioned this issue Mar 20, 2023

perf(spring): avoid lock contention when obtaining cache Xanthic/cache-api#109

Merged

jhoeller mentioned this issue Jun 30, 2023

Align ConcurrentMapCacheManager locking behavior with CaffeineCacheManager #30780

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CaffeineCacheManager getCache method cause thread block #30066

CaffeineCacheManager getCache method cause thread block #30066

huangxfchn commented Mar 4, 2023 •

edited by rstoyanchev

Loading

bclozel commented Mar 4, 2023

Uh oh!

huangxfchn commented Mar 5, 2023

Uh oh!

huangxfchn commented Mar 5, 2023

Uh oh!

huangxfchn commented Mar 5, 2023

Uh oh!

huangxfchn commented Mar 5, 2023

Uh oh!

bclozel commented Mar 8, 2023

Uh oh!

ben-manes commented Mar 8, 2023

Uh oh!

CaffeineCacheManager getCache method cause thread block #30066

CaffeineCacheManager getCache method cause thread block #30066

Comments

huangxfchn commented Mar 4, 2023 • edited by rstoyanchev Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

bclozel commented Mar 4, 2023

Uh oh!

huangxfchn commented Mar 5, 2023

Uh oh!

huangxfchn commented Mar 5, 2023

Uh oh!

huangxfchn commented Mar 5, 2023

Uh oh!

huangxfchn commented Mar 5, 2023

Uh oh!

bclozel commented Mar 8, 2023

Uh oh!

ben-manes commented Mar 8, 2023

Uh oh!

huangxfchn commented Mar 4, 2023 •

edited by rstoyanchev

Loading