Skip to content

Commit 6a4a285

Browse files
authored
Conditionally force sequential reading in LuceneSyntheticSourceChangesSnapshot (#128473)
Change LuceneSyntheticSourceChangesSnapshot to force sequential stored field reading when index.code is best_compression. In CCR benchmarks I see that relatively often we spend a lot of time compressing the same stored field block over and over again when the doc ids are not dense. It is likely when a seqno range is requested that the corresponding doc id list contains gaps. However most docids are monotonically increasing, so not sequential reading harms performance. The reason that currently we're not loading sequentially is because of the logic in `StoredFieldLoader#hasSequentialDocs(...)`, which requires all requested docids to be in monotonically order (no gaps allowed). In the case of `LuceneSyntheticSourceChangesSnapshot` with stored field best compression that is too conservative. In practice, we end decompressing stored field blocks for each docid we need to synthetisize source for recovery. I think it makes sense to do sequential reading in this case, given that it is very likely that many of the requested doc id ranges will contain monotonically increasing ranges. Note that the requested docids will always sort in ascending order (this happens in `LuceneSyntheticSourceChangesSnapshot#transformScoreDocsToRecords(...)`.
1 parent 3bc6a43 commit 6a4a285

File tree

2 files changed

+10
-1
lines changed

2 files changed

+10
-1
lines changed

docs/changelog/128473.yaml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
pr: 128473
2+
summary: Conditionally force sequential reading in `LuceneSyntheticSourceChangesSnapshot`
3+
area: Logs
4+
type: enhancement
5+
issues: []

server/src/main/java/org/elasticsearch/index/engine/LuceneSyntheticSourceChangesSnapshot.java

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@
1717
import org.apache.lucene.util.ArrayUtil;
1818
import org.elasticsearch.index.IndexSettings;
1919
import org.elasticsearch.index.IndexVersion;
20+
import org.elasticsearch.index.codec.CodecService;
2021
import org.elasticsearch.index.fieldvisitor.LeafStoredFieldLoader;
2122
import org.elasticsearch.index.fieldvisitor.StoredFieldLoader;
2223
import org.elasticsearch.index.mapper.MapperService;
@@ -85,7 +86,10 @@ public LuceneSyntheticSourceChangesSnapshot(
8586
this.maxMemorySizeInBytes = maxMemorySizeInBytes > 0 ? maxMemorySizeInBytes : 1;
8687
this.sourceLoader = mapperService.mappingLookup().newSourceLoader(null, SourceFieldMetrics.NOOP);
8788
Set<String> storedFields = sourceLoader.requiredStoredFields();
88-
this.storedFieldLoader = StoredFieldLoader.create(false, storedFields);
89+
String defaultCodec = EngineConfig.INDEX_CODEC_SETTING.get(mapperService.getIndexSettings().getSettings());
90+
// zstd best compression stores upto 2048 docs in a block, so it is likely that in this case docs are co-located in same block:
91+
boolean forceSequentialReader = CodecService.BEST_COMPRESSION_CODEC.equals(defaultCodec);
92+
this.storedFieldLoader = StoredFieldLoader.create(false, storedFields, forceSequentialReader);
8993
this.lastSeenSeqNo = fromSeqNo - 1;
9094
}
9195

0 commit comments

Comments
 (0)