-
Notifications
You must be signed in to change notification settings - Fork 25.3k
Flag in _field_caps to return only fields with values in index #103651
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 51 commits
c8af341
e17fe2c
feccdc7
5ad8b99
ed3c3c6
45d892c
21061bb
e92e9dc
838ebbd
f5bfd31
4fc26d6
f886135
47af25e
683459f
597f5a8
752d31e
a21d6fa
c15dd78
d6e9e65
8d1fa0a
58ab703
c19d573
76e5060
d53c56d
60d3731
322d65d
6e90f58
5a91222
4f4ef65
8f37313
5add85b
c1abe2b
d2332bd
a500abd
62759c6
39821d9
7999611
9c97ee5
734a72a
6da0369
ca6893a
b6a7dee
db70f45
e6c528d
cb2669b
a16e6ba
cc4edd3
75bea68
10ec350
8f848b8
774a21a
fc4c198
b860d7e
7d17ca3
9713967
c22377d
fdfbe3b
18e987b
31706c2
09bc9a6
ed68390
49c6d87
341cd7c
8f8e623
9d1fd93
25c906f
7c768c8
da717bd
578677d
72c972e
001e696
513083e
da846a6
9547296
653a8f0
50d8a8f
0cc060c
8fa1d0d
06b1641
f30daaf
f290e71
e1f86ea
3ab7529
a5c269d
18c4353
5a75987
1bf7d39
f5ae465
54edf53
09e6d92
2f0ac60
8fa1b6e
fa390e4
6f3ed17
1339277
44904c8
29b93cd
8156756
a7b64c4
fe5245d
7391105
482c11f
d9cd4f8
9498dce
57de9f7
8868e4d
f2db1cd
c0f453d
5cef376
daf034f
5c8ea58
4555a0e
be7483f
058c6c8
b1aeb86
4789d78
2e8f7b7
8115dcd
a4e2d8d
b6d667b
4a2d503
be4e91a
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
pr: 103651 | ||
summary: Flag in `_field_caps` to return only fields with values in index | ||
area: Search | ||
type: enhancement | ||
issues: [] |
Large diffs are not rendered by default.
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -12,6 +12,7 @@ | |
import org.elasticsearch.core.Nullable; | ||
import org.elasticsearch.index.IndexService; | ||
import org.elasticsearch.index.engine.Engine; | ||
import org.elasticsearch.index.mapper.ConstantFieldType; | ||
import org.elasticsearch.index.mapper.MappedFieldType; | ||
import org.elasticsearch.index.mapper.RuntimeField; | ||
import org.elasticsearch.index.query.MatchAllQueryBuilder; | ||
|
@@ -39,10 +40,12 @@ | |
*/ | ||
class FieldCapabilitiesFetcher { | ||
private final IndicesService indicesService; | ||
private final boolean includeFieldsWithNoValue; | ||
private final Map<String, Map<String, IndexFieldCapabilities>> indexMappingHashToResponses = new HashMap<>(); | ||
|
||
FieldCapabilitiesFetcher(IndicesService indicesService) { | ||
FieldCapabilitiesFetcher(IndicesService indicesService, boolean includeFieldsWithNoValue) { | ||
this.indicesService = indicesService; | ||
this.includeFieldsWithNoValue = includeFieldsWithNoValue; | ||
} | ||
|
||
FieldCapabilitiesIndexResponse fetch( | ||
|
@@ -100,7 +103,19 @@ private FieldCapabilitiesIndexResponse doFetch( | |
} | ||
|
||
final MappingMetadata mapping = indexService.getMetadata().mapping(); | ||
final String indexMappingHash = mapping != null ? mapping.getSha256() : null; | ||
String indexMappingHash; | ||
if (includeFieldsWithNoValue) { | ||
indexMappingHash = mapping != null ? mapping.getSha256() : null; | ||
} else { | ||
// even if the mapping is the same if we return only fields with values we need | ||
// to make sure that we consider all the shard-mappings pair, that is why we | ||
// calculate a different hash for this particular case. | ||
StringBuilder sb = new StringBuilder(indexService.getShard(shardId.getId()).getShardUuid()); | ||
if (mapping != null) { | ||
sb.append(mapping.getSha256()); | ||
} | ||
indexMappingHash = sb.toString(); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd like to understand how badly this affects the CCS optimizations around de-duplicating mappings. Are we going back to duplicating mappings for the sake of honouring the flag? Is this a reason why we may want to consider incorporating the flag in the mappings perhaps? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'll test it out and get back to you, it seems that There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This do not affect CCS and de-duplicating is still working fine. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This assessment does not sound accurate to me. The mapping hash is used to deduplicate mappings, and decide to send back one mapping per hash instead one mapping per index. But the whole mapping needs to be sent back? Using the flag, the index mapping hash deduplication is entirely disabled. We do have now field level deduplication across indices though, and the benefit that not returning empty fields will make responses smaller and decrease overhead on the merging side. But again, it is an entirely different trade-off and we need to measure the impact in the worst case scenario, compared to the previous index mapping hash based deduplication. Makes sense? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That is correct, my assessment was wrongly assuming that who different index would always have two different mapping. |
||
} | ||
if (indexMappingHash != null) { | ||
final Map<String, IndexFieldCapabilities> existing = indexMappingHashToResponses.get(indexMappingHash); | ||
if (existing != null) { | ||
|
@@ -114,7 +129,9 @@ private FieldCapabilitiesIndexResponse doFetch( | |
fieldNameFilter, | ||
filters, | ||
fieldTypes, | ||
fieldPredicate | ||
fieldPredicate, | ||
indicesService.getShardOrNull(shardId), | ||
includeFieldsWithNoValue | ||
); | ||
if (indexMappingHash != null) { | ||
indexMappingHashToResponses.put(indexMappingHash, responseMap); | ||
|
@@ -127,7 +144,9 @@ static Map<String, IndexFieldCapabilities> retrieveFieldCaps( | |
Predicate<String> fieldNameFilter, | ||
String[] filters, | ||
String[] types, | ||
Predicate<String> indexFieldfilter | ||
Predicate<String> indexFieldfilter, | ||
IndexShard indexShard, | ||
boolean includeFieldsWithNoValue | ||
) { | ||
boolean includeParentObjects = checkIncludeParents(filters); | ||
|
||
|
@@ -139,7 +158,8 @@ static Map<String, IndexFieldCapabilities> retrieveFieldCaps( | |
continue; | ||
} | ||
MappedFieldType ft = context.getFieldType(field); | ||
if (filter.test(ft)) { | ||
boolean includeField = includeFieldsWithNoValue || indexShard.fieldHasValue(ft.name()) || ft instanceof ConstantFieldType; | ||
if (includeField && filter.test(ft)) { | ||
IndexFieldCapabilities fieldCap = new IndexFieldCapabilities( | ||
field, | ||
ft.familyTypeName(), | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -163,6 +163,15 @@ private void doExecuteForked(Task task, FieldCapabilitiesRequest request, final | |
resp = new FieldCapabilitiesIndexResponse(resp.getIndexName(), curr.getIndexMappingHash(), curr.get(), true); | ||
} | ||
} | ||
indexResponses.merge(resp.getIndexName(), resp, (a, b) -> { | ||
piergm marked this conversation as resolved.
Show resolved
Hide resolved
|
||
if (request.includeFieldsWithNoValue() || a.get().equals(b.get())) { | ||
return a; | ||
} | ||
Map<String, IndexFieldCapabilities> mergedCaps = new HashMap<>(a.get()); | ||
mergedCaps.putAll(b.get()); | ||
return new FieldCapabilitiesIndexResponse(a.getIndexName(), a.getIndexMappingHash(), mergedCaps, a.canMatch()); | ||
piergm marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
piergm marked this conversation as resolved.
Show resolved
Hide resolved
|
||
}); | ||
indexResponses.putIfAbsent(resp.getIndexName(), resp); | ||
if (fieldCapTask.isCancelled()) { | ||
releaseResourcesOnCancel.run(); | ||
|
@@ -511,7 +520,7 @@ public void messageReceived(FieldCapabilitiesNodeRequest request, TransportChann | |
final Map<String, List<ShardId>> groupedShardIds = request.shardIds() | ||
.stream() | ||
.collect(Collectors.groupingBy(ShardId::getIndexName)); | ||
final FieldCapabilitiesFetcher fetcher = new FieldCapabilitiesFetcher(indicesService); | ||
final FieldCapabilitiesFetcher fetcher = new FieldCapabilitiesFetcher(indicesService, request.includeFieldsWithNoValue()); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't recall anymore: we said that the feature requires going to all shards, like index_filter does? Where do we make that decision? What happens if you provide the new flag without an index_filter? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. My understanding is that all the _field_caps requests are going to all the shards. The current behaviour is to send from the coordinating node one request per index to all the data nodes which in turn sends a request to all the shards that have such index. Just the first response from a shard for a particular index is then accepted in a data node and one node response for each index is accepted in the coordinating node while all the other pending requests are cancelled or discarded. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @original-brownbear is what I said correct? Answering your question:
I have tests for requests with and without index_filter and the behaviour is consistent in both cases. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe your tests don't have enough data nodes or shards to see issues, but I am pretty sure that field caps without index filter will pick one shard per index, instead of all shards for every index. Can you double check? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. After further analysis all code paths can go to all shards and all but my code path (return non-empty-fields only) breaks out of this loop when they find the first canMatch shard. When no index_filter is passed the first responding shard is always canMatch and breaks directly (unless there is an error), while when an index_filter is provided that could not be the case and the loop iterates to other shards until one returns a canMatch. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. cool thanks for the update. Could you check that we have test coverage about the new flag used in combination with index_filter, as well as not? Thanks! There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In |
||
final Predicate<String> fieldNameFilter = Regex.simpleMatcher(request.fields()); | ||
for (List<ShardId> shardIds : groupedShardIds.values()) { | ||
final Map<ShardId, Exception> failures = new HashMap<>(); | ||
|
@@ -532,7 +541,9 @@ public void messageReceived(FieldCapabilitiesNodeRequest request, TransportChann | |
unmatched.clear(); | ||
failures.clear(); | ||
piergm marked this conversation as resolved.
Show resolved
Hide resolved
|
||
allResponses.add(response); | ||
break; | ||
if (request.includeFieldsWithNoValue()) { | ||
break; | ||
} | ||
} else { | ||
unmatched.add(shardId); | ||
} | ||
|
Uh oh!
There was an error while loading. Please reload this page.