Flag in _field_caps to return only fields with values in index #103651

piergm · 2023-12-21T14:35:31Z

We are adding a query parameter to the field_caps api in order to filter out fields with no values.
The parameter is called include_empty_fields and defaults to true, and if set to false it
will filter out from the field_caps response all the fields that has no value in the index.
We keep track of FieldInfos during refresh in order to know which field has value in an index.
We added also a sys.prop es.field_caps_empty_fields_filter in order to disable this feature if needed.

elasticsearchmachine · 2023-12-21T14:44:14Z

Hi @piergm, I've created a changelog YAML for you.

kertal · 2024-02-07T18:58:40Z

A quick question, in a CCS scenario, when there are old CCS clusters targeted, and just the requesting machine has the include_empty_fields capabilities (using set to false in the request ). Then the CCS clusters return all the fields, including the empty ones, right? So all CCS clusters need to support include_empty_fields to get the desired result of excluding empty ones. But apart from that, such a request should work, so no error is thrown, because the CCS ones haven't heard of include_empty_fields before?

piergm · 2024-02-08T07:28:18Z

@kertal, yes, in CCS if we target es version prior to 8.13 we will not consider the flag and therefore we will be returning all fields, empty and non-empty, from those non updated clusters. That said the response per se is bwc and can be handled from the updated cluster, therefore we will not return any error from this miss match, instead we will return also empty fields from non updated clusters.

kertal · 2024-02-08T07:43:38Z

@kertal, yes, in CCS if we target es version prior to 8.13 we will not consider the flag and therefore we will be returning all fields, empty and non-empty, from those non updated clusters. That said the response per se is bwc and can be handled from the updated cluster, therefore we will not return any error from this miss match, instead we will return also empty fields from non updated clusters.

thx for the confirmation @piergm 👍

javanna

I left some cosmetic comments, mostly around testing refinements. LGTM otherwise.

...er-extras/src/test/java/org/elasticsearch/index/mapper/extras/FieldCapsRankFeatureTests.java

test/framework/src/main/java/org/elasticsearch/index/mapper/FieldTypeTestCase.java

test/framework/src/main/java/org/elasticsearch/index/mapper/MapperServiceTestCase.java

server/src/main/java/org/elasticsearch/index/shard/IndexShard.java

javanna · 2024-02-08T10:41:45Z

server/src/main/java/org/elasticsearch/action/fieldcaps/TransportFieldCapabilitiesAction.java

@@ -519,7 +531,7 @@ public void messageReceived(FieldCapabilitiesNodeRequest request, TransportChann
                final Map<String, List<ShardId>> groupedShardIds = request.shardIds()
                    .stream()
                    .collect(Collectors.groupingBy(ShardId::getIndexName));
-                final FieldCapabilitiesFetcher fetcher = new FieldCapabilitiesFetcher(indicesService);
+                final FieldCapabilitiesFetcher fetcher = new FieldCapabilitiesFetcher(indicesService, request.includeFieldsWithNoValue());


cool thanks for the update. Could you check that we have test coverage about the new flag used in combination with index_filter, as well as not? Thanks!

javanna · 2024-02-08T10:45:15Z

server/src/main/java/org/elasticsearch/action/fieldcaps/FieldCapabilitiesFetcher.java

+            if (mapping != null) {
+                sb.append(mapping.getSha256());
+            }
+            indexMappingHash = sb.toString();


This assessment does not sound accurate to me. The mapping hash is used to deduplicate mappings, and decide to send back one mapping per hash instead one mapping per index. But the whole mapping needs to be sent back?

Using the flag, the index mapping hash deduplication is entirely disabled. We do have now field level deduplication across indices though, and the benefit that not returning empty fields will make responses smaller and decrease overhead on the merging side. But again, it is an entirely different trade-off and we need to measure the impact in the worst case scenario, compared to the previous index mapping hash based deduplication. Makes sense?

original-brownbear · 2024-02-08T15:53:18Z

Performance wise this is just fine for the non-ccs case. The logic is optimal enough at this point it seems to make serialisation fo the response the most important aspect of this. I could not easily reproduce a case where the flag had much measurable overhead (maybe a couple percent when all fields have data but that's it, not a crazy scientific benchmark run here but still :)).

E.g.

Server Software:        
Server Hostname:        elasticsearch-4
Server Port:            9200
SSL/TLS Protocol:       TLSv1.2,ECDHE-RSA-AES256-GCM-SHA384,2048,256
TLS Server Name:        elasticsearch-4

Document Path:          /_field_caps?fields=*
Document Length:        2067221 bytes

Concurrency Level:      1
Time taken for tests:   39.806 seconds
Complete requests:      100
Failed requests:        0
Total transferred:      206733400 bytes
HTML transferred:       206722100 bytes
Requests per second:    2.51 [#/sec] (mean)
Time per request:       398.058 [ms] (mean)
Time per request:       398.058 [ms] (mean, across all concurrent requests)
Transfer rate:          5071.83 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        7    7   0.1      7       7
Processing:   376  391  19.1    386     516
Waiting:      371  386  19.1    382     511
Total:        383  398  19.1    393     523

Percentage of the requests served within a certain time (ms)
  50%    393
  66%    395
  75%    398
  80%    400
  90%    411
  95%    440
  98%    478
  99%    523
 100%    523 (longest request)

vs


Server Software:        
Server Hostname:        elasticsearch-4
Server Port:            9200
SSL/TLS Protocol:       TLSv1.2,ECDHE-RSA-AES256-GCM-SHA384,2048,256
TLS Server Name:        elasticsearch-4

Document Path:          /_field_caps?fields=*?include_empty_fields=false
Document Length:        1891399 bytes

Concurrency Level:      1
Time taken for tests:   39.314 seconds
Complete requests:      100
Failed requests:        0
Total transferred:      189151200 bytes
HTML transferred:       189139900 bytes
Requests per second:    2.54 [#/sec] (mean)
Time per request:       393.138 [ms] (mean)
Time per request:       393.138 [ms] (mean, across all concurrent requests)
Transfer rate:          4698.55 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        7    7   0.1      7       8
Processing:   372  386  16.9    380     484
Waiting:      369  383  16.9    377     481
Total:        379  393  17.0    387     491

Percentage of the requests served within a certain time (ms)
  50%    387
  66%    395
  75%    401
  80%    402
  90%    411
  95%    425
  98%    452
  99%    491
 100%    491 (longest request)

original-brownbear

Did not re-read everything again, but performance is fine in my testing and a quick read of the code looks just fine still -> LGTM

kertal · 2024-02-08T16:53:59Z

🎉 👋 🙇 !

ninoslavmiskovic · 2024-02-08T17:05:31Z

Amazing 🤩 great job all 👍

elasticsearchmachine · 2024-02-09T14:34:44Z

@piergm according to this PR's labels, I need to update the changelog YAML, but I can't because the PR is closed. Please either update the changelog yourself on the appropriate branch, or adjust the labels. Specifically:

The PR is labelled release highlight but the changelog has no highlight section

…d of custom query. (#178699) ## Summary Part of #178606. As of elastic/elasticsearch#103651 there is a new field caps option `include_empty_fields`. This PR updates AIOps Log Rate Analysis to make use of this option instead of a custom query and code that identified populated fields. ### Checklist - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios - [x] [Flaky Test Runner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1) was used on any tests changed https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/5482 - [x] This was checked for breaking API changes and was [labeled appropriately](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)

piergm added 9 commits December 19, 2023 14:24

add has_value to field_caps response

c8af341

Merge branch 'elastic:main' into field_caps-has_value

e17fe2c

working prototype

feccdc7

move map to shard instead of index, adds queryparam

5ad8b99

rename query param to include_fields_with_no_value

ed3c3c6

code clean

45d892c

include_fields_with_no_value defaults to true

21061bb

defaults in tests

e92e9dc

defaults in tests

838ebbd

elasticsearchmachine added the v8.13.0 label Dec 21, 2023

piergm self-assigned this Dec 21, 2023

piergm added >enhancement :Search/Search Search-related issues that do not fall into other categories labels Dec 21, 2023

Update docs/changelog/103651.yaml

f5bfd31

piergm added 15 commits December 21, 2023 15:47

Merge branch 'main' into field_caps-has_value

4fc26d6

closing readers

f886135

small refactor

47af25e

move listener afterIndexShardStarted and close reader

683459f

closes just searcher

597f5a8

updated tests

752d31e

spotless

a21d6fa

added 1 to number of refresh of FrozenIndex test

c15dd78

updated docs and added tests

d6e9e65

Merge branch 'main' into field_caps-has_value

8d1fa0a

iter

58ab703

iter

c19d573

moved afterShardCreated to afterIndexCreated

76e5060

CI

d53c56d

Merge branch 'elastic:main' into field_caps-has_value

60d3731

piergm added 5 commits February 7, 2024 13:07

Merge branch 'elastic:main' into field_caps-has_value

058c6c8

multi cluster tests

b1aeb86

maybe fixed multi-cluster

4789d78

Merge branch 'main' into field_caps-has_value

2e8f7b7

iter

8115dcd

piergm added 2 commits February 8, 2024 09:00

removed empty spaces

a4e2d8d

Merge branch 'elastic:main' into field_caps-has_value

b6d667b

javanna approved these changes Feb 8, 2024

View reviewed changes

piergm added 2 commits February 8, 2024 14:03

iter

4a2d503

Merge branch 'main' into field_caps-has_value

be4e91a

original-brownbear approved these changes Feb 8, 2024

View reviewed changes

piergm merged commit 54cfce4 into elastic:main Feb 8, 2024

This was referenced Feb 9, 2024

Add optional query param in _field_caps to return only fields with value in index elastic/elasticsearch-specification#2413

Merged

Added highlight section to change log for _field_caps enhancement #105341

Merged

piergm added the release highlight label Feb 9, 2024

DaveCTurner mentioned this pull request Feb 9, 2024

[CI] CorruptedFileIT testReplicaCorruption failing #105330

Closed

This was referenced Feb 12, 2024

Test fix: handling engine closed when refreshing FieldInfos #105374

Merged

[Lens] Unskip functional test error elastic/kibana#176885

Merged

Adds field-caps non empty field task in the many-shards-quantitative challenge elastic/rally-tracks#562

Merged

piergm mentioned this pull request Feb 23, 2024

Field-caps field has value lookup use map instead of looping array #105770

Merged

This was referenced Mar 13, 2024

[ML] Use field caps option include_empty_fields to identify populated fields. elastic/kibana#178606

Closed

[ML] AIOps: Use field caps option include_empty_fields=false instead of custom query. elastic/kibana#178699

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Flag in _field_caps to return only fields with values in index #103651

Flag in _field_caps to return only fields with values in index #103651

Uh oh!

piergm commented Dec 21, 2023 •

edited

Loading

Uh oh!

elasticsearchmachine commented Dec 21, 2023

Uh oh!

kertal commented Feb 7, 2024

Uh oh!

piergm commented Feb 8, 2024

Uh oh!

kertal commented Feb 8, 2024

Uh oh!

javanna left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

javanna Feb 8, 2024

Uh oh!

javanna Feb 8, 2024

Uh oh!

original-brownbear commented Feb 8, 2024

Uh oh!

original-brownbear left a comment

Uh oh!

kertal commented Feb 8, 2024

Uh oh!

ninoslavmiskovic commented Feb 8, 2024

Uh oh!

elasticsearchmachine commented Feb 9, 2024

Uh oh!

Uh oh!

Flag in _field_caps to return only fields with values in index #103651

Flag in _field_caps to return only fields with values in index #103651

Uh oh!

Conversation

piergm commented Dec 21, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elasticsearchmachine commented Dec 21, 2023

Uh oh!

kertal commented Feb 7, 2024

Uh oh!

piergm commented Feb 8, 2024

Uh oh!

kertal commented Feb 8, 2024

Uh oh!

javanna left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

javanna Feb 8, 2024

Choose a reason for hiding this comment

Uh oh!

javanna Feb 8, 2024

Choose a reason for hiding this comment

Uh oh!

original-brownbear commented Feb 8, 2024

Uh oh!

original-brownbear left a comment

Choose a reason for hiding this comment

Uh oh!

kertal commented Feb 8, 2024

Uh oh!

ninoslavmiskovic commented Feb 8, 2024

Uh oh!

elasticsearchmachine commented Feb 9, 2024

Uh oh!

Uh oh!

piergm commented Dec 21, 2023 •

edited

Loading