Skip to content

Flag in _field_caps to return only fields with values in index #103651

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 122 commits into from
Feb 8, 2024

Conversation

piergm
Copy link
Member

@piergm piergm commented Dec 21, 2023

We are adding a query parameter to the field_caps api in order to filter out fields with no values.
The parameter is called include_empty_fields and defaults to true, and if set to false it
will filter out from the field_caps response all the fields that has no value in the index.
We keep track of FieldInfos during refresh in order to know which field has value in an index.
We added also a sys.prop es.field_caps_empty_fields_filter in order to disable this feature if needed.

@piergm piergm self-assigned this Dec 21, 2023
@piergm piergm added >enhancement :Search/Search Search-related issues that do not fall into other categories labels Dec 21, 2023
@elasticsearchmachine
Copy link
Collaborator

Hi @piergm, I've created a changelog YAML for you.

@kertal
Copy link
Member

kertal commented Feb 7, 2024

A quick question, in a CCS scenario, when there are old CCS clusters targeted, and just the requesting machine has the include_empty_fields capabilities (using set to false in the request ). Then the CCS clusters return all the fields, including the empty ones, right? So all CCS clusters need to support include_empty_fields to get the desired result of excluding empty ones. But apart from that, such a request should work, so no error is thrown, because the CCS ones haven't heard of include_empty_fields before?

@piergm
Copy link
Member Author

piergm commented Feb 8, 2024

@kertal, yes, in CCS if we target es version prior to 8.13 we will not consider the flag and therefore we will be returning all fields, empty and non-empty, from those non updated clusters. That said the response per se is bwc and can be handled from the updated cluster, therefore we will not return any error from this miss match, instead we will return also empty fields from non updated clusters.

@kertal
Copy link
Member

kertal commented Feb 8, 2024

@kertal, yes, in CCS if we target es version prior to 8.13 we will not consider the flag and therefore we will be returning all fields, empty and non-empty, from those non updated clusters. That said the response per se is bwc and can be handled from the updated cluster, therefore we will not return any error from this miss match, instead we will return also empty fields from non updated clusters.

thx for the confirmation @piergm 👍

Copy link
Member

@javanna javanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some cosmetic comments, mostly around testing refinements. LGTM otherwise.

@@ -519,7 +531,7 @@ public void messageReceived(FieldCapabilitiesNodeRequest request, TransportChann
final Map<String, List<ShardId>> groupedShardIds = request.shardIds()
.stream()
.collect(Collectors.groupingBy(ShardId::getIndexName));
final FieldCapabilitiesFetcher fetcher = new FieldCapabilitiesFetcher(indicesService);
final FieldCapabilitiesFetcher fetcher = new FieldCapabilitiesFetcher(indicesService, request.includeFieldsWithNoValue());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool thanks for the update. Could you check that we have test coverage about the new flag used in combination with index_filter, as well as not? Thanks!

if (mapping != null) {
sb.append(mapping.getSha256());
}
indexMappingHash = sb.toString();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assessment does not sound accurate to me. The mapping hash is used to deduplicate mappings, and decide to send back one mapping per hash instead one mapping per index. But the whole mapping needs to be sent back?

Using the flag, the index mapping hash deduplication is entirely disabled. We do have now field level deduplication across indices though, and the benefit that not returning empty fields will make responses smaller and decrease overhead on the merging side. But again, it is an entirely different trade-off and we need to measure the impact in the worst case scenario, compared to the previous index mapping hash based deduplication. Makes sense?

@original-brownbear
Copy link
Contributor

Performance wise this is just fine for the non-ccs case. The logic is optimal enough at this point it seems to make serialisation fo the response the most important aspect of this. I could not easily reproduce a case where the flag had much measurable overhead (maybe a couple percent when all fields have data but that's it, not a crazy scientific benchmark run here but still :)).

E.g.

Server Software:        
Server Hostname:        elasticsearch-4
Server Port:            9200
SSL/TLS Protocol:       TLSv1.2,ECDHE-RSA-AES256-GCM-SHA384,2048,256
TLS Server Name:        elasticsearch-4

Document Path:          /_field_caps?fields=*
Document Length:        2067221 bytes

Concurrency Level:      1
Time taken for tests:   39.806 seconds
Complete requests:      100
Failed requests:        0
Total transferred:      206733400 bytes
HTML transferred:       206722100 bytes
Requests per second:    2.51 [#/sec] (mean)
Time per request:       398.058 [ms] (mean)
Time per request:       398.058 [ms] (mean, across all concurrent requests)
Transfer rate:          5071.83 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        7    7   0.1      7       7
Processing:   376  391  19.1    386     516
Waiting:      371  386  19.1    382     511
Total:        383  398  19.1    393     523

Percentage of the requests served within a certain time (ms)
  50%    393
  66%    395
  75%    398
  80%    400
  90%    411
  95%    440
  98%    478
  99%    523
 100%    523 (longest request)

vs


Server Software:        
Server Hostname:        elasticsearch-4
Server Port:            9200
SSL/TLS Protocol:       TLSv1.2,ECDHE-RSA-AES256-GCM-SHA384,2048,256
TLS Server Name:        elasticsearch-4

Document Path:          /_field_caps?fields=*?include_empty_fields=false
Document Length:        1891399 bytes

Concurrency Level:      1
Time taken for tests:   39.314 seconds
Complete requests:      100
Failed requests:        0
Total transferred:      189151200 bytes
HTML transferred:       189139900 bytes
Requests per second:    2.54 [#/sec] (mean)
Time per request:       393.138 [ms] (mean)
Time per request:       393.138 [ms] (mean, across all concurrent requests)
Transfer rate:          4698.55 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        7    7   0.1      7       8
Processing:   372  386  16.9    380     484
Waiting:      369  383  16.9    377     481
Total:        379  393  17.0    387     491

Percentage of the requests served within a certain time (ms)
  50%    387
  66%    395
  75%    401
  80%    402
  90%    411
  95%    425
  98%    452
  99%    491
 100%    491 (longest request)

Copy link
Contributor

@original-brownbear original-brownbear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did not re-read everything again, but performance is fine in my testing and a quick read of the code looks just fine still -> LGTM

@piergm piergm merged commit 54cfce4 into elastic:main Feb 8, 2024
@kertal
Copy link
Member

kertal commented Feb 8, 2024

🎉 👋 🙇 !

@ninoslavmiskovic
Copy link

Amazing 🤩 great job all 👍

@elasticsearchmachine
Copy link
Collaborator

@piergm according to this PR's labels, I need to update the changelog YAML, but I can't because the PR is closed. Please either update the changelog yourself on the appropriate branch, or adjust the labels. Specifically:

  • The PR is labelled release highlight but the changelog has no highlight section

walterra added a commit to elastic/kibana that referenced this pull request Mar 14, 2024
…d of custom query. (#178699)

## Summary

Part of #178606.

As of elastic/elasticsearch#103651 there is a
new field caps option `include_empty_fields`. This PR updates AIOps Log
Rate Analysis to make use of this option instead of a custom query and
code that identified populated fields.

### Checklist

- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios
- [x] [Flaky Test
Runner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1) was used on any tests changed https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/5482
- [x] This was checked for breaking API changes and was [labeled
appropriately](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement release highlight :Search/Search Search-related issues that do not fall into other categories Team:Search Meta label for search team v8.13.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants