Skip to content

Commit b5d8a44

Browse files
authored
Adding the API design guidelines. (#2476)
* Adding the API design guidelines. * Updating unsaved files
1 parent 0a7126f commit b5d8a44

File tree

5 files changed

+613
-5
lines changed

5 files changed

+613
-5
lines changed

README.md

Lines changed: 16 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,24 @@
1-
# Elasticsearch Specification
1+
# Elasticsearch API Specification
22

3-
This repository contains the Elasticsearch request/response definitions in TypeScript,
4-
you can find them inside [`/specification`](./specification).
5-
The [`/compiler`](./compiler) folder contains a TypeScript program that compiles the entire definition
6-
in a JSON representation that can be used for generating language clients.
3+
The **Elasticsearch API Specification** provides the contract for communication between client and server components within the Elasticsearch stack.
4+
With almost 500 API endpoints and around 3000 data types across the entire API surface, this project is a vitally important part of sustaining our engineering efforts at scale.
5+
6+
The repository has the following structure:
7+
8+
| Path | Description |
9+
| -------- | ------- |
10+
| [`api-design-guidelines/`](api-design-guidelines/) | Knowledge base of best practices for API design. |
11+
| [`compiler/`](compiler/) | TypeScript compiler for specification definition to JSON. |
12+
| [`compiler-rs/`](compiler-rs/) | |
13+
| [`docs/`](docs/) | |
14+
| [`output/`](output/) | |
15+
| [`specification/`](specification/) | Elasticsearch request/response definitions in TypeScript. |
16+
| [`typescript-generator/`](typescript-generator/) | |
717

818
This JSON representation is formally defined by [a set of TypeScript definitions (a meta-model)](./compiler/src/model/metamodel.ts)
919
that also explains the various properties and their values.
1020

21+
1122
## Prepare the environment
1223

1324
For generating the JSON representation and running the validation code you need

api-design-guidelines/README.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
# Elasticsearch API design guidelines
2+
3+
This document aims to provide a set of design guidelines for drawing up HTTP APIs in a way that is consistent and easy to consume. These concerns are becoming ever more important, as the number of API consumers grows, and as new systems are built upon these lower layers. One such project involves the automatic generation of API documentation, the existence of which is crucial for the effective communication of the Elastic product surface to users.
4+
5+
While it is also desirable and beneficial to align existing APIs with these design principles, this is very much recognised as a complex secondary concern, which can only happen gradually over a longer period of time.
6+
7+
All guidelines are just that: guidelines. These should not be read as hard rules, but as a set of best practices distilled from our collective experience. As such, deviation from these guidelines is acceptable, but must be accompanied by a solid reasoning for that deviation.
8+
9+
* [Naming](naming.md)
10+
* [Data modelling](data-modelling.md)
11+
* [Requests and responses](requests-responses.md)
Lines changed: 267 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,267 @@
1+
# API guidelines - Data modelling
2+
3+
Request and response bodies are typically encoded in JSON. The full JSON specification is defined in [RFC 8259](https://datatracker.ietf.org/doc/html/rfc8259).
4+
5+
## JSON objects are not ordered
6+
7+
JSON objects must be conceptually treated as **unordered**. Maintaining order of deserialized JSON objects is not supported in general, and isn’t possible in some programming languages.
8+
9+
If preservation of order is important, such ordering information should be represented separately. The examples below shows a plain (unordered) object alongside three alternative representations which also provide ordering detail.
10+
11+
```json
12+
// Plain object (unordered)
13+
{
14+
"one": "eins",
15+
"two": "zwei",
16+
"three": "drei"
17+
}
18+
19+
// List of single key objects
20+
[
21+
{"one": "eins"},
22+
{"two": "zwei"},
23+
{"three": "drei"}
24+
]
25+
26+
// List of key-value lists
27+
[
28+
["one", "eins"],
29+
["two", "zwei"],
30+
["three", "drei"]
31+
]
32+
33+
// List of key-value objects
34+
[
35+
{"key": "one", "value": "eins"},
36+
{"key": "two", "value": "zwei"},
37+
{"key": "three", "value": "drei"}
38+
]
39+
```
40+
41+
Ordering of keys in JSON objects should be consistent when serialized into a response body to avoid being confusing to human viewers.
42+
43+
## JSON objects and query parameters must not have duplicate keys
44+
45+
JSON objects and query string parameters must not use duplicate keys. The behaviour of most JSON libraries and language modules is undefined in this regard, and therefore unexpected consequences are likely to occur if duplicate keys are passed.
46+
47+
## Avoid null and empty string to mark a value as missing
48+
49+
Nulls are sometimes used to mark a value as "missing" or "not available". Instead of returning `null` as a value in JSON prefer to omit the field in the response. Empty strings (`""`) should similarly not be used to denote a missing string value as they can be confused for an explicit value of the empty string. It is fine to accept `null` in the request if they have special semantics such as unsetting a value. Otherwise fields that would have null or empty string values should also be omitted in the request.
50+
51+
- DON'T:
52+
```json
53+
{"key1": 1, "key2": null}
54+
```
55+
- DON'T:
56+
```json
57+
{"key1": 1, "key2": ""}
58+
```
59+
- DO:
60+
```json
61+
{"key1": 1}
62+
```
63+
64+
## Consider the portability of numeric values over JSON
65+
66+
Numeric values can be subject to a couple of limitations that exist in some JSON parsers:
67+
68+
- Values with more than 53 bits of precision are not guaranteed to be interpreted losslessly due to the limitations of the IEEE 64-bit floating point number format, which is used by some parsers. This most notably affects large integer values within the int64 and uint64 ranges.
69+
70+
- Special floating point values (NaN, Infinity and negative zero) are not included in the official JSON spec and, as such, are not reliably understood by all JSON parsers.
71+
72+
In general, a solution to both of these problems is to permit numeric values to be encoded as either JSON strings or JSON numbers for certain cases. However, this solution should only be employed in conjunction with the corresponding API specification, wherein the correct data type and numeric range for a particular field can be denoted. It is the responsibility of the API designer to ensure that ambiguity does not occur between numeric values passed as strings and true string values.
73+
74+
### Integer values
75+
76+
If the API specification defines a field as having a type of int8, int16 or int32 (or equivalent numeric range) then corresponding values should always be encoded as JSON numbers, as no lossiness can occur within these ranges.
77+
78+
If the API specification defines a field as having a type of int64 or uint64 (or equivalent numeric range) then corresponding values may be passed as either JSON numbers or JSON strings. Note that the smallest positive integer that is subject to potential lossiness is (2^53 + 1) and the smallest equivalent negative integer is (-2^53 - 1).
79+
80+
```python
81+
>>> 2**53+1, int(float(2**53+1))
82+
(9007199254740993, 9007199254740992)
83+
84+
>>> -2**53-1, int(float(-2**53-1))
85+
(-9007199254740993, -9007199254740992)
86+
```
87+
88+
In all integer cases, JSON numbers and equivalent JSON strings should only ever be written as digits with an optional sign prefix. A decimal point should never be written for integer fields (even if the fractional component is "`.0`") as JSON parsers that distinguish between numeric types will typically interpret such a value as a floating point number rather than as an integer.
89+
90+
### Floating point values
91+
92+
If the API specification defines a field as having a type of float32 or float64 then corresponding values may be passed as either JSON numbers or JSON strings. All such values should be written with a decimal point and fractional component, even if that fractional component is ".0". If a field is likely to need to pass special values, consider always encoding the values as strings, for simpler client-side processing.
93+
94+
To ensure maximum portability, all special values must be passed as JSON strings and must be encoded precisely according to the table below, including casing.
95+
96+
| JSON output | Description |
97+
|---------------|-------------------|
98+
| `"-0.0"` | Negative zero |
99+
| `"NaN"` | Not a number |
100+
| `"Infinity"` | Positive infinity |
101+
| `"+Infinity"` | Positive infinity |
102+
| `"-Infinity"` | Negative infinity |
103+
104+
### Don't mix static and dynamic keys in JSON objects
105+
106+
Modelling objects that are a mix between static and dynamic keys is more complex to parse and extend as an API as you need to provide a hash/dictionary-like structure for arbitrary key access in addition to a structure that has properties. To avoid this problem **dynamic values like names and IDs shouldn’t be mixed with static keys in objects**.
107+
108+
An example of mixed static and dynamic keys can be seen in the `indices.field_usage_stats` endpoint response:
109+
110+
```json
111+
{
112+
"_shards": {
113+
"total": 1,
114+
"successful": 1,
115+
"failed": 0
116+
},
117+
"my-index": { ... }
118+
}
119+
```
120+
121+
The key `"my-index"` is user-defined wheras `"_shards"` is static. This API would be easier to model by keeping all dynamic keys in their own object:
122+
123+
```json
124+
{
125+
"_shards": {
126+
"total": 1,
127+
"successful": 1,
128+
"failed": 0
129+
},
130+
"indices": {
131+
"my-index": { ... }
132+
}
133+
}
134+
```
135+
136+
Or better yet, to completely avoid using dynamic keys the user-defined value can be a property value within the object itself:
137+
138+
```json
139+
{
140+
"_shards": {
141+
"total": 1,
142+
"successful": 1,
143+
"failed": 0
144+
},
145+
"indices": [
146+
{"name": "my-index", ...}
147+
]
148+
}
149+
```
150+
151+
### Model object variants in a consumable way
152+
153+
Sometimes an API accepts an object but the keys are determined by what "kind/variant" of the object is intended. An example of this is aggregations, queries, and pipeline steps. There are two ways the Elasticsearch API handles this situation. The first method is using an **internal variant type** property like "type" with analyzers:
154+
155+
```json
156+
{
157+
"type": "snowball",
158+
"stopwords": ["if", "and", "but"]
159+
}
160+
```
161+
162+
The second is using **external variants** where the inner object is wrapped with an object with a single key containing the kind of the inner object. This example changes the analyzer from above to use an external variant:
163+
164+
```json
165+
{
166+
"snowball": {
167+
"stopwords": ["if", "and", "but"]
168+
}
169+
}
170+
```
171+
172+
When choosing between these two possibilities **favor using external variants** as it removes the requirement to buffer key-value pairs until the internal variant property is found. Using external variants also improves traversability of the API (ie auto-complete) as properties can be anticipated without waiting for the discriminant property.
173+
174+
## Model enumerations in a portable way
175+
176+
Enumerations should be modelled in a way that is most portable across programming languages. The following guidelines apply:
177+
178+
- Always use string values
179+
- Values should be case-sensitive and casing should be consistent across the enumeration.
180+
- Values should only use basic characters
181+
182+
**Note that booleans and enumerations are distinct types**. Historically, some enum values have evolved from booleans, but where the original boolean form remains, a clumsy mixture of accepted values sometimes results. For example:
183+
184+
```ts
185+
enum DynamicMapping {strict, runtime, true, false}
186+
```
187+
188+
Where the above type describes the value accepted by the "dynamic_mapping" parameter. While this may be useful for backwards compatibility, this arguably actually introduces a new value which has been conflated with the original. Additionally, changing a value from `boolean -> union[bool|enum]` isn’t backwards compatible for clients. An alternative might be:
189+
190+
```ts
191+
bool DynamicMappingEnabled {true, false}
192+
// only use DynamicMapping if the first value is true
193+
enum DynamicMapping {simple, strict, runtime}
194+
```
195+
196+
Or adding a "disabled" value to the enum:
197+
198+
```ts
199+
enum DynamicMapping {simple, strict, runtime, disabled}
200+
```
201+
202+
## Units
203+
204+
Below is the complete set of units that are accepted by Elasticsearch along with their canonicalized suffix or abbreviation. Note that these units are case-sensitive as there can be collisions within a category (e.g. minute versus month)
205+
206+
| Category | Unit | Suffix(es) / Abbreviation |
207+
|-----------|-------------------------|---------------------------|
208+
| Duration | Nanosecond | ns, nanos¹ |
209+
| Duration | Microsecond | us, micros |
210+
| Duration | Millisecond | ms, millis¹ |
211+
| Duration | Second | s, seconds |
212+
| Duration | Minute | m, minutes |
213+
| Duration | Hour | h, hours |
214+
| Duration | Day | d, days |
215+
| Duration | Month | M, months |
216+
| Duration | Year | y, years |
217+
| Distance | Millimeter | mm, millimeters |
218+
| Distance | Centimeter | cm, centimeters |
219+
| Distance | Meter | m, meters |
220+
| Distance | Kilometer | km, kilometers |
221+
| Distance | Inch | in, inches |
222+
| Distance | Foot | ft, feet |
223+
| Distance | Mile | mi, miles |
224+
| Distance | Nautical Mile | nmi, nautical_miles |
225+
| Byte size | Byte | b, bytes |
226+
| Byte size | Kilobyte ("kibibyte") ² | kb |
227+
| Byte size | Metabyte ("mibibyte") ² | mb |
228+
| Byte size | Gigabyte ("gibibyte") ² | gb |
229+
| Byte size | Terabyte ("tebibyte") ² | tb |
230+
| Byte size | Petabyte ("pebibyte") ² | pb |
231+
232+
¹ "millis" and "nanos" are used already in the API as a suffix for field names to either represent a datetime value in terms of "units since the epoch" or a duration of time.
233+
For example: `{"start_date_in_millis": 1652296510000}` or `{"duration_in_millis": 10}`
234+
235+
² Note that Elasticsearch uses suffixes like "kb" (Kilobytes) instead of "kib" (Kibibytes) despite scaling bytes values by factors of 1,024 instead of 1,000. Also note that the suffixes for these units in prose use an uppercase "B" to represent bytes (ie kB, not kb) as lowercase "b" means bits instead of bytes.
236+
237+
For the following measures consider using the default unit in order to be consistent with other areas in the API:
238+
239+
| Metric | Recommended unit | Suffix examples |
240+
|-----------|------------------|----------------------|
241+
| Duration | milliseconds | "duration_in_millis" |
242+
| Byte size | bytes | "memory_in_bytes" |
243+
244+
Values which have a unit should be explicit by specifying the name in the property, name in the value, or unit as a separate property in the same object.
245+
246+
- DO: `{"took_in_millis": 3}`
247+
- DO: `{"took": "3ms"}`
248+
- DO: `{"took": {"value": 3, "unit": "ms"}}`
249+
- DON'T: `{"took": 3}`
250+
251+
## Formats
252+
253+
Many of our APIs support users supplying values in multiple different formats. For example, a duration can be specified either as an integer value (10000) or as a string value ("10s") and mean the same thing assuming the default unit is "milliseconds" for this value. This is especially true for APIs accepting dates and times.
254+
255+
For the sake of user experience we should accept values in different formats. For values that accept multiple formats the format should be explicitly set by users so Elasticsearch can interpret and validate the value based on the given format. For example:
256+
257+
```json
258+
{"type": "date", "format": "yyyy-MM-dd HH:mm:ss"}
259+
```
260+
261+
Values that are returned in responses (with the exception of user-specified values) must always be in the same format. Below is the set of formats to use to be consistent with other APIs:
262+
263+
| Metric | Format | Example(s) |
264+
|-----------|----------|-----------------------|
265+
| Date | ISO 8601 | "2022-05-10" |
266+
| Datetime | ISO 8601 | "2022-05-10 16:00:00" |
267+
| Time zone | ISO 8601 | "Z", "-05:00" |

0 commit comments

Comments
 (0)