Skip to content

Commit 7472f76

Browse files
committed
Merge step-by-step with existing
1 parent 6364bd4 commit 7472f76

File tree

4 files changed

+205
-24
lines changed

4 files changed

+205
-24
lines changed

fern/docs/pages/airdrop/data-extraction.mdx

Lines changed: 20 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,3 @@
1-
21
In the data extraction phase, the extractor is expected to call the external system's APIs
32
to retrieve all the items that were updated since the start of the last extraction.
43
If there was no previous extraction (the current run is an initial import),
@@ -43,12 +42,11 @@ Each artifact is submitted with an `item_type`, defining a separate domain objec
4342
external system and matching the `record_type` in the provided metadata.
4443
Item types defined when uploading extracted data must validate the declarations in the metadata file.
4544

46-
Extracted data must be normalized.
47-
45+
Extracted data must be normalized:
4846
- Null values: All fields without a value should either be omitted or set to null.
4947
For example, if an external system provides values such as "", -1 for missing values,
5048
those must be set to null.
51-
- Timestamps: Full-precision timestamps should be formatted as RFC3999 (`1972-03-29T22:04:47+01:00`),
49+
- Timestamps: Full-precision timestamps should be formatted as RFC3339 (`1972-03-29T22:04:47+01:00`),
5250
and dates should be just `2020-12-31`.
5351
- References: references must be strings, not numbers or objects.
5452
- Number fields must be valid JSON numbers (not strings).
@@ -74,9 +72,26 @@ All other fields are contained within the `data` attribute.
7472
}
7573
```
7674

75+
## Validating extracted data
7776

7877
Extracted artifacts can be validated with the `chef-cli` using the following command:
7978

8079
```bash
81-
chef-cli validate-metadata -m external_domain_metadata.json -r issue < extractor_issues_2.json
80+
chef-cli validate-data -m external_domain_metadata.json -r issue < extractor_issues_2.json
81+
```
82+
83+
You can also generate example data to show the format the data has to be normalized to, using:
84+
85+
```bash
86+
echo '{}' | chef-cli fuzz-extracted -r issue -m external_domain_metadata.json > example_issues.json
8287
```
88+
89+
## Deploying and testing the snap-in
90+
91+
Once you have implemented data extraction, you should deploy your snap-in to your test organization and run an import.
92+
93+
To deploy the snap-in, run `make auth` and `make deploy` in the snap-in repository. Then, activate the snap-in by running `devrev snap_in activate`.
94+
95+
After activation, you can create an import in the DevRev UI, which will initially reach the 'waiting for user input' stage. During this phase, you can verify your data extraction implementation is working correctly.
96+
97+
Relevant documentation can be found under [Snap-in development](/snapin-development/locally-testing-snap-ins).
Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
## Complete the chef-cli initial domain mapping setup
2+
3+
Next, continue with the steps outlined in [chef-cli setup](initial_domain_mapping_setup.md).
4+
5+
When you are done you should have the chef-cli context set up and have the chef-cli local UI running in your browser.
6+
7+
## Use the local UI to create initial domain mappings
8+
9+
The final artifact of the recipe creation process is the `initial_domain_mapping.json`, which has to be embedded in the extractor.
10+
11+
This mapping, unlike the recipe blueprint of a concrete import, can contain multiple options for each external record type from which the end-user might choose (for example allow 'task' from an external system to map either to issue or ticket in DevRev), and it can contain also mappings that apply to a record type category. When the user runs a new import, and the extractor reports in its metadata record types belonging to this category, that are not directly mapped in the initial domain mappings, the recipe manager will apply the per-category default to them.
12+
13+
After the blueprint of the test import was completed, the 'install in this org' button takes you to the initial domain mapping creation screen, where you can 'merge' the blueprint to the existing initial mappings of the org.
14+
15+
By repeating this process (run a new import, create a different configuration, merge to the initial mappings), you can create an initial mapping that contains multiple options for the user to choose from.
16+
17+
Finally the Export button allows you to retrieve the `initial_domain_mapping.json`.
18+
19+
## Tip: use local metadata in the local UI
20+
21+
You can also provide a local metadata file to the command using the '-m' flag for example: `chef-cli configure-mappings --env prod -m metadata.json`, this enables to use:
22+
23+
- raw jq transformations using an external field as input. (This is an experimental feature)
24+
25+
- filling in example input data for trying out the transformation.
26+
27+
In this case it is not validated that the local file is the same as the one submitted by the snap-in, this has to be ensured by you.
28+
29+
## Test an import with initial mapping using the in-app UI.
30+
31+
Once the initial mappings are prepared and, any new import in the org (with the same snap-in slug and import slug) where they are installed will use them. The end-users can influence the recipe blueprint that gets created for the sync unit trough the mapping screen in the UI, where they can make record-type filtering, mapping, fine grained filtering, low-code field and value mapping, and finally custom field filtering.
32+
33+
Their decisions are constrained by the choices provided in the initial domain mappings. Currently the low-code UI offers limited insight into the mappings and their reasons, and in some cases, mismatches arise when something that worked in chef-cli doesn't offer the right options to the user, or not all fields that should be resolved are solved. To assist debugging such cases, chef-cli provides a command to extract the description of the low-code decisions that are asked in the UI. Please provide this to us when reporting an issue with how the end-user mapping UI behaves.
34+
35+
```bash
36+
chef-cli low-code --env prod > low_code.json`
37+
```
38+
39+
## Read metadata tips to improve the metadata
40+
41+
See the [metadata tips](tips.md) section for more information on how to improve the metadata.

fern/docs/pages/airdrop/metadata-extraction.mdx

Lines changed: 140 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,3 @@
1-
21
During the metadata extraction phase, the Airdrop snap-in must provide an
32
`external_domain_metadata.json` file on each sync run.
43
This file provides a structured way of describing the external system's domain system,
@@ -24,6 +23,33 @@ During the metadata extraction phase, the Airdrop snap-in must provide an
2423
The transformation can be crafted and finalized further using the `chef-cli` to ensure extracted
2524
data is mapped consistently to the DevRev domain model.
2625

26+
## Getting started with infer-metadata
27+
28+
The `chef-cli` provides a helpful command to generate initial domain metadata from example data:
29+
30+
```bash
31+
chef-cli infer-metadata example_data_directory > metadata.json
32+
```
33+
34+
To get good results with this approach:
35+
36+
1. Collect example data from the external system and place them in a directory. Each file should:
37+
- Contain the same type of records, named after their type
38+
- Have .json or .jsonl extension, for example `issues.json`
39+
- Contain either a single JSON array of objects, or newline-separated objects
40+
41+
2. Run the `infer-metadata` command targeting this directory
42+
43+
3. Inspect the generated metadata, particularly field types and the suggestions the tool generates
44+
45+
For best results:
46+
- Provide 10-100 examples of each record type (but not more than 1000)
47+
- Ensure logically distinct fields are separate keys at the top level
48+
- Use referentially consistent example data if possible
49+
- Make sure IDs are strings, not numbers
50+
51+
This generated metadata serves as a starting point that will need further refinement.
52+
2753
## Craft the metadata declaration
2854

2955
Since crafting metadata declaration in the form of an `external_domain_metadata.json` file can be a
@@ -36,7 +62,6 @@ a snap-in run from external system APIs (since they are configurable in the exte
3662
be changed by the end user at any time, such as mandatory fields or custom fields).
3763

3864
<Steps>
39-
4065
### Declare the extracted record types
4166

4267
_Record types_ are the types of records that has a well-defined schema you extract from or load
@@ -266,30 +291,126 @@ be changed by the end user at any time, such as mandatory fields or custom field
266291
existing in that record type, marked 'is_identifier'. For example:
267292
```json
268293
{
269-
"record_types": {
270-
"users": {
271-
"fields": {
272-
"email": {
273-
"type": "text",
274-
"is_identifier":true
294+
"record_types": {
295+
"users": {
296+
"fields": {
297+
"email": {
298+
"type": "text",
299+
"is_identifier":true
300+
}
275301
}
276-
}
277-
},
278-
"comments": {
279-
"fields": {
280-
"user_email": {
281-
"type": "reference",
282-
"reference": {
283-
"refers_to": {
284-
"#record:users": {
285-
"by_field": "email"
302+
},
303+
"comments": {
304+
"fields": {
305+
"user_email": {
306+
"type": "reference",
307+
"reference": {
308+
"refers_to": {
309+
"#record:users": {
310+
"by_field": "email"
311+
}
286312
}
287313
}
288314
}
289315
}
290316
}
291317
}
292318
}
293-
}
294319
```
320+
321+
### Define field attributes
322+
323+
External system fields that shouldn't be mapped in reverse should be marked as `is_read_only`.
324+
Depending on their purpose you can also mark fields as `is_indexed`, `is_identifier`, `is_filterable`,
325+
`is_write_only` etc. By default these will be set to false. You can find the full list of supported
326+
field attributes and their descriptions in the [metadata schema](./external_domain_metadata_schema.json).
327+
328+
### Configure state transitions
329+
330+
If an external record type has some concept of states, between which only certain transitions are
331+
possible (e.g., to move to the 'resolved' status, an issue first has to be 'in_testing'), you
332+
can declare these in the metadata too.
333+
334+
This will allow creation of a matching 'stage diagram' in DevRev, which enables a much simpler
335+
import and a closer preservation of the external data than mapping to DevRev's builtin stages.
336+
337+
This is especially important for two-way sync, as setting the transitions correctly ensures that
338+
the transitions a record undergoes in DevRev can be replicated in the external system.
339+
340+
To declare this in the metadata, make sure the status is represented as an enum field, and then
341+
declare the allowed transitions:
342+
343+
```json
344+
{
345+
"fields": {
346+
"status": {
347+
"name": "Status",
348+
"is_required": true,
349+
"type": "enum",
350+
"enum": {
351+
"values": [
352+
{
353+
"key": "detected",
354+
"name": "Detected"
355+
},
356+
{
357+
"key": "mitigated",
358+
"name": "Mitigated"
359+
},
360+
{
361+
"key": "rca_ready",
362+
"name": "RCA Ready"
363+
},
364+
{
365+
"key": "archived",
366+
"name": "Archived"
367+
}
368+
]
369+
}
370+
}
371+
},
372+
"stage_diagram": {
373+
"controlling_field": "status",
374+
"starting_stage": "detected",
375+
"all_transitions_allowed": false,
376+
"stages": {
377+
"detected": {
378+
"transitions_to": ["mitigated", "archived", "rca_ready"],
379+
"state": "new"
380+
},
381+
"mitigated": {
382+
"transitions_to": ["archived", "detected"],
383+
"state": "work_in_progress"
384+
},
385+
"rca_ready": {
386+
"transitions_to": ["archived"],
387+
"state": "work_in_progress"
388+
},
389+
"archived": {
390+
"transitions_to": [],
391+
"state": "completed"
392+
}
393+
},
394+
"states": {
395+
"new": {
396+
"name": "New"
397+
},
398+
"work_in_progress": {
399+
"name": "Work in Progress"
400+
},
401+
"completed": {
402+
"name": "Completed",
403+
"is_end_state": true
404+
}
405+
}
406+
}
407+
}
408+
```
409+
410+
In the above example:
411+
- The status field is the controlling field of the stage diagram
412+
- If a status field has no explicit transitions but you still want a stage diagram, set `all_transitions_allowed` to `true`
413+
- External systems may categorize statuses (like Jira's status categories), which can be included in the diagram metadata (`states` in the example)
414+
- The `starting_stage` defines the initial stage for new object instances
415+
- In current metadata format (v0.2.0), the order and human-readable name are taken from the enum values defined on the controlling field
295416
</Steps>

fern/versions/public.yml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -164,6 +164,10 @@ navigation:
164164
hidden: false
165165
slug: data-extraction
166166
path: ../docs/pages/airdrop/data-extraction.mdx
167+
- page: "Initial domain mapping"
168+
hidden: false
169+
slug: initial-domain-mapping
170+
path: ../docs/pages/airdrop/initial-domain-mapping.mdx
167171
- page: "Attachments extraction"
168172
hidden: false
169173
slug: extract-attachments

0 commit comments

Comments
 (0)