Skip to content

Commit 73697f6

Browse files
ISS-162937: Update Airdrop docs (#210)
* ISS-162937: Update Airdrop docs * Add item_url_field explanation * Fix highlighted line in the example. * Fixes * some markup changes --------- Co-authored-by: Ben Colborn <[email protected]>
1 parent fc51a43 commit 73697f6

15 files changed

+302
-88
lines changed

fern/docs/pages/airdrop/attachments-extraction.mdx

Lines changed: 22 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -10,8 +10,8 @@ During the attachment extraction phase,
1010
the snap-in extracts attachments from the external system and uploads them as artifacts to DevRev.
1111

1212
The snap-in must respond to Airdrop with a message with an event of type
13-
`EXTRACTION_ATTACHMENTS_PROGRESS` together with an optional progress estimate and relevant artifacts
14-
when it extracts some data and the maximum snap-in run time (12 minutes) has been reached.
13+
`EXTRACTION_ATTACHMENTS_PROGRESS` together with an optional progress estimate
14+
when the maximum snap-in runtime (13 minutes) has been reached.
1515

1616
The snap-in must respond to Airdrop with a message with an event of type `EXTRACTION_ATTACHMENTS_DELAY`
1717
and specify a back-off time when the extraction has been rate-limited by the external system and
@@ -42,6 +42,26 @@ domain object ID from the external system and the actor ID from the external sys
4242
The uploaded artifact is structured like a normal artifact containing extracted data in JSON Lines
4343
(JSONL) format and requires specifying `ssor_attachment` as the item type.
4444

45+
The snap-in must respond to Airdrop with a message, that either signals success, a delay, or an error.
46+
47+
```typescript Success
48+
await adapter.emit(ExtractorEventType.ExtractionAttachmentsDone);
49+
```
50+
51+
```typescript Delay
52+
await adapter.emit(ExtractorEventType.ExtractionAttachmentsDelay, {
53+
delay: "30",
54+
});
55+
```
56+
57+
```typescript Error
58+
await adapter.emit(ExtractorEventType.ExtractionAttachmentsError, {
59+
error: "Informative error message",
60+
});
61+
```
62+
63+
<Note>The snap-in must always emit a single message.</Note>
64+
4565
## Examples
4666

4767
Here is an example of an SSOR attachment file:

fern/docs/pages/airdrop/data-extraction.mdx

Lines changed: 158 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -1,47 +1,120 @@
11
In the data extraction phase, the extractor is expected to call the external system's APIs
2-
to retrieve all the items that were updated since the start of the last extraction.
3-
If there was no previous extraction (the current run is an initial import),
4-
then all the items should be extracted.
2+
to retrieve all the items that should be synced with DevRev.
53

6-
The extractor must store at what time it started each extraction in its state,
7-
so that it can extract only items created or updated since this date in the next sync run.
4+
If the current run is an initial sync, this means all the items should be extracted.
5+
Otherwise the extractor should retrieve all the items that were changed since the start of the last extraction.
6+
7+
Each snap-in invocation runs in a separate runtime instance with a maximum execution time of 13 minutes.
8+
After 10 minutes, the Airdrop platform sends a message to the snap-in to gracefully exit.
9+
10+
If a large amount of data needs to be extracted, it might not all be extracted within this time frame.
11+
To handle such situations, the snap-in uses a state object.
12+
This state object is shared across all invocations and keeps track of where the previous snap-in invocations ended in the extraction process.
813

914
## Triggering event
1015

11-
Airdrop initiates data extraction by starting the snap-in with a message with event of type
16+
Airdrop initiates data extraction by starting the snap-in with a message with event type
1217
`EXTRACTION_DATA_START` when transitioning to the data extraction phase.
1318

1419
During the data extraction phase, the snap-in extracts data from an external system,
15-
prepares batches of data and uploads them in the form of artifacts to DevRev.
20+
prepares batches of data and uploads them in the form of artifacts (files) to DevRev.
1621

17-
The snap-in must respond to Airdrop with a message with event of type `EXTRACTION_DATA_PROGRESS`,
18-
together with an optional progress estimate and relevant artifacts
19-
when it extracts some data and the maximum Airdrop snap-in runtime (12 minutes) has been reached.
22+
The snap-in must respond to Airdrop with a message with event type of `EXTRACTION_DATA_PROGRESS`,
23+
together with an optional progress estimate when the maximum Airdrop snap-in runtime (13 minutes) has been reached.
2024

2125
If the extraction has been rate-limited by the external system and back-off is required, the snap-in
22-
must respond to Airdrop with a message with event of type `EXTRACTION_DATA_DELAY` and specifying
23-
back-off time with `delay` attribute.
26+
must respond to Airdrop with a message with event type `EXTRACTION_DATA_DELAY` and specifying
27+
back-off time with `delay` attribute (in seconds as an integer).
2428

25-
In both cases, Airdrop starts the snap-in with a message with event of type `EXTRACTION_DATA_CONTINUE`.
26-
The restarting is immediate (in case of `EXTRACTION_DATA_PROGRESS`) or delayed
27-
(in case of `EXTRACTION_DATA_DELAY`).
29+
In both cases, Airdrop starts the snap-in with a message with event type `EXTRACTION_DATA_CONTINUE`.
30+
In case of `EXTRACTION_DATA_PROGRESS` the restarting is immediate,
31+
meanwhile in case of `EXTRACTION_DATA_DELAY` the restarting is delayed for the given number of seconds.
2832

29-
Once the data extraction is done, the snap-in must respond to Airdrop with a message with event of
30-
type `EXTRACTION_DATA_DONE`.
33+
Once the data extraction is done, the snap-in must respond to Airdrop with a message with event type `EXTRACTION_DATA_DONE`.
3134

3235
If data extraction failed in any moment of extraction, the snap-in must respond to Airdrop with a
33-
message with event of type `EXTRACTION_DATA_ERROR`.
36+
message with event type `EXTRACTION_DATA_ERROR`.
3437

3538
## Implementation
3639

3740
Data extraction should be implemented in the [data-extraction.ts](https://github.com/devrev/airdrop-template/blob/main/code/src/functions/extraction/workers/data-extraction.ts) file.
3841

39-
During the data extraction phase, the snap-in uploads batches of extracted items (with tooling from `@devrev/adaas-sdk`).
42+
The snap-in must respond to Airdrop with a message, that signals either success, a delay, progress, or an error.
43+
44+
```typescript Success
45+
await adapter.emit(ExtractorEventType.ExtractionDataDone);
46+
```
47+
48+
```typescript Delay
49+
await adapter.emit(ExtractorEventType.ExtractionDataDelay, {
50+
delay: "30",
51+
});
52+
```
53+
54+
```typescript Progress
55+
await adapter.emit(ExtractorEventType.ExtractionDataProgress);
56+
```
57+
58+
```typescript Error
59+
await adapter.emit(ExtractorEventType.ExtractionDataError, {
60+
error: {
61+
message: "Failed to extract data.",
62+
},
63+
});
64+
```
65+
66+
<Note>The snap-in must always emit a single message.</Note>
67+
68+
### Extracting and storing the data
69+
70+
The SDK library includes a repository system for handling extracted items.
71+
Each item type, such as users, tasks, or issues, has its own repository.
72+
These are defined in the `repos` array as `itemType`.
73+
The `itemType` name should match the `record_type` specified in the provided metadata.
74+
75+
```typescript
76+
const repos = [
77+
{
78+
itemType: 'todos',
79+
},
80+
{
81+
itemType: 'users',
82+
},
83+
{
84+
itemType: 'attachments',
85+
},
86+
];
87+
```
88+
89+
The `initializeRepos` function initializes the repositories and should be the first step when the process begins.
90+
91+
```typescript
92+
processTask<ExtractorState>({
93+
task: async ({ adapter }) => {
94+
adapter.initializeRepos(repos);
95+
// ...
96+
},
97+
onTimeout: async ({ adapter }) => {
98+
// ...
99+
},
100+
});
101+
```
102+
103+
After initialization of repositories using `initializeRepos`,
104+
items should be then retrieved from the external system and stored in the correct repository by calling the `push` function.
105+
106+
```typescript
107+
await adapter.getRepo('users')?.push(items);
108+
```
109+
110+
Behind the scenes, the SDK library stores items pushed to the repository and uploads them in batches to the Airdrop platform.
40111

41-
Each artifact is submitted with an `item_type`, defining a separate domain object from the
42-
external system and matching the `record_type` in the provided metadata.
112+
### Data normalization
43113

44-
Extracted data must be normalized:
114+
Extracted data must be normalized to fit the domain metadata defined in the `external-domain-metadata.json` file.
115+
More details on this process are provided in the [Metadata extraction](/public/snapin-development/adaas/metadata-extraction) section.
116+
117+
Normalization rules:
45118

46119
- Null values: All fields without a value should either be omitted or set to null.
47120
For example, if an external system provides values such as "", –1 for missing values,
@@ -52,8 +125,29 @@ Extracted data must be normalized:
52125
- Number fields must be valid JSON numbers (not strings).
53126
- Multiselect fields must be provided as an array (not CSV).
54127

55-
Each line of the file contains an `id` and the optional `created_date` and `modified_date` fields
56-
in the beginning of the record.
128+
Extracted items are automatically normalized when pushed to the `repo` if a normalization function is provided under the `normalize` key in the repo object.
129+
130+
```typescript
131+
const repos = [
132+
{
133+
itemType: 'todos',
134+
normalize: normalizeTodo,
135+
},
136+
{
137+
itemType: 'users',
138+
normalize: normalizeUser,
139+
},
140+
{
141+
itemType: 'attachments',
142+
normalize: normalizeAttachment,
143+
},
144+
];
145+
```
146+
147+
For examples of normalization functions, refer to the [data-normalization.ts](https://github.com/devrev/airdrop-template/blob/main/code/src/functions/external-system/data-normalization.ts) file in the starter template.
148+
149+
Each line of the file contains `id`, `created_date`, and `modified_date` fields
150+
in the beginning of the record. These fields are required.
57151
All other fields are contained within the `data` attribute.
58152

59153
```json {2-4}
@@ -67,7 +161,30 @@ All other fields are contained within the `data` attribute.
67161
"owner": "A3A",
68162
"rca": null,
69163
"severity": "fatal",
70-
"summary": "Lorem ipsum"
164+
"summary": "Lorem ipsum",
165+
}
166+
}
167+
```
168+
169+
If the item you are normalizing is a work item (a ticket, task, issue, or similar),
170+
it should also contain the `item_url_field` within the `data` attribute.
171+
This field should be assigned a URL that points to the item in the external system.
172+
This link is visible in the airdropped item in the DevRev app,
173+
helping users to easily locate the item in the external system.
174+
175+
```json {12}
176+
{
177+
"id": "2102e01F",
178+
"created_date": "1972-03-29T22:04:47+01:00",
179+
"modified_date": "1970-01-01T01:00:04+01:00",
180+
"data": {
181+
"actual_close_date": "1970-01-01T02:33:18+01:00",
182+
"creator": "b8",
183+
"owner": "A3A",
184+
"rca": null,
185+
"severity": "fatal",
186+
"summary": "Lorem ipsum",
187+
"item_url_field": "https://external-system.com/issue/123"
71188
}
72189
}
73190
```
@@ -88,12 +205,15 @@ echo '{}' | chef-cli fuzz-extracted -r issue -m external_domain_metadata.json >
88205

89206
## State handling
90207

91-
Since each snap-in invocation is a separate runtime instance (with a maximum execution time of 12 minutes),
92-
it does not know what has been previously accomplished or how many records have already been extracted.
93-
To enable information passing between invocations and runs, support has been added for saving a limited amount
94-
of data as the snap-in `state`. Snap-in `state` persists between phases in one sync run as well as between multiple sync runs.
208+
To enable information passing between invocations and runs, a limited amount of data can be saved as the snap-in `state`.
209+
Snap-in `state` persists between phases in one sync run as well as between multiple sync runs.
210+
95211
You can access the `state` through SDK's `adapter` object.
96212

213+
```typescript
214+
adapter.state['users'].completed = true;
215+
```
216+
97217
A snap-in must consult its state to obtain information on when the last successful forward sync started.
98218

99219
- The snap-in's `state` is loaded at the start of each invocation and saved at its end.
@@ -105,7 +225,15 @@ Effective use of the state and breaking down the problem into smaller chunks are
105225

106226
The snap-in starter template contains an [example](https://github.com/devrev/airdrop-template/blob/main/code/src/functions/extraction/index.ts) of a simple state. Adding more data to the state can help with pagination and rate limiting by saving the point at which extraction was left off.
107227

108-
To test the state in development, you can decrease the timeout between snap-in invocations.
228+
```typescript
229+
export const initialExtractorState: ExtractorState = {
230+
todos: { completed: false },
231+
users: { completed: false },
232+
attachments: { completed: false },
233+
};
234+
```
235+
236+
To test the state during snap-in development, you can pass in the option to decrease the timeout between snap-in invocations.
109237

110238
```typescript
111239
await spawn<DummyExtractorState>({

fern/docs/pages/airdrop/external-sync-units-extraction.mdx

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,14 @@ sync units that it can extract from the external system API and send it to Airdr
99

1010
External sync unit extraction is executed only during the initial import.
1111

12-
### Implementation
12+
## Triggering event
13+
14+
Airdrop starts the external sync unit extraction by sending a message with the event type `EXTRACTION_EXTERNAL_SYNC_UNITS_START`.
15+
16+
The snap-in must reply to Airdrop with an `EXTRACTION_EXTERNAL_SYNC_UNITS_DONE` message when finished,
17+
or `EXTRACTION_EXTERNAL_SYNC_UNITS_ERROR` if an error occurs.
18+
19+
## Implementation
1320

1421
This phase should be implemented in the [`external-sync-units-extraction.ts`](https://github.com/devrev/airdrop-template/blob/main/code/src/functions/extraction/workers/external-sync-units-extraction.ts) file.
1522

@@ -52,4 +59,6 @@ await adapter.emit(ExtractorEventType.ExtractionExternalSyncUnitsError, {
5259
});
5360
```
5461

62+
**The snap-in must always emit a single message.**
63+
5564
To test your changes, start a new airdrop in the DevRev App. If external sync units extraction is successful, you should be prompted to choose an external sync unit from the list.

fern/docs/pages/airdrop/extraction-phases.mdx

Lines changed: 15 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,24 +1,30 @@
11
Each snap-in must handle all the phases of Airdrop extraction. In a snap-in, you typically define a run
22
function that iterates over events and invokes workers per extraction phase.
33

4-
The SDK library exports `processTask` to structure the work within each phase, and `onTimeout` function
5-
to handle timeouts.
6-
74
The Airdrop snap-in extraction lifecycle consists of four phases:
8-
* External sync units extraction
5+
* External sync units extraction (only for initial sync)
96
* Metadata extraction
107
* Data extraction
118
* Attachments extraction
129

1310
Each phase is defined in a separate file and is responsible for fetching the respective data.
1411

15-
The SDK library provides a repository management system to handle artifacts in batches.
16-
The `initializeRepos` function initializes the repositories, and the `push` function uploads the
17-
artifacts to the repositories. The `postState` function is used to post the state of the extraction task.
12+
<Note>
13+
Snap-in development is an iterative process.
14+
It typically begins with retrieving some data from the external system.
15+
The next step involves crafting an initial version of the external domain metadata and validating it through chef-cli.
16+
This metadata is used to prepare the initial domain mapping and checking for any possible issues.
17+
API calls to the external system are then corrected to fetch the missing data.
18+
Start by working with one item type (we recommend starting with users), and once it maps well to DevRev objects and imports as desired, proceed with other item types.
19+
</Note>
20+
21+
The SDK library exports a `processTask` function, which takes an object parameter with two keys:
22+
* `task`: a function that implements the functionality for the given phase.
23+
* `onTimeout`: a function that handles timeouts, typically by simply emitting a message to the Airdrop platform.
1824

1925
State management is crucial for snap-ins to maintain the state of the extraction task.
20-
The `postState` function is used to post the state of the extraction task.
21-
The state is stored in the adapter and can be retrieved using the `adapter.state` property.
26+
State is saved to the Airdrop backend by calling the `postState` function.
27+
During the extraction the state is stored in the adapter and can be retrieved using the `adapter.state` property.
2228

2329
```typescript
2430
import { AirdropEvent, EventType, spawn } from "@devrev/ts-adaas";

0 commit comments

Comments
 (0)