You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the data extraction phase, the extractor is expected to call the external system's APIs
2
-
to retrieve all the items that were updated since the start of the last extraction.
3
-
If there was no previous extraction (the current run is an initial import),
4
-
then all the items should be extracted.
2
+
to retrieve all the items that should be synced with DevRev.
5
3
6
-
The extractor must store at what time it started each extraction in its state,
7
-
so that it can extract only items created or updated since this date in the next sync run.
4
+
If the current run is an initial sync, this means all the items should be extracted.
5
+
Otherwise the extractor should retrieve all the items that were changed since the start of the last extraction.
6
+
7
+
Each snap-in invocation runs in a separate runtime instance with a maximum execution time of 13 minutes.
8
+
After 10 minutes, the Airdrop platform sends a message to the snap-in to gracefully exit.
9
+
10
+
If a large amount of data needs to be extracted, it might not all be extracted within this time frame.
11
+
To handle such situations, the snap-in uses a state object.
12
+
This state object is shared across all invocations and keeps track of where the previous snap-in invocations ended in the extraction process.
8
13
9
14
## Triggering event
10
15
11
-
Airdrop initiates data extraction by starting the snap-in with a message with event of type
16
+
Airdrop initiates data extraction by starting the snap-in with a message with event type
12
17
`EXTRACTION_DATA_START` when transitioning to the data extraction phase.
13
18
14
19
During the data extraction phase, the snap-in extracts data from an external system,
15
-
prepares batches of data and uploads them in the form of artifacts to DevRev.
20
+
prepares batches of data and uploads them in the form of artifacts (files) to DevRev.
16
21
17
-
The snap-in must respond to Airdrop with a message with event of type `EXTRACTION_DATA_PROGRESS`,
18
-
together with an optional progress estimate and relevant artifacts
19
-
when it extracts some data and the maximum Airdrop snap-in runtime (12 minutes) has been reached.
22
+
The snap-in must respond to Airdrop with a message with event type of `EXTRACTION_DATA_PROGRESS`,
23
+
together with an optional progress estimate when the maximum Airdrop snap-in runtime (13 minutes) has been reached.
20
24
21
25
If the extraction has been rate-limited by the external system and back-off is required, the snap-in
22
-
must respond to Airdrop with a message with event of type `EXTRACTION_DATA_DELAY` and specifying
23
-
back-off time with `delay` attribute.
26
+
must respond to Airdrop with a message with event type `EXTRACTION_DATA_DELAY` and specifying
27
+
back-off time with `delay` attribute (in seconds as an integer).
24
28
25
-
In both cases, Airdrop starts the snap-in with a message with event of type `EXTRACTION_DATA_CONTINUE`.
26
-
The restarting is immediate (in case of `EXTRACTION_DATA_PROGRESS`) or delayed
27
-
(in case of `EXTRACTION_DATA_DELAY`).
29
+
In both cases, Airdrop starts the snap-in with a message with event type `EXTRACTION_DATA_CONTINUE`.
30
+
In case of `EXTRACTION_DATA_PROGRESS` the restarting is immediate,
31
+
meanwhile in case of `EXTRACTION_DATA_DELAY` the restarting is delayed for the given number of seconds.
28
32
29
-
Once the data extraction is done, the snap-in must respond to Airdrop with a message with event of
30
-
type `EXTRACTION_DATA_DONE`.
33
+
Once the data extraction is done, the snap-in must respond to Airdrop with a message with event type `EXTRACTION_DATA_DONE`.
31
34
32
35
If data extraction failed in any moment of extraction, the snap-in must respond to Airdrop with a
33
-
message with event of type `EXTRACTION_DATA_ERROR`.
36
+
message with event type `EXTRACTION_DATA_ERROR`.
34
37
35
38
## Implementation
36
39
37
40
Data extraction should be implemented in the [data-extraction.ts](https://github.com/devrev/airdrop-template/blob/main/code/src/functions/extraction/workers/data-extraction.ts) file.
38
41
39
-
During the data extraction phase, the snap-in uploads batches of extracted items (with tooling from `@devrev/adaas-sdk`).
42
+
The snap-in must respond to Airdrop with a message, that signals either success, a delay, progress, or an error.
<Note>The snap-in must always emit a single message.</Note>
67
+
68
+
### Extracting and storing the data
69
+
70
+
The SDK library includes a repository system for handling extracted items.
71
+
Each item type, such as users, tasks, or issues, has its own repository.
72
+
These are defined in the `repos` array as `itemType`.
73
+
The `itemType` name should match the `record_type` specified in the provided metadata.
74
+
75
+
```typescript
76
+
const repos = [
77
+
{
78
+
itemType: 'todos',
79
+
},
80
+
{
81
+
itemType: 'users',
82
+
},
83
+
{
84
+
itemType: 'attachments',
85
+
},
86
+
];
87
+
```
88
+
89
+
The `initializeRepos` function initializes the repositories and should be the first step when the process begins.
90
+
91
+
```typescript
92
+
processTask<ExtractorState>({
93
+
task: async ({ adapter }) => {
94
+
adapter.initializeRepos(repos);
95
+
// ...
96
+
},
97
+
onTimeout: async ({ adapter }) => {
98
+
// ...
99
+
},
100
+
});
101
+
```
102
+
103
+
After initialization of repositories using `initializeRepos`,
104
+
items should be then retrieved from the external system and stored in the correct repository by calling the `push` function.
105
+
106
+
```typescript
107
+
awaitadapter.getRepo('users')?.push(items);
108
+
```
109
+
110
+
Behind the scenes, the SDK library stores items pushed to the repository and uploads them in batches to the Airdrop platform.
40
111
41
-
Each artifact is submitted with an `item_type`, defining a separate domain object from the
42
-
external system and matching the `record_type` in the provided metadata.
112
+
### Data normalization
43
113
44
-
Extracted data must be normalized:
114
+
Extracted data must be normalized to fit the domain metadata defined in the `external-domain-metadata.json` file.
115
+
More details on this process are provided in the [Metadata extraction](/public/snapin-development/adaas/metadata-extraction) section.
116
+
117
+
Normalization rules:
45
118
46
119
- Null values: All fields without a value should either be omitted or set to null.
47
120
For example, if an external system provides values such as "", –1 for missing values,
@@ -52,8 +125,29 @@ Extracted data must be normalized:
52
125
- Number fields must be valid JSON numbers (not strings).
53
126
- Multiselect fields must be provided as an array (not CSV).
54
127
55
-
Each line of the file contains an `id` and the optional `created_date` and `modified_date` fields
56
-
in the beginning of the record.
128
+
Extracted items are automatically normalized when pushed to the `repo` if a normalization function is provided under the `normalize` key in the repo object.
129
+
130
+
```typescript
131
+
const repos = [
132
+
{
133
+
itemType: 'todos',
134
+
normalize: normalizeTodo,
135
+
},
136
+
{
137
+
itemType: 'users',
138
+
normalize: normalizeUser,
139
+
},
140
+
{
141
+
itemType: 'attachments',
142
+
normalize: normalizeAttachment,
143
+
},
144
+
];
145
+
```
146
+
147
+
For examples of normalization functions, refer to the [data-normalization.ts](https://github.com/devrev/airdrop-template/blob/main/code/src/functions/external-system/data-normalization.ts) file in the starter template.
148
+
149
+
Each line of the file contains `id`, `created_date`, and `modified_date` fields
150
+
in the beginning of the record. These fields are required.
57
151
All other fields are contained within the `data` attribute.
58
152
59
153
```json {2-4}
@@ -67,7 +161,30 @@ All other fields are contained within the `data` attribute.
67
161
"owner": "A3A",
68
162
"rca": null,
69
163
"severity": "fatal",
70
-
"summary": "Lorem ipsum"
164
+
"summary": "Lorem ipsum",
165
+
}
166
+
}
167
+
```
168
+
169
+
If the item you are normalizing is a work item (a ticket, task, issue, or similar),
170
+
it should also contain the `item_url_field` within the `data` attribute.
171
+
This field should be assigned a URL that points to the item in the external system.
172
+
This link is visible in the airdropped item in the DevRev app,
173
+
helping users to easily locate the item in the external system.
Since each snap-in invocation is a separate runtime instance (with a maximum execution time of 12 minutes),
92
-
it does not know what has been previously accomplished or how many records have already been extracted.
93
-
To enable information passing between invocations and runs, support has been added for saving a limited amount
94
-
of data as the snap-in `state`. Snap-in `state` persists between phases in one sync run as well as between multiple sync runs.
208
+
To enable information passing between invocations and runs, a limited amount of data can be saved as the snap-in `state`.
209
+
Snap-in `state` persists between phases in one sync run as well as between multiple sync runs.
210
+
95
211
You can access the `state` through SDK's `adapter` object.
96
212
213
+
```typescript
214
+
adapter.state['users'].completed=true;
215
+
```
216
+
97
217
A snap-in must consult its state to obtain information on when the last successful forward sync started.
98
218
99
219
- The snap-in's `state` is loaded at the start of each invocation and saved at its end.
@@ -105,7 +225,15 @@ Effective use of the state and breaking down the problem into smaller chunks are
105
225
106
226
The snap-in starter template contains an [example](https://github.com/devrev/airdrop-template/blob/main/code/src/functions/extraction/index.ts) of a simple state. Adding more data to the state can help with pagination and rate limiting by saving the point at which extraction was left off.
107
227
108
-
To test the state in development, you can decrease the timeout between snap-in invocations.
Copy file name to clipboardExpand all lines: fern/docs/pages/airdrop/external-sync-units-extraction.mdx
+10-1Lines changed: 10 additions & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -9,7 +9,14 @@ sync units that it can extract from the external system API and send it to Airdr
9
9
10
10
External sync unit extraction is executed only during the initial import.
11
11
12
-
### Implementation
12
+
## Triggering event
13
+
14
+
Airdrop starts the external sync unit extraction by sending a message with the event type `EXTRACTION_EXTERNAL_SYNC_UNITS_START`.
15
+
16
+
The snap-in must reply to Airdrop with an `EXTRACTION_EXTERNAL_SYNC_UNITS_DONE` message when finished,
17
+
or `EXTRACTION_EXTERNAL_SYNC_UNITS_ERROR` if an error occurs.
18
+
19
+
## Implementation
13
20
14
21
This phase should be implemented in the [`external-sync-units-extraction.ts`](https://github.com/devrev/airdrop-template/blob/main/code/src/functions/extraction/workers/external-sync-units-extraction.ts) file.
**The snap-in must always emit a single message.**
63
+
55
64
To test your changes, start a new airdrop in the DevRev App. If external sync units extraction is successful, you should be prompted to choose an external sync unit from the list.
Copy file name to clipboardExpand all lines: fern/docs/pages/airdrop/extraction-phases.mdx
+15-9Lines changed: 15 additions & 9 deletions
Original file line number
Diff line number
Diff line change
@@ -1,24 +1,30 @@
1
1
Each snap-in must handle all the phases of Airdrop extraction. In a snap-in, you typically define a run
2
2
function that iterates over events and invokes workers per extraction phase.
3
3
4
-
The SDK library exports `processTask` to structure the work within each phase, and `onTimeout` function
5
-
to handle timeouts.
6
-
7
4
The Airdrop snap-in extraction lifecycle consists of four phases:
8
-
* External sync units extraction
5
+
* External sync units extraction (only for initial sync)
9
6
* Metadata extraction
10
7
* Data extraction
11
8
* Attachments extraction
12
9
13
10
Each phase is defined in a separate file and is responsible for fetching the respective data.
14
11
15
-
The SDK library provides a repository management system to handle artifacts in batches.
16
-
The `initializeRepos` function initializes the repositories, and the `push` function uploads the
17
-
artifacts to the repositories. The `postState` function is used to post the state of the extraction task.
12
+
<Note>
13
+
Snap-in development is an iterative process.
14
+
It typically begins with retrieving some data from the external system.
15
+
The next step involves crafting an initial version of the external domain metadata and validating it through chef-cli.
16
+
This metadata is used to prepare the initial domain mapping and checking for any possible issues.
17
+
API calls to the external system are then corrected to fetch the missing data.
18
+
Start by working with one item type (we recommend starting with users), and once it maps well to DevRev objects and imports as desired, proceed with other item types.
19
+
</Note>
20
+
21
+
The SDK library exports a `processTask` function, which takes an object parameter with two keys:
22
+
*`task`: a function that implements the functionality for the given phase.
23
+
*`onTimeout`: a function that handles timeouts, typically by simply emitting a message to the Airdrop platform.
18
24
19
25
State management is crucial for snap-ins to maintain the state of the extraction task.
20
-
The `postState` function is used to post the state of the extraction task.
21
-
The state is stored in the adapter and can be retrieved using the `adapter.state` property.
26
+
State is saved to the Airdrop backend by calling the `postState` function.
27
+
During the extraction the state is stored in the adapter and can be retrieved using the `adapter.state` property.
0 commit comments