You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
chore: Move ADaaS template repo FAQ to public docs (#166)
This commit moves documentation from the airdrop-template-snapin
repo to the public docs, as we will archive that repository, and we want
all ADaaS documentation to be fully public and stored centrally.
https://app.devrev.ai/devrev/works/ISS-154218
Copy file name to clipboardExpand all lines: fern/docs/pages/airdrop/attachments-extraction.mdx
+25-18Lines changed: 25 additions & 18 deletions
Original file line number
Diff line number
Diff line change
@@ -1,37 +1,44 @@
1
-
For the attachment extraction phase of the import process, the extractor has to upload each attachment to DevRev’s S3
2
-
using the `S3Interact` API.
1
+
For the attachment extraction phase of the import process, the extractor has to upload each
2
+
attachment to DevRev's S3 using the `S3Interact` API.
3
3
4
4
## Triggering event
5
5
6
-
Airdrop initiates the attachment extraction by starting the snap-in with a message with an event of type `EXTRACTION_ATTACHMENTS_START`.
7
-
This is done after the data extraction, transformation and loading into DevRev are completed.
6
+
Airdrop initiates the attachment extraction by starting the snap-in with a message with an event of
7
+
type `EXTRACTION_ATTACHMENTS_START`.
8
+
This is done after the data extraction, transformation, and loading into DevRev are completed.
8
9
9
10
During the attachment extraction phase,
10
11
the snap-in extracts attachments from the external system and uploads them as artifacts to DevRev.
11
12
12
-
The snap-in must respond to Airdrop with a message with an event of type `EXTRACTION_ATTACHMENTS_PROGRESS` together with an optional progress estimate and relevant artifacts
13
+
The snap-in must respond to Airdrop with a message with an event of type
14
+
`EXTRACTION_ATTACHMENTS_PROGRESS` together with an optional progress estimate and relevant artifacts
13
15
when it extracts some data and the maximum snap-in run time (12 minutes) has been reached.
14
16
15
-
The snap-in must respond to Airdrop with a message with an event of type `EXTRACTION_ATTACHMENTS_DELAY` and specify a back-off time
16
-
when the extraction has been rate-limited by the external system and back-off is required.
17
+
The snap-in must respond to Airdrop with a message with an event of type `EXTRACTION_ATTACHMENTS_DELAY`
18
+
and specify a back-off time when the extraction has been rate-limited by the external system and
19
+
back-off is required.
17
20
18
-
In both cases, Airdrop starts the snap-in with a message with an event of type `EXTRACTION_ATTACHMENTS_CONTINUE`.
19
-
The restart is immediate in case of `EXTRACTION_ATTACHMENTS_PROGRESS`, or delayed
20
-
in case of `EXTRACTION_ATTACHMENTS_DELAY`.
21
+
In both cases, Airdrop starts the snap-in with a message with an event of type
22
+
`EXTRACTION_ATTACHMENTS_CONTINUE`.
23
+
The restart is immediate in case of `EXTRACTION_ATTACHMENTS_PROGRESS`, or delayed in case of
24
+
`EXTRACTION_ATTACHMENTS_DELAY`.
21
25
22
-
Once the attachment extraction phase is done, the snap-in must respond to Airdrop with a message with an event of type `EXTRACTION_ATTACHMENTS_DONE`.
26
+
Once the attachment extraction phase is done, the snap-in must respond to Airdrop with a message
27
+
with an event of type `EXTRACTION_ATTACHMENTS_DONE`.
23
28
24
-
If attachment extraction fails the snap-in must respond to Airdrop with a message with an event of type `EXTRACTION_ATTACHMENTS_ERROR`.
29
+
If attachment extraction fails the snap-in must respond to Airdrop with a message with an event of
30
+
type `EXTRACTION_ATTACHMENTS_ERROR`.
25
31
26
-
## Snap-in response
32
+
## Response from the snap-in
27
33
28
-
After uploading an attachment or a batch of attachments, the extractor also has to prepare and upload a file specifying
29
-
the extracted and uploaded attachments.
34
+
After uploading an attachment or a batch of attachments, the extractor also has to prepare and
35
+
upload a file specifying the extracted and uploaded attachments.
30
36
31
-
It should contain the DevRev IDs of the extracted and uploaded attachments, along with the parent domain object ID
32
-
from the external system and the actor ID from the external system.
37
+
It should contain the DevRev IDs of the extracted and uploaded attachments, along with the parent
38
+
domain object ID from the external system and the actor ID from the external system.
33
39
34
-
The uploaded artifact is structured like a normal artifact containing extracted data in JSON Lines (JSONL) format and requires specifying `ssor_attachment` as the item type.
40
+
The uploaded artifact is structured like a normal artifact containing extracted data in JSON Lines
41
+
(JSONL) format and requires specifying `ssor_attachment` as the item type.
In the data extraction phase, the extractor is expected to call the external system’s APIs
2
+
In the data extraction phase, the extractor is expected to call the external system's APIs
3
3
to retrieve all the items that were updated since the start of the last extraction.
4
-
If there was no previous extraction (the current run is an initial import), then all the items should be extracted.
4
+
If there was no previous extraction (the current run is an initial import),
5
+
then all the items should be extracted.
5
6
6
7
The extractor must store at what time it started each extraction in its state,
7
-
so that it can extract only items created and/or updated since this date in the next sync run.
8
+
so that it can extract only items created or updated since this date in the next sync run.
8
9
9
10
## Triggering event
10
11
11
-
Airdrop initiates data extraction by starting the snap-in with a message with event of type`EXTRACTION_DATA_START`
12
-
when transitioning to the data extraction phase.
12
+
Airdrop initiates data extraction by starting the snap-in with a message with event of type
13
+
`EXTRACTION_DATA_START`when transitioning to the data extraction phase.
13
14
14
15
During the data extraction phase, the snap-in extracts data from an external system,
15
16
prepares batches of data and uploads them in the form of artifacts to DevRev.
16
17
17
18
The snap-in must respond to Airdrop with a message with event of type `EXTRACTION_DATA_PROGRESS`,
18
19
together with an optional progress estimate and relevant artifacts
19
-
when it extracts some data and the maximum ADaaS snap-in runtime (12 minutes) has been reached.
20
+
when it extracts some data and the maximum Airdrop snap-in runtime (12 minutes) has been reached.
20
21
21
-
If the extraction has been rate-limited by the external system and back-off is required, the snap-in must respond to
22
-
Airdrop with a message with event of type `EXTRACTION_DATA_DELAY` and specifying back-off time with `delay` attribute.
22
+
If the extraction has been rate-limited by the external system and back-off is required, the snap-in
23
+
must respond to Airdrop with a message with event of type `EXTRACTION_DATA_DELAY` and specifying
24
+
back-off time with `delay` attribute.
23
25
24
26
In both cases, Airdrop starts the snap-in with a message with event of type `EXTRACTION_DATA_CONTINUE`.
25
-
The restarting is immediate (in case of `EXTRACTION_DATA_PROGRESS`) or delayed (in case of `EXTRACTION_DATA_DELAY`).
27
+
The restarting is immediate (in case of `EXTRACTION_DATA_PROGRESS`) or delayed
28
+
(in case of `EXTRACTION_DATA_DELAY`).
26
29
27
-
Once the data extraction is done, the snap-in must respond to Airdrop with a message with event of type `EXTRACTION_DATA_DONE`.
30
+
Once the data extraction is done, the snap-in must respond to Airdrop with a message with event of
31
+
type `EXTRACTION_DATA_DONE`.
28
32
29
-
If data extraction failed in any moment of extraction, the snap-in must respond to Airdrop with a message with event of type `EXTRACTION_DATA_ERROR`.
33
+
If data extraction failed in any moment of extraction, the snap-in must respond to Airdrop with a
34
+
message with event of type `EXTRACTION_DATA_ERROR`.
30
35
31
-
## Snap-in response
36
+
## Response from the snap-in
32
37
33
-
During the data extraction phase, the snap-in uploads batches of extracted items (the recommended batch size is 2000 items) formatted in JSONL
34
-
(JSON Lines format), gzipped, and submitted as an artifact to S3Interact (with tooling from `@devrev/adaas-sdk`).
38
+
During the data extraction phase, the snap-in uploads batches of extracted items (the recommended
39
+
batch size is 2000 items) formatted in JSONL (JSON Lines format), gzipped, and submitted as an
40
+
artifact to S3Interact (with tooling from `@devrev/adaas-sdk`).
35
41
36
-
Each artifact is submitted with an `item_type`, defining a separate domain object from the external system and matching the `record_type` in the provided metadata.
42
+
Each artifact is submitted with an `item_type`, defining a separate domain object from the
43
+
external system and matching the `record_type` in the provided metadata.
37
44
Item types defined when uploading extracted data must validate the declarations in the metadata file.
38
45
39
46
Extracted data must be normalized.
40
47
41
-
- Null values: All fields without a value should either be omitted or set to null. For example, if an external system provides values such as "", -1 for missing values, those must be set to null.
42
-
- Timestamps: Full-precision timestamps should be formatted as RFC3999 (`1972-03-29T22:04:47+01:00`), and dates should be just `2020-12-31`.
48
+
- Null values: All fields without a value should either be omitted or set to null.
49
+
For example, if an external system provides values such as "", -1 for missing values,
50
+
those must be set to null.
51
+
- Timestamps: Full-precision timestamps should be formatted as RFC3999 (`1972-03-29T22:04:47+01:00`),
52
+
and dates should be just `2020-12-31`.
43
53
- References: references must be strings, not numbers or objects.
44
-
- Number fields must be valid JSON numbers (not strings)
45
-
- Multiselect fields must be provided as an array (not CSV)
54
+
- Number fields must be valid JSON numbers (not strings).
55
+
- Multiselect fields must be provided as an array (not CSV).
46
56
47
-
Each line of the file contains an `id` and the optional `created_date` and `modified_date` fields in the beginning of the record.
57
+
Each line of the file contains an `id` and the optional `created_date` and `modified_date` fields
58
+
in the beginning of the record.
48
59
All other fields are contained within the `data` attribute.
49
60
50
61
```json
@@ -68,4 +79,4 @@ Extracted artifacts can be validated with the `chef-cli` using the following com
0 commit comments