|
1 |
| -In the attachment extraction phase, the snap-in has to upload each attachment to DevRev and associate it with its parent data object. |
| 1 | +During the attachments extraction phase, the snap-in retrieves attachments from the external system and uploads them to DevRev. This phase occurs after data extraction, transformation, and loading are completed. |
2 | 2 |
|
3 | 3 | ## Triggering event
|
4 | 4 |
|
5 |
| -Airdrop initiates the attachment extraction by starting the snap-in with a message with an event of |
6 |
| -type `EXTRACTION_ATTACHMENTS_START`. |
7 |
| -This is done after the data extraction, transformation, and loading into DevRev are completed. |
| 5 | +```mermaid |
| 6 | +sequenceDiagram |
| 7 | + participant Airdrop |
| 8 | + participant Snap-in |
| 9 | + participant ExternalSystem |
| 10 | + |
| 11 | + Airdrop->>Snap-in: EXTRACTION_ATTACHMENTS_START |
| 12 | + |
| 13 | + alt Success path |
| 14 | + Snap-in->>ExternalSystem: Request attachments |
| 15 | + ExternalSystem->>Snap-in: Return attachments |
| 16 | + Snap-in->>Airdrop: EXTRACTION_ATTACHMENTS_DONE |
| 17 | + |
| 18 | + else Runtime limit reached |
| 19 | + Snap-in->>Airdrop: EXTRACTION_ATTACHMENTS_PROGRESS |
| 20 | + Airdrop->>Snap-in: EXTRACTION_ATTACHMENTS_CONTINUE |
| 21 | + Note over Snap-in,Airdrop: Process continues where it left off |
| 22 | + |
| 23 | + else Rate limiting required |
| 24 | + Snap-in->>Airdrop: EXTRACTION_ATTACHMENTS_DELAY (with back-off time) |
| 25 | + Note over Airdrop: Waits for specified time |
| 26 | + Airdrop->>Snap-in: EXTRACTION_ATTACHMENTS_CONTINUE |
| 27 | + Note over Snap-in,Airdrop: Process resumes after delay |
| 28 | + |
| 29 | + else Error occurs |
| 30 | + Snap-in->>Airdrop: EXTRACTION_ATTACHMENTS_ERROR |
| 31 | + Note over Snap-in,Airdrop: Process terminates |
| 32 | + end |
| 33 | +``` |
8 | 34 |
|
9 |
| -During the attachment extraction phase, |
10 |
| -the snap-in extracts attachments from the external system and uploads them as artifacts to DevRev. |
| 35 | +### Event types |
11 | 36 |
|
12 |
| -The snap-in must respond to Airdrop with a message with an event of type |
13 |
| -`EXTRACTION_ATTACHMENTS_PROGRESS` together with an optional progress estimate |
14 |
| -when the maximum snap-in runtime (13 minutes) has been reached. |
| 37 | +| Event | Direction | Description | |
| 38 | +|-------|-----------|-------------| |
| 39 | +| `EXTRACTION_ATTACHMENTS_START` | Airdrop → Snap-in | Initiates the attachments extraction | |
| 40 | +| `EXTRACTION_ATTACHMENTS_PROGRESS` | Snap-in → Airdrop | Indicates process is ongoing but runtime limit (13 minutes) reached | |
| 41 | +| `EXTRACTION_ATTACHMENTS_DELAY` | Snap-in → Airdrop | Requests a delay due to rate limiting from external system | |
| 42 | +| `EXTRACTION_ATTACHMENTS_CONTINUE` | Airdrop → Snap-in | Resumes the extraction process after progress update or delay | |
| 43 | +| `EXTRACTION_ATTACHMENTS_DONE` | Snap-in → Airdrop | Signals successful completion of attachments extraction | |
| 44 | +| `EXTRACTION_ATTACHMENTS_ERROR` | Snap-in → Airdrop | Indicates that an error occurred during extraction | |
15 | 45 |
|
16 |
| -The snap-in must respond to Airdrop with a message with an event of type `EXTRACTION_ATTACHMENTS_DELAY` |
17 |
| -and specify a back-off time when the extraction has been rate-limited by the external system and |
18 |
| -back-off is required. |
| 46 | +## Implementation |
19 | 47 |
|
20 |
| -In both cases, Airdrop starts the snap-in with a message with an event of type |
21 |
| -`EXTRACTION_ATTACHMENTS_CONTINUE`. |
22 |
| -The restart is immediate in case of `EXTRACTION_ATTACHMENTS_PROGRESS`, or delayed in case of |
23 |
| -`EXTRACTION_ATTACHMENTS_DELAY`. |
| 48 | +### Default implementation |
24 | 49 |
|
25 |
| -Once the attachment extraction phase is done, the snap-in must respond to Airdrop with a message |
26 |
| -with an event of type `EXTRACTION_ATTACHMENTS_DONE`. |
| 50 | +The SDK provides a default implementation for attachments extraction. If the default behavior (iterating through attachment metadata and uploading from saved URLs) meets your needs, **no additional implementation is required**. |
27 | 51 |
|
28 |
| -If attachment extraction fails the snap-in must respond to Airdrop with a message with an event of |
29 |
| -type `EXTRACTION_ATTACHMENTS_ERROR`. |
| 52 | +### Custom implementation |
30 | 53 |
|
31 |
| -## Implementation |
| 54 | +If you need to customize the attachments extraction, modify the implementation in `attachments-extraction.ts`. |
| 55 | +Use the `streamAttachments` function from the `WorkerAdapter` class, which handles most of functionality needed for this phase: |
32 | 56 |
|
33 |
| -Attachments extraction is already provided by SDK, but if you need to customize it for your use case, |
34 |
| -it should be implemented in the [attachments-extraction.ts](https://github.com/devrev/airdrop-template/blob/main/code/src/functions/extraction/workers/attachments-extraction.ts) file. |
35 |
| - |
36 |
| -After uploading an attachment or a batch of attachments, the extractor also has to prepare and |
37 |
| -upload a file specifying the extracted and uploaded attachments. |
| 57 | +```typescript |
| 58 | +const response = await adapter.streamAttachments({ |
| 59 | + stream: getFileStream, |
| 60 | + batchSize: 10 |
| 61 | +}); |
| 62 | +``` |
38 | 63 |
|
39 |
| -It should contain the DevRev IDs of the extracted and uploaded attachments, along with the parent |
40 |
| -domain object ID from the external system and the actor ID from the external system. |
| 64 | +Parameters: |
| 65 | +- `stream`: (Required) Function that handles downloading attachments from the external system |
| 66 | +- `batchSize`: (Optional) Number of attachments to process simultaneously (default: 1) |
| 67 | + |
| 68 | +Increasing the batch size (from the default 1) can significantly improve performance. But be mindful of lambda memory constraints and external system rate limits when choosing batch size. A batch size between 10 and 50 typically provides good results. |
| 69 | + |
| 70 | +```typescript Example 'stream' function |
| 71 | +async function getFileStream({ |
| 72 | + item, |
| 73 | +}: ExternalSystemAttachmentStreamingParams): Promise<ExternalSystemAttachmentStreamingResponse> { |
| 74 | + const { id, url } = item; |
| 75 | + |
| 76 | + try { |
| 77 | + const fileStreamResponse = await axiosClient.get(url, { |
| 78 | + responseType: 'stream', |
| 79 | + headers: { |
| 80 | + 'Accept-Encoding': 'identity', |
| 81 | + }, |
| 82 | + }); |
| 83 | + |
| 84 | + return { httpStream: fileStreamResponse }; |
| 85 | + } catch (error) { |
| 86 | + if (axios.isAxiosError(error)) { |
| 87 | + console.warn(`Error while fetching attachment ${id} from URL.`, serializeAxiosError(error)); |
| 88 | + console.warn('Failed attachment metadata', item); |
| 89 | + } else { |
| 90 | + console.warn(`Error while fetching attachment ${id} from URL.`, error); |
| 91 | + console.warn('Failed attachment metadata', item); |
| 92 | + } |
| 93 | + |
| 94 | + return { |
| 95 | + error: { |
| 96 | + message: `Failed to fetch attachment ${id} from URL.`, |
| 97 | + }, |
| 98 | + }; |
| 99 | + } |
| 100 | +} |
| 101 | +``` |
41 | 102 |
|
42 |
| -The uploaded artifact is structured like a normal artifact containing extracted data in JSON Lines |
43 |
| -(JSONL) format and requires specifying `ssor_attachment` as the item type. |
| 103 | +## Emitting responses |
44 | 104 |
|
45 |
| -The snap-in must respond to Airdrop with a message, that either signals success, a delay, or an error. |
| 105 | +The snap-in must send exactly one response to Airdrop when extraction is complete: |
46 | 106 |
|
47 |
| -```typescript Success |
| 107 | +```typescript Success response |
48 | 108 | await adapter.emit(ExtractorEventType.ExtractionAttachmentsDone);
|
49 | 109 | ```
|
50 | 110 |
|
51 |
| -```typescript Delay |
| 111 | +```typescript Delay response (for rate limiting) |
52 | 112 | await adapter.emit(ExtractorEventType.ExtractionAttachmentsDelay, {
|
53 |
| - delay: "30", |
| 113 | + delay: "30", // Delay in seconds |
54 | 114 | });
|
55 | 115 | ```
|
56 | 116 |
|
57 |
| -```typescript Error |
| 117 | +```typescript Error response |
58 | 118 | await adapter.emit(ExtractorEventType.ExtractionAttachmentsError, {
|
59 | 119 | error: "Informative error message",
|
60 | 120 | });
|
61 | 121 | ```
|
62 | 122 |
|
63 |
| -<Note>The snap-in must always emit a single message.</Note> |
64 |
| - |
65 |
| -## Examples |
66 |
| - |
67 |
| -Here is an example of an SSOR attachment file: |
68 |
| - |
69 |
| -```json lines |
70 |
| -{ |
71 |
| - "id": { |
72 |
| - "devrev": "don:core:dvrv-us-1:devo/1:artifact/1", // DON of the artifact, that S3interact returned |
73 |
| - "external": "111" // ID of the artifact in the external service |
74 |
| - }, |
75 |
| - "parent_id": { |
76 |
| - "external": "1111" // ID of the parent object in the external service |
77 |
| - }, |
78 |
| - "actor_id": { |
79 |
| - "external": "11111" // ID of the actor that uploaded/modified the artifact in the external service |
80 |
| - } |
81 |
| -} |
82 |
| -{ |
83 |
| - "id": { |
84 |
| - "devrev": "don:core:dvrv-us-1:devo/1:artifact/2", |
85 |
| - "external": "222" |
86 |
| - }, |
87 |
| - "parent_id": { |
88 |
| - "external": "2222" |
89 |
| - }, |
90 |
| - "actor_id": { |
91 |
| - "external": "22222" |
92 |
| - } |
93 |
| -} |
94 |
| -``` |
| 123 | +<Note>The snap-in must always emit exactly one response event.</Note> |
0 commit comments