Skip to content

Commit 516b823

Browse files
Update Airdrop attachments extraction phase (#229)
1 parent 6644cf5 commit 516b823

File tree

1 file changed

+96
-67
lines changed

1 file changed

+96
-67
lines changed
Lines changed: 96 additions & 67 deletions
Original file line numberDiff line numberDiff line change
@@ -1,94 +1,123 @@
1-
In the attachment extraction phase, the snap-in has to upload each attachment to DevRev and associate it with its parent data object.
1+
During the attachments extraction phase, the snap-in retrieves attachments from the external system and uploads them to DevRev. This phase occurs after data extraction, transformation, and loading are completed.
22

33
## Triggering event
44

5-
Airdrop initiates the attachment extraction by starting the snap-in with a message with an event of
6-
type `EXTRACTION_ATTACHMENTS_START`.
7-
This is done after the data extraction, transformation, and loading into DevRev are completed.
5+
```mermaid
6+
sequenceDiagram
7+
participant Airdrop
8+
participant Snap-in
9+
participant ExternalSystem
10+
11+
Airdrop->>Snap-in: EXTRACTION_ATTACHMENTS_START
12+
13+
alt Success path
14+
Snap-in->>ExternalSystem: Request attachments
15+
ExternalSystem->>Snap-in: Return attachments
16+
Snap-in->>Airdrop: EXTRACTION_ATTACHMENTS_DONE
17+
18+
else Runtime limit reached
19+
Snap-in->>Airdrop: EXTRACTION_ATTACHMENTS_PROGRESS
20+
Airdrop->>Snap-in: EXTRACTION_ATTACHMENTS_CONTINUE
21+
Note over Snap-in,Airdrop: Process continues where it left off
22+
23+
else Rate limiting required
24+
Snap-in->>Airdrop: EXTRACTION_ATTACHMENTS_DELAY (with back-off time)
25+
Note over Airdrop: Waits for specified time
26+
Airdrop->>Snap-in: EXTRACTION_ATTACHMENTS_CONTINUE
27+
Note over Snap-in,Airdrop: Process resumes after delay
28+
29+
else Error occurs
30+
Snap-in->>Airdrop: EXTRACTION_ATTACHMENTS_ERROR
31+
Note over Snap-in,Airdrop: Process terminates
32+
end
33+
```
834

9-
During the attachment extraction phase,
10-
the snap-in extracts attachments from the external system and uploads them as artifacts to DevRev.
35+
### Event types
1136

12-
The snap-in must respond to Airdrop with a message with an event of type
13-
`EXTRACTION_ATTACHMENTS_PROGRESS` together with an optional progress estimate
14-
when the maximum snap-in runtime (13 minutes) has been reached.
37+
| Event | Direction | Description |
38+
|-------|-----------|-------------|
39+
| `EXTRACTION_ATTACHMENTS_START` | Airdrop → Snap-in | Initiates the attachments extraction |
40+
| `EXTRACTION_ATTACHMENTS_PROGRESS` | Snap-in → Airdrop | Indicates process is ongoing but runtime limit (13 minutes) reached |
41+
| `EXTRACTION_ATTACHMENTS_DELAY` | Snap-in → Airdrop | Requests a delay due to rate limiting from external system |
42+
| `EXTRACTION_ATTACHMENTS_CONTINUE` | Airdrop → Snap-in | Resumes the extraction process after progress update or delay |
43+
| `EXTRACTION_ATTACHMENTS_DONE` | Snap-in → Airdrop | Signals successful completion of attachments extraction |
44+
| `EXTRACTION_ATTACHMENTS_ERROR` | Snap-in → Airdrop | Indicates that an error occurred during extraction |
1545

16-
The snap-in must respond to Airdrop with a message with an event of type `EXTRACTION_ATTACHMENTS_DELAY`
17-
and specify a back-off time when the extraction has been rate-limited by the external system and
18-
back-off is required.
46+
## Implementation
1947

20-
In both cases, Airdrop starts the snap-in with a message with an event of type
21-
`EXTRACTION_ATTACHMENTS_CONTINUE`.
22-
The restart is immediate in case of `EXTRACTION_ATTACHMENTS_PROGRESS`, or delayed in case of
23-
`EXTRACTION_ATTACHMENTS_DELAY`.
48+
### Default implementation
2449

25-
Once the attachment extraction phase is done, the snap-in must respond to Airdrop with a message
26-
with an event of type `EXTRACTION_ATTACHMENTS_DONE`.
50+
The SDK provides a default implementation for attachments extraction. If the default behavior (iterating through attachment metadata and uploading from saved URLs) meets your needs, **no additional implementation is required**.
2751

28-
If attachment extraction fails the snap-in must respond to Airdrop with a message with an event of
29-
type `EXTRACTION_ATTACHMENTS_ERROR`.
52+
### Custom implementation
3053

31-
## Implementation
54+
If you need to customize the attachments extraction, modify the implementation in `attachments-extraction.ts`.
55+
Use the `streamAttachments` function from the `WorkerAdapter` class, which handles most of functionality needed for this phase:
3256

33-
Attachments extraction is already provided by SDK, but if you need to customize it for your use case,
34-
it should be implemented in the [attachments-extraction.ts](https://github.com/devrev/airdrop-template/blob/main/code/src/functions/extraction/workers/attachments-extraction.ts) file.
35-
36-
After uploading an attachment or a batch of attachments, the extractor also has to prepare and
37-
upload a file specifying the extracted and uploaded attachments.
57+
```typescript
58+
const response = await adapter.streamAttachments({
59+
stream: getFileStream,
60+
batchSize: 10
61+
});
62+
```
3863

39-
It should contain the DevRev IDs of the extracted and uploaded attachments, along with the parent
40-
domain object ID from the external system and the actor ID from the external system.
64+
Parameters:
65+
- `stream`: (Required) Function that handles downloading attachments from the external system
66+
- `batchSize`: (Optional) Number of attachments to process simultaneously (default: 1)
67+
68+
Increasing the batch size (from the default 1) can significantly improve performance. But be mindful of lambda memory constraints and external system rate limits when choosing batch size. A batch size between 10 and 50 typically provides good results.
69+
70+
```typescript Example 'stream' function
71+
async function getFileStream({
72+
item,
73+
}: ExternalSystemAttachmentStreamingParams): Promise<ExternalSystemAttachmentStreamingResponse> {
74+
const { id, url } = item;
75+
76+
try {
77+
const fileStreamResponse = await axiosClient.get(url, {
78+
responseType: 'stream',
79+
headers: {
80+
'Accept-Encoding': 'identity',
81+
},
82+
});
83+
84+
return { httpStream: fileStreamResponse };
85+
} catch (error) {
86+
if (axios.isAxiosError(error)) {
87+
console.warn(`Error while fetching attachment ${id} from URL.`, serializeAxiosError(error));
88+
console.warn('Failed attachment metadata', item);
89+
} else {
90+
console.warn(`Error while fetching attachment ${id} from URL.`, error);
91+
console.warn('Failed attachment metadata', item);
92+
}
93+
94+
return {
95+
error: {
96+
message: `Failed to fetch attachment ${id} from URL.`,
97+
},
98+
};
99+
}
100+
}
101+
```
41102

42-
The uploaded artifact is structured like a normal artifact containing extracted data in JSON Lines
43-
(JSONL) format and requires specifying `ssor_attachment` as the item type.
103+
## Emitting responses
44104

45-
The snap-in must respond to Airdrop with a message, that either signals success, a delay, or an error.
105+
The snap-in must send exactly one response to Airdrop when extraction is complete:
46106

47-
```typescript Success
107+
```typescript Success response
48108
await adapter.emit(ExtractorEventType.ExtractionAttachmentsDone);
49109
```
50110

51-
```typescript Delay
111+
```typescript Delay response (for rate limiting)
52112
await adapter.emit(ExtractorEventType.ExtractionAttachmentsDelay, {
53-
delay: "30",
113+
delay: "30", // Delay in seconds
54114
});
55115
```
56116

57-
```typescript Error
117+
```typescript Error response
58118
await adapter.emit(ExtractorEventType.ExtractionAttachmentsError, {
59119
error: "Informative error message",
60120
});
61121
```
62122

63-
<Note>The snap-in must always emit a single message.</Note>
64-
65-
## Examples
66-
67-
Here is an example of an SSOR attachment file:
68-
69-
```json lines
70-
{
71-
"id": {
72-
"devrev": "don:core:dvrv-us-1:devo/1:artifact/1", // DON of the artifact, that S3interact returned
73-
"external": "111" // ID of the artifact in the external service
74-
},
75-
"parent_id": {
76-
"external": "1111" // ID of the parent object in the external service
77-
},
78-
"actor_id": {
79-
"external": "11111" // ID of the actor that uploaded/modified the artifact in the external service
80-
}
81-
}
82-
{
83-
"id": {
84-
"devrev": "don:core:dvrv-us-1:devo/1:artifact/2",
85-
"external": "222"
86-
},
87-
"parent_id": {
88-
"external": "2222"
89-
},
90-
"actor_id": {
91-
"external": "22222"
92-
}
93-
}
94-
```
123+
<Note>The snap-in must always emit exactly one response event.</Note>

0 commit comments

Comments
 (0)