Skip to content

Commit 05206f2

Browse files
Support for new ADaaS architecture, attachments streaming, gzipping files (#6)
1 parent da1850b commit 05206f2

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

55 files changed

+7607
-1942
lines changed

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -130,3 +130,4 @@ dist
130130
.pnp.*
131131

132132
.npmrc
133+
.idea

README.md

Lines changed: 182 additions & 116 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,14 @@
11
# ADaaS Library
22

3-
Typescript ADaaS Library (@devrev/ts-adaas) provides:
3+
## Release Notes
44

5-
- type definitions for ADaaS control protocol,
6-
- an adapter for ADaaS control protocol,
7-
- helpers for uploading artifacts and manage the state for ADaaS snap-in.
5+
#### v1.0.0
86

9-
## Release Notes
7+
- Allow extractions to use full lambda runtime and gracefully handle execution context timeout.
8+
- Simplified metadata and data normalization and uploading with repo implementation.
9+
- Default handling of attachment extraction phase in ADaaS SDK library.
10+
- Reduced file size, streamlined process by gzip compression.
11+
- Bug fixes and improvements in error handling.
1012

1113
#### v0.0.3
1214

@@ -25,155 +27,219 @@ Typescript ADaaS Library (@devrev/ts-adaas) provides:
2527
- Adapter for ADaaS control protocol with helper functions
2628
- Uploader for uploading artifacts
2729

28-
## Usage
30+
# Overview
2931

30-
Create a new ADaaS adapter on each ADaaS snap-in invocation:
32+
The ADaaS (Airdrop-as-a-Service) Library for TypeScript helps developers build Snap-ins that integrate with DevRev’s ADaaS platform. This library simplifies the workflow for handling data extraction, event-driven actions, state management, and artifact handling.
3133

32-
```javascript
33-
const adapter = new Adapter(event: AirdropEvent);
34-
```
34+
## Features
3535

36-
Adapter class provides:
36+
- Type Definitions: Structured types for ADaaS control protocol
37+
- Event Management: Easily emit events for different extraction phases
38+
- State Handling: Update and access state in real-time within tasks
39+
- Artifact Management: Supports batched storage of artifacts (2000 items per batch)
40+
- Error & Timeout Support: Error handling and timeout management for long-running tasks
3741

38-
- helper function to emit response,
39-
- automatic emit event if ADaaS snap-in invocation runs out of time,
40-
- setter for updating ADaaS snap-in state and adding artifacts to the return ADaaS message.
42+
# Installation
4143

42-
### Phases of Airdrop Extraction
44+
```bash
45+
npm install @devrev/ts-adaas
46+
```
4347

44-
Each ADaaS snap-in must handle all the phases of ADaaS extraction.
48+
# Usage
4549

46-
ADaaS library provides type definitions to ensure ADaaS snap-ins are compatible with ADaaS control protocol.
50+
ADaaS Snap-ins are composed of several phases, each with unique requirements for initialization, data extraction, and error handling. The ADaaS library exports processTask to structure the work within each phase. The processTask function accepts task and onTimeout handlers, giving access to the adapter to streamline state updates, upload of extracted data, and event emission.
4751

48-
```javascript
49-
async run() {
50-
switch (this.event.payload.event_type) {
51-
case EventType.ExtractionExternalSyncUnitsStart: {
52+
### ADaaS Snap-in Invocation
5253

53-
// extract available External Sync Units (projects, organizations, ...)
54+
Each ADaaS snap-in must handle all the phases of ADaaS extraction. In a Snap-in, you typically define a `run` function that iterates over events and invokes workers per extraction phase.
5455

55-
await this.adapter.emit(ExtractorEventType.ExtractionExternalSyncUnitsDone, {
56-
external_sync_units: externalSyncUnits,
57-
});
58-
break;
59-
}
60-
61-
case EventType.ExtractionMetadataStart: {
56+
```typescript
57+
import { AirdropEvent, EventType, spawn } from '@devrev/ts-adaas';
6258

63-
// provide mappings of domain objects by provioding initial_domain_mapping.json file
64-
// update ADaaS snap-in state
59+
interface DummyExtractorState {
60+
issues: { completed: boolean };
61+
users: { completed: boolean };
62+
attachments: { completed: boolean };
63+
}
6564

66-
await this.adapter.emit(ExtractorEventType.ExtractionMetadataDone);
65+
const initialState: DummyExtractorState = {
66+
issues: { completed: false },
67+
users: { completed: false },
68+
attachments: { completed: false },
69+
};
70+
71+
function getWorkerPerExtractionPhase(event: AirdropEvent) {
72+
let path;
73+
switch (event.payload.event_type) {
74+
case EventType.ExtractionExternalSyncUnitsStart:
75+
path = __dirname + '/workers/external-sync-units-extraction';
6776
break;
68-
}
69-
70-
case EventType.ExtractionDataStart: {
71-
72-
// extract Data
73-
// upload Data
74-
// update ADaaS snap-in state
75-
// approximate progress done
76-
77-
await this.adapter.emit(ExtractorEventType.ExtractionDataContinue, {
78-
progress: 10,
79-
});
80-
77+
case EventType.ExtractionMetadataStart:
78+
path = __dirname + '/workers/metadata-extraction';
8179
break;
82-
}
83-
84-
case EventType.ExtractionDataContinue: {
85-
await this.processExtractionData();
86-
87-
// extract Data
88-
// upload Data
89-
// update ADaaS snap-in state
90-
// approximate progress done
91-
92-
await this.adapter.emit(ExtractorEventType.ExtractionDataDone, {
93-
progress: 100,
94-
});
80+
case EventType.ExtractionDataStart:
81+
case EventType.ExtractionDataContinue:
82+
path = __dirname + '/workers/data-extraction';
9583
break;
96-
}
97-
98-
case EventType.ExtractionDataDelete: {
84+
}
85+
return path;
86+
}
9987

100-
// if an extraction has any side-effects to 3rd party systems cleanup should be done here.
88+
const run = async (events: AirdropEvent[]) => {
89+
for (const event of events) {
90+
const file = getWorkerPerExtractionPhase(event);
91+
await spawn<DummyExtractorState>({
92+
event,
93+
initialState,
94+
workerPath: file,
95+
options: {
96+
isLocalDevelopment: true,
97+
},
98+
});
99+
}
100+
};
101101

102-
await this.adapter.emit(ExtractorEventType.ExtractionDataDeleteDone);
103-
break;
104-
}
102+
export default run;
103+
```
105104

106-
case EventType.ExtractionAttachmentsStart: {
105+
## Extraction Phases
107106

108-
// extract Attachments
109-
// upload Attachments
110-
// update ADaaS snap-in state
107+
The ADaaS snap-in extraction lifecycle consists of three main phases: External Sync Units Extraction, Metadata Extraction, and Data Extraction. Each phase is defined in a separate file and is responsible for fetching the respective data.
111108

112-
await this.adapter.emit(ExtractorEventType.ExtractionAttachmentsContinue);
113-
break;
114-
}
109+
### 1. External Sync Units Extraction
115110

116-
case EventType.ExtractionAttachmentsContinue: {
111+
This phase is defined in `external-sync-units-extraction.ts` and is responsible for fetching the external sync units.
117112

113+
```typescript
114+
import {
115+
ExternalSyncUnit,
116+
ExtractorEventType,
117+
processTask,
118+
} from '@devrev/ts-adaas';
119+
120+
const externalSyncUnits: ExternalSyncUnit[] = [
121+
{
122+
id: 'devrev',
123+
name: 'devrev',
124+
description: 'Demo external sync unit',
125+
item_count: 2,
126+
item_type: 'issues',
127+
},
128+
];
129+
130+
processTask({
131+
task: async ({ adapter }) => {
132+
await adapter.emit(ExtractorEventType.ExtractionExternalSyncUnitsDone, {
133+
external_sync_units: externalSyncUnits,
134+
});
135+
},
136+
onTimeout: async ({ adapter }) => {
137+
await adapter.emit(ExtractorEventType.ExtractionExternalSyncUnitsError, {
138+
error: {
139+
message: 'Failed to extract external sync units. Lambda timeout.',
140+
},
141+
});
142+
},
143+
});
144+
```
118145

119-
// extract Attachments
120-
// upload Attachments
121-
// update ADaaS snap-in state
146+
### 2. Metadata Extraction
122147

123-
await this.adapter.emit(ExtractorEventType.ExtractionAttachmentsDone);
124-
break;
125-
}
148+
This phase is defined in `metadata-extraction.ts` and is responsible for fetching the metadata.
126149

127-
case EventType.ExtractionAttachmentsDelete: {
150+
```typescript
151+
import { ExtractorEventType, processTask } from '@devrev/ts-adaas';
152+
import externalDomainMetadata from '../dummy-extractor/external_domain_metadata.json';
153+
154+
const repos = [{ itemType: 'external_domain_metadata' }];
155+
156+
processTask({
157+
task: async ({ adapter }) => {
158+
adapter.initializeRepos(repos);
159+
await adapter
160+
.getRepo('external_domain_metadata')
161+
?.push([externalDomainMetadata]);
162+
await adapter.emit(ExtractorEventType.ExtractionMetadataDone);
163+
},
164+
onTimeout: async ({ adapter }) => {
165+
await adapter.emit(ExtractorEventType.ExtractionMetadataError, {
166+
error: { message: 'Failed to extract metadata. Lambda timeout.' },
167+
});
168+
},
169+
});
170+
```
128171

129-
// if an extraction has any side-effects to 3rd party systems cleanup should be done here.
172+
### 3. Data Extraction
130173

131-
await this.adapter.emit(ExtractorEventType.ExtractionAttachmentsDeleteDone);
132-
break;
133-
}
174+
This phase is defined in `data-extraction.ts` and is responsible for fetching the data. In this phase also attachments metadata is extracted.
134175

135-
default: {
136-
console.log('Event not supported' + JSON.stringify(this.event));
176+
```typescript
177+
import { EventType, ExtractorEventType, processTask } from '@devrev/ts-adaas';
178+
import { normalizeAttachment, normalizeIssue, normalizeUser } from '../dummy-extractor/data-normalization';
179+
180+
const issues = [
181+
{ id: 'issue-1', created_date: '1999-12-25T01:00:03+01:00', ... },
182+
{ id: 'issue-2', created_date: '1999-12-27T15:31:34+01:00', ... },
183+
];
184+
185+
const users = [
186+
{ id: 'user-1', created_date: '1999-12-25T01:00:03+01:00', ... },
187+
{ id: 'user-2', created_date: '1999-12-27T15:31:34+01:00', ... },
188+
];
189+
190+
const attachments = [
191+
{ url: 'https://app.dev.devrev-eng.ai/favicon.ico', id: 'attachment-1', ... },
192+
{ url: 'https://app.dev.devrev-eng.ai/favicon.ico', id: 'attachment-2', ... },
193+
];
194+
195+
const repos = [
196+
{ itemType: 'issues', normalize: normalizeIssue },
197+
{ itemType: 'users', normalize: normalizeUser },
198+
{ itemType: 'attachments', normalize: normalizeAttachment },
199+
];
200+
201+
processTask({
202+
task: async ({ adapter }) => {
203+
adapter.initializeRepos(repos);
204+
205+
if (adapter.event.payload.event_type === EventType.ExtractionDataStart) {
206+
await adapter.getRepo('issues')?.push(issues);
207+
await adapter.emit(ExtractorEventType.ExtractionDataProgress, { progress: 50 });
208+
} else {
209+
await adapter.getRepo('users')?.push(users);
210+
await adapter.getRepo('attachments')?.push(attachments);
211+
await adapter.emit(ExtractorEventType.ExtractionDataDone, { progress: 100 });
137212
}
138-
}
139-
}
213+
},
214+
onTimeout: async ({ adapter }) => {
215+
await adapter.postState();
216+
await adapter.emit(ExtractorEventType.ExtractionDataProgress, { progress: 50 });
217+
},
218+
});
140219
```
141220

142-
## Uploading artifacts
221+
## 4. Attachments Streaming
143222

144-
Create a new Uploader class for uploading artifacts:
223+
The ADaaS library handles attachments streaming to improve efficiency and reduce complexity for developers. During the extraction phase, developers need only to provide metadata in a specific format for each attachment, and the library manages the streaming process.
145224

146-
```javascript
147-
const upload = new Uploader(
148-
event.execution_metadata.devrev_endpoint,
149-
event.context.secrets.service_account_token
150-
);
151-
```
225+
The Snap-in should provide attachment metadata following the `NormalizedAttachment` interface:
152226

153-
Files with extracted domain objects must be in JSONL (JSON Lines) format. Data files should contain 2000 - 5000 records each.
154-
155-
```javascript
156-
const entity = 'users';
157-
const { artifact, error } = await this.uploader.upload(
158-
`extractor_${entity}_${i}.jsonl`,
159-
entity,
160-
data
161-
);
162-
if (error) {
163-
return error;
164-
} else {
165-
await this.adapter.update({ artifact });
227+
```typescript
228+
export interface NormalizedAttachment {
229+
url: string;
230+
id: string;
231+
file_name: string;
232+
author_id: string;
233+
parent_id: string;
166234
}
167235
```
168236

169-
Each uploaded file must be attached to ADaaS adapter as soon as it is uploaded to ensure it is included in the ADaaS response message in case of a lambda timeout.
237+
## Artifact Uploading and State Management
170238

171-
## Updating ADaaS snap-in state
239+
The ADaaS library provides a repository management system to handle artifacts in batches. The `initializeRepos` function initializes the repositories, and the `push` function uploads the artifacts to the repositories. The `postState` function is used to post the state of the extraction task.
172240

173-
ADaaS snap-ins keep their own state between sync runs, between the states of a particular sync run and between invocations within a particular state.
241+
State management is crucial for ADaaS Snap-ins to maintain the state of the extraction task. The `postState` function is used to post the state of the extraction task. The state is stored in the adapter and can be retrieved using the `adapter.state` property.
174242

175-
By managing its own state, the ADaaS snap-in keeps track of the process of extraction (what items have already been extracted and where to continue), the times of the last successful sync run and keeps record of progress of the extraction.
243+
## Timeout Handling
176244

177-
```typescript
178-
async update({ artifacts, extractor_state}: AdapterUpdateParams)
179-
```
245+
The ADaaS library provides a timeout handler to handle timeouts in long-running tasks. The `onTimeout` handler is called when the task exceeds the timeout limit. The handler can be used to post the state of the extraction task and emit an event when a timeout occurs.

0 commit comments

Comments
 (0)