Skip to content

ISS-162937: Update Airdrop docs #210

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Apr 23, 2025
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
139 changes: 112 additions & 27 deletions fern/docs/pages/airdrop/data-extraction.mdx
Original file line number Diff line number Diff line change
@@ -1,47 +1,90 @@
In the data extraction phase, the extractor is expected to call the external system's APIs
to retrieve all the items that were updated since the start of the last extraction.
If there was no previous extraction (the current run is an initial import),
then all the items should be extracted.
to retrieve all the items that should be synced with DevRev.

The extractor must store at what time it started each extraction in its state,
so that it can extract only items created or updated since this date in the next sync run.
If the current run is an initial sync, this means all the items should be extracted.
Otherwise the extractor should retrieve all the items that were changed since the start of the last extraction.

Each snap-in invocation is a separate runtime instance with a maximum execution time of 12 minutes.
If a large amount of data needs to be extracted, it might not all be extracted within this time frame.
To handle such situations, the snap-in uses a state object.
This state object is shared across all invocations and keeps track of where the previous snap-in invocations ended in the extraction process.

## Triggering event

Airdrop initiates data extraction by starting the snap-in with a message with event of type
`EXTRACTION_DATA_START` when transitioning to the data extraction phase.

During the data extraction phase, the snap-in extracts data from an external system,
prepares batches of data and uploads them in the form of artifacts to DevRev.
prepares batches of data and uploads them in the form of artifacts (files) to DevRev.

The snap-in must respond to Airdrop with a message with event of type `EXTRACTION_DATA_PROGRESS`,
together with an optional progress estimate and relevant artifacts
when it extracts some data and the maximum Airdrop snap-in runtime (12 minutes) has been reached.
The snap-in must respond to Airdrop with a message with event type of `EXTRACTION_DATA_PROGRESS`,
together with an optional progress estimate and relevant list of artifacts
when the maximum Airdrop snap-in runtime (12 minutes) has been reached.

If the extraction has been rate-limited by the external system and back-off is required, the snap-in
must respond to Airdrop with a message with event of type `EXTRACTION_DATA_DELAY` and specifying
back-off time with `delay` attribute.
must respond to Airdrop with a message with event type `EXTRACTION_DATA_DELAY` and specifying
back-off time with `delay` attribute (in seconds).

In both cases, Airdrop starts the snap-in with a message with event of type `EXTRACTION_DATA_CONTINUE`.
The restarting is immediate (in case of `EXTRACTION_DATA_PROGRESS`) or delayed
(in case of `EXTRACTION_DATA_DELAY`).
In case of `EXTRACTION_DATA_PROGRESS` the restarting is immediate,
meanwhile in case of `EXTRACTION_DATA_DELAY` the restarting is delayed for the given number of seconds.

Once the data extraction is done, the snap-in must respond to Airdrop with a message with event of
type `EXTRACTION_DATA_DONE`.
Once the data extraction is done, the snap-in must respond to Airdrop with a message with event type `EXTRACTION_DATA_DONE`.

If data extraction failed in any moment of extraction, the snap-in must respond to Airdrop with a
message with event of type `EXTRACTION_DATA_ERROR`.
message with event type `EXTRACTION_DATA_ERROR`.

## Implementation

Data extraction should be implemented in the [data-extraction.ts](https://github.com/devrev/airdrop-template/blob/main/code/src/functions/extraction/workers/data-extraction.ts) file.

During the data extraction phase, the snap-in uploads batches of extracted items (with tooling from `@devrev/adaas-sdk`).
### Extracting and storing the data

The SDK library includes a repository system for storing extracted items.
Each item type, such as users, tasks, or issues, has its own repository.
These are defined in the `repos` array as `itemType`.
The `itemType` name should match the `record_type` specified in the provided metadata.

```typescript
const repos = [
{
itemType: 'todos',
},
{
itemType: 'users',
},
{
itemType: 'attachments',
},
];
```

The `initializeRepos` function initializes the repositories and should be the first step when the process begins.

```typescript
processTask<ExtractorState>({
task: async ({ adapter }) => {
adapter.initializeRepos(repos);
// ...
},
onTimeout: async ({ adapter }) => {
// ...
},
});
```

After initialization, items are retrieved from the external system and stored in the repository by calling the `push` function.

```typescript
await adapter.getRepo('users')?.push(items);
```

Each artifact is submitted with an `item_type`, defining a separate domain object from the
external system and matching the `record_type` in the provided metadata.
### Data normalization

Extracted data must be normalized:
Extracted data must be normalized to fit the domain metadata defined in the `external-domain-metadata.json` file.
More details on this process are provided in the [Metadata extraction](/public/snapin-development/adaas/metadata-extraction) section.

Normalization rules:

- Null values: All fields without a value should either be omitted or set to null.
For example, if an external system provides values such as "", –1 for missing values,
Expand All @@ -52,8 +95,29 @@ Extracted data must be normalized:
- Number fields must be valid JSON numbers (not strings).
- Multiselect fields must be provided as an array (not CSV).

Each line of the file contains an `id` and the optional `created_date` and `modified_date` fields
in the beginning of the record.
Extracted items are automatically normalized when pushed to the `repo` if a normalization function is provided under the `normalize` key in the repo object.

```typescript
const repos = [
{
itemType: 'todos',
normalize: normalizeTodo,
},
{
itemType: 'users',
normalize: normalizeUser,
},
{
itemType: 'attachments',
normalize: normalizeAttachment,
},
];
```

For examples of normalization functions, refer to the [data-normalization.ts](https://github.com/devrev/airdrop-template/blob/main/code/src/functions/external-system/data-normalization.ts) file in the starter template.

Each line of the file contains an `id`, `created_date` and `modified_date` fields
in the beginning of the record. These fields are required.
All other fields are contained within the `data` attribute.

```json {2-4}
Expand All @@ -67,7 +131,30 @@ All other fields are contained within the `data` attribute.
"owner": "A3A",
"rca": null,
"severity": "fatal",
"summary": "Lorem ipsum"
"summary": "Lorem ipsum",
}
}
```

If the item you are normalizing is a work item (e.g., a ticket, task, issue, or similar),
it should also contain the `item_url_field` within the `data` attribute.
This field should be assigned a URL that points to the item in the external system.
This link is visible in the airdropped item in the DevRev app,
helping users to easily locate the item in the external system.

```json {12}
{
"id": "2102e01F",
"created_date": "1972-03-29T22:04:47+01:00",
"modified_date": "1970-01-01T01:00:04+01:00",
"data": {
"actual_close_date": "1970-01-01T02:33:18+01:00",
"creator": "b8",
"owner": "A3A",
"rca": null,
"severity": "fatal",
"summary": "Lorem ipsum",
"item_url_field": "https://external-system.com/issue/123"
}
}
```
Expand All @@ -88,10 +175,8 @@ echo '{}' | chef-cli fuzz-extracted -r issue -m external_domain_metadata.json >

## State handling

Since each snap-in invocation is a separate runtime instance (with a maximum execution time of 12 minutes),
it does not know what has been previously accomplished or how many records have already been extracted.
To enable information passing between invocations and runs, support has been added for saving a limited amount
of data as the snap-in `state`. Snap-in `state` persists between phases in one sync run as well as between multiple sync runs.
To enable information passing between invocations and runs, a limited amount of data can be saved as the snap-in `state`.
Snap-in `state` persists between phases in one sync run as well as between multiple sync runs.
You can access the `state` through SDK's `adapter` object.

A snap-in must consult its state to obtain information on when the last successful forward sync started.
Expand Down
23 changes: 14 additions & 9 deletions fern/docs/pages/airdrop/extraction-phases.mdx
Original file line number Diff line number Diff line change
@@ -1,24 +1,29 @@
Each snap-in must handle all the phases of Airdrop extraction. In a snap-in, you typically define a run
function that iterates over events and invokes workers per extraction phase.

The SDK library exports `processTask` to structure the work within each phase, and `onTimeout` function
to handle timeouts.

The Airdrop snap-in extraction lifecycle consists of four phases:
* External sync units extraction
* External sync units extraction (only for initial sync)
* Metadata extraction
* Data extraction
* Attachments extraction

Each phase is defined in a separate file and is responsible for fetching the respective data.

The SDK library provides a repository management system to handle artifacts in batches.
The `initializeRepos` function initializes the repositories, and the `push` function uploads the
artifacts to the repositories. The `postState` function is used to post the state of the extraction task.
<Note>
Snap-in development is an iterative process.
It typically begins with retrieving some data from the external system.
The next step involves crafting an initial version of the external domain metadata and validating it through chef-cli.
This metadata is used to prepare the initial domain mapping and checking for any possible issues.
API calls to the external system are then corrected to fetch the missing data.
Start by working with one item type, and once it maps well to DevRev objects and imports as desired, proceed with other item types.
</Note>

The SDK library exports `processTask` to structure the work within each phase, and `onTimeout` function
to handle timeouts.

State management is crucial for snap-ins to maintain the state of the extraction task.
The `postState` function is used to post the state of the extraction task.
The state is stored in the adapter and can be retrieved using the `adapter.state` property.
State is saved to the Airdrop backend by calling the `postState` function.
During the extraction the state is stored in the adapter and can be retrieved using the `adapter.state` property.

```typescript
import { AirdropEvent, EventType, spawn } from "@devrev/ts-adaas";
Expand Down
38 changes: 24 additions & 14 deletions fern/docs/pages/airdrop/getting-started.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -30,15 +30,18 @@ consider gathering the following information:
- **Error handling**: Learn about error response formats and codes. Knowing this helps in
handling errors and exceptions in your integration.

## Terminology
## Basic concepts

### Sync unit

A _sync unit_ is one self encompassing unit of data that is synced to an external system. Examples:
A _sync unit_ is one self encompassing unit of data that is synced to an external system. For example:
- A project in Jira.
- An account in SalesForce.
- An organization Zendesk.

In Jira, users often have multiple projects. Each project acts as an individual sync unit.
In contrast, Zendesk operates with a single large pool of tickets and agents. Here, the entire Zendesk instance can be synced in a single airdrop.

### Sync run

Airdrop extractions are done in _sync runs_.
Expand All @@ -61,13 +64,13 @@ An **extractor** function in the snap-in is responsible for extracting data from
A _reverse sync_ is a sync run from DevRev to an external system.
It uses a **loader** function, to create or update data in the external system.

### Initial import
### Initial sync

An _initial import_ is the first import of data from the external system to DevRev.
The first sync is called the _initial sync_.
It is triggered manually by the end user in DevRev's **Airdrops** UI.

In initial import all data needs to be extracted to create a baseline (while in incremental runs only
updated objects need to be extracted).
During the initial sync, all data from the external sync unit is extracted and loaded into DevRev.
This process typically involves a large import and may take some time.

An _initial import_ consists of the following phases:

Expand All @@ -79,18 +82,25 @@ An _initial import_ consists of the following phases:
### 1-way (incremental) sync

A _1-way sync_ (or _incremental sync_) refers to any extraction after the initial sync run has been successfully completed.
An extractor extracts data that was created or updated in the external system after the start
of the latest successful forward sync, including any changes that occurred during the forward sync,
but were not picked up by it.
This can be a forward sync or a reverse sync.

A snap-in must consult its state to get information on when the last successful forward sync started.
Airdrop snap-ins must maintain their own states that persists between phases in a sync run,
as well as between sync runs.
#### 1-way forward sync

An extractor extracts data that was created or updated in the external system after the start
of the latest successful forward sync.
This includes any changes that happened during the previous sync, but were not picked up by it.

A 1-way sync consists of the following phases:
A 1-way forward sync consists of the following phases:

1. Metadata extraction
2. Data extraction
3. Attachments extraction

A 1-way sync extracts only the domain objects updated or created since the previous successful sync run.
#### 1-way reverse sync

The loader checks for any changes in DevRev after the latest successful reverse sync and updates the data in the external system.

A 1-way reverse sync consists of the following phases:

1. Data loading
2. Attachments loading
7 changes: 5 additions & 2 deletions fern/docs/pages/airdrop/initial-domain-mapping.mdx
Original file line number Diff line number Diff line change
@@ -1,8 +1,11 @@
Initial domain mapping is a process that establishes relationships between
external data schemas and DevRev's native record types. This mapping is configured
once and then becomes available to all users of your integration,
external data schemas and DevRev's native record types.
This mapping is configured once and then becomes available to all users of your snap-in,
allowing them to import data while maintaining semantic meaning from their source systems.

The initial domain mapping is installed with your snap-in.
The extractor automatically triggers a function to upload these mappings to the Airdrop system.

## Chef-cli initial domain mapping setup

### Prerequisites
Expand Down
5 changes: 5 additions & 0 deletions fern/docs/pages/airdrop/loading-phases.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -49,4 +49,9 @@ export default run;
Loading phases run as separate runtime instances, similar to extraction phases, with a maximum execution time of 12 minutes.
These phases share a `state`, defined in the `LoaderState` interface.
It is important to note that the loader state is separate from the extractor state.

Access to the `state` is available through the SDK's `adapter` object.

## Creating items in DevRev

To create an item in DevRev and sync it with the external system, start by creating an item with a **subtype** that was established during the initial sync. After selecting the subtype, fill out the necessary details for the item.
4 changes: 3 additions & 1 deletion fern/docs/pages/airdrop/local-development.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ For easier development you can run your Airdrop snap-in locally and receive logs

## Run the template

DevRev provides a starter template, which you can run and test out right away.
DevRev offers a starter Airdrop snap-in template that is ready for immediate use and testing.

1. Create a new repository:
- Create a new repository from this template by clicking the "Use this template" button in the upper right corner and then "Create a new repository".
Expand Down Expand Up @@ -46,6 +46,8 @@ DevRev provides a starter template, which you can run and test out right away.
devrev snap_in activate
```

# Initial sync

Now that you have a running snap-in, you can start an airdrop.
Go to DevRev app and click **Airdrops** -> **Start Airdrop** -> **Your snap-in**.

Expand Down
Loading
Loading