devrev · radovanjorgic · Apr 23, 2025 · Apr 13, 2025 · Apr 17, 2025 · Apr 17, 2025
@@ -1,47 +1,90 @@
 In the data extraction phase, the extractor is expected to call the external system's APIs
-to retrieve all the items that were updated since the start of the last extraction.
-If there was no previous extraction (the current run is an initial import),
-then all the items should be extracted.
+to retrieve all the items that should be synced with DevRev.
 
-The extractor must store at what time it started each extraction in its state,
-so that it can extract only items created or updated since this date in the next sync run.
+If the current run is an initial sync, this means all the items should be extracted.
+Otherwise the extractor should retrieve all the items that were changed since the start of the last extraction.
+
+Each snap-in invocation is a separate runtime instance with a maximum execution time of 12 minutes. 
+If a large amount of data needs to be extracted, it might not all be extracted within this time frame. 
+To handle such situations, the snap-in uses a state object. 
+This state object is shared across all invocations and keeps track of where the previous snap-in invocations ended in the extraction process.
 
 ## Triggering event
 
 Airdrop initiates data extraction by starting the snap-in with a message with event of type
 `EXTRACTION_DATA_START` when transitioning to the data extraction phase.
 
 During the data extraction phase, the snap-in extracts data from an external system,
-prepares batches of data and uploads them in the form of artifacts to DevRev.
+prepares batches of data and uploads them in the form of artifacts (files) to DevRev.
 
-The snap-in must respond to Airdrop with a message with event of type `EXTRACTION_DATA_PROGRESS`,
-together with an optional progress estimate and relevant artifacts
-when it extracts some data and the maximum Airdrop snap-in runtime (12 minutes) has been reached.
+The snap-in must respond to Airdrop with a message with event type of `EXTRACTION_DATA_PROGRESS`,
+together with an optional progress estimate and relevant list of artifacts
+when the maximum Airdrop snap-in runtime (12 minutes) has been reached.
 
 If the extraction has been rate-limited by the external system and back-off is required, the snap-in
-must respond to Airdrop with a message with event of type `EXTRACTION_DATA_DELAY` and specifying
-back-off time with `delay` attribute.
+must respond to Airdrop with a message with event type `EXTRACTION_DATA_DELAY` and specifying
+back-off time with `delay` attribute (in seconds).
 
 In both cases, Airdrop starts the snap-in with a message with event of type `EXTRACTION_DATA_CONTINUE`.
-The restarting is immediate (in case of `EXTRACTION_DATA_PROGRESS`) or delayed
-(in case of `EXTRACTION_DATA_DELAY`).
+In case of `EXTRACTION_DATA_PROGRESS` the restarting is immediate,
+meanwhile in case of `EXTRACTION_DATA_DELAY` the restarting is delayed for the given number of seconds.
 
-Once the data extraction is done, the snap-in must respond to Airdrop with a message with event of
-type `EXTRACTION_DATA_DONE`.
+Once the data extraction is done, the snap-in must respond to Airdrop with a message with event type `EXTRACTION_DATA_DONE`.
 
 If data extraction failed in any moment of extraction, the snap-in must respond to Airdrop with a
-message with event of type `EXTRACTION_DATA_ERROR`.
+message with event type `EXTRACTION_DATA_ERROR`.
 
 ## Implementation
 
 Data extraction should be implemented in the [data-extraction.ts](https://github.com/devrev/airdrop-template/blob/main/code/src/functions/extraction/workers/data-extraction.ts) file.
 
-During the data extraction phase, the snap-in uploads batches of extracted items (with tooling from `@devrev/adaas-sdk`).
+### Extracting and storing the data
+
+The SDK library includes a repository system for storing extracted items.
+Each item type, such as users, tasks, or issues, has its own repository. 
+These are defined in the `repos` array as `itemType`. 
+The `itemType` name should match the `record_type` specified in the provided metadata.
+
+```typescript
+const repos = [
+  {
+    itemType: 'todos',
+  },
+  {
+    itemType: 'users',
+  },
+  {
+    itemType: 'attachments',
+  },
+];
+```
+
+The `initializeRepos` function initializes the repositories and should be the first step when the process begins.
+
+```typescript
+processTask<ExtractorState>({
+  task: async ({ adapter }) => {
+    adapter.initializeRepos(repos);
+    // ...
+  },
+  onTimeout: async ({ adapter }) => {
+    // ...
+  },
+});
+```
+
+After initialization, items are retrieved from the external system and stored in the repository by calling the `push` function.
+
+```typescript
+await adapter.getRepo('users')?.push(items);
+```
 
-Each artifact is submitted with an `item_type`, defining a separate domain object from the
-external system and matching the `record_type` in the provided metadata.
+### Data normalization
 
-Extracted data must be normalized:
+Extracted data must be normalized to fit the domain metadata defined in the `external-domain-metadata.json` file. 
+More details on this process are provided in the [Metadata extraction](/public/snapin-development/adaas/metadata-extraction) section.
+
+Normalization rules:
 
 - Null values: All fields without a value should either be omitted or set to null.
   For example, if an external system provides values such as "", –1 for missing values,
@@ -52,8 +95,29 @@ Extracted data must be normalized:
 - Number fields must be valid JSON numbers (not strings).
 - Multiselect fields must be provided as an array (not CSV).
 
-Each line of the file contains an `id` and the optional `created_date` and `modified_date` fields
-in the beginning of the record.
+Extracted items are automatically normalized when pushed to the `repo` if a normalization function is provided under the `normalize` key in the repo object.
+
+```typescript
+const repos = [
+  {
+    itemType: 'todos',
+    normalize: normalizeTodo,
+  },
+  {
+    itemType: 'users',
+    normalize: normalizeUser,
+  },
+  {
+    itemType: 'attachments',
+    normalize: normalizeAttachment,
+  },
+];
+```
+
+For examples of normalization functions, refer to the [data-normalization.ts](https://github.com/devrev/airdrop-template/blob/main/code/src/functions/external-system/data-normalization.ts) file in the starter template.
+
+Each line of the file contains an `id`, `created_date` and `modified_date` fields
+in the beginning of the record. These fields are required.
 All other fields are contained within the `data` attribute.
 
 ```json {2-4}
@@ -67,7 +131,30 @@ All other fields are contained within the `data` attribute.
     "owner": "A3A",
     "rca": null,
     "severity": "fatal",
-    "summary": "Lorem ipsum"
+    "summary": "Lorem ipsum",
+  }
+}
+```
+
+If the item you are normalizing is a work item (e.g., a ticket, task, issue, or similar),
+it should also contain the `item_url_field` within the `data` attribute. 
+This field should be assigned a URL that points to the item in the external system.
+This link is visible in the airdropped item in the DevRev app, 
+helping users to easily locate the item in the external system.
+
+```json {12}
+{
+  "id": "2102e01F",
+  "created_date": "1972-03-29T22:04:47+01:00",
+  "modified_date": "1970-01-01T01:00:04+01:00",
+  "data": {
+    "actual_close_date": "1970-01-01T02:33:18+01:00",
+    "creator": "b8",
+    "owner": "A3A",
+    "rca": null,
+    "severity": "fatal",
+    "summary": "Lorem ipsum",
+    "item_url_field": "https://external-system.com/issue/123"
   }
 }
 ```
@@ -88,10 +175,8 @@ echo '{}' | chef-cli fuzz-extracted -r issue -m external_domain_metadata.json >
 
 ## State handling
 
-Since each snap-in invocation is a separate runtime instance (with a maximum execution time of 12 minutes),
-it does not know what has been previously accomplished or how many records have already been extracted. 
-To enable information passing between invocations and runs, support has been added for saving a limited amount 
-of data as the snap-in `state`. Snap-in `state` persists between phases in one sync run as well as between multiple sync runs.
+To enable information passing between invocations and runs, a limited amount of data can be saved as the snap-in `state`. 
+Snap-in `state` persists between phases in one sync run as well as between multiple sync runs.
 You can access the `state` through SDK's `adapter` object.
 
 A snap-in must consult its state to obtain information on when the last successful forward sync started.

@@ -1,24 +1,29 @@
 Each snap-in must handle all the phases of Airdrop extraction. In a snap-in, you typically define a run
 function that iterates over events and invokes workers per extraction phase.
 
-The SDK library exports `processTask` to structure the work within each phase, and `onTimeout` function
-to handle timeouts.
-
 The Airdrop snap-in extraction lifecycle consists of four phases: 
-* External sync units extraction
+* External sync units extraction (only for initial sync)
 * Metadata extraction
 * Data extraction  
 * Attachments extraction
 
 Each phase is defined in a separate file and is responsible for fetching the respective data.
 
-The SDK library provides a repository management system to handle artifacts in batches.
-The `initializeRepos` function initializes the repositories, and the `push` function uploads the
-artifacts to the repositories. The `postState` function is used to post the state of the extraction task.
+<Note>
+  Snap-in development is an iterative process.
+  It typically begins with retrieving some data from the external system. 
+  The next step involves crafting an initial version of the external domain metadata and validating it through chef-cli. 
+  This metadata is used to prepare the initial domain mapping and checking for any possible issues.
+  API calls to the external system are then corrected to fetch the missing data.
+  Start by working with one item type, and once it maps well to DevRev objects and imports as desired, proceed with other item types.
+</Note>
+
+The SDK library exports `processTask` to structure the work within each phase, and `onTimeout` function
+to handle timeouts.
 
 State management is crucial for snap-ins to maintain the state of the extraction task.
-The `postState` function is used to post the state of the extraction task.
-The state is stored in the adapter and can be retrieved using the `adapter.state` property.
+State is saved to the Airdrop backend by calling the `postState` function.
+During the extraction the state is stored in the adapter and can be retrieved using the `adapter.state` property.
 
 ```typescript
 import { AirdropEvent, EventType, spawn } from "@devrev/ts-adaas";

@@ -30,15 +30,18 @@ consider gathering the following information:
 - **Error handling**: Learn about error response formats and codes. Knowing this helps in
   handling errors and exceptions in your integration.
 
-## Terminology
+## Basic concepts
 
 ### Sync unit
 
-A _sync unit_ is one self encompassing unit of data that is synced to an external system. Examples:
+A _sync unit_ is one self encompassing unit of data that is synced to an external system. For example:
 - A project in Jira.
 - An account in SalesForce.
 - An organization Zendesk.
 
+In Jira, users often have multiple projects. Each project acts as an individual sync unit. 
+In contrast, Zendesk operates with a single large pool of tickets and agents. Here, the entire Zendesk instance can be synced in a single airdrop.
+
 ### Sync run
 
 Airdrop extractions are done in _sync runs_.
@@ -61,13 +64,13 @@ An **extractor** function in the snap-in is responsible for extracting data from
 A _reverse sync_ is a sync run from DevRev to an external system.
 It uses a **loader** function, to create or update data in the external system.
 
-### Initial import
+### Initial sync
 
-An _initial import_ is the first import of data from the external system to DevRev.
+The first sync is called the _initial sync_.
 It is triggered manually by the end user in DevRev's **Airdrops** UI.
 
-In initial import all data needs to be extracted to create a baseline (while in incremental runs only
-updated objects need to be extracted).
+During the initial sync, all data from the external sync unit is extracted and loaded into DevRev. 
+This process typically involves a large import and may take some time.
 
 An _initial import_ consists of the following phases:
 
@@ -79,18 +82,25 @@ An _initial import_ consists of the following phases:
 ### 1-way (incremental) sync
 
 A _1-way sync_ (or _incremental sync_) refers to any extraction after the initial sync run has been successfully completed.
-An extractor extracts data that was created or updated in the external system after the start
-of the latest successful forward sync, including any changes that occurred during the forward sync,
-but were not picked up by it.
+This can be a forward sync or a reverse sync.
 
-A snap-in must consult its state to get information on when the last successful forward sync started.
-Airdrop snap-ins must maintain their own states that persists between phases in a sync run,
-as well as between sync runs.
+#### 1-way forward sync
+
+An extractor extracts data that was created or updated in the external system after the start
+of the latest successful forward sync.
+This includes any changes that happened during the previous sync, but were not picked up by it.
 
-A 1-way sync consists of the following phases:
+A 1-way forward sync consists of the following phases:
 
 1. Metadata extraction
 2. Data extraction
 3. Attachments extraction
 
-A 1-way sync extracts only the domain objects updated or created since the previous successful sync run.
+#### 1-way reverse sync
+
+The loader checks for any changes in DevRev after the latest successful reverse sync and updates the data in the external system.
+
+A 1-way reverse sync consists of the following phases:
+
+1. Data loading
+2. Attachments loading
@@ -1,8 +1,11 @@
 Initial domain mapping is a process that establishes relationships between 
-external data schemas and DevRev's native record types. This mapping is configured
-once and then becomes available to all users of your integration,
+external data schemas and DevRev's native record types. 
+This mapping is configured once and then becomes available to all users of your snap-in,
 allowing them to import data while maintaining semantic meaning from their source systems.
 
+The initial domain mapping is installed with your snap-in.
+The extractor automatically triggers a function to upload these mappings to the Airdrop system.
+
 ## Chef-cli initial domain mapping setup
 
 ### Prerequisites

@@ -49,4 +49,9 @@ export default run;
 Loading phases run as separate runtime instances, similar to extraction phases, with a maximum execution time of 12 minutes. 
 These phases share a `state`, defined in the `LoaderState` interface. 
 It is important to note that the loader state is separate from the extractor state. 
+
 Access to the `state` is available through the SDK's `adapter` object.
+
+## Creating items in DevRev
+
+To create an item in DevRev and sync it with the external system, start by creating an item with a **subtype** that was established during the initial sync. After selecting the subtype, fill out the necessary details for the item.
@@ -9,7 +9,7 @@ For easier development you can run your Airdrop snap-in locally and receive logs
 
 ## Run the template
 
-DevRev provides a starter template, which you can run and test out right away.
+DevRev offers a starter Airdrop snap-in template that is ready for immediate use and testing. 
 
 1. Create a new repository:
    - Create a new repository from this template by clicking the "Use this template" button in the upper right corner and then "Create a new repository".
@@ -46,6 +46,8 @@ DevRev provides a starter template, which you can run and test out right away.
    devrev snap_in activate
    ```
 
+# Initial sync 
+
 Now that you have a running snap-in, you can start an airdrop.
 Go to DevRev app and click **Airdrops** -> **Start Airdrop** -> **Your snap-in**.