ISS-162937: Update Airdrop docs

patricijabrecko · patricijabrecko · commit 6a061e860ae3 · 2025-04-13T12:23:03.000+02:00
diff --git a/fern/docs/pages/airdrop/data-extraction.mdx b/fern/docs/pages/airdrop/data-extraction.mdx
@@ -1,47 +1,90 @@
 In the data extraction phase, the extractor is expected to call the external system's APIs
-to retrieve all the items that were updated since the start of the last extraction.
-If there was no previous extraction (the current run is an initial import),
-then all the items should be extracted.
+to retrieve all the items that should be synced with DevRev.
 
-The extractor must store at what time it started each extraction in its state,
-so that it can extract only items created or updated since this date in the next sync run.
+If the current run is an initial sync, this means all the items should be extracted.
+Otherwise the extractor should retrieve all the items that were changed since the start of the last extraction.
+
+Each snap-in invocation is a separate runtime instance with a maximum execution time of 12 minutes. 
+If a large amount of data needs to be extracted, it might not all be extracted within this time frame. 
+To handle such situations, the snap-in uses a state object. 
+This state object is shared across all invocations and keeps track of where the previous snap-in invocations ended in the extraction process.
 
 ## Triggering event
 
 Airdrop initiates data extraction by starting the snap-in with a message with event of type
 `EXTRACTION_DATA_START` when transitioning to the data extraction phase.
 
 During the data extraction phase, the snap-in extracts data from an external system,
-prepares batches of data and uploads them in the form of artifacts to DevRev.
+prepares batches of data and uploads them in the form of artifacts (files) to DevRev.
 
-The snap-in must respond to Airdrop with a message with event of type `EXTRACTION_DATA_PROGRESS`,
-together with an optional progress estimate and relevant artifacts
-when it extracts some data and the maximum Airdrop snap-in runtime (12 minutes) has been reached.
+The snap-in must respond to Airdrop with a message with event type of `EXTRACTION_DATA_PROGRESS`,
+together with an optional progress estimate and relevant list of artifacts
+when the maximum Airdrop snap-in runtime (12 minutes) has been reached.
 
 If the extraction has been rate-limited by the external system and back-off is required, the snap-in
-must respond to Airdrop with a message with event of type `EXTRACTION_DATA_DELAY` and specifying
-back-off time with `delay` attribute.
+must respond to Airdrop with a message with event type `EXTRACTION_DATA_DELAY` and specifying
+back-off time with `delay` attribute (in seconds).
 
 In both cases, Airdrop starts the snap-in with a message with event of type `EXTRACTION_DATA_CONTINUE`.
-The restarting is immediate (in case of `EXTRACTION_DATA_PROGRESS`) or delayed
-(in case of `EXTRACTION_DATA_DELAY`).
+In case of `EXTRACTION_DATA_PROGRESS` the restarting is immediate,
+meanwhile in case of `EXTRACTION_DATA_DELAY` the restarting is delayed for the given number of seconds.
 
-Once the data extraction is done, the snap-in must respond to Airdrop with a message with event of
-type `EXTRACTION_DATA_DONE`.
+Once the data extraction is done, the snap-in must respond to Airdrop with a message with event type `EXTRACTION_DATA_DONE`.
 
 If data extraction failed in any moment of extraction, the snap-in must respond to Airdrop with a
-message with event of type `EXTRACTION_DATA_ERROR`.
+message with event type `EXTRACTION_DATA_ERROR`.
 
 ## Implementation
 
 Data extraction should be implemented in the [data-extraction.ts](https://github.com/devrev/airdrop-template/blob/main/code/src/functions/extraction/workers/data-extraction.ts) file.
 
-During the data extraction phase, the snap-in uploads batches of extracted items (with tooling from `@devrev/adaas-sdk`).
+### Extracting and storing the data
+
+The SDK library includes a repository system for storing extracted items.
+Each item type, such as users, tasks, or issues, has its own repository. 
+These are defined in the `repos` array as `itemType`. 
+The `itemType` name should match the `record_type` specified in the provided metadata.
+
+```typescript
+const repos = [
+  {
+    itemType: 'todos',
+  },
+  {
+    itemType: 'users',
+  },
+  {
+    itemType: 'attachments',
+  },
+];
+```
+
+The `initializeRepos` function initializes the repositories and should be the first step when the process begins.
+
+```typescript
+processTask<ExtractorState>({
+  task: async ({ adapter }) => {
+    adapter.initializeRepos(repos);
+    // ...
+  },
+  onTimeout: async ({ adapter }) => {
+    // ...
+  },
+});
+```
+
+After initialization, items are retrieved from the external system and stored in the repository by calling the `push` function.
+
+```typescript
+await adapter.getRepo('users')?.push(items);
+```
+
+### Data normalization
 
-Each artifact is submitted with an `item_type`, defining a separate domain object from the
-external system and matching the `record_type` in the provided metadata.
+Extracted data must be normalized to fit the domain metadata defined in the `external-domain-metadata.json` file. 
+More details on this process are provided in the [Metadata extraction](/public/snapin-development/adaas/metadata-extraction) section.
 
-Extracted data must be normalized:
+Normalization rules:
 
 - Null values: All fields without a value should either be omitted or set to null.
   For example, if an external system provides values such as "", –1 for missing values,
@@ -52,6 +95,27 @@ Extracted data must be normalized:
 - Number fields must be valid JSON numbers (not strings).
 - Multiselect fields must be provided as an array (not CSV).
 
+Extracted items are automatically normalized when pushed to the `repo` if a normalization function is provided under the `normalize` key in the repo object.
+
+```typescript
+const repos = [
+  {
+    itemType: 'todos',
+    normalize: normalizeTodo,
+  },
+  {
+    itemType: 'users',
+    normalize: normalizeUser,
+  },
+  {
+    itemType: 'attachments',
+    normalize: normalizeAttachment,
+  },
+];
+```
+
+For examples of normalization functions, refer to the [data-normalization.ts](https://github.com/devrev/airdrop-template/blob/main/code/src/functions/external-system/data-normalization.ts) file in the starter template.
+
 Each line of the file contains an `id` and the optional `created_date` and `modified_date` fields
 in the beginning of the record.
 All other fields are contained within the `data` attribute.
@@ -88,10 +152,8 @@ echo '{}' | chef-cli fuzz-extracted -r issue -m external_domain_metadata.json >
 
 ## State handling
 
-Since each snap-in invocation is a separate runtime instance (with a maximum execution time of 12 minutes),
-it does not know what has been previously accomplished or how many records have already been extracted. 
-To enable information passing between invocations and runs, support has been added for saving a limited amount 
-of data as the snap-in `state`. Snap-in `state` persists between phases in one sync run as well as between multiple sync runs.
+To enable information passing between invocations and runs, a limited amount of data can be saved as the snap-in `state`. 
+Snap-in `state` persists between phases in one sync run as well as between multiple sync runs.
 You can access the `state` through SDK's `adapter` object.
 
 A snap-in must consult its state to obtain information on when the last successful forward sync started.
diff --git a/fern/docs/pages/airdrop/extraction-phases.mdx b/fern/docs/pages/airdrop/extraction-phases.mdx
@@ -1,24 +1,29 @@
 Each snap-in must handle all the phases of Airdrop extraction. In a snap-in, you typically define a run
 function that iterates over events and invokes workers per extraction phase.
 
-The SDK library exports `processTask` to structure the work within each phase, and `onTimeout` function
-to handle timeouts.
-
 The Airdrop snap-in extraction lifecycle consists of four phases: 
-* External sync units extraction
+* External sync units extraction (only for initial sync)
 * Metadata extraction
 * Data extraction  
 * Attachments extraction
 
 Each phase is defined in a separate file and is responsible for fetching the respective data.
 
-The SDK library provides a repository management system to handle artifacts in batches.
-The `initializeRepos` function initializes the repositories, and the `push` function uploads the
-artifacts to the repositories. The `postState` function is used to post the state of the extraction task.
+<Note>
+  Snap-in development is an iterative process.
+  It typically begins with retrieving some data from the external system. 
+  The next step involves crafting an initial version of the external domain metadata and validating it through chef-cli. 
+  This metadata is used to prepare the initial domain mapping and checking for any possible issues.
+  API calls to the external system are then corrected to fetch the missing data.
+  Start by working with one item type, and once it maps well to DevRev objects and imports as desired, proceed with other item types.
+</Note>
+
+The SDK library exports `processTask` to structure the work within each phase, and `onTimeout` function
+to handle timeouts.
 
 State management is crucial for snap-ins to maintain the state of the extraction task.
-The `postState` function is used to post the state of the extraction task.
-The state is stored in the adapter and can be retrieved using the `adapter.state` property.
+State is saved to the Airdrop backend by calling the `postState` function.
+During the extraction the state is stored in the adapter and can be retrieved using the `adapter.state` property.
 
 ```typescript
 import { AirdropEvent, EventType, spawn } from "@devrev/ts-adaas";
diff --git a/fern/docs/pages/airdrop/getting-started.mdx b/fern/docs/pages/airdrop/getting-started.mdx
@@ -30,15 +30,18 @@ consider gathering the following information:
 - **Error handling**: Learn about error response formats and codes. Knowing this helps in
   handling errors and exceptions in your integration.
 
-## Terminology
+## Basic concepts
 
 ### Sync unit
 
-A _sync unit_ is one self encompassing unit of data that is synced to an external system. Examples:
+A _sync unit_ is one self encompassing unit of data that is synced to an external system. For example:
 - A project in Jira.
 - An account in SalesForce.
 - An organization Zendesk.
 
+In Jira, users often have multiple projects. Each project acts as an individual sync unit. 
+In contrast, Zendesk operates with a single large pool of tickets and agents. Here, the entire Zendesk instance can be synced in a single airdrop.
+
 ### Sync run
 
 Airdrop extractions are done in _sync runs_.
@@ -61,13 +64,13 @@ An **extractor** function in the snap-in is responsible for extracting data from
 A _reverse sync_ is a sync run from DevRev to an external system.
 It uses a **loader** function, to create or update data in the external system.
 
-### Initial import
+### Initial sync
 
-An _initial import_ is the first import of data from the external system to DevRev.
+The first sync is called the _initial sync_.
 It is triggered manually by the end user in DevRev's **Airdrops** UI.
 
-In initial import all data needs to be extracted to create a baseline (while in incremental runs only
-updated objects need to be extracted).
+During the initial sync, all data from the external sync unit is extracted and loaded into DevRev. 
+This process typically involves a large import and may take some time.
 
 An _initial import_ consists of the following phases:
 
@@ -79,18 +82,25 @@ An _initial import_ consists of the following phases:
 ### 1-way (incremental) sync
 
 A _1-way sync_ (or _incremental sync_) refers to any extraction after the initial sync run has been successfully completed.
-An extractor extracts data that was created or updated in the external system after the start
-of the latest successful forward sync, including any changes that occurred during the forward sync,
-but were not picked up by it.
+This can be a forward sync or a reverse sync.
 
-A snap-in must consult its state to get information on when the last successful forward sync started.
-Airdrop snap-ins must maintain their own states that persists between phases in a sync run,
-as well as between sync runs.
+#### 1-way forward sync
+
+An extractor extracts data that was created or updated in the external system after the start
+of the latest successful forward sync.
+This includes any changes that happened during the previous sync, but were not picked up by it.
 
-A 1-way sync consists of the following phases:
+A 1-way forward sync consists of the following phases:
 
 1. Metadata extraction
 2. Data extraction
 3. Attachments extraction
 
-A 1-way sync extracts only the domain objects updated or created since the previous successful sync run.
+#### 1-way reverse sync
+
+The loader will check for any changes in DevRev after the latest successful reverse sync and update the data in the external system.
+
+A 1-way reverse sync consists of the following phases:
+
+1. Data loading
+2. Attachments loading
diff --git a/fern/docs/pages/airdrop/initial-domain-mapping.mdx b/fern/docs/pages/airdrop/initial-domain-mapping.mdx
@@ -1,8 +1,11 @@
 Initial domain mapping is a process that establishes relationships between 
-external data schemas and DevRev's native record types. This mapping is configured
-once and then becomes available to all users of your integration,
+external data schemas and DevRev's native record types. 
+This mapping is configured once and then becomes available to all users of your snap-in,
 allowing them to import data while maintaining semantic meaning from their source systems.
 
+The initial domain mapping is installed with your snap-in.
+The extractor automatically triggers a function to upload these mappings to the Airdrop system.
+
 ## Chef-cli initial domain mapping setup
 
 ### Prerequisites
diff --git a/fern/docs/pages/airdrop/loading-phases.mdx b/fern/docs/pages/airdrop/loading-phases.mdx
@@ -49,4 +49,9 @@ export default run;
 Loading phases run as separate runtime instances, similar to extraction phases, with a maximum execution time of 12 minutes. 
 These phases share a `state`, defined in the `LoaderState` interface. 
 It is important to note that the loader state is separate from the extractor state. 
+
 Access to the `state` is available through the SDK's `adapter` object.
+
+## Creating items in DevRev
+
+To create an item in DevRev and sync it with the external system, start by creating an item with a **subtype** that was established during the initial sync. After selecting the subtype, fill out the necessary details for the item.
diff --git a/fern/docs/pages/airdrop/local-development.mdx b/fern/docs/pages/airdrop/local-development.mdx
@@ -9,7 +9,7 @@ For easier development you can run your Airdrop snap-in locally and receive logs
 
 ## Run the template
 
-DevRev provides a starter template, which you can run and test out right away.
+DevRev offers a starter Airdrop snap-in template that is ready for immediate use and testing. 
 
 1. Create a new repository:
    - Create a new repository from this template by clicking the "Use this template" button in the upper right corner and then "Create a new repository".
@@ -46,6 +46,8 @@ DevRev provides a starter template, which you can run and test out right away.
    devrev snap_in activate
    ```
 
+# Initial sync 
+
 Now that you have a running snap-in, you can start an airdrop.
 Go to DevRev app and click **Airdrops** -> **Start Airdrop** -> **Your snap-in**.
 
diff --git a/fern/docs/pages/airdrop/manifest.mdx b/fern/docs/pages/airdrop/manifest.mdx
@@ -46,18 +46,18 @@ Ensure that `extractor_function` and `loader_function` names correspond with tho
 
 ## Establish a connection to the external system
 
-_Keyrings_ are a collection of authentication information, used by a snap-in to authenticate to the external system in API calls. This can include a key (for example, a PAT token or API key), its type, the organization ID for which a key is valid, and in some cases the organization name.
+_Keyrings_ provide a secure way to store and manage credentials within your DevRev snap-in. 
 
-Keyrings provide a secure way to store and manage credentials within your DevRev snap-in.
-This eliminates the need to expose sensitive information like passwords or access tokens directly
-within your code or configuration files, enhancing overall security.
-They also provide a valid token by abstracting OAuth token renewal from the end user.
+Keyrings are a collection of authentication information, used by a snap-in to authenticate to the external system in API calls. 
+This can include a key (for example, a PAT token or API key), its type and the organization ID for which a key is valid.
 
-They are called **Connections** in the DevRev app.
+This eliminates the need to expose sensitive information like passwords or access tokens directly within your code or configuration files. They also provide a valid token by abstracting OAuth token renewal from the end user, so less work is needed on the developer's side.
+
+Keyrings are called **Connections** in the DevRev app.
 
 ### Configure a keyring
 
-Keyrings are configured in the `manifest.yaml` by configuring a `keyring_type`, like in the [example](https://github.com/devrev/airdrop-template/blob/main/manifest.yaml).
+Keyrings are configured in the `manifest.yaml` by configuring a `keyring_type`, like in the [example](https://github.com/devrev/airdrop-template/blob/main/manifest.yaml):
 
 ```yaml
 keyring_types:
@@ -67,8 +67,7 @@ keyring_types:
     # The kind field specifies the type of keyring.
     kind: <"secret"/"oauth2">
     # is_subdomain field specifies whether the keyring contains a subdomain.
-    # Enabling this field allows the keyring to get the subdomain from the user during creation.
-    # This is useful when the keyring requires a subdomain as part of the configuration.
+    # Enabling this field allows the keyring to get the subdomain from the user during keyring creation.
     # Default is false.
     is_subdomain: <true/false>
     # Name of the external system you are importing from.
@@ -96,7 +95,7 @@ keyring_types:
         # Optional: query parameters to be included in the verification request.
         query_params:
           <param_name>: <param_value> # optional: query parameters to be included in the verification request.
-      # Fetching Organization Data: This allows you to retrieve additional information about the user's organization.
+      # Optional: fetching organization data if is_subdomain option is false.
       organization_data:
         type: "config"
         # The URL to which the request is sent to fetch organization data.
@@ -106,7 +105,16 @@ keyring_types:
         headers:
           <header_name>: <header_value>
         # The jq filter used to extract the organization data from the response.
+        # It should provide an object with id and name, depending on what the external system returns.
+        # For example "{id: .data[0].id, name: .data[0].name }".
         response_jq: <jq_filter>
 ```
+There are some options to consider:
+
+* `kind`  
+  The `kind` option can be either "secret" or "oauth2". The "secret" option is intended for storing various tokens, such as a PAT token. Use of OAuth2 is encouraged when possible. More information is available for [secret](/public/snapin-development/references/keyrings/secret-configuration) and [oauth2](/oauth-configuration).
+
+* `is_subdomain`  
+  The `is_subdomain` field relates to the API endpoints being called. When the endpoints for fetching data from an external system include a slug representing the organization—such as for example `https://subdomain.freshdesk.com/api/v2/tickets`—set this key to "true". In this scenario, users creating a new connection are prompted to insert the subdomain.
 
-You can find more information about keyrings and keyring types [here](/snapin-development/references/keyrings/keyring-intro).
+  If no subdomain is present in the endpoint URL, set this key to "false". In this case, provide the `organization_data` part of the configuration. Specify the endpoint in the `url` field to fetch organization data. Users creating a new connection are prompted to select the organization from a list of options, as retrieved from the `organization_data.url` value.
diff --git a/fern/docs/pages/airdrop/metadata-extraction.mdx b/fern/docs/pages/airdrop/metadata-extraction.mdx
@@ -243,7 +243,7 @@ be changed by the end user at any time, such as mandatory fields or custom field
     A good practice is to retrieve the set of possible values for all enum fields from the external
     system's APIs in each sync run. You can mark specific enum values as deprecated using the `is_deprecated` property.
 
-    `ID` (primary key) of the record, `created_date`, and `modified_date` must not be declared.
+    **`ID` (primary key) of the record, `created_date`, and `modified_date` must not be declared.**
 
     Example:
 
diff --git a/fern/docs/pages/airdrop/overview.mdx b/fern/docs/pages/airdrop/overview.mdx
diff --git a/fern/docs/pages/airdrop/snap-in-template.mdx b/fern/docs/pages/airdrop/snap-in-template.mdx