devrev · erazemk · Mar 17, 2025 · Mar 8, 2025 · Mar 13, 2025 · Mar 13, 2025
@@ -1,37 +1,44 @@
-For the attachment extraction phase of the import process, the extractor has to upload each attachment to DevRev’s S3
-using the `S3Interact` API.
+For the attachment extraction phase of the import process, the extractor has to upload each
+attachment to DevRev's S3 using the `S3Interact` API.
 
 ## Triggering event
 
-Airdrop initiates the attachment extraction by starting the snap-in with a message with an event of type `EXTRACTION_ATTACHMENTS_START`.
-This is done after the data extraction, transformation and loading into DevRev are completed.
+Airdrop initiates the attachment extraction by starting the snap-in with a message with an event of
+type `EXTRACTION_ATTACHMENTS_START`.
+This is done after the data extraction, transformation, and loading into DevRev are completed.
 
 During the attachment extraction phase,
 the snap-in extracts attachments from the external system and uploads them as artifacts to DevRev.
 
-The snap-in must respond to Airdrop with a message with an event of type `EXTRACTION_ATTACHMENTS_PROGRESS` together with an optional progress estimate and relevant artifacts
+The snap-in must respond to Airdrop with a message with an event of type
+`EXTRACTION_ATTACHMENTS_PROGRESS` together with an optional progress estimate and relevant artifacts
 when it extracts some data and the maximum snap-in run time (12 minutes) has been reached.
 
-The snap-in must respond to Airdrop with a message with an event of type `EXTRACTION_ATTACHMENTS_DELAY` and specify a back-off time
-when the extraction has been rate-limited by the external system and back-off is required.
+The snap-in must respond to Airdrop with a message with an event of type `EXTRACTION_ATTACHMENTS_DELAY`
+and specify a back-off time when the extraction has been rate-limited by the external system and
+back-off is required.
 
-In both cases, Airdrop starts the snap-in with a message with an event of type `EXTRACTION_ATTACHMENTS_CONTINUE`.
-The restart is immediate in case of `EXTRACTION_ATTACHMENTS_PROGRESS`, or delayed
-in case of `EXTRACTION_ATTACHMENTS_DELAY`.
+In both cases, Airdrop starts the snap-in with a message with an event of type
+`EXTRACTION_ATTACHMENTS_CONTINUE`.
+The restart is immediate in case of `EXTRACTION_ATTACHMENTS_PROGRESS`, or delayed in case of
+`EXTRACTION_ATTACHMENTS_DELAY`.
 
-Once the attachment extraction phase is done, the snap-in must respond to Airdrop with a message with an event of type `EXTRACTION_ATTACHMENTS_DONE`.
+Once the attachment extraction phase is done, the snap-in must respond to Airdrop with a message
+with an event of type `EXTRACTION_ATTACHMENTS_DONE`.
 
-If attachment extraction fails the snap-in must respond to Airdrop with a message with an event of type `EXTRACTION_ATTACHMENTS_ERROR`.
+If attachment extraction fails the snap-in must respond to Airdrop with a message with an event of
+type `EXTRACTION_ATTACHMENTS_ERROR`.
 
-## Snap-in response
+## Response from the snap-in
 
-After uploading an attachment or a batch of attachments, the extractor also has to prepare and upload a file specifying
-the extracted and uploaded attachments.
+After uploading an attachment or a batch of attachments, the extractor also has to prepare and
+upload a file specifying the extracted and uploaded attachments.
 
-It should contain the DevRev IDs of the extracted and uploaded attachments, along with the parent domain object ID
-from the external system and the actor ID from the external system.
+It should contain the DevRev IDs of the extracted and uploaded attachments, along with the parent
+domain object ID from the external system and the actor ID from the external system.
 
-The uploaded artifact is structured like a normal artifact containing extracted data in JSON Lines (JSONL) format and requires specifying `ssor_attachment` as the item type.
+The uploaded artifact is structured like a normal artifact containing extracted data in JSON Lines
+(JSONL) format and requires specifying `ssor_attachment` as the item type.
 
 ## Examples
 

@@ -1,5 +1,6 @@
-During the deletion phases, the snap-in may clean up its state or other side effects in the third party systems.
-In the most common extraction use cases, there is nothing to do and snap-ins may reply with the completion message.
+During the deletion phases, the snap-in may clean up its state or other side effects in the third
+party systems. In the most common extraction use cases, there is nothing to do and snap-ins may reply
+with the completion message.
 
 ## Data deletion
 
@@ -8,21 +9,21 @@ In the most common extraction use cases, there is nothing to do and snap-ins may
 Airdrop initiates the data deletion phase when the import is deleted in the DevRev app.
 It is started by sending the worker a message with an event of type `EXTRACTION_DATA_DELETE`.
 
-### Snap-in response
+### Response from the snap-in
 
-The worker must respond to Airdrop with a message with an event of type `EXTRACTION_DATA_DELETE_DONE` when done or
-`EXTRACTION_DATA_DELETE_ERROR` in case of an error.
+The worker must respond to Airdrop with a message with an event of type `EXTRACTION_DATA_DELETE_DONE`
+when done or `EXTRACTION_DATA_DELETE_ERROR` in case of an error.
 
 ## Attachments deletion
 
 
 ### Triggering event
 
-Airdrop initiates the attachments deletion phase when an import is deleted from the DevRev app, after the data deletion has completed.
+Airdrop initiates the attachments deletion phase when an import is deleted from the DevRev app,
+after the data deletion has completed.
 It is started by sending the snap-in a message with an event of type `EXTRACTION_ATTACHMENTS_DELETE`.
 
-### Snap-in response
-
-The snap-in must respond to Airdrop with a message with an event of type `EXTRACTION_ATTACHMENTS_DELETE_DONE` when done,
-or `EXTRACTION_ATTACHMENTS_DELETE_ERROR` in case of an error.
+### Response from the snap-in
 
+The snap-in must respond to Airdrop with a message with an event of type`EXTRACTION_ATTACHMENTS_DELETE_DONE`
+when done, or `EXTRACTION_ATTACHMENTS_DELETE_ERROR` in case of an error.
@@ -1,50 +1,61 @@
 
-In the data extraction phase, the extractor is expected to call the external system’s APIs
+In the data extraction phase, the extractor is expected to call the external system's APIs
 to retrieve all the items that were updated since the start of the last extraction.
-If there was no previous extraction (the current run is an initial import), then all the items should be extracted.
+If there was no previous extraction (the current run is an initial import),
+then all the items should be extracted.
 
 The extractor must store at what time it started each extraction in its state,
-so that it can extract only items created and/or updated since this date in the next sync run.
+so that it can extract only items created or updated since this date in the next sync run.
 
 ## Triggering event
 
-Airdrop initiates data extraction by starting the snap-in with a message with event of type  `EXTRACTION_DATA_START`
-when transitioning to the data extraction phase.
+Airdrop initiates data extraction by starting the snap-in with a message with event of type
+`EXTRACTION_DATA_START` when transitioning to the data extraction phase.
 
 During the data extraction phase, the snap-in extracts data from an external system,
 prepares batches of data and uploads them in the form of artifacts to DevRev.
 
 The snap-in must respond to Airdrop with a message with event of type `EXTRACTION_DATA_PROGRESS`,
 together with an optional progress estimate and relevant artifacts
-when it extracts some data and the maximum ADaaS snap-in runtime (12 minutes) has been reached.
+when it extracts some data and the maximum Airdrop snap-in runtime (12 minutes) has been reached.
 
-If the extraction has been rate-limited by the external system and back-off is required, the snap-in must respond to
-Airdrop with a message with event of type `EXTRACTION_DATA_DELAY` and specifying back-off time with `delay` attribute.
+If the extraction has been rate-limited by the external system and back-off is required, the snap-in
+must respond to Airdrop with a message with event of type `EXTRACTION_DATA_DELAY` and specifying
+back-off time with `delay` attribute.
 
 In both cases, Airdrop starts the snap-in with a message with event of type `EXTRACTION_DATA_CONTINUE`.
-The restarting is immediate (in case of `EXTRACTION_DATA_PROGRESS`) or delayed (in case of `EXTRACTION_DATA_DELAY`).
+The restarting is immediate (in case of `EXTRACTION_DATA_PROGRESS`) or delayed
+(in case of `EXTRACTION_DATA_DELAY`).
 
-Once the data extraction is done, the snap-in must respond to Airdrop with a message with event of type  `EXTRACTION_DATA_DONE`.
+Once the data extraction is done, the snap-in must respond to Airdrop with a message with event of
+type  `EXTRACTION_DATA_DONE`.
 
-If data extraction failed in any moment of extraction, the snap-in must respond to Airdrop with a message with event of type  `EXTRACTION_DATA_ERROR`.
+If data extraction failed in any moment of extraction, the snap-in must respond to Airdrop with a
+message with event of type `EXTRACTION_DATA_ERROR`.
 
-## Snap-in response
+## Response from the snap-in
 
-During the data extraction phase, the snap-in uploads batches of extracted items (the recommended batch size is 2000 items) formatted in JSONL
-(JSON Lines format), gzipped, and submitted as an artifact to S3Interact (with tooling from `@devrev/adaas-sdk`).
+During the data extraction phase, the snap-in uploads batches of extracted items (the recommended
+batch size is 2000 items) formatted in JSONL (JSON Lines format), gzipped, and submitted as an
+artifact to S3Interact (with tooling from `@devrev/adaas-sdk`).
 
-Each artifact is submitted with an `item_type`, defining a separate domain object from the external system and matching the `record_type` in the provided metadata.
+Each artifact is submitted with an `item_type`, defining a separate domain object from the
+external system and matching the `record_type` in the provided metadata.
 Item types defined when uploading extracted data must validate the declarations in the metadata file.
 
 Extracted data must be normalized.
 
-- Null values: All fields without a value should either be omitted or set to null. For example, if an external system provides values such as "", -1 for missing values, those must be set to null.
-- Timestamps: Full-precision timestamps should be formatted as RFC3999 (`1972-03-29T22:04:47+01:00`), and dates should be just `2020-12-31`.
+- Null values: All fields without a value should either be omitted or set to null.
+For example, if an external system provides values such as "", -1 for missing values,
+those must be set to null.
+- Timestamps: Full-precision timestamps should be formatted as RFC3999 (`1972-03-29T22:04:47+01:00`),
+and dates should be just `2020-12-31`.
 - References: references must be strings, not numbers or objects.
-- Number fields must be valid JSON numbers (not strings)
-- Multiselect fields must be provided as an array (not CSV)
+- Number fields must be valid JSON numbers (not strings).
+- Multiselect fields must be provided as an array (not CSV).
 
-Each line of the file contains an `id` and the optional `created_date` and `modified_date` fields in the beginning of the record.
+Each line of the file contains an `id` and the optional `created_date` and `modified_date` fields
+in the beginning of the record.
 All other fields are contained within the `data` attribute.
 
 ```json
@@ -68,4 +79,4 @@ Extracted artifacts can be validated with the `chef-cli` using the following com
 
 ```bash
 $ chef-cli validate-metadata -m external_domain_metadata.json -r issue < extractor_issues_2.json
-```
+```
@@ -1,37 +1,41 @@
-In the external sync unit extraction phase, the extractor is expected to obtain a list of external sync units
-that it can extract with the provided credentials and send it to Airdrop in its response.
+In the external sync unit extraction phase, the extractor is expected to obtain a list of external
+sync units that it can extract with the provided credentials and send it to Airdrop in its response.
 
-An _external sync unit_ refers to a single unit in the external system that we are airdropping to DevRev.
-In some systems, this is a project; in some it is a repository; in support systems it could be called a brand or an organization.
+An _external sync unit_ refers to a single unit in the external system that is being airdropped to DevRev.
+In some systems, this is a project; in some it is a repository; in support systems it could be
+called a brand or an organization.
 What a unit of data is called and what it represents depends on the external system's domain model.
 It usually combines contacts, users, work-like items, and comments into a unit of domain objects.
 
 Some external systems may offer a single unit in their free plans,
 while their enterprise plans may offer their clients to operate many separate units.
 
-The external sync unit ID is the identifier of the sync unit (project, repository or similar) in the external system.
+The external sync unit ID is the identifier of the sync unit (project, repository, or similar)
+in the external system.
 For GitHub, this would be the repository, for example `cli` in `github.com/devrev/cli`.
 
 ## Triggering event
 
 External sync unit extraction is executed only during the initial import.
-It extracts external sync units available in the external system,
-so that the end user can choose which external sync unit should be airdropped during the creation of an **Import** in the DevRev App.
+It extracts external sync units available in the external system, so that the end user can choose
+which external sync unit should be airdropped during the creation of an **Import** in the DevRev App.
 
-Airdrop initiates the external sync unit extraction phase by starting the worker with a message with an event of type
-`EXTRACTION_EXTERNAL_SYNC_UNITS_START`.
+Airdrop initiates the external sync unit extraction phase by starting the worker with a message
+with an event of type `EXTRACTION_EXTERNAL_SYNC_UNITS_START`.
 
-The snap-in must respond to Airdrop with a message with an event of type `EXTRACTION_EXTERNAL_SYNC_UNITS_DONE`, which contains
-a list of external sync units as a payload, or `EXTRACTION_EXTERNAL_SYNC_UNITS_ERROR` in case of an error.
+The snap-in must respond to Airdrop with a message with an event of type
+`EXTRACTION_EXTERNAL_SYNC_UNITS_DONE`, which contains a list of external sync units as a payload,
+or `EXTRACTION_EXTERNAL_SYNC_UNITS_ERROR` in case of an error.
 
-## Snap-in response
+## Response from the snap-in
 
-The snap-in provides the list of external sync units in the provided event message `event_data.external_sync_units` containing the following fields:
+The snap-in provides the list of external sync units in the provided event message
+`event_data.external_sync_units` containing the following fields:
 - `id`: The unique identifier in the external system.
 - `name`: The human-readable name in the external system.
 - `description`: The short description if the external system provides it.
 - `item_count`: The number of items (issues, tickets, comments or others) in the external system.
-Item count should be provided if it can be obtained in a very lightweight manner, such as by calling an API endpoint.
+Item count should be provided if it can be obtained in a lightweight manner, such as by calling an API endpoint.
 If there is no such way to get it (for example, if the items would need to be extracted to count them),
 then the item count should be `-1` to avoid blocking the import with long-running queries.