Skip to content

Commit 5aa2cfc

Browse files
author
AWS
committed
AWS Glue Update: You can now choose to crawl the entire table or just a sample of records in DynamoDB when using AWS Glue crawlers. Additionally, you can also specify a scanning rate for crawling DynamoDB tables.
1 parent 9f439e2 commit 5aa2cfc

File tree

2 files changed

+28
-15
lines changed

2 files changed

+28
-15
lines changed
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
{
2+
"type": "feature",
3+
"category": "AWS Glue",
4+
"description": "You can now choose to crawl the entire table or just a sample of records in DynamoDB when using AWS Glue crawlers. Additionally, you can also specify a scanning rate for crawling DynamoDB tables."
5+
}

services/glue/src/main/resources/codegen-resources/service-2.json

Lines changed: 23 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -3185,7 +3185,7 @@
31853185
},
31863186
"Configuration":{
31873187
"shape":"CrawlerConfiguration",
3188-
"documentation":"<p>Crawler configuration information. This versioned JSON string allows users to specify aspects of a crawler's behavior. For more information, see <a href=\"http://docs.aws.amazon.com/glue/latest/dg/crawler-configuration.html\">Configuring a Crawler</a>.</p>"
3188+
"documentation":"<p>Crawler configuration information. This versioned JSON string allows users to specify aspects of a crawler's behavior. For more information, see <a href=\"https://docs.aws.amazon.com/glue/latest/dg/crawler-configuration.html\">Configuring a Crawler</a>.</p>"
31893189
},
31903190
"CrawlerSecurityConfiguration":{
31913191
"shape":"CrawlerSecurityConfiguration",
@@ -3400,7 +3400,7 @@
34003400
},
34013401
"Schedule":{
34023402
"shape":"CronExpression",
3403-
"documentation":"<p>A <code>cron</code> expression used to specify the schedule. For more information, see <a href=\"http://docs.aws.amazon.com/glue/latest/dg/monitor-data-warehouse-schedule.html\">Time-Based Schedules for Jobs and Crawlers</a>. For example, to run something every day at 12:15 UTC, specify <code>cron(15 12 * * ? *)</code>.</p>"
3403+
"documentation":"<p>A <code>cron</code> expression used to specify the schedule (see <a href=\"https://docs.aws.amazon.com/glue/latest/dg/monitor-data-warehouse-schedule.html\">Time-Based Schedules for Jobs and Crawlers</a>. For example, to run something every day at 12:15 UTC, you would specify: <code>cron(15 12 * * ? *)</code>.</p>"
34043404
},
34053405
"Classifiers":{
34063406
"shape":"ClassifierNameList",
@@ -3416,15 +3416,15 @@
34163416
},
34173417
"Configuration":{
34183418
"shape":"CrawlerConfiguration",
3419-
"documentation":"<p>The crawler configuration information. This versioned JSON string allows users to specify aspects of a crawler's behavior. For more information, see <a href=\"http://docs.aws.amazon.com/glue/latest/dg/crawler-configuration.html\">Configuring a Crawler</a>.</p>"
3419+
"documentation":"<p>Crawler configuration information. This versioned JSON string allows users to specify aspects of a crawler's behavior. For more information, see <a href=\"https://docs.aws.amazon.com/glue/latest/dg/crawler-configuration.html\">Configuring a Crawler</a>.</p>"
34203420
},
34213421
"CrawlerSecurityConfiguration":{
34223422
"shape":"CrawlerSecurityConfiguration",
34233423
"documentation":"<p>The name of the <code>SecurityConfiguration</code> structure to be used by this crawler.</p>"
34243424
},
34253425
"Tags":{
34263426
"shape":"TagsMap",
3427-
"documentation":"<p>The tags to use with this crawler request. You can use tags to limit access to the crawler. For more information, see <a href=\"http://docs.aws.amazon.com/glue/latest/dg/monitor-tags.html\">AWS Tags in AWS Glue</a>.</p>"
3427+
"documentation":"<p>The tags to use with this crawler request. You may use tags to limit access to the crawler. For more information about tags in AWS Glue, see <a href=\"https://docs.aws.amazon.com/glue/latest/dg/monitor-tags.html\">AWS Tags in AWS Glue</a> in the developer guide.</p>"
34283428
}
34293429
}
34303430
},
@@ -3774,7 +3774,7 @@
37743774
},
37753775
"JsonPath":{
37763776
"shape":"JsonPath",
3777-
"documentation":"<p>A <code>JsonPath</code> string defining the JSON data for the classifier to classify. AWS Glue supports a subset of <code>JsonPath</code>, as described in <a href=\"https://docs.aws.amazon.com/glue/latest/dg/custom-classifier.html#custom-classifier-json\">Writing JsonPath Custom Classifiers</a>.</p>"
3777+
"documentation":"<p>A <code>JsonPath</code> string defining the JSON data for the classifier to classify. AWS Glue supports a subset of JsonPath, as described in <a href=\"https://docs.aws.amazon.com/glue/latest/dg/custom-classifier.html#custom-classifier-json\">Writing JsonPath Custom Classifiers</a>.</p>"
37783778
}
37793779
},
37803780
"documentation":"<p>Specifies a JSON classifier for <code>CreateClassifier</code> to create.</p>"
@@ -4744,6 +4744,14 @@
47444744
"Path":{
47454745
"shape":"Path",
47464746
"documentation":"<p>The name of the DynamoDB table to crawl.</p>"
4747+
},
4748+
"scanAll":{
4749+
"shape":"NullableBoolean",
4750+
"documentation":"<p>Indicates whether to scan all the records, or to sample rows from the table. Scanning all the records can take a long time when the table is not a high throughput table.</p> <p>A value of <code>true</code> means to scan all records, while a value of <code>false</code> means to sample the records. If no value is specified, the value defaults to <code>true</code>.</p>"
4751+
},
4752+
"scanRate":{
4753+
"shape":"NullableDouble",
4754+
"documentation":"<p>The percentage of the configured read capacity units to use by the AWS Glue crawler. Read capacity units is a term defined by DynamoDB, and is a numeric value that acts as rate limiter for the number of reads that can be performed on that table per second.</p> <p>The valid values are null or a value between 0.1 to 1.5. A null value is used when user does not provide a value, and defaults to 0.5 of the configured Read Capacity Unit (for provisioned tables), or 0.25 of the max configured Read Capacity Unit (for tables using on-demand mode).</p>"
47474755
}
47484756
},
47494757
"documentation":"<p>Specifies an Amazon DynamoDB table to crawl.</p>"
@@ -6387,11 +6395,11 @@
63876395
},
63886396
"GrokPattern":{
63896397
"shape":"GrokPattern",
6390-
"documentation":"<p>The grok pattern applied to a data store by this classifier. For more information, see built-in patterns in <a href=\"http://docs.aws.amazon.com/glue/latest/dg/custom-classifier.html\">Writing Custom Classifiers</a>.</p>"
6398+
"documentation":"<p>The grok pattern applied to a data store by this classifier. For more information, see built-in patterns in <a href=\"https://docs.aws.amazon.com/glue/latest/dg/custom-classifier.html\">Writing Custom Classifiers</a>.</p>"
63916399
},
63926400
"CustomPatterns":{
63936401
"shape":"CustomPatterns",
6394-
"documentation":"<p>Optional custom grok patterns defined by this classifier. For more information, see custom patterns in <a href=\"http://docs.aws.amazon.com/glue/latest/dg/custom-classifier.html\">Writing Custom Classifiers</a>.</p>"
6402+
"documentation":"<p>Optional custom grok patterns defined by this classifier. For more information, see custom patterns in <a href=\"https://docs.aws.amazon.com/glue/latest/dg/custom-classifier.html\">Writing Custom Classifiers</a>.</p>"
63956403
}
63966404
},
63976405
"documentation":"<p>A classifier that uses <code>grok</code> patterns.</p>"
@@ -6507,7 +6515,7 @@
65076515
},
65086516
"Exclusions":{
65096517
"shape":"PathList",
6510-
"documentation":"<p>A list of glob patterns used to exclude from the crawl. For more information, see <a href=\"http://docs.aws.amazon.com/glue/latest/dg/add-crawler.html\">Catalog Tables with a Crawler</a>.</p>"
6518+
"documentation":"<p>A list of glob patterns used to exclude from the crawl. For more information, see <a href=\"https://docs.aws.amazon.com/glue/latest/dg/add-crawler.html\">Catalog Tables with a Crawler</a>.</p>"
65116519
}
65126520
},
65136521
"documentation":"<p>Specifies a JDBC data store to crawl.</p>"
@@ -6909,7 +6917,7 @@
69096917
},
69106918
"JsonPath":{
69116919
"shape":"JsonPath",
6912-
"documentation":"<p>A <code>JsonPath</code> string defining the JSON data for the classifier to classify. AWS Glue supports a subset of <code>JsonPath</code>, as described in <a href=\"https://docs.aws.amazon.com/glue/latest/dg/custom-classifier.html#custom-classifier-json\">Writing JsonPath Custom Classifiers</a>.</p>"
6920+
"documentation":"<p>A <code>JsonPath</code> string defining the JSON data for the classifier to classify. AWS Glue supports a subset of JsonPath, as described in <a href=\"https://docs.aws.amazon.com/glue/latest/dg/custom-classifier.html#custom-classifier-json\">Writing JsonPath Custom Classifiers</a>.</p>"
69136921
}
69146922
},
69156923
"documentation":"<p>A classifier for <code>JSON</code> content.</p>"
@@ -7943,7 +7951,7 @@
79437951
},
79447952
"Exclusions":{
79457953
"shape":"PathList",
7946-
"documentation":"<p>A list of glob patterns used to exclude from the crawl. For more information, see <a href=\"http://docs.aws.amazon.com/glue/latest/dg/add-crawler.html\">Catalog Tables with a Crawler</a>.</p>"
7954+
"documentation":"<p>A list of glob patterns used to exclude from the crawl. For more information, see <a href=\"https://docs.aws.amazon.com/glue/latest/dg/add-crawler.html\">Catalog Tables with a Crawler</a>.</p>"
79477955
}
79487956
},
79497957
"documentation":"<p>Specifies a data store in Amazon Simple Storage Service (Amazon S3).</p>"
@@ -7958,7 +7966,7 @@
79587966
"members":{
79597967
"ScheduleExpression":{
79607968
"shape":"CronExpression",
7961-
"documentation":"<p>A <code>cron</code> expression used to specify the schedule. For more information, see <a href=\"http://docs.aws.amazon.com/glue/latest/dg/monitor-data-warehouse-schedule.html\">Time-Based Schedules for Jobs and Crawlers</a>. For example, to run something every day at 12:15 UTC, specify <code>cron(15 12 * * ? *)</code>.</p>"
7969+
"documentation":"<p>A <code>cron</code> expression used to specify the schedule (see <a href=\"https://docs.aws.amazon.com/glue/latest/dg/monitor-data-warehouse-schedule.html\">Time-Based Schedules for Jobs and Crawlers</a>. For example, to run something every day at 12:15 UTC, you would specify: <code>cron(15 12 * * ? *)</code>.</p>"
79627970
},
79637971
"State":{
79647972
"shape":"ScheduleState",
@@ -9287,7 +9295,7 @@
92879295
},
92889296
"Schedule":{
92899297
"shape":"CronExpression",
9290-
"documentation":"<p>A <code>cron</code> expression used to specify the schedule. For more information, see <a href=\"http://docs.aws.amazon.com/glue/latest/dg/monitor-data-warehouse-schedule.html\">Time-Based Schedules for Jobs and Crawlers</a>. For example, to run something every day at 12:15 UTC, specify <code>cron(15 12 * * ? *)</code>.</p>"
9298+
"documentation":"<p>A <code>cron</code> expression used to specify the schedule (see <a href=\"https://docs.aws.amazon.com/glue/latest/dg/monitor-data-warehouse-schedule.html\">Time-Based Schedules for Jobs and Crawlers</a>. For example, to run something every day at 12:15 UTC, you would specify: <code>cron(15 12 * * ? *)</code>.</p>"
92919299
},
92929300
"Classifiers":{
92939301
"shape":"ClassifierNameList",
@@ -9303,7 +9311,7 @@
93039311
},
93049312
"Configuration":{
93059313
"shape":"CrawlerConfiguration",
9306-
"documentation":"<p>The crawler configuration information. This versioned JSON string allows users to specify aspects of a crawler's behavior. For more information, see <a href=\"http://docs.aws.amazon.com/glue/latest/dg/crawler-configuration.html\">Configuring a Crawler</a>.</p>"
9314+
"documentation":"<p>Crawler configuration information. This versioned JSON string allows users to specify aspects of a crawler's behavior. For more information, see <a href=\"https://docs.aws.amazon.com/glue/latest/dg/crawler-configuration.html\">Configuring a Crawler</a>.</p>"
93079315
},
93089316
"CrawlerSecurityConfiguration":{
93099317
"shape":"CrawlerSecurityConfiguration",
@@ -9326,7 +9334,7 @@
93269334
},
93279335
"Schedule":{
93289336
"shape":"CronExpression",
9329-
"documentation":"<p>The updated <code>cron</code> expression used to specify the schedule. For more information, see <a href=\"http://docs.aws.amazon.com/glue/latest/dg/monitor-data-warehouse-schedule.html\">Time-Based Schedules for Jobs and Crawlers</a>. For example, to run something every day at 12:15 UTC, specify <code>cron(15 12 * * ? *)</code>.</p>"
9337+
"documentation":"<p>The updated <code>cron</code> expression used to specify the schedule (see <a href=\"https://docs.aws.amazon.com/glue/latest/dg/monitor-data-warehouse-schedule.html\">Time-Based Schedules for Jobs and Crawlers</a>. For example, to run something every day at 12:15 UTC, you would specify: <code>cron(15 12 * * ? *)</code>.</p>"
93309338
}
93319339
}
93329340
},
@@ -9498,7 +9506,7 @@
94989506
},
94999507
"JsonPath":{
95009508
"shape":"JsonPath",
9501-
"documentation":"<p>A <code>JsonPath</code> string defining the JSON data for the classifier to classify. AWS Glue supports a subset of <code>JsonPath</code>, as described in <a href=\"https://docs.aws.amazon.com/glue/latest/dg/custom-classifier.html#custom-classifier-json\">Writing JsonPath Custom Classifiers</a>.</p>"
9509+
"documentation":"<p>A <code>JsonPath</code> string defining the JSON data for the classifier to classify. AWS Glue supports a subset of JsonPath, as described in <a href=\"https://docs.aws.amazon.com/glue/latest/dg/custom-classifier.html#custom-classifier-json\">Writing JsonPath Custom Classifiers</a>.</p>"
95029510
}
95039511
},
95049512
"documentation":"<p>Specifies a JSON classifier to be updated.</p>"

0 commit comments

Comments
 (0)