Skip to content

Commit 5ee0634

Browse files
author
awstools
committed
feat(client-textract): This release introduces additional support for 30+ normalized fields such as vendor address and currency. It also includes OCR output in the response and accuracy improvements for the already supported fields in previous version
1 parent 7cabc1e commit 5ee0634

File tree

9 files changed

+738
-442
lines changed

9 files changed

+738
-442
lines changed

clients/client-textract/src/Textract.ts

Lines changed: 80 additions & 51 deletions
Original file line numberDiff line numberDiff line change
@@ -56,7 +56,7 @@ import { TextractClient } from "./TextractClient";
5656
*/
5757
export class Textract extends TextractClient {
5858
/**
59-
* <p>Analyzes an input document for relationships between detected items. </p>
59+
* <p>Analyzes an input document for relationships between detected items. </p>
6060
* <p>The types of information returned are as follows: </p>
6161
* <ul>
6262
* <li>
@@ -67,31 +67,39 @@ export class Textract extends TextractClient {
6767
* the value.</p>
6868
* </li>
6969
* <li>
70-
* <p>Table and table cell data. A TABLE <code>Block</code> object contains information about a detected table. A CELL
71-
* <code>Block</code> object is returned for each cell in a table.</p>
70+
* <p>Table and table cell data. A TABLE <code>Block</code> object contains information
71+
* about a detected table. A CELL <code>Block</code> object is returned for each cell in
72+
* a table.</p>
7273
* </li>
7374
* <li>
74-
* <p>Lines and words of text. A LINE <code>Block</code> object contains one or more WORD <code>Block</code> objects.
75-
* All lines and words that are detected in the document are returned (including text that doesn't have a
76-
* relationship with the value of <code>FeatureTypes</code>). </p>
75+
* <p>Lines and words of text. A LINE <code>Block</code> object contains one or more
76+
* WORD <code>Block</code> objects. All lines and words that are detected in the
77+
* document are returned (including text that doesn't have a relationship with the value
78+
* of <code>FeatureTypes</code>). </p>
7779
* </li>
7880
* <li>
79-
* <p>Queries.A QUERIES_RESULT Block object contains the answer to the query, the alias associated and an ID that
80-
* connect it to the query asked. This Block also contains a location and attached confidence score.</p>
81+
* <p>Query. A QUERY Block object contains the query text, alias and link to the
82+
* associated Query results block object.</p>
83+
* </li>
84+
* <li>
85+
* <p>Query Result. A QUERY_RESULT Block object contains the answer to the query and an
86+
* ID that connects it to the query asked. This Block also contains a confidence
87+
* score.</p>
8188
* </li>
8289
* </ul>
8390
*
84-
* <p>Selection elements such as check boxes and option buttons (radio buttons) can be detected in form data and in tables.
85-
* A SELECTION_ELEMENT <code>Block</code> object contains information about a selection element,
86-
* including the selection status.</p>
91+
* <p>Selection elements such as check boxes and option buttons (radio buttons) can be
92+
* detected in form data and in tables. A SELECTION_ELEMENT <code>Block</code> object contains
93+
* information about a selection element, including the selection status.</p>
8794
*
88-
* <p>You can choose which type of analysis to perform by specifying the <code>FeatureTypes</code> list.
89-
* </p>
95+
* <p>You can choose which type of analysis to perform by specifying the
96+
* <code>FeatureTypes</code> list. </p>
9097
* <p>The output is returned in a list of <code>Block</code> objects.</p>
9198
* <p>
9299
* <code>AnalyzeDocument</code> is a synchronous operation. To analyze documents
93-
* asynchronously, use <a>StartDocumentAnalysis</a>.</p>
94-
* <p>For more information, see <a href="https://docs.aws.amazon.com/textract/latest/dg/how-it-works-analyzing.html">Document Text Analysis</a>.</p>
100+
* asynchronously, use <a>StartDocumentAnalysis</a>.</p>
101+
* <p>For more information, see <a href="https://docs.aws.amazon.com/textract/latest/dg/how-it-works-analyzing.html">Document Text
102+
* Analysis</a>.</p>
95103
*/
96104
public analyzeDocument(
97105
args: AnalyzeDocumentCommandInput,
@@ -124,18 +132,21 @@ export class Textract extends TextractClient {
124132

125133
/**
126134
* <p>
127-
* <code>AnalyzeExpense</code> synchronously analyzes an input document for financially related relationships between text.</p>
128-
* <p>Information is returned as <code>ExpenseDocuments</code> and seperated as follows.</p>
135+
* <code>AnalyzeExpense</code> synchronously analyzes an input document for financially
136+
* related relationships between text.</p>
137+
* <p>Information is returned as <code>ExpenseDocuments</code> and seperated as
138+
* follows:</p>
129139
* <ul>
130140
* <li>
131141
* <p>
132142
* <code>LineItemGroups</code>- A data set containing <code>LineItems</code> which
133-
* store information about the lines of text, such as an item purchased and its price on a receipt.</p>
143+
* store information about the lines of text, such as an item purchased and its price on
144+
* a receipt.</p>
134145
* </li>
135146
* <li>
136147
* <p>
137-
* <code>SummaryFields</code>- Contains all other information a receipt, such as header information
138-
* or the vendors name.</p>
148+
* <code>SummaryFields</code>- Contains all other information a receipt, such as
149+
* header information or the vendors name.</p>
139150
* </li>
140151
* </ul>
141152
*/
@@ -169,10 +180,10 @@ export class Textract extends TextractClient {
169180
}
170181

171182
/**
172-
* <p>Analyzes identity documents for relevant information. This information is extracted
173-
* and returned as <code>IdentityDocumentFields</code>, which records both the normalized
174-
* field and value of the extracted text.Unlike other Amazon Textract operations, <code>AnalyzeID</code>
175-
* doesn't return any Geometry data.</p>
183+
* <p>Analyzes identity documents for relevant information. This information is extracted and
184+
* returned as <code>IdentityDocumentFields</code>, which records both the normalized field
185+
* and value of the extracted text.Unlike other Amazon Textract operations,
186+
* <code>AnalyzeID</code> doesn't return any Geometry data.</p>
176187
*/
177188
public analyzeID(args: AnalyzeIDCommandInput, options?: __HttpHandlerOptions): Promise<AnalyzeIDCommandOutput>;
178189
public analyzeID(args: AnalyzeIDCommandInput, cb: (err: any, data?: AnalyzeIDCommandOutput) => void): void;
@@ -199,8 +210,9 @@ export class Textract extends TextractClient {
199210

200211
/**
201212
* <p>Detects text in the input document. Amazon Textract can detect lines of text and the
202-
* words that make up a line of text. The input document must be an image in JPEG, PNG, PDF, or TIFF
203-
* format. <code>DetectDocumentText</code> returns the detected text in an array of <a>Block</a> objects. </p>
213+
* words that make up a line of text. The input document must be in one of the following image
214+
* formats: JPEG, PNG, PDF, or TIFF. <code>DetectDocumentText</code> returns the detected
215+
* text in an array of <a>Block</a> objects. </p>
204216
* <p>Each document page has as an associated <code>Block</code> of type PAGE. Each PAGE <code>Block</code> object
205217
* is the parent of LINE <code>Block</code> objects that represent the lines of detected text on a page. A LINE <code>Block</code> object is
206218
* a parent for each word that makes up the line. Words are represented by <code>Block</code> objects of type WORD.</p>
@@ -240,44 +252,60 @@ export class Textract extends TextractClient {
240252
}
241253

242254
/**
243-
* <p>Gets the results for an Amazon Textract asynchronous operation that analyzes text in a document.</p>
244-
* <p>You start asynchronous text analysis by calling <a>StartDocumentAnalysis</a>, which returns a job identifier
245-
* (<code>JobId</code>). When the text analysis operation finishes, Amazon Textract publishes a
246-
* completion status to the Amazon Simple Notification Service (Amazon SNS) topic that's registered in the initial call to
247-
* <code>StartDocumentAnalysis</code>. To get the results of the text-detection operation,
248-
* first check that the status value published to the Amazon SNS topic is <code>SUCCEEDED</code>.
249-
* If so, call <code>GetDocumentAnalysis</code>, and pass the job identifier
250-
* (<code>JobId</code>) from the initial call to <code>StartDocumentAnalysis</code>.</p>
255+
* <p>Gets the results for an Amazon Textract asynchronous operation that analyzes text in a
256+
* document.</p>
257+
* <p>You start asynchronous text analysis by calling <a>StartDocumentAnalysis</a>,
258+
* which returns a job identifier (<code>JobId</code>). When the text analysis operation
259+
* finishes, Amazon Textract publishes a completion status to the Amazon Simple Notification Service (Amazon SNS) topic
260+
* that's registered in the initial call to <code>StartDocumentAnalysis</code>. To get the
261+
* results of the text-detection operation, first check that the status value published to the
262+
* Amazon SNS topic is <code>SUCCEEDED</code>. If so, call <code>GetDocumentAnalysis</code>, and
263+
* pass the job identifier (<code>JobId</code>) from the initial call to
264+
* <code>StartDocumentAnalysis</code>.</p>
251265
* <p>
252-
* <code>GetDocumentAnalysis</code> returns an array of <a>Block</a> objects. The following
253-
* types of information are returned: </p>
266+
* <code>GetDocumentAnalysis</code> returns an array of <a>Block</a> objects.
267+
* The following types of information are returned: </p>
254268
* <ul>
255269
* <li>
256270
* <p>Form data (key-value pairs). The related information is returned in two <a>Block</a> objects, each of type <code>KEY_VALUE_SET</code>: a KEY
257-
* <code>Block</code> object and a VALUE <code>Block</code> object. For example,
258-
* <i>Name: Ana Silva Carolina</i> contains a key and value.
259-
* <i>Name:</i> is the key. <i>Ana Silva Carolina</i> is
260-
* the value.</p>
271+
* <code>Block</code> object and a VALUE <code>Block</code> object. For example,
272+
* <i>Name: Ana Silva Carolina</i> contains a key and value.
273+
* <i>Name:</i> is the key. <i>Ana Silva Carolina</i> is
274+
* the value.</p>
261275
* </li>
262276
* <li>
263-
* <p>Table and table cell data. A TABLE <code>Block</code> object contains information about a detected table. A CELL
264-
* <code>Block</code> object is returned for each cell in a table.</p>
277+
* <p>Table and table cell data. A TABLE <code>Block</code> object contains information
278+
* about a detected table. A CELL <code>Block</code> object is returned for each cell in
279+
* a table.</p>
265280
* </li>
266281
* <li>
267-
* <p>Lines and words of text. A LINE <code>Block</code> object contains one or more WORD <code>Block</code> objects.
268-
* All lines and words that are detected in the document are returned (including text that doesn't have a
269-
* relationship with the value of the <code>StartDocumentAnalysis</code>
282+
* <p>Lines and words of text. A LINE <code>Block</code> object contains one or more
283+
* WORD <code>Block</code> objects. All lines and words that are detected in the
284+
* document are returned (including text that doesn't have a relationship with the value
285+
* of the <code>StartDocumentAnalysis</code>
270286
* <code>FeatureTypes</code> input parameter). </p>
271287
* </li>
272288
* <li>
273-
* <p>Queries. A QUERIES_RESULT Block object contains the answer to the query, the alias associated and an ID that
274-
* connect it to the query asked. This Block also contains a location and attached confidence score</p>
289+
* <p>Query. A QUERY Block object contains the query text, alias and link to the
290+
* associated Query results block object.</p>
291+
* </li>
292+
* <li>
293+
* <p>Query Results. A QUERY_RESULT Block object contains the answer to the query and an
294+
* ID that connects it to the query asked. This Block also contains a confidence
295+
* score.</p>
275296
* </li>
276297
* </ul>
277298
*
278-
* <p>Selection elements such as check boxes and option buttons (radio buttons) can be detected in form data and in tables.
279-
* A SELECTION_ELEMENT <code>Block</code> object contains information about a selection element,
280-
* including the selection status.</p>
299+
* <note>
300+
* <p>While processing a document with queries, look out for
301+
* <code>INVALID_REQUEST_PARAMETERS</code> output. This indicates that either the per
302+
* page query limit has been exceeded or that the operation is trying to query a page in
303+
* the document which doesn’t exist. </p>
304+
* </note>
305+
*
306+
* <p>Selection elements such as check boxes and option buttons (radio buttons) can be
307+
* detected in form data and in tables. A SELECTION_ELEMENT <code>Block</code> object contains
308+
* information about a selection element, including the selection status.</p>
281309
*
282310
*
283311
* <p>Use the <code>MaxResults</code> parameter to limit the number of blocks that are
@@ -287,7 +315,8 @@ export class Textract extends TextractClient {
287315
* <code>GetDocumentAnalysis</code>, and populate the <code>NextToken</code> request
288316
* parameter with the token value that's returned from the previous call to
289317
* <code>GetDocumentAnalysis</code>.</p>
290-
* <p>For more information, see <a href="https://docs.aws.amazon.com/textract/latest/dg/how-it-works-analyzing.html">Document Text Analysis</a>.</p>
318+
* <p>For more information, see <a href="https://docs.aws.amazon.com/textract/latest/dg/how-it-works-analyzing.html">Document Text
319+
* Analysis</a>.</p>
291320
*/
292321
public getDocumentAnalysis(
293322
args: GetDocumentAnalysisCommandInput,

clients/client-textract/src/commands/AnalyzeDocumentCommand.ts

Lines changed: 23 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ export interface AnalyzeDocumentCommandInput extends AnalyzeDocumentRequest {}
2929
export interface AnalyzeDocumentCommandOutput extends AnalyzeDocumentResponse, __MetadataBearer {}
3030

3131
/**
32-
* <p>Analyzes an input document for relationships between detected items. </p>
32+
* <p>Analyzes an input document for relationships between detected items. </p>
3333
* <p>The types of information returned are as follows: </p>
3434
* <ul>
3535
* <li>
@@ -40,31 +40,39 @@ export interface AnalyzeDocumentCommandOutput extends AnalyzeDocumentResponse, _
4040
* the value.</p>
4141
* </li>
4242
* <li>
43-
* <p>Table and table cell data. A TABLE <code>Block</code> object contains information about a detected table. A CELL
44-
* <code>Block</code> object is returned for each cell in a table.</p>
43+
* <p>Table and table cell data. A TABLE <code>Block</code> object contains information
44+
* about a detected table. A CELL <code>Block</code> object is returned for each cell in
45+
* a table.</p>
4546
* </li>
4647
* <li>
47-
* <p>Lines and words of text. A LINE <code>Block</code> object contains one or more WORD <code>Block</code> objects.
48-
* All lines and words that are detected in the document are returned (including text that doesn't have a
49-
* relationship with the value of <code>FeatureTypes</code>). </p>
48+
* <p>Lines and words of text. A LINE <code>Block</code> object contains one or more
49+
* WORD <code>Block</code> objects. All lines and words that are detected in the
50+
* document are returned (including text that doesn't have a relationship with the value
51+
* of <code>FeatureTypes</code>). </p>
5052
* </li>
5153
* <li>
52-
* <p>Queries.A QUERIES_RESULT Block object contains the answer to the query, the alias associated and an ID that
53-
* connect it to the query asked. This Block also contains a location and attached confidence score.</p>
54+
* <p>Query. A QUERY Block object contains the query text, alias and link to the
55+
* associated Query results block object.</p>
56+
* </li>
57+
* <li>
58+
* <p>Query Result. A QUERY_RESULT Block object contains the answer to the query and an
59+
* ID that connects it to the query asked. This Block also contains a confidence
60+
* score.</p>
5461
* </li>
5562
* </ul>
5663
*
57-
* <p>Selection elements such as check boxes and option buttons (radio buttons) can be detected in form data and in tables.
58-
* A SELECTION_ELEMENT <code>Block</code> object contains information about a selection element,
59-
* including the selection status.</p>
64+
* <p>Selection elements such as check boxes and option buttons (radio buttons) can be
65+
* detected in form data and in tables. A SELECTION_ELEMENT <code>Block</code> object contains
66+
* information about a selection element, including the selection status.</p>
6067
*
61-
* <p>You can choose which type of analysis to perform by specifying the <code>FeatureTypes</code> list.
62-
* </p>
68+
* <p>You can choose which type of analysis to perform by specifying the
69+
* <code>FeatureTypes</code> list. </p>
6370
* <p>The output is returned in a list of <code>Block</code> objects.</p>
6471
* <p>
6572
* <code>AnalyzeDocument</code> is a synchronous operation. To analyze documents
66-
* asynchronously, use <a>StartDocumentAnalysis</a>.</p>
67-
* <p>For more information, see <a href="https://docs.aws.amazon.com/textract/latest/dg/how-it-works-analyzing.html">Document Text Analysis</a>.</p>
73+
* asynchronously, use <a>StartDocumentAnalysis</a>.</p>
74+
* <p>For more information, see <a href="https://docs.aws.amazon.com/textract/latest/dg/how-it-works-analyzing.html">Document Text
75+
* Analysis</a>.</p>
6876
* @example
6977
* Use a bare-bones client and the command you need to make an API call.
7078
* ```javascript

0 commit comments

Comments
 (0)