Update sdk docs for page splitting defaulting to true (#77)

awalker4 · web-flow · commit d20c8c1712b8 · 2024-06-17T19:26:30.000-04:00
diff --git a/api-reference/api-services/sdk.mdx b/api-reference/api-services/sdk.mdx
@@ -220,20 +220,25 @@ deployment of Unstructured API, you can access the API using the Python or TypeS
 
 ## Page Splitting
 
-    In order to speed up processing of long PDF files, set the `splitPdfPage`[*](#parameter-names) parameter to `true`. This will
-    cause the PDF to be split into small batches of pages by the client before sending requests to the API. The client
-    awaits all parallel requests and combines the responses into a single response object. This will
-    work only for PDF files, so don't set it for other types of files.
+    In order to speed up processing of large PDF files, the `splitPdfPage`[*](#parameter-names) parameter is `true` by default. This
+    causes the PDF to be split into small batches of pages before sending requests to the API. The client
+    awaits all parallel requests and combines the responses into a single response object. This is specific to PDF files and other
+    filetypes are ignored.
 
     The number of parallel requests is controlled by `splitPdfConcurrencyLevel`[*](#parameter-names). 
     The default is 5 and the max is set to 15 to avoid high resource usage and costs.
 
     If at least one request is successful, the responses are combined into a single response object. An
     error is returned only if all requests failed or there was an error during splitting.
 
-    When using page splitting, note that chunking will not always work as expected since chunking will happen on the
-    API side. When chunking elements the whole document context is processed but when we use splitting we only have a part
-    of the context. If you need to chunk, you can make a second request to the API with the returned elements.
+    <Note>
+    This feature may lead to unexpected results when chunking because the server does not see the entire
+    document context at once. If you'd like to chunk across the whole document and still get the speedup from 
+    parallel processing, you can:
+    * Partition the pdf with `splitPdfPage` set to true, without any chunking parameters
+    * Store the returned elements in `results.json`
+    * Partition this json file with the desired chunking parameters
+    </Note>
 
     <CodeGroup>
     ```python Python
@@ -243,7 +248,7 @@ deployment of Unstructured API, you can access the API using the Python or TypeS
                 content=file.read(),
                 file_name=filename,
             ),
-            split_pdf_page=True,  # Set `split_pdf_page` parameter to `True` to enable splitting the PDF file
+            split_pdf_page=True,  # Set to `False` to disable PDF splitting
             split_pdf_concurrency_level=10,  # Modify split_pdf_concurrency_level to set the number of parallel requests
         )
     )
@@ -256,7 +261,7 @@ deployment of Unstructured API, you can access the API using the Python or TypeS
                 content: data,
                 fileName: filename,
             },
-            // Set `splitPdfPage` parameter to `true` to enable splitting the PDF file
+            // Set to `false` to disable PDF splitting
             splitPdfPage: true,
             // Modify splitPdfConcurrencyLevel to set the number of parallel requests
             splitPdfConcurrencyLevel: 10,