Skip to content

Adds HtmlStripProcessor and UriPartsProcessor #2835

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Aug 30, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
58 changes: 58 additions & 0 deletions specification/ingest/_types/Processors.ts
Original file line number Diff line number Diff line change
Expand Up @@ -130,6 +130,12 @@ export class ProcessorContainer {
* @doc_id gsub-processor
*/
gsub?: GsubProcessor
/**
* Removes HTML tags from the field.
* If the field is an array of strings, HTML tags will be removed from all members of the array.
* @doc_id htmlstrip-processor
*/
html_strip?: HtmlStripProcessor
/**
* Uses a pre-trained data frame analytics model or a model deployed for natural language processing tasks to infer against the data that is being ingested in the pipeline.
* @doc_id inference-processor
Expand Down Expand Up @@ -230,6 +236,12 @@ export class ProcessorContainer {
* @doc_id urldecode-processor
*/
urldecode?: UrlDecodeProcessor
/**
* Parses a Uniform Resource Identifier (URI) string and extracts its components as an object.
* This URI object includes properties for the URI’s domain, path, fragment, port, query, scheme, user info, username, and password.
* @doc_id uri-parts-processor
*/
uri_parts?: UriPartsProcessor
/**
* The `user_agent` processor extracts details from the user agent string a browser sends with its web requests.
* This processor adds this information by default under the `user_agent` field.
Expand Down Expand Up @@ -722,6 +734,24 @@ export class GsubProcessor extends ProcessorBase {
target_field?: Field
}

export class HtmlStripProcessor extends ProcessorBase {
/**
* The string-valued field to remove HTML tags from.
*/
field: Field
/**
* If `true` and `field` does not exist or is `null`, the processor quietly exits without modifying the document,
* @server_default false
*/
ignore_missing?: boolean
/**
* The field to assign the converted value to
* By default, the `field` is updated in-place.
* @server_default field
*/
target_field?: Field
}

export class InferenceProcessor extends ProcessorBase {
/**
* The ID or alias for the trained model, or the ID of the deployment.
Expand Down Expand Up @@ -1174,3 +1204,31 @@ export class UrlDecodeProcessor extends ProcessorBase {
*/
target_field?: Field
}

export class UriPartsProcessor extends ProcessorBase {
/**
* Field containing the URI string.
*/
field: Field
/**
* If `true` and `field` does not exist, the processor quietly exits without modifying the document.
* @server_default false
*/
ignore_missing?: boolean
/**
* If `true`, the processor copies the unparsed URI to `<target_field>.original`.
* @server_default true
*/
keep_original?: boolean
/**
* If `true`, the processor removes the `field` after parsing the URI string.
* If parsing fails, the processor does not remove the `field`.
* @server_default false
*/
remove_if_successful?: boolean
/**
* Output field for the URI object.
* @server_default url
*/
target_field?: Field
}
Loading