Skip to main content

ExtractDocumentRawText

Description

Extracts the text from a Document and writes it to the FlowFile content. This does not include any text found in any Processing Elements.

Tags

datavolo, document, rag, retrieval augmented generation, text, unstructured

Properties

In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.

Display NameAPI NameDefault ValueAllowable ValuesDescription

Dynamic Properties

This component does not support dynamic properties.

Relationships

NameDescription
failureIf the text of a FlowFile cannot be extracted for any reason, the input FlowFile will be routed to this relationship.
successThe text of the PDF is routed to the success relationship.

Reads Attributes

This processor does not read attributes.

Writes Attributes

This processor does not write attributes.

State Management

This component does not store state.

Restricted

This component is not restricted.

Input Requirement

This component requires an incoming relationship.

System Resource Considerations

This component does not specify system resource considerations.

See Also

ChunkDocument, ParsePdfDocument