All Processors
+
The processors listed below are the combination of exclusive Datavolo components and
Apache NiFi™ community contributions that are available on Datavolo Runtimes.
AttributesToCSV
Generates a CSV representation of the input FlowFile Attributes
AttributesToJSON
Generates a JSON representation of the input FlowFile Attributes
CalculateRecordStats
Counts the number of Records in a record set, optionally counting the number of elements per category, where the categories are defined by user-defined properties.
CaptureGoogleDriveChanges
Captures changes to a Shared Google Drive and emits a FlowFile for each change that occurs
CaptureSharepointChanges
Captures changes from a Sharepoint Document Library and emits a FlowFile for each change that occurs
ChunkDocument
Given an input Datavolo Document, chunks the data into segments that are more applicable for LLM synthesis or semantic embedding
ChunkText
Chunks text with options for recursively splitting by delimiters and max character length
CompressContent
Compresses or decompresses the contents of FlowFiles using a user-specified compression algorithm and updates the mime.type attribute as appropriate
ConnectWebSocket
Acts as a WebSocket client endpoint to interact with a remote WebSocket server
ConsumeAMQP
Consumes AMQP Messages from an AMQP Broker using the AMQP 0.9.1 protocol
ConsumeAzureEventHub
Receives messages from Microsoft Azure Event Hubs with checkpointing to ensure consistent event processing
ConsumeElasticsearch
A processor that repeatedly runs a paginated query against a field using a Range query to consume new Documents from an Elasticsearch index/query
ConsumeGCPubSub
Consumes message from the configured Google Cloud PubSub subscription
ConsumeIMAP
Consumes messages from Email Server using IMAP protocol
ConsumeJMS
Consumes JMS Message of type BytesMessage, TextMessage, ObjectMessage, MapMessage or StreamMessage transforming its content to a FlowFile and transitioning it to 'success' relationship
ConsumeKafka
Consumes messages from Apache Kafka Consumer API
ConsumeKinesisStream
Reads data from the specified AWS Kinesis stream and outputs a FlowFile for every processed Record (raw) or a FlowFile for a batch of processed records if a Record Reader and Record Writer are configured
ConsumeMQTT
Subscribes to a topic and receives messages from an MQTT broker
ConsumePOP3
Consumes messages from Email Server using POP3 protocol
ConsumeSlack
Retrieves messages from one or more configured Slack channels
ConsumeTwitter
Streams tweets from Twitter's streaming API v2
ControlRate
Controls the rate at which data is transferred to follow-on processors
ConvertCharacterSet
Converts a FlowFile's content from one character set to another
ConvertOfficeFormat
Converts a Open Office compatable file, to a PDF or Docx format
ConvertPdfToImage
Converts a PDF file into a series of images, one for each page.
ConvertRecord
Converts records from one data format to another using configured Record Reader and Record Write Controller Services
CopyAzureBlobStorage_v12
Copies a blob in Azure Blob Storage from one account/container to another
CopyS3Object
Copies a file from one bucket and key to another in AWS S3
CountText
Counts various metrics on incoming text
CreateAzureOpenAiEmbeddings
Uses Azure OpenAI to create embeddings for text
CreateCohereEmbeddings
Uses Cohere to create embeddings for text
CreateOllamaEmbeddings
Uses Ollama to create embeddings for text
CreateOpenAiEmbeddings
Uses OpenAI to create embeddings for text
CreateSnowflakeEmbeddings
Create vector embeddings using Snowflake Cortex Large Language Model functions
CreateVertexAIEmbeddings
Uses VertexAI to create embeddings for text
CryptographicHashContent
Calculates a cryptographic hash value for the flowfile content using the given algorithm and writes it to an output attribute
DebugFlow
The DebugFlow processor aids testing and debugging the FlowFile framework by allowing various responses to be explicitly triggered in response to the receipt of a FlowFile or a timer event without a FlowFile if using timer or cron based scheduling
DecryptContentAge
Decrypt content using the age-encryption.org/v1 specification
DecryptContentPGP
Decrypt contents of OpenPGP messages
DeduplicateRecord
This processor de-duplicates individual records within a record set
DeleteAzureBlobStorage_v12
Deletes the specified blob from Azure Blob Storage
DeleteAzureDataLakeStorage
Deletes the provided file from Azure Data Lake Storage
DeleteByQueryElasticsearch
Delete from an Elasticsearch index using a query
DeleteDBFSResource
Delete a DBFS files and directories.
DeleteDynamoDB
Deletes a document from DynamoDB based on hash and range key
DeleteFile
Deletes a file from the filesystem.
DeleteGCSObject
Deletes objects from a Google Cloud Bucket
DeleteGridFS
Deletes a file from GridFS using a file name or a query.
DeleteMilvus
Deletes vectors from Milvus database from a collection by ID
DeleteMongo
Executes a delete query against a MongoDB collection
DeletePinecone
Deletes vectors from a Pinecone index.
DeleteS3Object
Deletes a file from an Amazon S3 Bucket
DeleteSFTP
Deletes a file residing on an SFTP server.
DeleteSQS
Deletes a message from an Amazon Simple Queuing Service Queue
DeleteUnityCatalogResource
Delete a Unity Catalog file or directory.
DetectDocumentPII
This processor accepts a parsed document then returns the document with metadata containing text positions of recognized PII entities
DetectDuplicate
Caches a value, computed from FlowFile attributes, for each incoming FlowFile and determines if the cached value has already been seen
DistributeLoad
Distributes FlowFiles to downstream processors based on a Distribution Strategy
DuplicateFlowFile
Intended for load testing, this processor will create the configured number of copies of each incoming FlowFile
EncodeContent
Encode or decode the contents of a FlowFile using Base64, Base32, or hex encoding schemes
EncryptContentAge
Encrypt content using the age-encryption.org/v1 specification
EncryptContentPGP
Encrypt contents using OpenPGP
EnforceOrder
Enforces expected ordering of FlowFiles that belong to the same data group within a single node
EnrichAttributes
Looks up a value using the configured Lookup Service and adds the results to the FlowFile as one or more attributes
EvaluateJsonPath
Evaluates one or more JsonPath expressions against the content of a FlowFile
EvaluateRagAnswerCorrectness
Evaluates the correctness of generated answers in a Retrieval-Augmented Generation (RAG) context by computing metrics such as F1 score, cosine similarity, and answer correctness
EvaluateRagFaithfulness
Evaluates the faithfulness of generated answers in a Retrieval-Augmented Generation (RAG) system by analyzing responses using an LLM (e.g., OpenAI's GPT)
EvaluateRagRetrieval
Calculates retrieval metrics (Precision@N, Recall@N, FScore@N, MAP@N, MRR) for a RAG system using an LLM as a judge
EvaluateXPath
Evaluates one or more XPaths against the content of a FlowFile
EvaluateXQuery
Evaluates one or more XQueries against the content of a FlowFile
ExecuteGroovyScript
Experimental Extended Groovy script processor
ExecuteProcess
Runs an operating system command specified by the user and writes the output of that command to a FlowFile
ExecuteScript
Experimental - Executes a script given the flow file and a process session
ExecuteSQL
Executes provided SQL select query
ExecuteSQLRecord
Executes provided SQL select query
ExecuteSQLStatement
Executes a SQL DDL or DML Statement against a database
ExecuteStreamCommand
The ExecuteStreamCommand processor provides a flexible way to integrate external commands and scripts into NiFi data flows
ExtractAvroMetadata
Extracts metadata from the header of an Avro datafile.
ExtractDocumentRawText
Extracts the text from a Document and writes it to the FlowFile content
ExtractEmailAttachments
Extract attachments from a mime formatted email file, splitting them into individual flowfiles.
ExtractEmailHeaders
Using the flowfile content as source of data, extract header from an RFC compliant email file adding the relevant attributes to the flowfile
ExtractGrok
Evaluates one or more Grok Expressions against the content of a FlowFile, adding the results as attributes or replacing the content of the FlowFile with a JSON notation of the matched content
ExtractRecordSchema
Extracts the record schema from the FlowFile using the supplied Record Reader and writes it to the `avro.schema` attribute.
ExtractText
Evaluates one or more Regular Expressions against the content of a FlowFile
FetchAzureBlobStorage_v12
Retrieves the specified blob from Azure Blob Storage and writes its content to the content of the FlowFile
FetchAzureDataLakeStorage
Fetch the specified file from Azure Data Lake Storage
FetchBoxFile
Fetches files from a Box Folder
FetchDistributedMapCache
Computes cache key(s) from FlowFile attributes, for each incoming FlowFile, and fetches the value(s) from the Distributed Map Cache associated with each key
FetchDropbox
Fetches files from Dropbox
FetchFile
Reads the contents of a file from disk and streams it into the contents of an incoming FlowFile
FetchFTP
Fetches the content of a file from a remote FTP server and overwrites the contents of an incoming FlowFile with the content of the remote file.
FetchGCSObject
Fetches a file from a Google Cloud Bucket
FetchGoogleDrive
Fetches files from a Google Drive Folder
FetchGridFS
Retrieves one or more files from a GridFS bucket by file name or by a user-defined query.
FetchS3Object
Retrieves the contents of an S3 Object and writes it to the content of a FlowFile
FetchSFTP
Fetches the content of a file from a remote SFTP server and overwrites the contents of an incoming FlowFile with the content of the remote file.
FetchSharepointFile
Fetches the contents of a file from a Sharepoint Drive, optionally downloading a PDF or HTML version of the file when applicable
FetchSmb
Fetches files from a SMB Share
FilterAttribute
Filters the attributes of a FlowFile by retaining specified attributes and removing the rest or by removing specified attributes and retaining the rest.
FlattenJson
Provides the user with the ability to take a nested JSON document and flatten it into a simple key/value pair document
ForkEnrichment
Used in conjunction with the JoinEnrichment processor, this processor is responsible for adding the attributes that are necessary for the JoinEnrichment processor to perform its function
ForkRecord
This processor allows the user to fork a record into multiple records
FormatWordDocument
Formats a MS Word docx file
GenerateAnswersFromContext
Generates synthetic answers for each question present in the incoming records using a Large Language Model (LLM)
GenerateAnswersFromGroundTruth
Generates synthetic answers for each question in the incoming records using an LLM
GenerateFlowFile
This processor creates FlowFiles with random data or custom content
GenerateRecord
This processor creates FlowFiles with records having random value for the specified fields
GenerateTableFetch
Generates SQL select queries that fetch pages of rows from a table
GeoEnrichIP
Looks up geolocation information for an IP address and adds the geo information to FlowFile attributes
GeoEnrichIPRecord
Looks up geolocation information for an IP address and adds the geo information to FlowFile attributes
GetAwsPollyJobStatus
Retrieves the current status of an AWS Polly job.
GetAwsTextractJobStatus
Retrieves the current status of an AWS Textract job.
GetAwsTranscribeJobStatus
Retrieves the current status of an AWS Transcribe job.
GetAwsTranslateJobStatus
Retrieves the current status of an AWS Translate job.
GetAzureEventHub
Receives messages from Microsoft Azure Event Hubs without reliable checkpoint tracking
GetAzureQueueStorage_v12
Retrieves the messages from an Azure Queue Storage
GetDBFSFile
Read a DBFS file.
GetDynamoDB
Retrieves a document from DynamoDB based on hash and range key
GetElasticsearch
Elasticsearch get processor that uses the official Elastic REST client libraries to fetch a single document from Elasticsearch by _id
GetFile
Creates FlowFiles from files in a directory
GetFTP
Fetches files from an FTP Server and creates FlowFiles from them
GetGcpVisionAnnotateFilesOperationStatus
Retrieves the current status of an Google Vision operation.
GetGcpVisionAnnotateImagesOperationStatus
Retrieves the current status of an Google Vision operation.
GetHubSpot
Retrieves JSON data from a private HubSpot application
GetHubSpotObject
Get a HubSpot object and its associations by ID or unique value.
GetMongo
Creates FlowFiles from documents in MongoDB loaded by a user-specified query.
GetMongoRecord
A record-based version of GetMongo that uses the Record writers to write the MongoDB result set.
GetS3ObjectMetadata
Check for the existence of a file in S3 without attempting to download it
GetSFTP
Fetches files from an SFTP Server and creates FlowFiles from them
GetShopify
Retrieves objects from a custom Shopify store
GetSmbFile
Reads file from a samba network location to FlowFiles
GetSplunk
Retrieves data from Splunk Enterprise.
GetSQS
Fetches messages from an Amazon Simple Queuing Service Queue
GetUnityCatalogFile
Read a Unity Catalog file up to 5 GiB.
GetUnityCatalogFileMetadata
Checks for Unity Catalog file metadata.
GetWorkdayReport
A processor which can interact with a configurable Workday Report
GetZendesk
Incrementally fetches data from Zendesk API.
HandleHttpRequest
Starts an HTTP Server and listens for HTTP Requests
HandleHttpResponse
Sends an HTTP Response to the Requestor that generated a FlowFile
IdentifyMimeType
Attempts to identify the MIME Type used for a FlowFile
InvokeHTTP
An HTTP client processor which can interact with a configurable HTTP Endpoint
InvokeScriptedProcessor
Experimental - Invokes a script engine for a Processor defined in the given script
ISPEnrichIP
Looks up ISP information for an IP address and adds the information to FlowFile attributes
JoinEnrichment
Joins together Records from two different FlowFiles where one FlowFile, the 'original' contains arbitrary records and the second FlowFile, the 'enrichment' contains additional data that should be used to enrich the first
JoltTransformJSON
Applies a list of Jolt specifications to the flowfile JSON payload
JoltTransformRecord
Applies a JOLT specification to each record in the FlowFile payload
JSLTTransformJSON
Applies a JSLT transformation to the FlowFile JSON payload
JsonQueryElasticsearch
A processor that allows the user to run a query (with aggregations) written with the Elasticsearch JSON DSL
ListAzureBlobStorage_v12
Lists blobs in an Azure Blob Storage container
ListAzureDataLakeStorage
Lists directory in an Azure Data Lake Storage Gen 2 filesystem
ListBoxFile
Lists files in a Box folder
ListDatabaseTables
Generates a set of flow files, each containing attributes corresponding to metadata about a table from a database connection
ListDBFSDirectory
List file names in a DBFS directory and output a new FlowFile with the filename.
ListDropbox
Retrieves a listing of files from Dropbox (shortcuts are ignored)
ListenFTP
Starts an FTP server that listens on the specified port and transforms incoming files into FlowFiles
ListenHTTP
Starts an HTTP Server and listens on a given base path to transform incoming requests into FlowFiles
ListenOTLP
Collect OpenTelemetry messages over HTTP or gRPC
ListenSlack
Retrieves real-time messages or Slack commands from one or more Slack conversations
ListenSyslog
Listens for Syslog messages being sent to a given port over TCP or UDP
ListenTCP
Listens for incoming TCP connections and reads data from each connection using a line separator as the message demarcator
ListenUDP
Listens for Datagram Packets on a given port
ListenUDPRecord
Listens for Datagram Packets on a given port and reads the content of each datagram using the configured Record Reader
ListenWebSocket
Acts as a WebSocket server endpoint to accept client connections
ListFile
Retrieves a listing of files from the input directory
ListFTP
Performs a listing of the files residing on an FTP server
ListGCSBucket
Retrieves a listing of objects from a GCS bucket
ListGoogleDrive
Performs a listing of concrete files (shortcuts are ignored) in a Google Drive folder
ListS3
Retrieves a listing of objects from an S3 bucket
ListSFTP
Performs a listing of the files residing on an SFTP server
ListSmb
Lists concrete files shared via SMB protocol
ListUnityCatalogDirectory
List file names in a Unity Catalog directory and output a new FlowFile with the filename.
LogAttribute
Emits attributes of the FlowFile at the specified log level
LogMessage
Emits a log message at the specified log level
LookupAttribute
Lookup attributes from a lookup service
LookupRecord
Extracts one or more fields from a Record and looks up a value for those fields in a LookupService
MergeContent
Merges a Group of FlowFiles together based on a user-defined strategy and packages them into a single FlowFile
MergeDocumentElements
Given a FlowFile that contains a full Document and one more FlowFiles that contain additional data to merge into the Document, this Processor will merge the additional data into the Document
MergeRecord
This Processor merges together multiple record-oriented FlowFiles into a single FlowFile that contains all of the Records of the input FlowFiles
ModifyBytes
Discard byte range at the start and end or all content of a binary file.
ModifyCompression
Changes the compression algorithm used to compress the contents of a FlowFile by decompressing the contents of FlowFiles using a user-specified compression algorithm and recompressing the contents using the specified compression format properties
MonitorActivity
Monitors the flow for activity and sends out an indicator when the flow has not had any data for some specified amount of time and again when the flow's activity is restored
MoveAzureDataLakeStorage
Moves content within an Azure Data Lake Storage Gen 2
Notify
Caches a release signal identifier in the distributed cache, optionally along with the FlowFile's attributes
OpenAiTranscribeAudio
Transcribes audio into English text
PackageFlowFile
This processor will package FlowFile attributes and content into an output FlowFile that can be exported from NiFi and imported back into NiFi, preserving the original attributes and content.
PaginatedJsonQueryElasticsearch
A processor that allows the user to run a paginated query (with aggregations) written with the Elasticsearch JSON DSL
ParseEvtx
Parses the contents of a Windows Event Log file (evtx) and writes the resulting XML to the FlowFile
ParsePdfDocument
Parses a PDF file, extracting the text and additional information into a structured JSON document
ParseSyslog
Attempts to parses the contents of a Syslog message in accordance to RFC5424 and RFC3164 formats and adds attributes to the FlowFile for each of the parts of the Syslog message
ParseSyslog5424
Attempts to parse the contents of a well formed Syslog message in accordance to RFC5424 format and adds attributes to the FlowFile for each of the parts of the Syslog message, including Structured Data.Structured Data will be written to attributes as one attribute per item id + parameter see https://tools.ietf.org/html/rfc5424
ParseTableImage
Extracts the text from a Table image and writes it to the FlowFile content in csv format.
PartitionRecord
Splits, or partitions, record-oriented data based on the configured fields in the data
PerformOCR
Uses the Datavolo Tesseract OCR Service to extract text from a PDF or image, optionally providing metadata including the bounding box, page number and confidence level of the OCR.
PromptAnthropicAI
Sends a prompt to Anthropic, writing the response either as a FlowFile attribute or to the contents of the incoming FlowFile
PromptAzureOpenAI
Sends a prompt to Azure's OpenAI service, writing the response either as a FlowFile attribute or to the contents of the incoming FlowFile
PromptLLM
This processor sends a user defined prompt to a Large Language Model (LLM) to respond.
PromptOllama
Sends a prompt to Ollama, writing the response either as a FlowFile attribute or to the contents of the incoming FlowFile
PromptOpenAI
Sends a prompt to OpenAI, writing the response either as a FlowFile attribute or to the contents of the incoming FlowFile
PromptSnowflakeCortex
Sends a prompt to Snowflake Cortex, writing the response either as a FlowFile attribute or to the contents of the incoming FlowFile
PromptVertexAI
Sends a prompt to VertexAI, writing the response either as a FlowFile attribute or to the contents of the incoming FlowFile
PublishAMQP
Creates an AMQP Message from the contents of a FlowFile and sends the message to an AMQP Exchange
PublishGCPubSub
Publishes the content of the incoming flowfile to the configured Google Cloud PubSub topic
PublishJMS
Creates a JMS Message from the contents of a FlowFile and sends it to a JMS Destination (queue or topic) as JMS BytesMessage or TextMessage
PublishKafka
Sends the contents of a FlowFile as either a message or as individual records to Apache Kafka using the Kafka Producer API
PublishMQTT
Publishes a message to an MQTT topic
PublishSlack
Posts a message to the specified Slack channel
PutAzureBlobStorage_v12
Puts content into a blob on Azure Blob Storage
PutAzureCosmosDBRecord
This processor is a record-aware processor for inserting data into Cosmos DB with Core SQL API
PutAzureDataExplorer
Acts as an Azure Data Explorer sink which sends FlowFiles to the provided endpoint
PutAzureDataLakeStorage
Writes the contents of a FlowFile as a file on Azure Data Lake Storage Gen 2
PutAzureEventHub
Send FlowFile contents to Azure Event Hubs
PutAzureQueueStorage_v12
Writes the content of the incoming FlowFiles to the configured Azure Queue Storage.
PutBigQuery
Writes the contents of a FlowFile to a Google BigQuery table
PutBoxFile
Puts content to a Box folder.
PutCloudWatchMetric
Publishes metrics to Amazon CloudWatch
PutDatabaseRecord
The PutDatabaseRecord processor uses a specified RecordReader to input (possibly multiple) records from an incoming flow file
PutDatabricksSQL
Submit a SQL Execution using Databricks REST API then write the JSON response to FlowFile Content
PutDBFSFile
Write FlowFile content to DBFS.
PutDistributedMapCache
Gets the content of a FlowFile and puts it to a distributed map cache, using a cache key computed from FlowFile attributes
PutDropbox
Puts content to a Dropbox folder.
PutDynamoDB
Puts a document from DynamoDB based on hash and range key
PutDynamoDBRecord
Inserts items into DynamoDB based on record-oriented data
PutElasticsearchJson
An Elasticsearch put processor that uses the official Elastic REST client libraries
PutElasticsearchRecord
A record-aware Elasticsearch put processor that uses the official Elastic REST client libraries
PutEmail
Sends an e-mail to configured recipients for each incoming FlowFile
PutFile
Writes the contents of a FlowFile to the local file system
PutFTP
Sends FlowFiles to an FTP Server
PutGCSObject
Writes the contents of a FlowFile as an object in a Google Cloud Storage.
PutGoogleDrive
Writes the contents of a FlowFile as a file in Google Drive.
PutGridFS
Writes a file to a GridFS bucket.
PutHubSpot
Upsert a HubSpot object.
PutIcebergTable
Store records in Iceberg using configurable Catalog for managing namespaces and tables.
PutKinesisFirehose
Sends the contents to a specified Amazon Kinesis Firehose
PutKinesisStream
Sends the contents to a specified Amazon Kinesis
PutLambda
Sends the contents to a specified Amazon Lambda Function
PutMLflow
Record metadata in MLflow
PutMongo
Writes the contents of a FlowFile to MongoDB
PutMongoBulkOperations
Writes the contents of a FlowFile to MongoDB as bulk-update
PutMongoRecord
This processor is a record-aware processor for inserting/upserting data into MongoDB
PutRecord
The PutRecord processor uses a specified RecordReader to input (possibly multiple) records from an incoming flow file, and sends them to a destination specified by a Record Destination Service (i.e
PutRedisHashRecord
Puts record field data into Redis using a specified hash value, which is determined by a RecordPath to a field in each record containing the hash value
PutS3Object
Writes the contents of a FlowFile as an S3 Object to an Amazon S3 Bucket.
PutSalesforceObject
Creates new records for the specified Salesforce sObject
PutSFTP
Sends FlowFiles to an SFTP Server
PutSmbFile
Writes the contents of a FlowFile to a samba network location
PutSnowflakeInternalStageFile
Puts files into a Snowflake internal stage
PutSNS
Sends the content of a FlowFile as a notification to the Amazon Simple Notification Service
PutSplunk
Sends logs to Splunk Enterprise over TCP, TCP + TLS/SSL, or UDP
PutSplunkHTTP
Sends flow file content to the specified Splunk server over HTTP or HTTPS
PutSQL
Executes a SQL UPDATE or INSERT command
PutSQS
Publishes a message to an Amazon Simple Queuing Service Queue
PutSyslog
Sends Syslog messages to a given host and port over TCP or UDP
PutTCP
Sends serialized FlowFiles or Records over TCP to a configurable destination with optional support for TLS
PutUDP
The PutUDP processor receives a FlowFile and packages the FlowFile content into a single UDP datagram packet which is then transmitted to the configured UDP server
PutUnityCatalogFile
Write FlowFile content with max size of 5 GiB to Unity Catalog.
PutVectaraDocument
Generate and upload a JSON document to Vectara's upload endpoint
PutVectaraFile
Upload a FlowFile content to Vectara's index endpoint
PutVespaDocument
Uses Vespa document api to update a record in a specific namespace.
PutWebSocket
Sends messages to a WebSocket remote endpoint using a WebSocket session that is established by either ListenWebSocket or ConnectWebSocket.
PutZendeskTicket
Create Zendesk tickets using the Zendesk API.
QueryAzureDataExplorer
Query Azure Data Explorer and stream JSON results to output FlowFiles
QueryDatabaseTable
Generates a SQL select query, or uses a provided statement, and executes it to fetch all rows whose values in the specified Maximum Value column(s) are larger than the previously-seen maxima
QueryDatabaseTableRecord
Generates a SQL select query, or uses a provided statement, and executes it to fetch all rows whose values in the specified Maximum Value column(s) are larger than the previously-seen maxima
QueryDocument
Evaluates a SQL-like query against the incoming Datavolo Document JSON, producing the results on the outgoing FlowFile
QueryMilvus
Queries a given collection in a Milvus database using vectors
QueryPinecone
Queries Pinecone for vectors that are similar to the input vector, or retrieves a vector by ID.
QueryRecord
Evaluates one or more SQL queries against the contents of a FlowFile
QuerySalesforceObject
Retrieves records from a Salesforce sObject
QuerySplunkIndexingStatus
Queries Splunk server in order to acquire the status of indexing acknowledgement.
RemoveRecordField
Modifies the contents of a FlowFile that contains Record-oriented data (i.e
RenameRecordField
Renames one or more fields in each Record of a FlowFile
ReplaceText
Updates the content of a FlowFile by searching for some textual value in the FlowFile content (via Regular Expression/regex, or literal value) and replacing the section of the content that matches with some alternate value
ReplaceTextWithMapping
Updates the content of a FlowFile by evaluating a Regular Expression against it and replacing the section of the content that matches the Regular Expression with some alternate value provided in a mapping file.
RetryFlowFile
FlowFiles passed to this Processor have a 'Retry Attribute' value checked against a configured 'Maximum Retries' value
RouteOnAttribute
Routes FlowFiles based on their Attributes using the Attribute Expression Language
RouteOnContent
Applies Regular Expressions to the content of a FlowFile and routes a copy of the FlowFile to each destination whose Regular Expression matches
RouteText
Routes textual data based on a set of user-defined rules
RunDatabricksJob
Triggers a pre-defined Databricks job to run with custom parameters
RunMongoAggregation
A processor that runs an aggregation query whenever a flowfile is received.
SampleRecord
Samples the records of a FlowFile based on a specified sampling strategy (such as Reservoir Sampling)
ScanAttribute
Scans the specified attributes of FlowFiles, checking to see if any of their values are present within the specified dictionary of terms
ScanContent
Scans the content of FlowFiles for terms that are found in a user-supplied dictionary
ScriptedFilterRecord
This processor provides the ability to filter records out from FlowFiles using the user-provided script
ScriptedPartitionRecord
Receives Record-oriented data (i.e., data that can be read by the configured Record Reader) and evaluates the user provided script against each record in the incoming flow file
ScriptedTransformRecord
Provides the ability to evaluate a simple script against each record in an incoming FlowFile
ScriptedValidateRecord
This processor provides the ability to validate records in FlowFiles using the user-provided script
SearchElasticsearch
A processor that allows the user to repeatedly run a paginated query (with aggregations) written with the Elasticsearch JSON DSL
SegmentContent
Segments a FlowFile into multiple smaller segments on byte boundaries
SignContentPGP
Sign content using OpenPGP Private Keys
SplitAvro
Splits a binary encoded Avro datafile into smaller files based on the configured Output Size
SplitContent
Splits incoming FlowFiles by a specified byte sequence
SplitExcel
Splits a multi sheet Microsoft Excel spreadsheet into multiple Microsoft Excel spreadsheets where each sheet from the original file is converted to an individual spreadsheet in its own flow file
SplitJson
Splits a JSON File into multiple, separate FlowFiles for an array element specified by a JsonPath expression
SplitRecord
Splits up an input FlowFile that is in a record-oriented data format into multiple smaller FlowFiles
SplitText
Splits a text file into multiple smaller text files on line boundaries limited by maximum number of lines or total size of fragment
SplitXml
Splits an XML File into multiple separate FlowFiles, each comprising a child or descendant of the original root element
StartAwsPollyJob
Trigger a AWS Polly job
StartAwsTextractJob
Trigger a AWS Textract job
StartAwsTranscribeJob
Trigger a AWS Transcribe job
StartAwsTranslateJob
Trigger a AWS Translate job
StartGcpVisionAnnotateFilesOperation
Trigger a Vision operation on file input
StartGcpVisionAnnotateImagesOperation
Trigger a Vision operation on image input
SummarizeText
This processor uses a Large Language Model (LLM) to summarize the content of a FlowFile
TagS3Object
Adds or updates a tag on an Amazon S3 Object.
TailFile
Tails a file, or a list of files, ingesting data from the file as it is written to the file
TransformXml
Applies the provided XSLT file to the FlowFile XML payload
UnpackContent
Unpacks the content of FlowFiles that have been packaged with one of several different Packaging Formats, emitting one to many FlowFiles for each input FlowFile
UpdateAttribute
Updates the Attributes for a FlowFile by using the Attribute Expression Language and/or deletes the attributes based on a regular expression
UpdateByQueryElasticsearch
Update documents in an Elasticsearch index using a query
UpdateCounter
This processor allows users to set specific counters and key points in their flow
UpdateDatabaseTable
This processor uses a JDBC connection and incoming records to generate any database table changes needed to support the incoming records
UpdateRecord
Updates the contents of a FlowFile that contains Record-oriented data (i.e., data that can be read via a RecordReader and written by a RecordWriter)
UpsertMilvus
Upserts vectors into Milvus database for a given collection
UpsertPinecone
Publishes vectors, including metadata, and optionally text, to a Pinecone index.
ValidateCsv
Validates the contents of FlowFiles against a user-specified CSV schema
ValidateJson
Validates the contents of FlowFiles against a configurable JSON Schema
ValidateRecord
Validates the Records of an incoming FlowFile against a given schema
ValidateXml
Validates XML contained in a FlowFile
VerifyContentMAC
Calculates a Message Authentication Code using the provided Secret Key and compares it with the provided MAC property
VerifyContentPGP
Verify signatures using OpenPGP Public Keys
Wait
Routes incoming FlowFiles to the 'wait' relationship until a matching release signal is stored in the distributed cache from a corresponding Notify processor