Skip to main content

ScanContent

Description

Scans the content of FlowFiles for terms that are found in a user-supplied dictionary. If a term is matched, the UTF-8 encoded version of the term will be added to the FlowFile using the 'matching.term' attribute

Tags

aho-corasick, byte sequence, content, dictionary, find, scan, search

Properties

In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.

Display NameAPI NameDefault ValueAllowable ValuesDescription
Dictionary File *Dictionary FileThe filename of the terms dictionary
Dictionary Encoding *Dictionary Encodingtext
  • text
  • binary
Indicates how the dictionary is encoded. If 'text', dictionary terms are new-line delimited and UTF-8 encoded; if 'binary', dictionary terms are denoted by a 4-byte integer indicating the term length followed by the term itself

Dynamic Properties

This component does not support dynamic properties.

Relationships

NameDescription
matchedFlowFiles that match at least one term in the dictionary are routed to this relationship
unmatchedFlowFiles that do not match any term in the dictionary are routed to this relationship

Reads Attributes

This processor does not read attributes.

Writes Attributes

NameDescription
matching.termThe term that caused the Processor to route the FlowFile to the 'matched' relationship; if FlowFile is routed to the 'unmatched' relationship, this attribute is not added

State Management

This component does not store state.

Restricted

This component is not restricted.

Input Requirement

This component requires an incoming relationship.

System Resource Considerations

This component does not specify system resource considerations.

See Also