ScanContent
Description
Scans the content of FlowFiles for terms that are found in a user-supplied dictionary. If a term is matched, the UTF-8 encoded version of the term will be added to the FlowFile using the 'matching.term' attribute
Tags
aho-corasick, byte sequence, content, dictionary, find, scan, search
Properties
In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
Display Name | API Name | Default Value | Allowable Values | Description |
---|---|---|---|---|
Dictionary File * | Dictionary File | The filename of the terms dictionary | ||
Dictionary Encoding * | Dictionary Encoding | text |
| Indicates how the dictionary is encoded. If 'text', dictionary terms are new-line delimited and UTF-8 encoded; if 'binary', dictionary terms are denoted by a 4-byte integer indicating the term length followed by the term itself |
Dynamic Properties
This component does not support dynamic properties.
Relationships
Name | Description |
---|---|
matched | FlowFiles that match at least one term in the dictionary are routed to this relationship |
unmatched | FlowFiles that do not match any term in the dictionary are routed to this relationship |
Reads Attributes
This processor does not read attributes.
Writes Attributes
Name | Description |
---|---|
matching.term | The term that caused the Processor to route the FlowFile to the 'matched' relationship; if FlowFile is routed to the 'unmatched' relationship, this attribute is not added |
State Management
This component does not store state.
Restricted
This component is not restricted.
Input Requirement
This component requires an incoming relationship.
System Resource Considerations
This component does not specify system resource considerations.