CreateCohereEmbeddings
Description
Uses Cohere to create embeddings for text. The input text can be provided as a single FlowFile or as a record-oriented FlowFile.
Tags
chatbot, cohere, embeddings, gen ai, generative ai, llm, nlp, text
Properties
In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
Display Name | API Name | Default Value | Allowable Values | Description |
---|---|---|---|---|
Cohere API Key * | Cohere API Key | The API Key for authenticating to Cohere | ||
Embeddings Model * | Embeddings Model | embed-english-v3.0 | The model to use for embeddings, available models are listed at https://docs.cohere.com/reference/embed Supports Expression Language, using FlowFile attributes and Environment variables. | |
Input Type * | Input Type | search_document |
| Specifies the type of input passed to the model. Required for embedding models v3 and higher. |
Embedding Type * | Embedding Type | float |
| Specifies the types of embeddings you want to get back. |
Truncate Policy * | Truncate Policy | NONE |
| One of NONE |
User | User | An identifier for the remote user on whose behalf the request is being made. Supports Expression Language, using FlowFile attributes and Environment variables. | ||
Record Reader | Record Reader | Controller Service: RecordReaderFactory Implementations: AvroReader CEFReader CSVReader ExcelReader GrokReader JsonPathReader JsonTreeReader ReaderLookup ScriptedReader Syslog5424Reader SyslogReader WindowsEventLogReader XMLReader YamlTreeReader | The record reader to use for reading record-oriented data. If the incoming data is to be treated as plaintext, this property should be left unset. | |
Record Writer * | Record Writer | Controller Service: RecordSetWriterFactory Implementations: AvroRecordSetWriter CSVRecordSetWriter FreeFormTextRecordSetWriter JsonRecordSetWriter RecordSetWriterLookup ScriptedRecordSetWriter XMLRecordSetWriter | The Record Writer to use for writing the output | |
Text Record Path * | Text Record Path | /text | The path to the field in the record that contains the text to be embedded. If the incoming data is to be treated as plaintext, this property should be left unset. Supports Expression Language, using FlowFile attributes and Environment variables. This property is only considered if:
| |
Embeddings Record Path * | Embeddings Record Path | /embeddings | The path to the field in the record where the embeddings are to be written. Supports Expression Language, using FlowFile attributes and Environment variables. This property is only considered if:
| |
Max Batch Size * | Max Batch Size | 100 | The maximum number of records to include in each batch sent to Cohere |
Dynamic Properties
This component does not support dynamic properties.
Relationships
Name | Description |
---|---|
failure | The original FlowFile will be routed to this relationship if the embeddings could not be created |
success | The embeddings will be routed to this relationship |
Reads Attributes
This processor does not read attributes.
Writes Attributes
Name | Description |
---|---|
mime.type | The MIME type of the output data, based on the chosen Record Writer |
record.count | The number of records written to the output |
State Management
This component does not store state.
Restricted
This component is not restricted.
Input Requirement
Input requirements are not specified for this component.
Example Use Cases
Use Case 1
Create embeddings for text using Cohere's Embedding model
Configuration
Set "Cohere API Key" to the API key for your Cohere account.
Set "Embeddings Model" to the name of the Cohere model to use for creating embeddings.
Set "Input Type" based on the desired use case for the vector embeddings.
Set "Record Writer" to a Record Writer that writes out data in the desired format, typically JSON.
If the incoming data is in a structured format such as JSON, set "Record Reader" to a Record Reader that can read the format of the incoming data and set "Text Record Path" to the path to the field that contains the text to be embedded.
Otherwise, if the incoming data is raw text, leave the "Text Record Path" and "Record Reader" properties unset.
System Resource Considerations
This component does not specify system resource considerations.