Skip to main content

Processor Life-cycle

Apache NiFi integrates with Python components through a proxy mechanism, allowing Python capabilities while maintaining a clear separation between environments. This is important to keep in mind, as there are certain limitations that come with this setup.

  1. Instantiation: When you add a processor to the Flow, NiFi creates a virtual environment to encapsulate the processor, retrieves the necessary dependencies, and instantiates the class. Before proceeding, the required configuration for the processor must be validated, and all outgoing relationships must be accounted for by either auto-terminating them or connecting them to another processor.

  2. Scheduled: Starting the processor via the NiFi UI initiates the funneling of FlowFiles to the processor. Before this happens, the Java Proxy invokes the onScheduled method in your processor. This method can be used to:

    • Initialize Resources: Setup connections, load configurations, or initialize variables.
    • Pre-fetch Data: Retrieve any data that will be used repeatedly during processing.
  3. Execution: The processor instance remains active and ready to process data as long as it is running. NiFi uses a Java Proxy to call the processor's transform method, passing a FlowFile as an argument. This method should contain the logic for processing the FlowFile.

  4. Stopped: Stopping the processor via the NiFi UI halts the flow of FlowFiles to the processor. The processor is then expected to terminate any ongoing connections and free any temporarily held resources from memory. This is managed by the onStopped method.

  5. Removed: Removing the processor from the Flow causes NiFi to terminate the existing instance of the processor and destroy the virtual environment that contained it.

note

All processors of the same type share a common virtual environment. Each new instance of a processor is instantiated within this same container. Consequently, the entire environment is only destroyed when the last instance is removed.

Python Processor Life-cycle

Life-cycle Methods

onScheduled

The onScheduled method in Apache NiFi is a life-cycle method that is called when a processor is scheduled to run, allowing for initialization tasks such as setting up resources, loading configurations, or validating properties.

This method has one positional argument, context (of type ProcessContext).

from nifiapi.flowfiletransform import FlowFileTransform
from nifiapi.properties import ProcessContext


class Processor(FlowFileTransform):
(...)

def onScheduled(self, context: ProcessContext) -> None:
'''
Parameters:
context (ProcessContext)

Return:
None
'''
pass

onStopped

The onStopped method in Apache NiFi is a life-cycle method invoked when a processor is stopped, allowing for releasing resources or closing connections. This ensures that the processor shuts down gracefully and is ready for the next start-up.

This method has one positional argument, context (of type ProcessContext).

from nifiapi.flowfiletransform import FlowFileTransform
from nifiapi.properties import ProcessContext


class Processor(FlowFileTransform):
(...)

def onStopped(self, context: ProcessContext) -> None:
'''
Parameters:
context (ProcessContext)

Returns:
None
'''
pass