Skip to main content

ConvertPdfToImage

Description

Converts a PDF file into a series of images, one for each page.

Tags

image, jpeg, jpg, ocr, pdf, png, tesseract

Properties

In the list below required Properties are shown with an asterisk (*). Other properties are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.

Display NameAPI NameDefault ValueAllowable ValuesDescription
Output Format *Output FormatPNG
  • PNG
  • JPEG
The format to use when writing the image.
Output Image Type *Output Image TypeColor
  • Color
  • Grayscale
  • Black and White
The type of image to use when writing the image.
Max File Size *Max File Size10 MBBecause the entire contents of the PDF file must be loaded into memory in order to parse it, this property is used to limit the size of the PDF file that can be processed. If a PDF file is larger than this value, it will be routed to failure.
Dots Per Inch *Dots Per Inch144The Dots Per Inch (DPI) to use when rendering the image. Larger values can result in higher quality images, but also larger file sizes and slower processing.

Supports Expression Language, using FlowFile attributes and Environment variables.
Max Page Size in InchesMax Page Size in Inches11.0The maximum width/height of the image in inches. If the image is larger than this value, it will be scaled down to fit within this width. Using this property in conjunction with the Dots Per Inch property can help to control the size of the resulting image, ensuring that memory usage and processing time are kept in check. The default value of 11.0 allows for a standard page size of 8.5 x 11. The size is applied to the larger of the page's width or height.

Dynamic Properties

This component does not support dynamic properties.

Relationships

NameDescription
failureIf a FlowFile cannot be converted into an image for any reason, it will be routed to this relationship.
imagesThe resulting images are routed to the success relationship.
originalThe original PDF file is routed to this relationship when processing is successful.

Reads Attributes

This processor does not read attributes.

Writes Attributes

NameDescription
image.height.original.inchesThe height of the image in inches before scaling to the configured maximum size.
image.height.pixelsThe height of the image in pixels.
image.height.scaled.inchesThe height of the image in inches after scaling to the configured maximum size.
image.width.original.inchesThe width of the image in inches before scaling to the configured maximum size.
image.width.pixelsThe width of the image in pixels.
image.width.scaled.inchesThe width of the image in inches after scaling to the configured maximum size.
pageNumberThe page number in the PDF Document that the image represents, with the first page having a value of 1.

State Management

This component does not store state.

Restricted

This component is not restricted.

Input Requirement

This component requires an incoming relationship.

System Resource Considerations

ScopeDescription
MEMORYParsing a PDF requires random access to the data. As such, the entire PDF must be read into Java's heap.

See Also

PerformOCR