Skip to main content

QueryDocument

Description

Evaluates a SQL-like query against the incoming Datavolo Document JSON, producing the results on the outgoing FlowFile. This can be used to select specific elements, or to route the document based on the query results. One or more queries may be specified as user-defined properties.

Tags

datavolo, document, dsl, filter, json, query, relationships, route, sql, unstructured

Properties

In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.

Display NameAPI NameDefault ValueAllowable ValuesDescription

Dynamic Properties

NameValueDescription
The name of the relationship to route the FlowFile toThe SQL-like query to evaluate against the incoming documentThe SQL-like query to evaluate against the incoming document

Supports Expression Language: No

Relationships

NameDescription
failureAny incoming FlowFile that fails to process is routed to this Relationship
originalThe incoming FlowFile is routed to this Relationship if any of the specified queries match. This FlowFile will be updated with attributes appropriate for use with MergeDocumentElements.
unmatchedIf the incoming document does not match any of the specified queries, it is routed to this Relationship

Reads Attributes

This processor does not read attributes.

Writes Attributes

NameDescription
container.scopeThe scope of the container is set to DOCUMENT for the JSON Document, TABLE for tables, and FIGURE for any figures/images identified
document.idA unique UUID for the document
fragment.countThe total number of fragments
fragment.indexThe index of the fragment
mime.typeThe MIME type is set to 'application/json' if multiple elements are selected. Otherwise, it is set to the appropriate MIME type, depending on the type of element selected. For example, it may be text/plain or image/png.
section.titleThe title of the section containing the selected element, if it is available

State Management

This component does not store state.

Restricted

This component is not restricted.

Input Requirement

Input requirements are not specified for this component.

Syntax

In short, a DQL query consists of a series of statements: a definition statement, a selection statement, and a where statement. The selection statement is required, while the definition and where statements are optional. We might describe the syntax as such:

DEFINE
<variable> AS <type> [WITH RESTRICTION <restriction>][,]
<variable> AS <type> [WITH RESTRICTION <restriction>][,]
SELECT
<variable>,
<variable>
WHERE
<condition>

As an example, we can use the following query to extract out any table that contains the text "Revenue":

DEFINE
revenue AS table WITH RESTRICTION revenue CONTAINS "Revenue"
SELECT
revenue

Below, we will provide a full BNF if the language. This will provide a more detailed description of the syntax. We will then describe these elements in a less formal way.

Backus-Naur Form (BNF)

The syntax of DQL is described by the following BNF:

<query> ::= <singleQuery> ('union' <singleQuery>)* EOF;
<singleQuery> ::= <definitionStatement>? <selectionStatement> <whereStatement>?;
<definitionStatement> ::= 'define' <definitionList>;
<definitionList> ::= <definition> (',' <definition>)*;
<definition> ::= IDENTIFIER 'as' <scope> ('.' REPRESENTATION)? ATTRIBUTE? <restrictionStatement>?;
<scope> ::= 'image' | 'text' | 'table' | 'caption' | 'header' | 'footer' | 'section';
<selectionStatement> ::= 'select' <selection> (',' <selection>)*;
<selection> ::= 'document' | <entity> <selectionSubElement>? | <transformation>;
<selectionSubElement> ::= '.' ('data' | 'image');
<whereStatement> ::= 'where' <condition>;
<restrictionStatement> ::= 'with restriction' <restriction>;
<restriction> ::= <value> <comparisonOperator> <value>
| <entity> 'on page' <value>
| '(' <restriction> ')'
| <restriction> <logicalOperator> <restriction>
| 'not' <restriction>;
<condition> ::= <value> <comparisonOperator> <value>
| <entity> <relativeOperator> <entity> ('within' <distance>)?
| <entity> 'same page as' <entity>
| <entity> 'on page' <value>
| <distanceStatement>
| <variable> <unaryOperator>
| '(' <condition> ')'
| <condition> <logicalOperator> <condition>
| 'not' <condition>;
<comparisonOperator> ::= '=' | '!=' | 'contains' 'ignore case'?
| 'contains regex' 'ignore case'?
| 'matches regex' 'ignore case'?;
<relativeOperator> ::= 'right of' | 'left of' | 'above' | 'below'
| 'left aligned with' | 'right aligned with'
| 'top aligned with' | 'bottom aligned with'
| 'center x aligned with' | 'center y aligned with';
<logicalOperator> ::= 'and' | 'or';
<unaryOperator> ::= 'exists' | 'not exists';
<variable> ::= <entity> | <metadataReference> | <sectionTitle>;
<metadataReference> ::= (<entity> | <representation>) '.' 'metadata' '[' <metadataKey> ']';
<sectionTitle> ::= <entity> '.' 'title';
<representation> ::= <entity> '.' 'representation';
<metadataKey> ::= STRING;
<entity> ::= IDENTIFIER;
<transformation> ::= ('string_replace' | 'regex_replace') '(' <entity> ',' STRING ',' STRING ')' 'as' IDENTIFIER;
<value> ::= NUMBER | STRING | <variable>;
<distanceStatement> ::= <entity> 'within' <distance> 'of' <entity>;
<distance> ::= NUMBER <units>;
<units> ::= 'inch' | 'inches' | 'pixel' | 'pixels';

Query

The most basic element of a DQL query is the query itself. There are times, however, when we may want to combine multiple queries in order to get the information we need. This can be accomplished by writing multiple queries and using the union keyword to combine them.

Each of the queries will typically contain a DEFINE statement, a SELECT statement, and often a WHERE statement. We will look at each of these in turn.

The keywords in a query are case-insensitive, so SELECT, select, and Select are all equivalent. However, variable names are case-sensitive, so revenue and Revenue are not the same. Additionally, metadata keys are case-sensitive, so start.page and START.PAGE are not the same.

Definition Statement

The Definition Statement, or DEFINE statement, is used to define variables that can be used in the SELECT and WHERE statements. These variables give us a way to refer to the elements of the document that we are interested in. The syntax for the DEFINE statement is as follows:

DEFINE
<variable> AS <type> [WITH RESTRICTION <restriction>][,]
<variable> AS <type> [WITH RESTRICTION <restriction>]

A variable name can be any combination of 1 or more letters (a-z), numbers, or underscores, with the exception that it must start with a letter. Letters may be upper or lower case, but they are case sensitive, so if you define it as myVar, it cannot be referred to as myvar.

The type of the variable can be one of the following:

  • image
  • text
  • table
  • caption
  • header
  • footer
  • section
  • document

Multiple variables may be declared. They are optionally separated by commas. Each variable must have a unique name. For example, the following are all valid:

DEFINE
myImage AS image,
myText AS text
DEFINE
myImage AS image
myText AS text
DEFINE myImage AS IMAGE myText AS TEXT
WITH RESTRICTION

The WITH RESTRICTION clause is optional, but for most queries, it should be used. This clause allows you to restrict which elements in the document are candidates to be assigned to the given variable. It is possible to leave off the restriction for all variable entirely and instead use the WHERE clause to filter the results. However, use restrictions is significantly more efficient.

Any condition can be expressed as a restriction as long as the condition does not reference any other variables. If there is a need to reference another variable, the condition should be expressed in the WHERE clause. Often, these are used together. We might, for example, define a variable that represents a text block and use a restriction to narrow down the text blocks to those that contain a certain keyword. We might then use the WHERE clause to further restrict the results to only those where the text block is within a certain distance of an image.

REPRESENTATION

Each container within Datavolo's Document model contains at least one of the following components:

  • containers - Child elements that are contained within the container.
  • textElement - The text content of the container.
  • processingElement - A holder for any sort of processing results have have occurred on the container throughout the flow.

The processingElement provides the ability to hold multiple different representations of that element. For example, a table might have an image representation that was extracted from a PDF file. It might also have a CSV representation that was derived by using vision or machine learning models. It might further have a plain text representation that was derived by asking a Large Language Model (LLM) to summarize the table. The REPRESENTATION keyword allows you to specify that a variable should be assigned a particular representation of the element, rather than the element itself.

Often, a Representation is used in conjunction with a restriction that filters based on the metadata of the representation. For example, we might specify that we want to retrieve the image representation of a table. We can do that by using the following query:

DEFINE
revenue AS TABLE.REPRESENTATION WITH RESTRICTION revenue.METADATA['mime.type'] = 'image/png'
SELECT
revenue

Note that in this case, we are not selecting the image of the table itself, but rather the entire representation element from the document, which includes the image data as well as any metadata that was extracted from the image. If we want to select the image data itself, we would use the DATA keyword. For example:

DEFINE
revenue AS TABLE.REPRESENTATION WITH RESTRICTION revenue.METADATA['mime.type'] = 'image/png'
SELECT
revenue.DATA
ATTRIBUTE

The ATTRIBUTE keyword is used to specify that any NiFi Processor that is making use of DQL should consider the returned value to be an attribute of the output FlowFile, rather than writing the value to the content of the FlowFile. This is often useful when multiple values are being returned from a single query and you want to be able to output some results as attributes and others as content. For example, we can use the following query to extract a table from a document, while also extracting the text just to the right of the table as an attribute:

DEFINE
table AS TABLE
description AS TEXT ATTRIBUTE
SELECT table, description
WHERE
description RIGHT OF table
AND
description TOP ALIGNED WITH table

Selection Statement

The Selection Statement, or SELECT statement, is used to specify which elements of the document should be returned by the query. In its simplest form, the syntax for the SELECT statement is as follows:

SELECT <variable>

or

SELECT <variable>, <variable>

We might also select the entire document. For example, we could use the following query to select any document that has a table containing the text "Revenue":

DEFINE
revenue AS table WITH RESTRICTION revenue CONTAINS "Revenue"
SELECT
DOCUMENT
Selecting with .DATA

When a variable is defined as a REPRESENTATION, it is possible to select the data of the representation by using the .DATA keyword. For example, the following query will return the Representation element of any table image in the document:

DEFINE
revenue AS TABLE.REPRESENTATION
SELECT
revenue
WHERE
revenue.METADATA['mime.type'] = 'image/png'

Depending on the use case, this may be desirable. In other cases, it may be more useful to select the data of the representation. That is, the image itself. This can be done by using the .DATA keyword. For example:

DEFINE
revenue AS TABLE.REPRESENTATION
SELECT
revenue.DATA
WHERE
revenue.METADATA['mime.type'] = 'image/png'
Selecting with .IMAGE

Working with images is a common use case for DQL. As such, the language provides a simpler way to select the image representation of an element. When the desired element is an Image or a Table, it can be useful to use .IMAGE when selecting the variable. This will return any Representation of the element that is an image. For example, the following query will return the image representation of any table that is found on the second page of the document:

DEFINE
revenue AS table WITH RESTRICTION revenue ON PAGE 2
SELECT
revenue.IMAGE

In this way, we can avoid the complexity of defining revenue as a TABLE.REPRESENTATION and then selecting revenue.DATA and we also have the ability to further restrict the results based on the entire table element.

Selecting with Text Transformations

Often times, we want to select specific text from a document, especially for use as an attribute. We might, for example, want to extract the forecast from a document using the following query:

DEFINE
forecast AS TEXT ATTRIBUTE WITH RESTRICTION forecast CONTAINS "Today's Forecast: "
SELECT
forecast

Now, if the document were to contain the text "Today's Forecast: Sunny with a high of 75" we would get the entire text block. However, we might only want the text that comes after "Today's Forecast:". We can do this by using a text transformation. For example:

DEFINE
forecast AS TEXT ATTRIBUTE WITH RESTRICTION forecast CONTAINS "Today's Forecast: "
SELECT
string_replace(forecast, "Today's Forecast: ", "") AS forecast

This query will return the text "Sunny with a high of 75" as the value of the forecast variable. When we need more than a simple string replacement, we can use a regular expression with matching groups. For example, the following query will return the text "Sunny, 75" as the value of the forecast variable:

DEFINE
forecast AS TEXT ATTRIBUTE WITH RESTRICTION forecast CONTAINS "Today's Forecast: "
SELECT
regex_replace(forecast, "Today's Forecast: (.*?) with a high of (75)", "$1, $2") AS forecast

We could also make multiple selections based on the same variable, using an alias for each selection. For example:

DEFINE
forecast AS TEXT ATTRIBUTE WITH RESTRICTION forecast CONTAINS "Today's Forecast: "
SELECT
string_replace(forecast, "Today's Forecast: ", "") AS forecast,
regex_replace(forecast, "Today's Forecast: (.*?) with a high of .*", "$1") AS skyCondition,
regex_replace(forecast, "Today's Forecast: .* with a high of (\\d+)", "$1") AS highTemperature

Where Statement

The WHERE statement is used to filter the results of the query, by comparing the values of the variables that have been defined in the DEFINE statement. Unlike the RESTRICTION clause, the WHERE statement can be used to compare multiple values.

The WHERE statement is optional if all variables that are defined have a restriction. If any variable lacks a restriction, the WHERE statement is required.

Operators

The DQL language provides a number of operators that can be used to compare values. These operators can be used in the RESTRICTION and WHERE clauses. Here, we will take a look at these operators, how they are used, and what they do.

Comparison Operators

The comparison operators are used to compare two values, mostly using textual representations. The following comparison operators are supported:

  • = The equals operator is used to compare the text of an element to some other text. It returns true if the text of the element is equal to the text provided, case sensitive, and false otherwise.
  • != The not equals operator is used to compare the text of an element to some other text. It returns false if the text of the element is equal to the text provided, case sensitive, and true if the text is exactly the same.
  • CONTAINS The contains operator is used to determine if the text of an element contains the text provided. It returns true if the text of the element contains the text provided, case sensitive, and false otherwise.
  • CONTAINS IGNORE CASE The contains ignore case operator is used to determine if the text of an element contains the text provided. It returns true if the text of the element contains the text provided, without considering case, and false otherwise.
  • CONTAINS REGEX The contains regex operator is used to determine if the text of an element contains any text that matches the (Java) Regular Expression provided . It returns true if any part of the element's text matches the regular expression provided, case sensitive, and false otherwise.
  • CONTAINS REGEX IGNORE CASE The contains regex ignore case operator is used to determine if the text of an element contains any text that matches the (Java) Regular Expression provided. It returns true if any part of the element's text matches the regular expression provided, without considering case, and false otherwise.
  • MATCHES REGEX The matches regex operator is used to determine if the text of an element matches exactly the (Java) Regular Expression provided. It returns true if the entire text of the element matches the regular expression provided, case sensitive, and false otherwise.
  • MATCHES REGEX IGNORE CASE The matches regex ignore case operator is used to determine if the text of an element matches exactly the (Java) Regular Expression provided. It returns true if the entire text of the element matches the regular expression provided, without considering case, and false otherwise.

Relative Position Operators

The relative position operators are used to compare the position of two elements. Many of these operators accept supplying a distance. The distance is comprised of a number and a unit. The unit may be either PIXELS or INCHES (or PIXEL or INCH). When comparing two elements based on their positions, all distances are first translated into a number of pixels. This conversion is performed using the dots.per.inch metadata of the document, if it is present. If the dots.per.inch metadata is not present, the default value of 72 DPI is used, as this is the standard for PDF documents. This aligns such that 72 pixels will equate to one inch if the PDF is printed at 100% scale.

To determine the location of one element relative to another, both elements must have a Bounding Box present in the document. If either element does not have a Bounding Box, the comparison will return false. Additionally, the two elements must be on the same page, as defined by the start.page metadata of the element, or the comparison will return false.

The following relative position operators are supported:

  • RIGHT OF <entity> [WITHIN <distance>] The right of operator is used to determine if one element is to the right of another element, optionally within a certain distance. If the WITHIN clause is omitted, the operator returns true if the first element is to the right of the second element, and false otherwise. If the WITHIN clause is included, the operator returns true if the first element is to the right of the second element and the distance between the first element's left edge and the second element's right edge is less than or equal to the distance provided. The vertical, or y position of the elements is not considered.
  • LEFT OF <entity> [WTHIN <distance>] The left of operator is used to determine if one element is to the left of another element, optionally within a certain distance. If the WITHIN clause is omitted, the operator returns true if the first element is to the left of the second element, and false otherwise. If the WITHIN clause is included, the operator returns true if the first element is to the left of the second element and the distance between the first element's right edge and the second element's left edge is less than or equal to the distance provided. The vertical, or y position of the elements is not considered.
  • ABOVE <entity> [WITHIN <distance>] The above operator is used to determine if one element is above another element, optionally within a certain distance. If the WITHIN clause is omitted, the operator returns true if the first element is above the second element, and false otherwise. If the WITHIN clause is included, the operator returns true if the first element is above the second element and the distance between the first element's bottom edge and the second element's top edge is less than or equal to the distance provided. The horizontal, or x position of the elements is not considered.
  • BELOW <entity> [WITHIN <distance>] The below operator is used to determine if one element is below another element, optionally within a certain distance. If the WITHIN clause is omitted, the operator returns true if the first element is below the second element, and false otherwise. If the WITHIN clause is included, the operator returns true if the first element is below the second element and the distance between the first element's top edge and the second element's bottom edge is less than or equal to the distance provided. The horizontal, or x position of the elements is not considered.
  • LEFT ALIGNED WITH <entity> [WITHIN <distance>] The left aligned with operator is used to determine if one element has the same left-hand border as another element, within some threshold. If the WITHIN clause is omitted, a threshold of 3 PIXELS is used. The vertical, or y position of the elements is not considered.
  • RIGHT ALIGNED WITH <entity> [WITHIN <distance>] The right aligned with operator is used to determine if one element has the same right-hand border as another element, within some threshold. If the WITHIN clause is omitted, a threshold of 3 PIXELS is used. The vertical, or y position of the elements is not considered.
  • TOP ALIGNED WITH <entity> [WITHIN <distance>] The top aligned with operator is used to determine if one element has the same top border as another element, within some threshold. If the WITHIN clause is omitted, a threshold of 3 PIXELS is used. The horizontal, or x position of the elements is not considered.
  • BOTTOM ALIGNED WITH <entity> [WITHIN <distance>] The bottom aligned with operator is used to determine if one element has the same bottom border as another element, within some threshold. If the WITHIN clause is omitted, a threshold of 3 PIXELS is used. The horizontal, or x position of the elements is not considered.
  • CENTER X ALIGNED WITH <entity> [WITHIN <distance>] The center x aligned with operator is used to determine if one element has the same center x position as another element, within some threshold. If the WITHIN clause is omitted, a threshold of 3 PIXELS is used. This operator can be helpful when comparing elements that are center-aligned on a page or have text that is center-aligned. The vertical, or y position of the elements is not considered.
  • CENTER Y ALIGNED WITH <entity> [WITHIN <distance>] The center y aligned with operator is used to determine if one element has the same center y position as another element, within some threshold. If the WITHIN clause is omitted, a threshold of 3 PIXELS is used. This operator can be helpful when comparing elements that are vertically aligned on a page or have text that is vertically aligned. The horizontal, or x position of the elements is not considered.
  • WITHIN <distance> OF <entity> The within operator is used to determine if one element is within a certain distance of another element. The distance is calculated as the shortest distance between any two points on the two elements' bounding boxes. Unlike the other operators discussed here, the within operator considers both the x and y coordinates of the elements.
  • SAME PAGE AS <entity> The same page as operator is used to determine if two elements are on the same page, as defined by the start.page metadata of the elements.
  • ON PAGE <value> The on page operator is used to determine if an element is on a specific page, as defined by the start.page metadata of the element.

Logical Operators

The logical operators are used to combine multiple conditions or restrictions, or to negate a condition or restriction. The following logical operators are supported:

  • AND The and operator is used to combine two conditions or restrictions. It returns true if both conditions or restrictions are true, and false otherwise.
  • OR The or operator is used to combine two conditions or restrictions. It returns true if either condition or restriction is true, and false otherwise.
  • NOT The not operator is used to negate a condition or restriction. It returns true if the condition or restriction is false, and false otherwise.

Unary Operators

In contrast to the operators discussed thus far, the unary operators are those that operate on a single value. The following unary operators are supported:

  • EXISTS The exists operator is used to determine if any element in the document matches the variable, as it is defined, with its restrictions.
  • NOT EXISTS The not exists operator is used to determine if no element in the document matches the variable, as it is defined, with its restrictions.

Examples

In this section, we will provide some sample use cases and queries to demonstrate how the DQL language can be used to extract information from documents.

Extracting a Table

In this example, we will extract the image of a table from the document. We provide a few different examples, each with sightly different requirements.

Extract All Tables

Extract the image of all tables in the document:

DEFINE
tbl AS TABLE
SELECT
tbl.IMAGE
WHERE
tbl EXISTS

Remember that we must define a restriction on each variable, or we must use a WHERE clause to filter the results. In this case, we want all tables, so we can simply use the EXISTS operator to match all tables.

Extract Table with Specific Text

Extract the image of any table that contains the text "in millions", regardless of the case of the text:

DEFINE
tbl AS TABLE WITH RESTRICTION tbl CONTAINS IGNORE CASE "in millions"
SELECT
tbl.IMAGE
Extract All Tables on Page 2

Extract the image of all tables that are on the second page of the document:

DEFINE
tbl AS TABLE WITH RESTRICTION tbl ON PAGE 2
SELECT
tbl.IMAGE
Extract the Table Just Below Text

Extract the image of any table that is just below the text "Officers of the Company":

DEFINE
tbl AS TABLE
officers AS TEXT WITH RESTRICTION officers CONTAINS "Officers of the Company"
SELECT
tbl.IMAGE
WHERE
tbl BELOW officers WITHIN 2 INCHES

Extracting Text Relative to Others

Extract the text that is immediately to the right of the text "Signed By:"

DEFINE
signedBy AS TEXT WITH RESTRICTION signedBy CONTAINS "Signed By:"
name AS TEXT
SELECT
name
WHERE
name RIGHT OF signedBy WITHIN 1 INCH
AND
name TOP ALIGNED WITH signedBy

Extracting Text with Regular Expressions

Consider the layout of many companies' 10-K filings. Typically, the first page contains the company's ticker symbol just below the text "Trading Symbol:" or "Trading Symbols:" or "Trading Symbol(s):" or something of that nature. Words may or may not be capitalized. Depending on the layout of the document, the ticker symbol may be on the same line as the text or it may be on the next line. If it is on the next line, the document parsing model might include the ticker symbol as part of the same text block as the "Trading Symbol" text. Other times, it might be in a separate text block. The following query will extract the ticker symbol from the document:

DEFINE
tradingSymbol AS TEXT WITH RESTRICTION tradingSymbol CONTAINS IGNORE CASE "Trading Symbol" AND tradingSymbol ON PAGE 1
SELECT
regex_replace_ignore_case(tradingSymbol, "Trading Symbol.*?([A-Z]{1,6})$", "$1") AS ticker

UNION

DEFINE
tradingSymbol AS TEXT WITH RESTRICTION tradingSymbol CONTAINS IGNORE CASE "Trading Symbol" AND tradingSymbol ON PAGE 1
ticker AS TEXT WITH RESTRICTION ticker MATCHES REGEX "[A-Z]{1,6}"
SELECT
ticker
WHERE
ticker BELOW tradingSymbol WITHIN 1 INCH
AND
ticker CENTER X ALIGNED WITH tradingSymbol

In this case, we define a union of two separate queries. The first query uses a regular expression to extract the ticker symbol from the same text block that contains the "Trading Symbol" text. The regex_replace function is used to extract the ticker symbol from the text block and assign it to the ticker alias. The second query uses a regular expression to extract the ticker symbol from a separate text block that is below and left-aligned with the "Trading Symbol". The UNION keyword is used to combine the results of the two queries into a single result set. Regardless of whether the first or second query finds the desired text, the result will be available in a variable named ticker.

Often, though, we will want to extract that value as an attribute of the FlowFile and have the query's results return the original document. We can update our query to do that as follows:

DEFINE
tradingSymbol AS TEXT ATTRIBUTE WITH RESTRICTION tradingSymbol CONTAINS IGNORE CASE "Trading Symbol" AND tradingSymbol ON PAGE 1
SELECT
regex_replace_ignore_case(tradingSymbol, "Trading Symbol.*?([A-Z]{1,6})$", "$1") AS ticker,
DOCUMENT

UNION

DEFINE
tradingSymbol AS TEXT ATTRIBUTE WITH RESTRICTION tradingSymbol CONTAINS IGNORE CASE "Trading Symbol" AND tradingSymbol ON PAGE 1
ticker AS TEXT ATTRIBUTE WITH RESTRICTION ticker MATCHES REGEX "[A-Z]{1,6}"
SELECT
ticker, DOCUMENT
WHERE
ticker BELOW tradingSymbol WITHIN 1 INCH
AND
ticker CENTER X ALIGNED WITH tradingSymbol

In this case, we define the ticker and tradingSymbol variables as attributes. Note that because the first query marks the tradingSymbol as an attribute, the second query must also mark the tradingSymbol as an attribute, since they are UNIONed together.

System Resource Considerations

This component does not specify system resource considerations.

See Also

ChunkDocument, ParsePdfDocument