QueryDocument
Description
Evaluates a SQL-like query against the incoming Datavolo Document JSON, producing the results on the outgoing FlowFile. This can be used to select specific elements, or to route the document based on the query results. One or more queries may be specified as user-defined properties.
Tags
datavolo, document, dsl, filter, json, query, relationships, route, sql, unstructured
Properties
In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language.
Display Name | API Name | Default Value | Allowable Values | Description |
---|
Dynamic Properties
Name | Value | Description |
---|---|---|
The name of the relationship to route the FlowFile to | The SQL-like query to evaluate against the incoming document | The SQL-like query to evaluate against the incoming document Supports Expression Language: No |
Relationships
Name | Description |
---|---|
failure | Any incoming FlowFile that fails to process is routed to this Relationship |
original | The incoming FlowFile is routed to this Relationship if any of the specified queries match. This FlowFile will be updated with attributes appropriate for use with MergeDocumentElements. |
unmatched | If the incoming document does not match any of the specified queries, it is routed to this Relationship |
Reads Attributes
This processor does not read attributes.
Writes Attributes
Name | Description |
---|---|
container.scope | The scope of the container is set to DOCUMENT for the JSON Document, TABLE for tables, and FIGURE for any figures/images identified |
document.id | A unique UUID for the document |
fragment.count | The total number of fragments |
fragment.index | The index of the fragment |
mime.type | The MIME type is set to 'application/json' if multiple elements are selected. Otherwise, it is set to the appropriate MIME type, depending on the type of element selected. For example, it may be text/plain or image/png. |
section.title | The title of the section containing the selected element, if it is available |
State Management
This component does not store state.
Restricted
This component is not restricted.
Input Requirement
Input requirements are not specified for this component.
Syntax
In short, a DQL query consists of a series of statements: a definition statement, a selection statement, and a where statement. The selection statement is required, while the definition and where statements are optional. We might describe the syntax as such:
DEFINE
<variable> AS <type> [WITH RESTRICTION <restriction>][,]
<variable> AS <type> [WITH RESTRICTION <restriction>][,]
SELECT
<variable>,
<variable>
WHERE
<condition>
As an example, we can use the following query to extract out any table that contains the text "Revenue":
DEFINE
revenue AS table WITH RESTRICTION revenue CONTAINS "Revenue"
SELECT
revenue
Below, we will provide a full BNF if the language. This will provide a more detailed description of the syntax. We will then describe these elements in a less formal way.
Backus-Naur Form (BNF)
The syntax of DQL is described by the following BNF:
<query> ::= <singleQuery> ('union' <singleQuery>)* EOF;
<singleQuery> ::= <definitionStatement>? <selectionStatement> <whereStatement>?;
<definitionStatement> ::= 'define' <definitionList>;
<definitionList> ::= <definition> (',' <definition>)*;
<definition> ::= IDENTIFIER 'as' <scope> ('.' REPRESENTATION)? ATTRIBUTE? <restrictionStatement>?;
<scope> ::= 'image' | 'text' | 'table' | 'caption' | 'header' | 'footer' | 'section';
<selectionStatement> ::= 'select' <selection> (',' <selection>)*;
<selection> ::= 'document' | <entity> <selectionSubElement>? | <transformation>;
<selectionSubElement> ::= '.' ('data' | 'image');
<whereStatement> ::= 'where' <condition>;
<restrictionStatement> ::= 'with restriction' <restriction>;
<restriction> ::= <value> <comparisonOperator> <value>
| <entity> 'on page' <value>
| '(' <restriction> ')'
| <restriction> <logicalOperator> <restriction>
| 'not' <restriction>;
<condition> ::= <value> <comparisonOperator> <value>
| <entity> <relativeOperator> <entity> ('within' <distance>)?
| <entity> 'same page as' <entity>
| <entity> 'on page' <value>
| <distanceStatement>
| <variable> <unaryOperator>
| '(' <condition> ')'
| <condition> <logicalOperator> <condition>
| 'not' <condition>;
<comparisonOperator> ::= '=' | '!=' | 'contains' 'ignore case'?
| 'contains regex' 'ignore case'?
| 'matches regex' 'ignore case'?;
<relativeOperator> ::= 'right of' | 'left of' | 'above' | 'below'
| 'left aligned with' | 'right aligned with'
| 'top aligned with' | 'bottom aligned with'
| 'center x aligned with' | 'center y aligned with';
<logicalOperator> ::= 'and' | 'or';
<unaryOperator> ::= 'exists' | 'not exists';
<variable> ::= <entity> | <metadataReference> | <sectionTitle>;
<metadataReference> ::= (<entity> | <representation>) '.' 'metadata' '[' <metadataKey> ']';
<sectionTitle> ::= <entity> '.' 'title';
<representation> ::= <entity> '.' 'representation';
<metadataKey> ::= STRING;
<entity> ::= IDENTIFIER;
<transformation> ::= ('string_replace' | 'regex_replace') '(' <entity> ',' STRING ',' STRING ')' 'as' IDENTIFIER;
<value> ::= NUMBER | STRING | <variable>;
<distanceStatement> ::= <entity> 'within' <distance> 'of' <entity>;
<distance> ::= NUMBER <units>;
<units> ::= 'inch' | 'inches' | 'pixel' | 'pixels';
Query
The most basic element of a DQL query is the query itself. There are times, however, when we may want to combine multiple queries
in order to get the information we need. This can be accomplished by writing multiple queries and using the union
keyword to
combine them.
Each of the queries will typically contain a DEFINE
statement, a SELECT
statement, and often a WHERE
statement. We will look at each of these in turn.
The keywords in a query are case-insensitive, so SELECT
, select
, and Select
are all equivalent.
However, variable names are case-sensitive, so revenue
and Revenue
are not the same.
Additionally, metadata keys are case-sensitive, so start.page
and START.PAGE
are not the same.
Definition Statement
The Definition Statement, or DEFINE
statement, is used to define variables that can be used in the SELECT
and WHERE
statements.
These variables give us a way to refer to the elements of the document that we are interested in. The syntax for the DEFINE
statement is as follows:
DEFINE
<variable> AS <type> [WITH RESTRICTION <restriction>][,]
<variable> AS <type> [WITH RESTRICTION <restriction>]
A variable name can be any combination of 1 or more letters (a-z), numbers, or underscores, with the exception that it must start with a letter. Letters may be
upper or lower case, but they are case sensitive, so if you define it as myVar
, it cannot be referred to as myvar
.
The type
of the variable can be one of the following:
image
text
table
caption
header
footer
section
document
Multiple variables may be declared. They are optionally separated by commas. Each variable must have a unique name. For example, the following are all valid:
DEFINE
myImage AS image,
myText AS text
DEFINE
myImage AS image
myText AS text
DEFINE myImage AS IMAGE myText AS TEXT
WITH RESTRICTION
The WITH RESTRICTION
clause is optional, but for most queries, it should be used. This clause allows you to restrict which elements in the document
are candidates to be assigned to the given variable. It is possible to leave off the restriction for all variable entirely and instead use the WHERE
clause
to filter the results. However, use restrictions is significantly more efficient.
Any condition can be expressed as a restriction as long as the condition does not reference any other variables. If there is a need to reference another variable,
the condition should be expressed in the WHERE
clause. Often, these are used together. We might, for example, define a variable that represents a text block and use
a restriction to narrow down the text blocks to those that contain a certain keyword. We might then use the WHERE
clause to further restrict the results to only those
where the text block is within a certain distance of an image.
REPRESENTATION
Each container within Datavolo's Document model contains at least one of the following components:
containers
- Child elements that are contained within the container.textElement
- The text content of the container.processingElement
- A holder for any sort of processing results have have occurred on the container throughout the flow.
The processingElement
provides the ability to hold multiple different representations of that element. For example, a table might have an image representation
that was extracted from a PDF file. It might also have a CSV representation that was derived by using vision or machine learning models. It might further have a
plain text representation that was derived by asking a Large Language Model (LLM) to summarize the table. The REPRESENTATION
keyword allows you to specify that
a variable should be assigned a particular representation of the element, rather than the element itself.
Often, a Representation is used in conjunction with a restriction that filters based on the metadata of the representation. For example, we might specify that we want to retrieve the image representation of a table. We can do that by using the following query:
DEFINE
revenue AS TABLE.REPRESENTATION WITH RESTRICTION revenue.METADATA['mime.type'] = 'image/png'
SELECT
revenue
Note that in this case, we are not selecting the image of the table itself, but rather the entire representation
element from the document, which includes the image
data as well as any metadata that was extracted from the image. If we want to select the image data itself, we would use the DATA
keyword. For example:
DEFINE
revenue AS TABLE.REPRESENTATION WITH RESTRICTION revenue.METADATA['mime.type'] = 'image/png'
SELECT
revenue.DATA
ATTRIBUTE
The ATTRIBUTE
keyword is used to specify that any NiFi Processor that is making use of DQL should consider the returned value to be an attribute of the output FlowFile,
rather than writing the value to the content of the FlowFile. This is often useful when multiple values are being returned from a single query and you want to be able to
output some results as attributes and others as content. For example, we can use the following query to extract a table from a document, while also extracting the
text just to the right of the table as an attribute:
DEFINE
table AS TABLE
description AS TEXT ATTRIBUTE
SELECT table, description
WHERE
description RIGHT OF table
AND
description TOP ALIGNED WITH table
Selection Statement
The Selection Statement, or SELECT
statement, is used to specify which elements of the document should be returned by the query. In its simplest
form, the syntax for the SELECT
statement is as follows:
SELECT <variable>
or
SELECT <variable>, <variable>
We might also select the entire document. For example, we could use the following query to select any document that has a table containing the text "Revenue":
DEFINE
revenue AS table WITH RESTRICTION revenue CONTAINS "Revenue"
SELECT
DOCUMENT
Selecting with .DATA
When a variable is defined as a REPRESENTATION
, it is possible to select the data of the representation by using the .DATA
keyword. For example, the following query
will return the Representation element of any table image in the document:
DEFINE
revenue AS TABLE.REPRESENTATION
SELECT
revenue
WHERE
revenue.METADATA['mime.type'] = 'image/png'
Depending on the use case, this may be desirable. In other cases, it may be more useful to select the data of the representation. That is, the image itself.
This can be done by using the .DATA
keyword. For example:
DEFINE
revenue AS TABLE.REPRESENTATION
SELECT
revenue.DATA
WHERE
revenue.METADATA['mime.type'] = 'image/png'
Selecting with .IMAGE
Working with images is a common use case for DQL. As such, the language provides a simpler way to select the image representation of an element.
When the desired element is an Image or a Table, it can be useful to use .IMAGE
when selecting the variable. This will return any Representation of
the element that is an image. For example, the following query will return the image representation of any table that is found on the second page of the document:
DEFINE
revenue AS table WITH RESTRICTION revenue ON PAGE 2
SELECT
revenue.IMAGE
In this way, we can avoid the complexity of defining revenue
as a TABLE.REPRESENTATION
and then selecting revenue.DATA
and we also have the ability to further
restrict the results based on the entire table
element.
Selecting with Text Transformations
Often times, we want to select specific text from a document, especially for use as an attribute. We might, for example, want to extract the forecast from a document using the following query:
DEFINE
forecast AS TEXT ATTRIBUTE WITH RESTRICTION forecast CONTAINS "Today's Forecast: "
SELECT
forecast
Now, if the document were to contain the text "Today's Forecast: Sunny with a high of 75" we would get the entire text block. However, we might only want the text that comes after "Today's Forecast:". We can do this by using a text transformation. For example:
DEFINE
forecast AS TEXT ATTRIBUTE WITH RESTRICTION forecast CONTAINS "Today's Forecast: "
SELECT
string_replace(forecast, "Today's Forecast: ", "") AS forecast
This query will return the text "Sunny with a high of 75" as the value of the forecast
variable.
When we need more than a simple string replacement, we can use a regular expression with matching groups.
For example, the following query will return the text "Sunny, 75" as the value of the forecast
variable:
DEFINE
forecast AS TEXT ATTRIBUTE WITH RESTRICTION forecast CONTAINS "Today's Forecast: "
SELECT
regex_replace(forecast, "Today's Forecast: (.*?) with a high of (75)", "$1, $2") AS forecast
We could also make multiple selections based on the same variable, using an alias for each selection. For example:
DEFINE
forecast AS TEXT ATTRIBUTE WITH RESTRICTION forecast CONTAINS "Today's Forecast: "
SELECT
string_replace(forecast, "Today's Forecast: ", "") AS forecast,
regex_replace(forecast, "Today's Forecast: (.*?) with a high of .*", "$1") AS skyCondition,
regex_replace(forecast, "Today's Forecast: .* with a high of (\\d+)", "$1") AS highTemperature
Where Statement
The WHERE
statement is used to filter the results of the query, by comparing the values of the variables that have been defined in the DEFINE
statement.
Unlike the RESTRICTION
clause, the WHERE
statement can be used to compare multiple values.
The WHERE
statement is optional if all variables that are defined have a restriction. If any variable lacks a restriction, the WHERE
statement is required.
Operators
The DQL language provides a number of operators that can be used to compare values. These operators can be used in the RESTRICTION
and WHERE
clauses.
Here, we will take a look at these operators, how they are used, and what they do.
Comparison Operators
The comparison operators are used to compare two values, mostly using textual representations. The following comparison operators are supported:
=
The equals operator is used to compare the text of an element to some other text. It returnstrue
if the text of the element is equal to the text provided, case sensitive, andfalse
otherwise.!=
The not equals operator is used to compare the text of an element to some other text. It returnsfalse
if the text of the element is equal to the text provided, case sensitive, andtrue
if the text is exactly the same.CONTAINS
The contains operator is used to determine if the text of an element contains the text provided. It returnstrue
if the text of the element contains the text provided, case sensitive, andfalse
otherwise.CONTAINS IGNORE CASE
Thecontains ignore case
operator is used to determine if the text of an element contains the text provided. It returnstrue
if the text of the element contains the text provided, without considering case, andfalse
otherwise.CONTAINS REGEX
Thecontains regex
operator is used to determine if the text of an element contains any text that matches the (Java) Regular Expression provided . It returnstrue
if any part of the element's text matches the regular expression provided, case sensitive, andfalse
otherwise.CONTAINS REGEX IGNORE CASE
Thecontains regex ignore case
operator is used to determine if the text of an element contains any text that matches the (Java) Regular Expression provided. It returnstrue
if any part of the element's text matches the regular expression provided, without considering case, andfalse
otherwise.MATCHES REGEX
Thematches regex
operator is used to determine if the text of an element matches exactly the (Java) Regular Expression provided. It returnstrue
if the entire text of the element matches the regular expression provided, case sensitive, andfalse
otherwise.MATCHES REGEX IGNORE CASE
Thematches regex ignore case
operator is used to determine if the text of an element matches exactly the (Java) Regular Expression provided. It returnstrue
if the entire text of the element matches the regular expression provided, without considering case, andfalse
otherwise.
Relative Position Operators
The relative position operators are used to compare the position of two elements. Many of these operators accept supplying a distance. The distance is comprised of a number
and a unit. The unit may be either PIXELS
or INCHES
(or PIXEL
or INCH
). When comparing two elements based on their positions, all distances are first translated into a
number of pixels. This conversion is performed using the dots.per.inch
metadata of the document, if it is present. If the dots.per.inch
metadata is not present, the default
value of 72 DPI is used, as this is the standard for PDF documents. This aligns such that 72 pixels will equate to one inch if the PDF is printed at 100% scale.
To determine the location of one element relative to another, both elements must have a Bounding Box present in the document. If either element does not have a Bounding Box,
the comparison will return false
. Additionally, the two elements must be on the same page, as defined by the start.page
metadata of the element, or the comparison will
return false
.
The following relative position operators are supported:
RIGHT OF <entity> [WITHIN <distance>]
Theright of
operator is used to determine if one element is to the right of another element, optionally within a certain distance. If theWITHIN
clause is omitted, the operator returnstrue
if the first element is to the right of the second element, andfalse
otherwise. If theWITHIN
clause is included, the operator returnstrue
if the first element is to the right of the second element and the distance between the first element's left edge and the second element's right edge is less than or equal to the distance provided. The vertical, ory
position of the elements is not considered.LEFT OF <entity> [WTHIN <distance>]
Theleft of
operator is used to determine if one element is to the left of another element, optionally within a certain distance. If theWITHIN
clause is omitted, the operator returnstrue
if the first element is to the left of the second element, andfalse
otherwise. If theWITHIN
clause is included, the operator returnstrue
if the first element is to the left of the second element and the distance between the first element's right edge and the second element's left edge is less than or equal to the distance provided. The vertical, ory
position of the elements is not considered.ABOVE <entity> [WITHIN <distance>]
Theabove
operator is used to determine if one element is above another element, optionally within a certain distance. If theWITHIN
clause is omitted, the operator returnstrue
if the first element is above the second element, andfalse
otherwise. If theWITHIN
clause is included, the operator returnstrue
if the first element is above the second element and the distance between the first element's bottom edge and the second element's top edge is less than or equal to the distance provided. The horizontal, orx
position of the elements is not considered.BELOW <entity> [WITHIN <distance>]
Thebelow
operator is used to determine if one element is below another element, optionally within a certain distance. If theWITHIN
clause is omitted, the operator returnstrue
if the first element is below the second element, andfalse
otherwise. If theWITHIN
clause is included, the operator returnstrue
if the first element is below the second element and the distance between the first element's top edge and the second element's bottom edge is less than or equal to the distance provided. The horizontal, orx
position of the elements is not considered.LEFT ALIGNED WITH <entity> [WITHIN <distance>]
Theleft aligned with
operator is used to determine if one element has the same left-hand border as another element, within some threshold. If theWITHIN
clause is omitted, a threshold of3 PIXELS
is used. The vertical, ory
position of the elements is not considered.RIGHT ALIGNED WITH <entity> [WITHIN <distance>]
Theright aligned with
operator is used to determine if one element has the same right-hand border as another element, within some threshold. If theWITHIN
clause is omitted, a threshold of3 PIXELS
is used. The vertical, ory
position of the elements is not considered.TOP ALIGNED WITH <entity> [WITHIN <distance>]
Thetop aligned with
operator is used to determine if one element has the same top border as another element, within some threshold. If theWITHIN
clause is omitted, a threshold of3 PIXELS
is used. The horizontal, orx
position of the elements is not considered.BOTTOM ALIGNED WITH <entity> [WITHIN <distance>]
Thebottom aligned with
operator is used to determine if one element has the same bottom border as another element, within some threshold. If theWITHIN
clause is omitted, a threshold of3 PIXELS
is used. The horizontal, orx
position of the elements is not considered.CENTER X ALIGNED WITH <entity> [WITHIN <distance>]
Thecenter x aligned with
operator is used to determine if one element has the same center x position as another element, within some threshold. If theWITHIN
clause is omitted, a threshold of3 PIXELS
is used. This operator can be helpful when comparing elements that are center-aligned on a page or have text that is center-aligned. The vertical, ory
position of the elements is not considered.CENTER Y ALIGNED WITH <entity> [WITHIN <distance>]
Thecenter y aligned with
operator is used to determine if one element has the same center y position as another element, within some threshold. If theWITHIN
clause is omitted, a threshold of3 PIXELS
is used. This operator can be helpful when comparing elements that are vertically aligned on a page or have text that is vertically aligned. The horizontal, orx
position of the elements is not considered.WITHIN <distance> OF <entity>
Thewithin
operator is used to determine if one element is within a certain distance of another element. The distance is calculated as the shortest distance between any two points on the two elements' bounding boxes. Unlike the other operators discussed here, thewithin
operator considers both thex
andy
coordinates of the elements.SAME PAGE AS <entity>
Thesame page as
operator is used to determine if two elements are on the same page, as defined by thestart.page
metadata of the elements.ON PAGE <value>
Theon page
operator is used to determine if an element is on a specific page, as defined by thestart.page
metadata of the element.
Logical Operators
The logical operators are used to combine multiple conditions or restrictions, or to negate a condition or restriction. The following logical operators are supported:
AND
Theand
operator is used to combine two conditions or restrictions. It returnstrue
if both conditions or restrictions aretrue
, andfalse
otherwise.OR
Theor
operator is used to combine two conditions or restrictions. It returnstrue
if either condition or restriction istrue
, andfalse
otherwise.NOT
Thenot
operator is used to negate a condition or restriction. It returnstrue
if the condition or restriction isfalse
, andfalse
otherwise.
Unary Operators
In contrast to the operators discussed thus far, the unary operators are those that operate on a single value. The following unary operators are supported:
EXISTS
Theexists
operator is used to determine if any element in the document matches the variable, as it is defined, with its restrictions.NOT EXISTS
Thenot exists
operator is used to determine if no element in the document matches the variable, as it is defined, with its restrictions.
Examples
In this section, we will provide some sample use cases and queries to demonstrate how the DQL language can be used to extract information from documents.
Extracting a Table
In this example, we will extract the image of a table from the document. We provide a few different examples, each with sightly different requirements.
Extract All Tables
Extract the image of all tables in the document:
DEFINE
tbl AS TABLE
SELECT
tbl.IMAGE
WHERE
tbl EXISTS
Remember that we must define a restriction on each variable, or we must use a WHERE
clause to filter the results. In this case, we want all tables, so we can simply
use the EXISTS
operator to match all tables.
Extract Table with Specific Text
Extract the image of any table that contains the text "in millions", regardless of the case of the text:
DEFINE
tbl AS TABLE WITH RESTRICTION tbl CONTAINS IGNORE CASE "in millions"
SELECT
tbl.IMAGE
Extract All Tables on Page 2
Extract the image of all tables that are on the second page of the document:
DEFINE
tbl AS TABLE WITH RESTRICTION tbl ON PAGE 2
SELECT
tbl.IMAGE
Extract the Table Just Below Text
Extract the image of any table that is just below the text "Officers of the Company":
DEFINE
tbl AS TABLE
officers AS TEXT WITH RESTRICTION officers CONTAINS "Officers of the Company"
SELECT
tbl.IMAGE
WHERE
tbl BELOW officers WITHIN 2 INCHES
Extracting Text Relative to Others
Extract the text that is immediately to the right of the text "Signed By:"
DEFINE
signedBy AS TEXT WITH RESTRICTION signedBy CONTAINS "Signed By:"
name AS TEXT
SELECT
name
WHERE
name RIGHT OF signedBy WITHIN 1 INCH
AND
name TOP ALIGNED WITH signedBy
Extracting Text with Regular Expressions
Consider the layout of many companies' 10-K filings. Typically, the first page contains the company's ticker symbol just below the text "Trading Symbol:" or "Trading Symbols:" or "Trading Symbol(s):" or something of that nature. Words may or may not be capitalized. Depending on the layout of the document, the ticker symbol may be on the same line as the text or it may be on the next line. If it is on the next line, the document parsing model might include the ticker symbol as part of the same text block as the "Trading Symbol" text. Other times, it might be in a separate text block. The following query will extract the ticker symbol from the document:
DEFINE
tradingSymbol AS TEXT WITH RESTRICTION tradingSymbol CONTAINS IGNORE CASE "Trading Symbol" AND tradingSymbol ON PAGE 1
SELECT
regex_replace_ignore_case(tradingSymbol, "Trading Symbol.*?([A-Z]{1,6})$", "$1") AS ticker
UNION
DEFINE
tradingSymbol AS TEXT WITH RESTRICTION tradingSymbol CONTAINS IGNORE CASE "Trading Symbol" AND tradingSymbol ON PAGE 1
ticker AS TEXT WITH RESTRICTION ticker MATCHES REGEX "[A-Z]{1,6}"
SELECT
ticker
WHERE
ticker BELOW tradingSymbol WITHIN 1 INCH
AND
ticker CENTER X ALIGNED WITH tradingSymbol
In this case, we define a union of two separate queries. The first query uses a regular expression to extract the ticker symbol from the same text block that contains the
"Trading Symbol" text. The regex_replace
function is used to extract the ticker symbol from the text block and assign it to the ticker
alias.
The second query uses a regular expression to extract the ticker symbol from a separate text block that is below and left-aligned with the "Trading Symbol".
The UNION
keyword is used to combine the results of the two queries into a single result set.
Regardless of whether the first or second query finds the desired text, the result will be available in a variable named ticker
.
Often, though, we will want to extract that value as an attribute of the FlowFile and have the query's results return the original document. We can update our query to do that as follows:
DEFINE
tradingSymbol AS TEXT ATTRIBUTE WITH RESTRICTION tradingSymbol CONTAINS IGNORE CASE "Trading Symbol" AND tradingSymbol ON PAGE 1
SELECT
regex_replace_ignore_case(tradingSymbol, "Trading Symbol.*?([A-Z]{1,6})$", "$1") AS ticker,
DOCUMENT
UNION
DEFINE
tradingSymbol AS TEXT ATTRIBUTE WITH RESTRICTION tradingSymbol CONTAINS IGNORE CASE "Trading Symbol" AND tradingSymbol ON PAGE 1
ticker AS TEXT ATTRIBUTE WITH RESTRICTION ticker MATCHES REGEX "[A-Z]{1,6}"
SELECT
ticker, DOCUMENT
WHERE
ticker BELOW tradingSymbol WITHIN 1 INCH
AND
ticker CENTER X ALIGNED WITH tradingSymbol
In this case, we define the ticker
and tradingSymbol
variables as attributes. Note that because the first query marks the tradingSymbol
as an
attribute, the second query must also mark the tradingSymbol
as an attribute, since they are UNION
ed together.
System Resource Considerations
This component does not specify system resource considerations.