Skip to main content

Release Notes

2024-11-06

Runtime

  1. Prompt LLM provides a unified way to prompt LLMs like OpenAI and Anthropic
  2. Get HubSpot Object makes it easy to get a HubSpot object and its associations with a single processor
  3. Improved support for the growing number of Flow Registry Clients that can provide version control for your pipelines
  4. Improved visibility of flow metrics, especially for large clusters
  5. Attributes for FlowFiles and Provenance Events are easier to find and compare as they change

2024-10-30

Runtime

  1. Evaluating RAG performance across multiple vector stores and LLMs? We've added an entire suite of tools for that!
    1. Evaluate RAG Retrieval calculates retrieval metrics (Precision@N, Recall@N, FScore@N, MAP@N, MRR) for a RAG system using an LLM as a judge.
    2. Generate Answers From Context generates synthetic answers for each question present in the incoming records using a Large Language Model (LLM) from the given context.
    3. Generate Answers From Ground Truth generates answers from ground truth document to extract answers for further performance testing.
    4. Evaluate RAG Answer Correctness evaluates the factual correctness and similarity of generated answers by comparing against the ground truth answers in a Retrieval-Augmented Generation (RAG) pipeline.
    5. Evaluate RAG Faithfulness evaluates the factual consistency of the generated answer against the given context in a Retrieval-Augmented Generation (RAG) system.
    6. OpenAI LLM Service integrates with OpenAI's Chat Completion API. Supports configurable parameters, including model selection, temperature, top_p, max tokens, and retry behavior.
    7. Anthropic LLM Service integrates with Anthropic's Claude AI models through their Messages API. Supports configurable parameters, including model selection, temperature, top_p, max tokens, and retry behavior.
    8. Summarize Text summarizes the content of a FlowFile or attribute using any of the supported LLM Service Providers.
  2. Consume Kafka improves performance when running stateless and handles partition reassignment by rolling back offsets.
  3. Pinecone and Milvus processors can handle even larger vector sizes
    1. Query Milvus, Upsert Milvus, Delete Milvus
    2. Query Pinecone, Upsert Pinecone, Delete Pinecone

Platform

  1. Suspend and Activate gives you the power of Datavolo Runtimes when you need them and reduces cost when you don't.
  2. Taking advantage of both self-hosted BYOC and fully-managed SaaS Runtimes? Manage them all through one account and one view now!

2024-10-25

Runtime

  1. Consume Kafka improved error handling and routing for input records that do not match the configured Record Reader
  2. Flow Labels are easier to use and resize when they contain longer lines of text

2024-10-23

Runtime

  1. Consume Kafka and Publish Kafka improved provenance tracking with a specific Kafka Transit URI for records sent and received
  2. Put MLflow added an example use case for Databricks Workspaces
  3. Put HubSpot enhancements include:
    • a new retry relationship to more easily and accurately handle HubSpot server timeouts and rate limiting responses
    • error handling options if HubSpot does not have a matching property for a record
  4. Continued improvements to the UI for light and dark mode, context-aware messages, and versioned flow indicators

Platform

  1. Refreshed Runtime options to support 4x utilization use cases with our new Standard profile.
  2. A new High Performance profile is available for subscribing customers.

2024-10-16

Runtime

  1. Parse PDF Document delivers continued improvements to complex PDF document parsing.
  2. Query Pinecone now allows for explicit weighting of sparse and dense vectors with hybrid search.
  3. Delete Milvus adds ability to delete records by filter.
  4. UI Updates
    • Introduced a Provider icon into the extension dialog which is used when creating a new Processor, Controller Service, Reporting Task, or Parameter Provider.
    • Improved progress indicator when importing a Flow from the Registry.
    • EL and Parameter syntax highlighting now supports quoted Parameter names.

Platform

  1. Runtime updates that trigger scaling of the cluster now wait for the scale up or down to complete before allowing more updates.
  2. Easier than ever for new users to sign up when they arrive on our newly introduced welcome page for new users.

2024-10-09

Runtime

  1. New: Milvus bundle easily integrates with the high-performance, open source vector database
  2. New: Vertex AI bundle brings Google Cloud's generative AI solution for multimodal data into your data pipelines
    1. Create Vertex AI Embeddings converts text to embeddings using any of Vertex AI's Embedding models
    2. Prompt Vertex AI sends text and/or multimedia prompts to models like Gemini
  3. New: Put Vespa Document hooks into the Vespa Document API to update records in a given namespace
  4. Parse PDF Document now handles huge PDF pages quickly and reliably
  5. Prompt Anthropic AI added support for Anthropic's beta features and custom API versions, along with prompting for text vs. images
  6. Publish Kafka added support for zstd compression, and improved support for round robin partitioning to more evenly balance data across the Kafka cluster
  7. Put Iceberg Table now supports OAuth2 Token Refresh for Polaris Iceberg Catalog
  8. Query Pinecone allows for a configurable number of results via Expression Language
  9. Standard Databricks Workspace Client Service indicates whether a connection is valid when the service is enabled

Platform

  1. Organization Admins can now consent to legal Terms and Conditions on behalf of their team
  2. Organizations with many Runtimes can view and manage them more easily

2024-10-03

Runtime

  1. Auto-selection of the best content viewer for your data is improved
  2. In-App documentation improved - getting you the information you need faster
  3. You can now fine-tune stateless process groups with concurrent task count and timeouts

Platform

  1. Elastic scaling is powerful, but by popular demand you can now set the minimum number of nodes greater than one!

2024-09-30

Runtime

  1. Datavolo's proprietary Kafka 3 connectors Consume Kafka and Publish Kafka now deliver lower latency and higher throughput when you need it the most
  2. Advanced Document Parsing runs significantly faster for Parse PDF Document and Parse Table Image

Platform

  1. Improved upgrade experience for Datavolo Cloud Runtimes, especially for large-scale BYOC clusters
  2. Improved cluster autoscaling for high throughput and/or bursty workloads, giving you more capacity when you need it and only scaling down when you don't

2024-09-26

Runtime

  1. Put Iceberg Table introduces support for writing data to Apache Iceberg through Snowflake's Polaris Catalog
  2. Snowflake Connection Service now supports Private Link configurations for enhanced security with enterprise integrations

2024-09-25

Runtime

  1. High throughput use cases run smoother for all sizes of Datavolo Runtimes with improvements to the underlying FlowFile repository
  2. Parse PDF Document received a significant boost in precision and recall for element detection, including redacted documents
  3. Improved rendering of bulletins from components that the authenticated user cannot access
  4. Listen Slack now supports "Joined Channel" events

Platform

  1. Restarting a Runtime now supports an optional Recovery Mode for troubleshooting
  2. Reorganized the Runtime listing to accommodate new and upcoming features
  3. Better error handling and more intuitive messaging when the API is unavailable for any reason
  4. Improved seamless upgrades of the UI when new versions are available
  5. Updates to Terms and Conditions will require all users to accept the latest version.

2024-09-23

Runtime

  1. Datavolo now ships with fully supported Kafka components to consume and publish messages using the latest Kafka 3.8 client.

2024-09-20

Runtime

  1. Bug fix for searching Provenance with indexed FlowFile attributes that contain . characters
    • Advanced Document Parsing use cases can now search by container.id and container.scope across all documents
  2. Bulletin icon colors now match the maximum bulletin severity
  3. Improved display of Python processors to more seamlessly integrate with the flow
  4. Improved styling and colors with UX updates
  5. Improved handling of API requests during cluster scale down

2024-09-18

Runtime

  1. Improved awareness of a Runtime's availability during upgrades
  2. Improved UX in the Provenance Preview panel for navigating data processed across the Runtime cluster
  3. Improved tooltips to open when expected and stay hidden when quickly scrolling by
  4. Bug fix to improve the consistency of connected nodes displayed after the cluster scales down
  5. Bug fix for a smoother log out -> log in experience

Platform

  1. Prefer light mode? Dark mode? Animations on or off? The choice is now yours across all Datavolo UIs!
  2. Security patches and upgrades to third party libraries
  3. Clearer prompts for receiving UI updates when a new version is available

2024-09-13

Runtime

  1. Improved UX for purging Flow Configuration History that is consistent with the Datavolo Runtime experience
  2. Bug fix to make data viewing for PDFs and annotated PDFs simpler from queues and provenance queries

Platform

  1. When a new version of the UI is available, you'll receive a prompt to reload

2024-09-12

Runtime

  1. Put MLflow improves experiment tracking by recording metadata in MLflow, including Databricks Managed MLflow!
  2. Delete SFTP now allows you to efficiently delete remote files without first copying them into your Runtime
  3. Upgraded Content Viewer more seamlessly integrates the experience of managing flows and viewing data in-flight
  4. Added a PDF Viewer for a better in-app experience of inspecting PDFs anywhere in your flow, regardless of whether they've been analyzed by Parse PDF Document
  5. Improved UX for setting properties with long values on processors

Platform

  1. Runtime Upgrades is GA! Look for the "up" icon in your Runtime listing to see if a new version is available
  2. Pre-approved Organizations gain easy access to Datavolo Cloud when you sign up with your corporate email address
  3. Security patches and upgrades to all Datavolo base images, maintaining the strong security posture that you can expect and deserve
  4. Bug fix for duplicate columns in the Runtime listing that some customers experienced last week
  5. Improve sign-up experience for those waiting for approval

2024-09-04

Runtime

  1. Bug fix for the Advanced Configuration UI for processors like Update Attribute and Jolt Transform JSON
  2. Background efficiency improvements for Runtimes that require high CPU utilization

Platform

  1. Added a "Created By" field on the Runtimes listing, improving tracking for organizations with multiple users

2024-09-03

Runtime

  1. Upgraded all Runtimes to the latest version, enabling everyone to use the best of Datavolo's advanced document parsing techniques and LLM integrations

2024-08-30

Delivery

  1. Put Snowflake Internal Stage File transfers documents to a Snowflake Stage with Datavolo's enhancements for seamless, native authentication to Snowflake

Transformation

  1. Detect Document PII identifies sensitive, personally identifiable information with options to redact or encrypt those values as needed.

Runtime

  1. Improvements for rolling upgrades of clustered nodes when extensions are removed, letting each node continue processing until it upgrades.

2024-08-28

Transformation

  1. Create Cohere Embeddings added to the rapidly growing set of supported LLMs and embedding models
  2. Format Word Document improves parsing and extraction accuracy for some documents
  3. Convert Office Format can now change XLS files to XLSX, unlocking table parsing for large column count use cases
  4. Bug fix for the "Top K" property name in Prompt Anthropic AI

Runtime

  1. Improved UX when editing sensitive Properties and Parameters.
  2. The highly popular PDF Annotation Viewer offers improved highlighting and searching.
  3. Improved support for Firefox when viewing Node Status History

Platform

  1. Updated the Help page with a link to Datavolo's Support Portal
  2. Updated navigation to link to Datavolo Documentation
  3. Foundational work to support Runtime upgrades in the near future

2024-08-26

Transformation

  1. Three new processors give you more options to choose the right LLMs and embeddings for your applications:
  2. Prompt processors support text-only and text+image options

2024-08-21

Runtime

  1. New processors for interacting with S3
  2. PDF Annotator now shows the annotation container hierarchy alongside the PDF. The tree allows the user to easily visualize the hierarchy and how each element relates to others. The tree nodes can be clicked which will jump to that annotation selected in the PDF. The currently selected annotation is shown and its container id can now be copied.
  3. After dragging a new Input/Output Port or Process Group, the user can now simply provide a name and then press Enter to add the component.
  4. Bug Fix: Editing a sensitive parameter description no longer incorrectly changes the parameter value.
  5. Bug Fix: Removed usage of randomUUID function call which wasn't available in all environments.
  6. Bug Fix: Removed action to center selected component when viewing Stats History if the component is already on screen.
  7. Bug Fix: Fixed the Property Combo Editor to no longer show the option to reference Parameters in the Controller Settings where Parameters are not available.
  8. Bug Fix: Introduced new error handling to properly format error messages when the NiFi instance is unreachable.

Transformation

  1. Document analysis processors like Chunk Document, and Chunk Text received an enhancement:
    • Support for writing new FlowFile Attributes related to chunking configuration to assist in experiment tracking
  2. Create Ollama Embeddings and Prompt Ollama provide support for creating embeddings and prompts with Ollama via REST API

Platform

  1. Role changes are now applied to active user sessions, automatically refreshing the user's capabilities.
  2. Bug Fix: Fixed enabled state of the Accept button after acknowledging Terms and Conditions.

2024-08-14

Runtime

  1. Asset Upload to Parameters brings even more customization to your Runtime, complementing Local NAR Extensions.
    • Add and reference JDBC Drivers, SSL Keystores, ML Models, Configuration Files, etc. throughout your flow.
  2. Query Document provides a SQL-like interface to Datavolo Document JSON, letting you inspect and route files based on detected elements.
  3. Bug Fix: Auto-scaling is much smoother, removing the temporary "Unknown Host" page that some users experienced when the cluster grew or shrunk.

Transformation

  1. OCR-enabled processors like Perform OCR, Parse Table Image, and Parse PDF Document received a few enhancements:
    • Hebrew language support
    • Korean language support
    • Standard OCR Service as a single configuration point for easier flow management
  2. Performance improvements to semantic chunking with Chunk Document increases throughput and reduces system load for RAG and other AI use cases.

Platform

  1. Runtime Restart gives you even more flexibility when using Local NAR Extensions and uploaded Assets.
  2. You can now modify your own roles, allowing you to transfer ownership of an organization to a colleague.

2024-08-07

Runtime

  1. Prompt Snowflake Cortex lets you tap into the power of Snowflake Cortex AI to build Generative AI applications with fully managed LLM services.
  2. Authenticating to Snowflake via Snowflake Connection Service now supports Keypairs and Session Tokens
  3. Convert Office Format extends Datavolo's Advanced Document Parsing capabilities to Word, PowerPoint, Excel, and other OpenDocument-compatible files.
  4. Parse PDF Document can now parse and detect elements for more complex PDFs with embedded elements.
  5. Execute SQL Statement enables DDL and DML commands against a SQL database accessed through a DBCP, Hikari, or Snowflake Connection Pool.
  6. Datavolo's new "Provenance Preview" feature makes it even easier to understand a processor's impact on data lineage.
    1. Select any processor and you'll see the latest provenance record, complete with custom views like text chunks and PDF annotations.
  7. Choose whether you'll see animations on the Runtime UI or skip over those transitions.
    1. The default option uses your system settings.
  8. Advanced Configuration UI for Jolt Transform JSON provides an updated, intuitive user experience after retiring the legacy NiFi implementation.

Platform

  1. Datavolo's Terms and Conditions will now display once for all new and existing users to clarify system expectations.
  2. Bug Fix: Renaming and deleting newly created Runtimes via the actions menu works again without needing to refresh your browser

2024-08-01

Runtime

  1. Added support for advanced configurations of processors such as Update Attribute and Jolt Transform JSON
  2. Bug Fixes
    1. Searching for components in your data flow once again shows you search results
    2. The "Logout" link is visible on the UI
    3. Fixed conditions for rendering various actions throughout the UI including new Controller Service, new Parameter Context, Move To Parent Group, Upload Custom NAR (when dragging), and deleting previously uploaded Local Extensions / Custom NARs

2024-07-30

Platform

  1. Chunk Text is now easier than ever to understand with its own content viewer!
  2. Enhancements to the PDF Viewer's Annotations
    1. New Search Capabilities, including Container ID and finding next / previous by Container Scope
    2. Bug fixes for annotations when the content has aged off or the schema is different than expected
  3. RBAC expanded to include new Editor and Viewer roles that you can assign to each member of your Organization
  4. Improved visibility on the UI for errors when they occur

2024-07-24

Change Data Capture

  1. Capture Google Drive Changes notifies on access or content changes from Google Drive, enabling your data pipeline to quickly and reliably respond.

Platform

  1. Enhancements to the PDF Viewer to support multi-line annotations in Advanced Document Parsing
  2. RBAC for Users within an Organization, allowing for Administrators and Members and paving the way for more roles to come!
  3. Support for resending invitations to previously invalid email addresses.
  4. Finer control over which flow metrics are exported by the OpenTelemetry Reporting Task, reducing cost and load on your observability pipeline.

2024-07-17

Transformation

  1. Structural, under-the-hood improvements for stability and performance.

Platform

  1. Bring Your Own Code - uploading NARs as a Local Extension provides the ultimate flexibility for your Datavolo Runtime whether you're building with Python or Java
  2. New PDF Viewer makes it easy to see what Datavolo's Advanced Document Parsing discovered as a section header, paragraph, table, chart, etc. with bounding boxes and metadata for all detected elements.
  3. Multi-user Organizations are now GA! Inviting others to collaborate on your data flows is the latest enhancement to Datavolo Cloud.

2024-07-10

Change Data Capture

  1. Bug fix for the Capture Sharepoint Changes processor to fetch Group Permissions in Sharepoint

Transformation

  1. Faster PDF and image processing for OCR and table extraction, in addition to a new table.not.found output for Parse Table Image

Platform

  1. Improved email invitations for new users invited to your organization
  2. Welcome emails when your account is approved
  3. UI enhancements, including a bug fix for Parameter Autocomplete
  4. Removed deprecated components:
    1. Ganglia Reporting Task
    2. Site-to-Site Reporting Tasks
    3. Prometheus Reporting Task
    4. Prometheus Record Sink

2024-07-02

Change Data Capture

  1. Capture Sharepoint Changes notifies on access or content changes.
  2. Fetch Sharepoint File retrieves the latest changed files.
  3. Microsoft Graph Authentication Provider provides a reusable authentication mechanism to Sharepoint.

Transformation

  1. Parse PDF Document
  2. Merge Document Elements
  3. Parse Table Image
  4. Perform OCR
  5. Enrich Attributes and Database Lookup to much more easily route or enrich data based on database tables

Delivery

  1. Snowflake integrations
    1. Snowflake Connection Service
    2. Create Snowflake Embeddings
      • Uses Snowflake Cortex functions to create vector embeddings with either 768 or 1024 dimensions from an English source text
  2. Databricks integrations
    1. Manage Unity Catalog via List Unity Catalog Directory, Get Unity Catalog File, Get Unity Catalog File Metadata, Put Unity Catalog File, Delete Unity Catalog Resource
    2. Manage DBFS via List DBFS Directory, Get DBFS File, Put DBFS File, and Delete DBFS Resource
    3. Run Jobs via Run Databricks Job
    4. Execute SQL via Put Databricks SQL
    5. Managed authentication via Standard Databricks Workspace Client Service
  3. Bug fixes for deleting records from Vectara

Platform

  1. Machine Learning services for every Runtime that perform:
    1. OCR
    2. Document Analysis / Element Detection
    3. Table Structure Recognition
  2. Auto-discovery and use of new Runtime versions
  3. Flexibility to enable/disable features and customize configurations for your organization
  4. Upgraded baseline to NiFi 2.0.0-M4, including support for Kafka 3