[Data Flow] [Customer] Guide and Documentation V1 October 15, 2025 13:34 Objective The main objective of this document is to enable the customer to use the Data Flow data service efficiently, providing a detailed overview of its technical aspects. The focus is to ensure that the customer fully understands the solution, enabling its safe and optimized use. Definitions Data Flow In the dynamic data engineering ecosystem, effective data sharing across platforms is a critical pillar for advanced analytics and informed decision-making. Our innovative platform, integrated with leading-edge solutions such as Databricks, Azure Data Factory, and Soda, is at the forefront of simplifying data sharing. At the heart of this initiative is Data Flow , a vital tool that leverages the Delta Sharing protocol to facilitate data sharing. At the core of Data Flow is a Python library specifically designed to streamline and secure data sharing across multiple platforms. With the adoption of the Delta Sharing protocol, this library stands out, offering an efficient solution for data engineers while adhering to strict governance policies. From facilitating data export and import to maintaining data integrity during sharing, Data Flow becomes an indispensable partner. In summary, Data Flow is an abstraction layer created by the Data team at Blip, with the aim of facilitating the configuration and management of data sharing between parties. With it, it is possible to consume Near Real Time data, in Batches or Streaming, in a secure and simple way. Through the sharing of Blip contracts (tenant ID), it is possible to ensure that the data is consistent and reliable. Finally, it is possible to manage exceptions in sharing, with error handling and permission strategies. We will explore some key features that Data Flow enables: Secure Data Sharing Data Flow simplifies the process of sharing data between systems, ensuring that data movement is not only efficient but also aligned with governance standards. Whether streaming data in batches or in real time, Data Flow equips you with the tools you need to share data securely and efficiently. Simplified Interfaces for Access Through the protocol used, Data Flow proposes a simplified flow for interaction with data systems. An interoperable solution that allows any use case, any platform and any tool. Implementation of Sharing Contracts Maintaining data integrity is essential, and Data Flow employs advanced features to ensure that shared data sets are consistent and reliable. By defining and implementing sharing agreements, the tool ensures that data conforms to predefined formats and standards, minimizing errors and inconsistencies. Effective Exception Management Data sharing can be subject to unexpected events and exceptions. Data Flow provides a robust solution for managing these exceptions, enabling the implementation of effective error handling strategies, thus ensuring that data sharing processes are continuous and reliable. Product/Service Vision and Roadmap As a data service, Data Flow, for clients, is the best way to consume raw conversational data from Blip, with near real-time latency (up to 120s). The available data comes from the following tables: messages, eventtracks, notifications and tickets . The customer can consume the data through Databricks or any other data consumption solution available through the Delta Sharing protocol. Architecture Delta Sharing What is Delta Sharing? Delta Sharing is an open protocol from Databricks that revolutionizes the way organizations share and exchange data. It provides a simple, secure, and open method for data providers and consumers to share information in real time, regardless of the computing platforms they use. Fundamental Concepts Provider / Data Provider Entities that make data available for sharing. In our case, Blip. Sharing A share is a logical grouping of shares to share with recipients, with read-only permissions. A share can be shared with one or multiple recipients. A recipient can access all resources in a share. A share can contain multiple schemas, tables, notebooks, volumes, ML models, or other data assets that the provider wants to share. Recipient / Data Recipient A client that has a token to access shared objects. Schema A schema is a logical grouping of tables. A schema can contain multiple tables. Table A table is either a Delta Lake table or a view over a Delta Lake table. Sharing Server A server that implements the protocol. Delta sharing protocol operation diagram Delta Sharing Sharing Methods D2D: Sharing between Databricks environments with access via Catalog Explorer, Databricks CLI or SQL. D2O: Databricks sharing for open source with credentials and activation links. O2O: Sharing between open source platforms with reference server and Delta Sharing clients. O2D: Open source sharing for Databricks using token-based credentials system. Sharing Between Databricks Environments (Databricks to Databricks - D2D) The recipient provides a unique identifier tied to your Databricks workspace. The data provider creates a "share" in its own workspace, which includes tables, views, and notebooks. A "recipient" object is created to represent the user or group that will access the data. The provider grants access to the share, which appears in the recipient's workspace. Users can access the share through various means, such as Catalog Explorer, the Databricks CLI, or SQL commands. Databricks to Open Source Sharing (Databricks to Open - D2O) The data provider creates "recipient" and "share" objects, just like in the previous method. A token and activation link are generated for the recipient. The provider sends the activation link to the recipient securely. The recipient uses this link to download a credential file, which is used to establish a secure connection with the provider and access the shared data. This method allows data to be read on any platform or tool. Open to Open (O2O) Sharing Allows data sharing between any open source platforms or tools, without the need for Databricks. The data provider can use an open source reference server to create and manage shares and recipients. The recipient can use any Delta Sharing client to access the shared data using a credential file. This method enables data sharing across different clouds and regions with minimal configuration and maintenance. Open to Databricks (O2D) Sharing Enables sharing of data and AI models beyond the Databricks ecosystem. Utilizes a token-based credentials system, allowing data providers to share assets with any user, regardless of access to Databricks. Examples include sharing data from Oracle to Databricks. Despite its openness, Delta Sharing ensures robust security and governance. Sharing possibilities via Delta Sharing CONNECTION GUIDE Usage in Databricks By default, the customer must locate the “deltashare_core” share (or another if it is a custom share) through the catalog, in the “Delta Sharing” tab, within the “Shared with me” section, as shown in the images below. It is necessary to create a catalog from the share. It is then possible to read the data using several tools available to Databricks users. Download Activation Link The recipient who receives the activation link must download the credentials file locally in JSON format. Note that for security reasons, the credentials file can only be downloaded once and will expire after the first download. For certain technologies, such as Tableau, in addition to the URL link, you may need to upload this credentials file. For other technologies, you may need a bearer token or other credentials contained in this file. Using the Credentials File in Python/Notebooks Once the credentials file has been downloaded, it can be used across multiple notebook platforms, such as Jupyter and Databricks, to access shared data stored within data frames. To enable this functionality on your notebook, run the following commands to install and import the Delta Sharing client. Alternatively, you can choose to install it from PyPi by searching for "delta-sharing-server" and installing the "delta-sharing" package. After installation, you can use the previously downloaded credentials profile file to enumerate and access all shared tables in your notebook environment: ##Install the Delta Sharing Python package !pip install delta-sharing ##Import Delta Sharing libraries import delta_sharing ##Client Configuration for Access to the Credential File ##This part can be done locally or stored in an external environment config_path = "C:/Users/dummy_user/shares/config.share" client = delta_sharing.SharingClient(config_path) Listing available datasets Now that the client has been configured, you can query the available datasets within your Data Flow. You can do this by simply calling the list_all_tables() method of the SharingClient object you just created. We can see in the example below that a dataset called incomings_schema.records , which is located inside incomings_share , is available to me. Each Table object that appears in this list below is a different table/dataset that you have access to. ##Displaying available datasets print(client.list_all_tables()) ## Result [Table(name='records', share='incomings_share', schema='incomings_schema')] Accessing the data To access and use data from a dataset via Data Flow, you need to load the data into the Python session, either via pandas DataFrame or Apache Spark DataFrame, depending on your convenience. The complete address of a dataset is made up of three distinct parts. First, it is the path to your credentials file (i.e. the config.share file ). The second part is a hashtag character (#), which acts as a separator between the first and third parts of this full address. Then, the third part is the full name of the dataset, which is composed of three parts separated by periods, which are: the share name, the schema name, and the dataset name (or the "table name", if you prefer to call it that). Therefore, the full name of a dataset takes the form: <share-name>.<schema-name>.<table-name>. Returning to the previous example, we have access to only one dataset, called records, which is within the schema incomings_schema and the share incomings_share . Therefore, the full name of this dataset would be incomings_share.incomings_schema.records . Since we already have the path to our credentials file, the full address for this dataset would be: #Data Access Path Configuration config_path = "C:/Users/dummy_user/shares/config.share" table_name = "incomings_share.incomings_schema.records" table_address = config_path + "#" + table_name print(table_address) ##Result C:/Users/dummy_user/shares/config.share#incomings_share.incomings_schema.records Loading data into pandas To do this, you can use the load_as_pandas() method from the delta_sharing library . All you need to do is provide the full address to this dataset to the method. # Import using Pandas import delta_sharing config_path = "C:/Users/dummy_user/shares/config.share" table_name = "incomings_share.incomings_schema.records" table_address = config_path + "#" + table_name table = delta_sharing.load_as_pandas(table_address) print(table) ##Result date datetime id value 0 2024-05-15 2024-05-15 15:12:20.680756 1 3200 1 2024-05-15 2024-05-15 15:12:54.680769 2 1550 2 2024-05-15 2024-05-15 15:06:14.680772 3 8700 3 2024-05-15 2024-05-15 15:13:08.680774 4 5800 Loading data into Apache Spark If you prefer, you can load the dataset into Apache Spark by changing the method used to load_as_spark() . The input to this method is the same as in load_as_pandas() , namely the full address to the dataset you are trying to access. #Import using Apache Spark import delta_sharing config_path = "C:/Users/dummy_user/shares/config.share" table_address = config_path + "#incomings_share.incomings_schema.records" table = delta_sharing.load_as_spark(table_address) table.show() ##Result +----------+-------------------+---+-----+ | date| datetime| id|value| +----------+-------------------+---+-----+ |2024-05-15|2024-05-15 15:12:...| 2| 1550| |2024-05-15|2024-05-15 15:06:...| 3| 8700| |2024-05-15|2024-05-15 15:13:...| 4| 5800| |2024-05-15|2024-05-15 15:12:...| 1| 3200| +----------+-------------------+---+-----+ Use in Reporting Platforms Additionally, you have the option of using well-known reporting platforms like Power BI to access your shared tables. In the context of Power BI, connecting to a Delta Sharing source is a very straightforward process. You can do this by selecting “Delta Sharing” from the available data sources options in Power BI and clicking the Connect button. Support links For more details on how to connect and consume data, you can consult the Project Documentation on Github, through the link below: Git Hub - Delta Sharing - Accessing Shared Data For more details on the protocol used in Data Flow: Delta Sharing - An open standard for secure data sharing Introducing Delta Sharing: an Open Protocol for Secure Data Sharing - The Databricks Blog Data Sharing | Databricks Databricks Use Cases Data Sharing - What's New? For more information, visit the discussion on the subject in our community or the videos on our channel. 😊 Related articles Conversational Data Messages How to build bots using SDKs or HTTP API Tickets - New How to check if an attendant is available in Builder