Introduction of Gudu SQLFlow
Gudu SQLFlow is an analysis software for analyzing SQL statements and discovering data lineage relationships. It is often used with metadata management tools and is a basic tool for enterprise data governance. In this article, we’ll give a brief introduction of Gudu SQLFlow.
If you are not familiar with the SQL language, this article may not be for you. No prior knowledge of data lineage is required to read this article. You only need to simply understand the data lineage relationship as a data dependency between two or more tables in the database.
Let’s analyze the following SQL statements to see how to sort out the data dependencies between various tables/views.
By analyzing the above INSERT SQL statement, we can know:
The data of the deptsal table comes from the dept and emp tables. Further, the data dependency (data lineage) at the field level is:
- The data of the deptsal.dept_no field comes from dept.deptno;
- The data of the deptsal.dept_name field comes from dept.name;
- The data of the deptsal.salary field comes from emp.sal and emp.comm;
With Gudu SQLGlow, you can see visualized data lineages:
How to use Gudu SQLFlow for the first time?
Through the web interface or Rest API, Gudu SQLFlow can connect to the database for a single SQL statement, multiple SQL files, and analyze the data lineage relationship for you in real time. It can also analyze different data sources such as Redshift log, Snowflake query history, DBT script, etc., and quickly discover the data lineage relationship in the enterprise data platform.
In this article, we only introduce the simplest way to use Gudu SQLFlow. With just three simple steps, you can immediately find complete and clear data lineage relationships from complex SQL statements.
Step 1: Enter the SQL statement.
Copy and paste the SQL statement to be analyzed into the SQL Editor in Gudu SQLFlow.
Step 2: Select the corresponding database type.
Select the database type corresponding to the SQL statement to help Gudu SQLFlow accurately analyze the input SQL statement.
Step 3: Analyze the data lineage.
Click the visualize button to analyze the entered SQL statement.
After completing the above three steps, you can see the graphical, interactive and very detailed data lineage relationship results in the main interface on the right, and you can click to select the table, view, field, etc. of interest for further viewing.
Further Exploration of Data Lineage Results
Gudu SQLFlow provides a wealth of parameters to customize the output of different data lineages according to your needs.
Here, we only introduce one parameter to demonstrate the powerful functions of Gudu SQLFlow. For the usage of other parameters, please refer to related documents.
Show Transform Parameter
The show transform parameter is used to display the expression for data transformation in the SQL statement, that is, the data of the target field is transformed from which source data field by which expression. E.g:
We can know that the data of the sal field is converted by the expression SUM(e.sal + Nvl(e.comm, 0)), and the source data fields are sal and comm.
By opening the show transform parameter, we can easily see the expression corresponding to this transformation process.
More features of Gudu SQLFlow
By entering SQL statements in the SQL Editor of Gudu SQLFlow, you can quickly analyze the data lineage relationship of SQL statements, understand the functions of Gudu SQLFlow, and be familiar with the basic concepts of data lineage. But Gudu SQLFlow has more features to meet the needs of enterprise data governance:
- Analyzes multiple SQL files at once;
- Connects to the database to analyze the data lineage relationship for you in real time;
- Analyzes different data sources such as Redshift log, Snowflake query history, DBT scripts, etc., and quickly discover the data lineage relationship in the enterprise data platform;
- Provides Rest API interface to quickly integrate with your data governance platform;
- Provides Java libraries to deploy to end customers along with your data governance tools;
- Provides front-end UI library to quickly provide interactive data lineage relationship display function for your data governance platform;
- Provides an integrated solution with datahub open source metadata management software;
The Main Components of the Gudu SQLFlow Software Interface
The main interface of Gudu SQLFlow:
- SQL Editor: Enter the SQL code to be analyzed in the code editing box, click the dbvendor menu to select the database, and click the visualize button or join button to draw the corresponding image.
- Sample SQL: After clicking the dbvendor menu to select the database, click sample sql to get the sample SQL corresponding to this dbvendor in the code editing box.
- Upload: Upload one or more files, or connect to the database, create a job in the background, and get the corresponding result when the job is processed successfully.
- Login: Login button, already supports multiple users. Currently the login feature is only supported in the SQLFlow SaaS version. (https://sqlflow.gudusoft.com).
- Lineage and schema explorer: Displays the schema structure obtained after parsing the SQL. Right-click on the database, schema, and table to visualize the data lineage of the selected object.
- Main diagram panel: The data lineage relationship diagram displayed in the main diagram panel is an interactive graph, which can perform more targeted operations to obtain the data of interest. For example, click the left mouse button on a column to fix the relationship, and click cancel to cancel.
Right-click Table Lineage or Column Lineage to display the association relationship of a table or column, and click Cancel to cancel.
For more information, please visit the official website of the Gudu SQLFlow https://sqlflow.gudusoft.com for experience.