How to use Gudu SQLFlow data lineage tool to analyze case-when statement?

Gudu SQLFlow is a platform for analyzing data lineage. You can directly submit SQL statements online for analysis, or you can choose to connect to a specified database to obtain metadata, upload a file directory from the local, or obtain scripts from a specified git repository for analysis. In this article, we’ll introduce how to use Gudu SQLFlow data lineage tool to analyze column dependencies in SQL case-when statements.

Gudu SQLFlow Data Lineage Tool
Gudu SQLFlow Data Lineage Tool

A simple case

Consider the following SQL:

1

SQL returns the column teur, and its value is directly in the calculation result of case-when.

In the design concept of Gudu SQLFlow, case-when is considered to be a special function, but unlike ordinary functions, it does not have arguments. Gudu SQLFlow directly analyzes the when, then, and else clauses in case when. What needs to be considered is the when clause. The result of teur is indirectly dependent on the when clause. If different when clauses are satisfied, different then results will be returned, which will affect the return value of the final teur column. Therefore:

2

The when clause here is very similar to the where statement, except that it affects the then clause instead of the resultset column directly.

As the picture shows:

teur and two when clauses: a.kamut=1 and b.teur IS null, a.kamut=1, resulting in an fdr relationship, so

3(1)

teur and two then clauses and else clauses produce an fdd relationship, so

4(1)

Among them, b.teur and teur directly have the relationship of fdd and fdr at the same time. Since the priority of the fdd relationship is higher than that of the fdr relationship, only the fdd relationship is displayed in the graph.

A slightly more complex case, with nested case-when statements

5(1)

6(1)7(1)

Check out the case-when statement in it.

8(1)

You can see that the ResultSet Column FLDR_COMPLETE is affected by a case-when statement, and the second When clause contains a case-when statement.

When the case-when statement as a whole is used as a when clause, the fdd relationship of this clause is actually a part of the fdr relationship of the outer when clause, namely:

9

Therefore, the relationship between the then clause of the inner case-when and the resultset column FLDR_COMPLETE is an fdr relationship, not an fdd relationship.

As shown in the figure, the fdd columns that ultimately affect FLDR_COMPLETE are only the then statement and the else statement of the outer case-when, and the others are displayed as fdr relationships.

Conclusion

The above is an introduction to the processing flow of using Gudu SQLFlow to analyze the case-when statement. For more functions of Gudu SQLFlow, please log on to Gudu SQLFlow official website.

As one of the best data lineage softwares available on the market today, Gudu SQLFlow can not only analyze SQL script files, obtain data lineage, and perform visual display, but also allow users to provide data lineage in CSV format and perform visual display.

Newsletter Updates

Enter your email address below to subscribe to our newsletter