Column impact analysis
Click to Try Live Demo
This demo illustrates how to use general sql parser to analyze column level dataflow in a complex SQL query with join clause and nested subquery. You can modify this demo to fit your own need, or just contact us for an improved demo to meet your requirement.
This is also an answer to the question in this post:
http://www.sqlparser.com/letTalk/viewtopic.php?f=1&t=16736
”
I wonder whether sqlparser provides the capability to match source and target columns to extract the column-level dataflow (lineage), even if the sources are in a subquery. This would be very useful to provide impact analysis feature to ETL mappings in data warehouses and marts.
“
Let’s take this SQL for example:
SELECT a.deptno "Department", a.num_emp/b.total_count "Employees", a.sal_sum/b.total_sal "Salary" FROM (SELECT deptno, COUNT(*) num_emp, SUM(SAL) sal_sum FROM scott.emp GROUP BY deptno) a, (SELECT COUNT(*) total_count, SUM(sal) total_sal FROM scott.emp) b
Our demo can trace the relationship between aliases in top-level select clause and table/view columns in bottom levels, And this demo will let you know that some of the target columns are influenced by a source table itself, for example by the number of records in previous sample SQL as well.
Search a.deptno --> scott.emp.deptno Search a.num_emp --> num_emp(alias) --> aggregate function COUNT(* ) --> table scott.emp --> emp.deptno(group by) --> scott.emp.deptno Search b.total_count --> total_count(alias) --> aggregate function COUNT(* ) --> table scott.emp Search a.sal_sum --> sal_sum(alias) --> aggregate function SUM(SAL) --> table scott.emp --> scott.emp.SAL --> emp.deptno(group by) --> scott.emp.deptno Search b.total_sal --> total_sal(alias) --> aggregate function SUM(sal) --> table scott.emp --> scott.emp.sal
This demo can trace data lineage in views as well. For each column in a view it traces the name or names of the source columns and the expression used (if any). Take this SQL for example:
CREATE VIEW vIMWhseBinID AS SELECT DISTINCT timWhseBin.WhseBinKey, timWhseBin.WhseKey, timWarehouse.WhseKey AS WhseKey2 FROM timWhseBin JOIN timWarehouse ON timWhseBin.WhseKey = timWarehouse.WhseKey
run this demo like this:
Java version: java columnImpact test.sql /c C# version: columnImpact test.sql /v
RESULT(Output):
view=vIMWhseBinID column=WhseBinKey source=timWhseBin.WhseBinKey expression= view=vIMWhseBinID column=WhseKey source=timWhseBin.WhseKey expression= view=vIMWhseBinID column=WhseKey2 source=timWarehouse.WhseKey expression=
Download this demo: C# version, Java version