Visualize your data under Left-Most format with SQLFlow REST API
Gudu SQLFlow is an analysis software for analyzing SQL statements and discovering data lineage relationships. It is often used with metadata management tools and is a basic tool for enterprise data governance. Check this introduction blog if you don’t know much about Gudu SQLFlow.
SQLFlow REST API
Gudu SQLFlow has various functionalities for data lineage generation and it provides you different choices to visualize your data. SQLFlow REST API is one of the most commonly used features. Your data analysis result will be generated and returned from SQLFlow server by sending standard parameterized RESTFul Api request. Apart from report under Json, XML, CSV format, SQLFlow REST API is also able to return result as image(PNG/JPEG) from your SQL.
The Left-Most format
A common usage scenario is that we would like to have a simple and direct data lineage when retrieving data from database or from multiple SQL statements. One of the features that SQLFlow provides is called the left-most result. The left-most format can simplify the data lineage result.
It generates the selected db elements(such as database/schema/table/column etc…) data lineage without the record set. Given data lineage a->b->c
, the returned result under the left-most format is a->c
.
The API Endpoint
To visualize the left-most data lineage with SQLFlow REST API, we need to invoke
/sqlflow/generation/sqlflow/leftMostSourceTableGraph/image
This endpoint takes sessionId
as input. There are two ways to get sessionId
1. Invoke data lineage API to get sessionId
Sample:
curl --location 'https://<SQLFlow url>/gspLive_backend/sqlflow/generation/sqlflow/graph' \
--header 'accept: application/json;charset=utf-8' \
--form 'sqlfile=@"/home/workspace/test2.sql"' \
--form 'dbvendor="dbvoracle"' \
--form 'userId="gudu|0123456789"'
2. Create SQLFlow Job
The following endpoint is to create a SQLFlow:
/sqlflow/job/submitUserJob
Wait few seconds and If we check our job list, we should have this demoJob2 in our job success list. You can verify that ethiter go to the SQLFlow interface or use another SQLFlow REST API /sqlflow/job/displayUserJobsSummary
.
Great! After checking the SQLFlow web, we have confirmed the SQLFlow Job is succeeded.
Now we have our Job created and we need to select our data now. To do that, invoke:
/sqlflow/job/displayUserJobGraph
Give the Job Id as the input and in the response of the above API, we will have the sessionId:
Retrieve left-most data lineage
Now we are good to invoke the left-most SQLFlow Api with the sessionId
ready. You can use tableNamePattern
to map the desired table. Full parameter table here:
Field name | Type | Description |
dbvendor | string | database vendor, support values: – dbvazuresql – dbvbigquery – dbvcouchbase – dbvdb2 – dbvgreenplum – dbvhana – dbvhive – dbvimpala – dbvinformix – dbvmdx – dbvmysql – dbvnetezza – dbvopenedge – dbvoracle – dbvpostgresql – dbvredshift – dbvsnowflake – dbvmssql – dbvsparksql – dbvsybase – dbvteradata – dbvvertica |
hideColumn | boolean | whether hide column |
ignoreFunction | boolean | whether ignore function |
simpleOutput | boolean | simple output, ignore the intermediate results, defualt is false. |
ignoreRecordSet | boolean | same as simpleOutput, but will keep output of the top level select list, default is false. |
jobId | string | give the job Id if need to use the job settings and display the job graph |
normalizeIdentifier | boolean | whether normalize the Identifier, default is true |
showTransform | boolean | whether show transform |
token | string | The token is only used when connecting to the SQLFlow Cloud server, It’s not in use when connect to the SQLFlow on-premise version. |
treatArgumentsInCountFunctionAsDirectDataflow | boolean | Whether treat the arguments in COUNT function as direct Dataflow |
userId | string | the user id of sqlflow web or client |
columnLevel | string | whether to show table level or column level data, false or true |
isReturnModel | boolean | whether to return model |
sessionId | string | the graph session Id |
share | boolean | whether shareable |
showConstantTable | boolean | whether show constant table |
showLinkOnly | boolean | whether show link only |
showRelationType | string | show relation type, required false, default value is ‘fdd’, multiple values seperated by comma like fdd,frd,fdr,join. Availables are ‘fdd’ value of target column from source column, ‘frd’ the recordset count of target column which is affected by value of source column, ‘fdr’ value of target column which is affected by the recordset count of source column, ‘join’ combines rows from two or more tables, based on a related column between them |
tableColumn | string | target table column |
tableNamePattern | string | target table name, supports regex expression |
Sample:
curl --location 'https://<SQLFLOW URL>/gspLive_backend/sqlflow/generation/sqlflow/leftMostSourceTableGraph/image' \
--header 'accept: image/*' \
--form 'userId="gudu|0123456789"' \
--form 'sessionId="85b5119f83fcfefa93136e79dc05f2f61efe3b75e6a2b733568e0879c24f0c08_1680615728918"' \
--form 'dbvendor="dbvmssql"' \
--form 'jobId="6def00e8796e4c20ba216826c0ab5b73"' \
--form 'tableNamePattern="msdb.@event_data.nodes"'