Integrate SQLFlow into Datahub
Datahub is an open-source metadata platform for the modern data stack. We have integrated the SQLFlow into Datahub so that the SQLFlow data lineage is enabled in the Datahub UI. This integration is for Datahub v0.10.4. You will need to replace the datahub-web-react-datahub-web-react-assets.jar
in datahub with the SQLFlow adapted jar file. The file to replace may change with different Datahub versions and please check this blog regularly to get the corresponding file name for the latest Datahub.
Contact james@gudusoft.com to get the corresponding SQLFlow adapted jar file.
1.Install SQLFlow
SQLFlow Regular version is required for the Datahub. You can either directly install the regular version of SQLFlow on your server or you can launch one container with the docker image.
a. Install Directly
Check our product documentation for the direct Install
b. Using Docker Image
Pull the image:
docker pull gudusqlflow/sqlflow-regular-trial:5.7.5
Launch the image with
docker run -itd -p 7090:80 --name mysqlflow gudusqlflow/sqlflow-regular-trial:5.7.4
The 7090
in the above command will be the port to visit SQLFlow UI. You can change the port if 7090 is occupied in your machine.
The mysqlflow
is the name of the container. For more information of the container creation, you can check the official Docker doc.
2. Update the API URL
The first thing to do is to update the GuduSqlFlowUrl
in the config.js
of datahub-web-react-datahub-web-react-assets.jar
:
Open the datahub-web-react-datahub-web-react-assets.jar
(please do not extract the jar file and re-compress it, it may cause encoding issues)
Find the public/sqlflow/config.js
and copy the file.
Create a new config.js
, update the GuduSqlFlowUrl
to your actual SQLFlow API URL.
Replace the old public/config.js
in datahub-web-react-datahub-web-react-assets.jar
with your new config.js
.
Hint: Do not use localhost as the GuduSqlFlowUrl because the localhost IS NOT the server address when browser loads web
3. Replace the jar file
a. Upload the updated datahub-web-react-datahub-web-react-assets.jar
to your server.
b. Stop datahub-frontend-react container
Find the container ID:
docker ps -a
Stop the container:
docker stop <container ID>
c. Backup the old datahub-web-react-datahub-web-react-assets.jar
docker cp datahub-frontend-react:/datahub-frontend/lib/datahub-web-react-datahub-web-react-assets.jar <Backup Address>
d. Replace the jar file in the datahub-frontend-react
docker cp <Your_Updated_datahub-web-react-datahub-web-react-assets.jar> datahub-frontend-react:/datahub-frontend/lib/
Start the datahub-frontend-react:
docker start <container ID>
4. Check your Datahub
Open your DataHub UI again and you will find SQLFlow features are enabled for the table level and field level data.
a. You can get the upstream and the downstream lineage in the SQLFlow tab of table level view
b. Click the Schema tab, you will have a column of Gudu SQLFlow which has been added to the field information list. Click the Lineage corresponding to a field to view the data lineage of the field.
Checking this document to get more details on how to create a Datahub task with the SQLFlow plugin.