4 Technologies for Metadata Management
Data has become the core element to enhance the competitiveness of enterprises, and effectively managing and using data has become a rigid need for enterprises. More and more enterprises use metadata management tools to manage data generated in cloud computing, Internet of Things, data lakes to make it easier to understand, find and manage enterprise data more efficiently to realize the value of data. From the technical point of view, the technologies for metadata management mainly include metadata collection, metadata management, metadata application and metadata interface.
Technologies for Metadata Management – 1. Metadata Collection
In data governance projects, common metadata include data source metadata, data processing metadata, data warehouse or data subject database metadata, data application layer metadata, and data interface service metadata.
The metadata collection service provides various adapters to meet the collection requirements of the above types of metadata, and integrates and processes the metadata and stores it in the central metadata warehouse to achieve unified management of metadata. In this process, data acquisition adapters are very important. Metadata acquisition must not only be able to adapt to various databases, various ETLs, various data warehouses and report products, but also various structured or semi-structured data sources.
- Relational Database: Metadata such as database table structures, views, stored procedures, etc. from relational databases such as Oracle, DB2, SQL Server, MySQL, Teradata, and Sybase are collected through the metadata adapter. Relational databases generally provide a metadata bridge, such as Oracle’s RDBMS, which can quickly read metadata information.
- NoSQL Database: Metadata collection tools should support metadata from NoSQL databases such as MongoDB, CouchDB, Redis, Neo4j, and HBase. Most of NoSQL database adapters use their own ability to manage and query schemas.
- DW (data warehouse): For mainstream data warehouses, you can customize and develop corresponding adapters to collect metadata based on their inherent query scripts. For example, the MPP database Greenplum, its core metadata is stored in pg_database, pg_namespace, pg_class, pg_attribute, and pg_proc tables, and its metadata can be collected through SQL scripts. Hive table structure information is stored in an external database, and Hive provides syntaxes such as show table and describe table to query its metadata information. Of course, professional metadata collection tools can also be used to collect the metadata of the data warehouse system.
- Metadata in the Cloud: As public cloud matures, cloud enterprise metadata management as an extension of core IT infrastructure by providing secure cloud connectivity, especially among SMBs, has become a reality. Cloud-based enterprise metadata management improves access to information through various contexts, and pushes real-time metadata management, machine learning models, and metadata APIs into streaming data pipelines to better manage enterprise data assets.
- Other Metadata Adapters: Modeling tools: PowerDesigner, ERwin, ER/Studio, EA and other modeling tool adapters. ETL tools: PowerCenter, DataStage, Kettle and other ETL tool adapters. BI tools: two-dimensional report metadata collection adapters in front-end tools such as Cognos and Power BI. Excel Adapter: Capture metadata of Excel format files. Of course, none of the mainstream metadata products on the market can achieve “universal adaptation”, and more or less customized development is required in the actual application process.
Technologies for Metadata Management – 2. Metadata Management
From a technical point of view, metadata management generally includes functions such as metadata model management, metadata auditing, metadata maintenance, metadata version management, and metadata change management.
- Metamodel Management: Metamodel management is to build a metadata warehouse that conforms to the CWM specification based on the metadata platform, realize the unified and centralized management of the metamodel, and provide functions such as query, addition, modification, deletion, metadata relationship management, and permission setting of the metamodel. The collection and management of models, logical models, and physical models allow users to intuitively understand the classification, statistics, usage, and change traceability of existing metamodels, as well as the life cycle management of each metamodel. At the same time, it supports model management for application development.
- Metadata Audit: Metadata audit is mainly to audit the metadata collected in the metadata warehouse but not officially published in the data resource directory. The auditing process supports data validation and fixes some problems, such as lack of semantic description, missing fields, wrong type, missing encoding or unrecognized character encoding, etc.
- Metadata Maintenance: Metadata maintenance is one of the most basic metadata management functions. Both technical and business personnel use this function to view the basic information of metadata.
- Metadata Versioning: When metadata is relatively complete and stable, or at the end of a milestone, metadata can be finalized to release a baseline version for later retrospection, inspection, and recovery of discrepant or erroneous metadata.
- Metadata Change Management: Users can subscribe to metadata by themselves. When the subscribed metadata is changed, the system will automatically notify the user, and the user can further query the specific content of the change and related impact analysis in the system according to the guidelines. The metadata management platform provides the metadata monitoring function. Once the metadata is monitored, it will notify the user as soon as possible.
Technologies for Metadata Management – 3. Metadata Application
- Data Asset Map: It conducts a comprehensive inventory and classification of enterprise data resources by data domain, and automatically generates a panoramic map of enterprise data assets according to the metadata dictionary. The map tells you what data is available, where to find it, and what to do with it. The data asset map supports the visual display of various metadata and data processing processes in the form of a topology map, and displays granular control through different levels of graphics to meet the needs of graphical query and auxiliary analysis in different business application scenarios.
- Metadata Lineage Analysis: Metadata lineage analysis will tell you where the data came from and how it was processed. Its value is that when data problems are found, they can be traced back to the source through the data lineage relationship, quickly locate the source and processing process of the problem data, and reduce the time and difficulty of data problem troubleshooting and analysis.
- Metadata Impact Analysis: Metadata impact analysis tells you where the data went and how it was processed. Its value is that when a data problem is found, it can be traced down through the association relationship of the data to quickly find which applications or databases use the data, thereby minimizing the impact of data problems. This function is often used to analyze the impact of metadata changes of data sources on downstream ETL, ODS, DW and other applications.
- Metadata Hot and Cold Analysis: Metadata hot and cold analysis will tell you which data is commonly used in the enterprise and which data is dead data. The value lies in the visualization of data activity, so that business personnel and management personnel in the enterprise can clearly see the data activity, so that they can better control the data, dispose or activate the dead data, so as to provide support for self-service analysis of data.
- Metadata Relevance Analysis: Metadata correlation analysis tells you how data is related to other data and how their relationships are established. Association degree analysis is to view the usage of specific data from the perspectives of other entities associated with an entity and the processing processes involved, forming a network of entities and participating processing processes, such as tables and ETL programs, tables and analysis application, the relationship between the table and other tables, etc., to further understand the importance of the entity.
Technologies for Metadata Management – 4. Metadata Interface
Establish a unified interface specification for metadata query and access, so that the core metadata of the enterprise can be extracted completely and accurately into the metadata warehouse for centralized management and unified sharing.
The metadata interface specification mainly includes the coding method of the interface, the interface response, the interface protocol, the interface security, the connection method, the technical implementation, the calling method, the message format and so on.
Interface encoding method: The interface encoding method must be indicated in the header information of the interface. Commonly used interface encoding methods are UTF-8, GBK, GB2312, ISO-8859-1.
Interface response format: message format commonly used by metadata interfaces, XML or JSON
Interface protocol: REST/SOAP protocol
Connection method: POST
Interface security: Token authentication
Interface address: http://url/service?[query]
Conclusion
Thank you for reading our article and we hope it can help you to have a better understanding of the technologies for metadata management. If you want to learn more about metadata management, we would like to advise you to visit Gudu SQLFlow for more information.
As one of the best data lineage tools available on the market today, Gudu SQLFlow can not only analyze SQL script files, obtain data lineage, and perform visual display, but also allow users to provide data lineage in CSV format and perform visual display.