Data Integration Overview (QuickStart)
What is data integration?
Data integration is the process of migrating data from one or more disparate sources together to provide users with a unified view. The premise of data integration is to make data more freely available and easier to consume and process by systems and users. Data integration done right can reduce IT costs, free-up resources, improve data quality, and foster innovation all without sweeping changes to existing applications or data structures. And though IT organizations have always had to integrate, the payoff for doing so has potentially never been as great as it is right now.
Why is data integration needed?
The required data needed to drive our applications are often distributed across applications, databases, and other data sources, it is hosted on-premises, in the cloud, on IoT devices, or provided via 3rd party applications. Most enterprises no longer maintain data in one database, but instead maintain traditional master and transactional data, as well as new types of structured and unstructured data, across multiple sources.
How important is data integration?
One of the biggest challenges medium and large organizations face is the ability to access, understand and make sense of the data that defines the environment in which it operates. Daily, organizations gather more and more data, in a variety of formats, from an innumerable number of data sources.
All of these entities will need a method for employees, partners, and customers to capture the value that is retained in this data. Therefore, organizations have an urgent need to integrate relevant data wherever it resides for the expressed purposes our business processes.
How does data integration work?
The traditional approach to data integration is the physical data integration approach, it involves the actual physical migration and movement of data from their source system to a staging area. It is within this staging environment where cleansing, mapping, and transformation will be invoked before the data is then physically moved to a target system (like a data warehouse or a data mart).
The other option is the data virtualization approach, it involves the use of a virtualization layer to connect to physical data stores. Unlike physical data integration, data virtualization involves the creation of virtualized views of the underlying physical environment without the need for the physical movement of data.