IT Standard Terminology (Glossary)

Definition of IT industry standard terms and their definitions.

Application Integration

The sharing of processes and/or data with different application systems within an organization using real-time communication. It is typically implemented to increase application efficiency and improve scalability between systems.

Apache Kafka

A distributed publish-subscribe messaging system that receives data from disparate source systems and makes the data available to target systems in real-time. It facilitates the asynchronous data exchange between processes, applications, and servers. 

Change Data Capture (CDC)

Capturing the changes made to a production data source, typically performed by reading the audit or log files.

Connector

Refers to specific software used to create a data connection and is used as a synonym for middleware.

Cloud Data Management

A method to manage data across cloud platforms, either with or instead of on-premises storage. The goal is to curb rising cloud storage costs. It is emerging as an alternative to data management since instead of buying on-premises storage resources and managing them, resources are bought on-demand in the cloud.

Cloud Migration

Movement of data, processes, and applications from on-premises storage or legacy infrastructure to cloud-based infrastructure for storage, application processing, data archiving, and ongoing data lifecycle management.

Data Blending

A data management technique that provides a fast, easy, and flexible method to extract value from multiple data sources and to find patterns without the deployment of a traditional data warehouse architecture.

Data Cleansing

The transformation of data from its native or raw state to a pre-defined or standardized format or structure using customized software.

Data Governance

The application of rules and standards for the management of the availability, usability, integrity, and security of the data stored within an enterprise.

Data Integration

The combination of business and technical processes that are used to merge data from multiple disparate sources into standard architectures for the purpose of gaining meaningful insights.

Data Lake

Refers to unstructured data that is sitting on different storage environments and clouds. The data lake supports data of all types  (ie. text, video, blog, etc).

Data Mapping

Data mapping is a linking process that creates data element mappings between two different data models for the purposes of integration. It is generally used as an initial step for a wide array of data integration tasks, including data transformation between a data source and a destination.

Data Mart

A data repository that contains data arranged in specific patterns (star schema, snowflake schema, etc) to support informational applications.

Data Quality

Refers to the overall level of “quality” or perceived value of the data. If a particular datastore is seen as holding highly relevant data for a project, that data is seen as a quality to the users when their is a high degree of trust in the data values.

Data Virtualization

A Data Integration approach that allows applications to retrieve and manipulate data without requiring technical details about the data. Virtualization is seen as an alternative to the traditional ETL process.

Data Warehouse

This is conceived as a central location of integrated data from other more disparate sources, storing both current (real-time) and historical data which can then be used to create trends reports. They are generally used for analytics in an information system.

Extract, Transform, Load (ETL)

The ETL process refers to the three main tasks performed in a data integration/migration process. Each of the tiers (Extract, Transform or Load) can include multiple levels.

Hybrid Cloud

An application environment model that combines an on-premises data center (also called a private cloud) with a public cloud, allowing data and applications to be shared between them.

Master Data Management

An industry-standard term that incorporates processes, policies, standards, tools, and governance to define and manage all of an enterprise’s most critical information in order to articulate a single point of reference.

Metadata

Metadata means “data about data” or data that describes other data, it makes finding and working with data easier by allowing the user to sort or locate specific documents.

Namespace

An XML namespace is a collection of names that can be used as either elements or attribute names in another XML document. The role of the namespace is to qualify element names uniquely on the Web to avoid conflicts between elements with the same name.

REST

REST (Representational State Transfer) is a software architectural style for distributed hypermedia systems, used in the development of Web services. Distributed applications send and receive data via REST.

S3

The S3 protocol is used in a URL that specifies the location of an Amazon S3 (Simple Storage Service) bucket and a prefix to use for reading or writing files in the bucket.

Scalability

The ability to increase and upscale key system services; inbound and outbound volumes of data, accessibility of processes and services, and the number of users accessing data critical to the enterprise.

Software as a Service (SaaS)

A software client delivery model in which software is licensed on a subscription basis and is centrally hosted and typically accessed by end-users using a client (ie. web browser, mobile device, etc).

Spring Framework

This is an open-source application framework that provides infrastructure support that enables faster development of Java applications by externalizing the configuration and using dependency injection.

Spring Boot

Spring Boot is an extension of the Spring framework, but it has some specific features that make the application easier for working within the developer ecosystem. This extension includes pre-configurable web starter kits to facilitate the responsibilities of an application server

Unstructured Data

Unstructured data refers to data that doesn’t fit nicely into the traditional database architecture and has no identifiable internal order or structure. It is considered to be the opposite of structured data, which is data stored in a database.