On this website there may be terms that you are familiar with.
As with all specialised professions there is a vocabulary that is specific to that industry. Without understanding that vocabulary you may not be able to contribute and interact with it to the level you require.
This is made more difficult in emerging technologies sectors, who use terminologies that are commonly available as well as create new ones.
We hope that this Glossary of Terms will help you on your way to understanding all that we do, and the industry we are leaders and exponents of.
|Advanced Analytics||"Advanced Analytics" describes products and techniques that produce insights that traditional approaches to business intelligence - such as queries and reporting, are unlikely to discover.
The traditional analytical tools that comprise basic business intelligence generally examine historical data; tools for advanced analytics focus on forecasting future events and behaviours,
Examples of techniques under the advanced analytics umbrella include predictive analytics, data mining, big data analytics, machine learning, pattern matching, forecasting, semantic analysis, sentiment analysis, network and cluster analysis, multivariate statistics, graph analysis, simulation, complex event processing, neural networks, and location intelligence.
|Analytics||"Analytics" describes the practice of putting pieces of information together into new and different combinations -- data models, multidimensional models, predictive models, algorithms, etc. -- such that they approximate a richer, more revealing, more actionable world.
Analytics can take the form of a basic business fact -- sales of this product in this store in this region for this period is one example. An historical analysis could ask how sales of this product at this time compare with sales at this time last year or five years ago.
More sophisticated types of analytics (Advanced Analytics) are far from mundane, however. They use statistical techniques to generate an output -- a prediction, a prescription, a simple correlation – that is a function of one or more input variables.
|Analytics Maturity Model||An "Analytics Maturity Model" is a tool to assess the stage of evolution of an organization in it's ability to integrate, manage, and leverage all relevant internal and external data sources into key decision points.
It is typically divided into levels or stages. The idea is that you cannot move to a higher stage until you have comprehensively met the requirements of all stages below. The goal is to achieve the highest step. It provides a framework for companies to understand where they are, where they've been, and where they still need to go in their analytics deployments.
Note that analytics maturity is not simply about having some technology in place; it involves technologies, data management, analytics, governance and organizational components. It can take years to create and instil an analytics culture in an organization.
|Automated Data Preparation||Preparing data for analysis is one of the most important steps in any project—and traditionally, one of the most time consuming.
"Automated Data Preparation" is where source data is prepared automatically into a desired output format, applying data quality checks to clean up poor quality data, and transforming and manipulating the data into an output format suitable for onward analysis or operational use.
Automated data preparation tools differ from Data Warehouse Automation tools in that the latter are typically focused on producing Kimball style star schema outputs. Automated data preparation tools can produce any desired output format.
|Big Data||In 2001 Doug Laney of Gartner articulated the now-mainstream definition of "Big Data" as the three Vs:
Volume. Organizations collect data from a variety of sources, including business transactions, social media and information from sensor or machine-to-machine data. In the past, storing it would've been a problem – but new technologies (such as Hadoop) have eased the burden.
Velocity. Data streams in at an unprecedented speed and must be dealt with in a timely manner. RFID tags, sensors and smart metering are driving the need to deal with torrents of data in near-real time.
Variety. Data comes in all types of formats – from structured, numeric data in traditional databases to unstructured text documents, email, video, audio, stock ticker data and financial transactions.
More broadly, the phenomenon of big data actually reflects an unprecedented change in how we as human beings imagine and make sense of our world. In the old information economy, the multidimensional OLAP cube was seen as sufficient to capture and model the world of human events. Increasingly, however, we've come to know this world as probabilistic, or at least as nondeterministic; as a context for interdependent events, interactions, and transactions that can't be adequately captured and modelled in the too-limited dimensional context of an OLAP cube. "Big data" is a catch-all term for the ongoing shift to this new information economy. It's an economy that uses statistical, numerical, and analytical technologies and methods to capture and model a richer, more contextualized—an n-dimensional—world.
It's an economy that requires more data, of different types and varieties, at greater rates, than ever before. It's an economy that constantly requires data—that expects to consume data as it streams and pulses. It's an economy, then, for which the old tools—such as the batch-driven data integration and data warehouse model, with its static production reports and its OLAP-based dashboards and analytics—are insufficient. Not outmoded but insufficient. The new information economy needs more: data, tools, technologies, and methods. That's what the umbrella term "big data" is all about.
|Business Intelligence (BI)||The term Business intelligence (BI) describes technology-driven processes for collecting and analyzing business relevant information. The purpose of the analysis is to provide actionable insights to help executives, managers and other corporate end users make informed business decisions based on hard facts.
BI encompasses a wide variety of tools, applications and methodologies that enable organizations to collect data from internal systems and external sources, prepare it for analysis, and then develop and run queries against that data. The query outputs are used to create data visualizations to make the analytical results both easily available and understandable by corporate decision-makers at all levels.
BI applications often source their data from corporate data warehouses or data marts.
|Business Rule|| A "Business Rule" expresses specific constraints on the creation, updating, and removal of persistent data in an information system. For example, the record of a purchase order may not be entered if the customer's credit rating is not adequate. Business rules define how knowledge in one form may be transformed into other knowledge, possibly in a different form.
Business Rules may be defined by convention, industry standard, or government legislation (which we term Standard Business Rules), or by an individual organisation for its own use (which we term a Custom Business Rule).
|Columnar database||A "Columnar Database" is a database management system that stores data in columns instead of rows.
In a columnar database, all the column 1 values are physically together, followed by all the column 2 values, etc. The data is stored in record order, so the 100th entry for column 1 and the 100th entry for column 2 belong to the same input record. This allows individual data elements, such as customer name for instance, to be accessed in columns as a group, rather than individually row-by-row.
By storing data in columns rather than rows, the database can more precisely access the data it needs to answer a query rather than scanning and discarding unwanted data in rows. Query performance is often increased as a result, particularly in very large data sets.
Another of the main benefits of a columnar database is that data can be highly compressedf. The compression permits columnar operations — like MIN, MAX, SUM, COUNT and AVG— to be performed very rapidly. Also, because a column-based DBMSs is self-indexing, it uses less disk space than a relational database management system (RDBMS) containing the same data.
Both columnar and row databases use traditional database languages like SQL to load data and perform queries. Both row and columnar databases can become the backbone in a system to serve data for common ETL and data visualization tools.
|Columnstore Index||A "Columnstore Index" is a type of data structure that is used to store, manage and retrieve data that is stored in a columnar or columnar-style database.
A columnstore index stores data in a column-wise (columnar) format, unlike the traditional B-tree structures used for clustered and nonclustered rowstore indexes, which store data row-wise (in rows).
This structure can offer significant performance gains for queries that summarize large quantities of data, the sort typically used for business intelligence (BI) and data warehousing.
Beginning with SQL Server 2012, you can now define columnstore indexes on SQL Server database tables.
|Dark Data|| "Dark Data" is a term coined by Gartner to describe all of the data an organization collects, processes and stores during regular business activities but doesn't use for analysis, and is therefore not used to derive insight of for decision making.
With the IOT, the ability of an organization to collect data gathered by sensors may exceed the rate at which it can analyze it. In some cases the organization may not even be aware that the data is being collected
|Data Analytics||"Data Analytics" is the science of examining raw data with the purpose of drawing conclusions about that information. Data analytics is used to allow companies and organizations to make better business decisions and in the sciences to verify or disprove existing models or theories.
Data analytics is distinguished from data mining by the scope, purpose and focus of the analysis. Data miners sort through huge data sets using sophisticated software to identifyundiscoveredpatterns and establish hidden relationships. Data analytics focuses on inference, the process of deriving a conclusion based solely on what isalready known by the researcher.
|Data Architecture||A "Data Architecture" is a set of models, policies, rules and standards that govern which data is collected, and how it is stored, arranged, integrated, and put to use in organizations and in their data systems.
Data architecture is a broad term that refers to all of the processes and methodologies that address data at rest, data in motion, and data sets, and how these relate to data dependent processes and applications. It lays out the criteria on processing operations including the whole flow of the system.
|Data Automation||Automation is defined as the use of control systems and information technologies to reduce the need for human work in the production of goods and services.
"Data Automation" is the use of electronic, electromechanical, or mechanical equipment and associated techniques to automatically record, communicate, and process data and to present the resultant information. GPS and bar code systems are good examples of 'source data' data automation tools, but the term covers downstream activities as well; see for example Data Warehouse Automation.
"Data Governance" is the creation of plans, policies, rules, and accountability for overseeing data, at rest or in motion, within an enterprise.
|Data Gravity||"Data Gravity" is a term first coined in a blog by Dave McCrory. McCrory said to consider data as if it were an object with mass.
The Law of Gravity states that the attraction between objects is directly proportional to their weight (or mass). As the mass or density of an object increases, so does the strength of its gravitational pull.
As Data accumulates (builds mass) there is a greater likelihood that additional services, applications, and even customers will be attracted to this data.
Big Data mostly resides in the cloud; data gravity is now a driver for putting applications into the cloud as well; moving large volumes of data can be expensive compared to hosting an application in the cloud close to the data.
|Data Lake||The term "Data Lake" was originally coined by James Dixon, Pentaho Chief Technology Officer.
A data lake is a storage repository that holds a vast amount of raw data in its native format until it is needed. While a hierarchical data warehouse stores data in files or folders, a data lake uses a flat architecture to store data. Each data element in a lake is assigned a unique identifier and tagged with a set of extended metadata tags. When a business question arises, the data lake can be queried for relevant data, and that smaller set of data can then be analyzed to help answer the question.
Before we can load data into a data warehouse, we first need to give it some shape and structure—i.e., we need to model it. That is called schema-on-write.
With a data lake, you just load in the raw data, as-is, and then when you're ready to use the data, that's when you give it shape and structure. That is called schema-on-read.
|Data Lineage||"Data Lineage", or data provenance, in its most general form describes where data came from, how it was derived, and how it was updated over time.
Within a data warehouse, Data Lineage answers the question "where did this data element come from?". Data Lineage is used to support:in-depth data analysis/traceabilityinvestigation of anomalous data valuesimpact analysis. Impact Analysis determines which tables, columns and processes are affected by changes. In addition, lineage information allows the user to trace the impact of faulty source data or buggy programs on derived data sets.
At the corporate level, for organisations that are concerned about critical and sensitive data, the topic of data lineage has become increasingly important. Today's environment of corporate regulation and stringent data governance is intended to reduce the many risks associated with managing data, such as security, privacy and intentional and accidental exposure of sensitive data. To minimise risks, and achieve compliance, whether external or internal, companies must be able to demonstrate where data comes from, where it flows to, and how it's transformed as it travels through the enterprise. This is what data lineage is all about; documenting where data is, and how it flows so it can be managed and secured appropriately as it moves across the corporate network.
|Data Management||"Data Management" is concerned with the end-to-end lifecycle of data, from creation to retirement, and the controlled progression of data to and from each stage within its lifecycle.
Data Management is the process by which data is acquired, validated, stored, protected, and processed, and by which its accessibility, reliability, and timeliness is ensured to satisfy the needs of the data users.
Data management minimizes the risks and costs of regulatory non-compliance, legal complications, and security breaches. It also provides access to accurate data when and where it is needed, without ambiguity or conflict, thereby avoiding miscommunication.
|Data Management Maturity model||A "Data Management Maturity Model" is a framework for the evaluation and assessment of an organisation's data management capabilities. The model allows an organisation to evaluate the current state of its data management capabilities, discover gaps to remediate, and strengths to leverage.
The model is used as a comparison benchmark when evaluating the organisation's capability to implement data management strategies and the level at which that organisation could be at risk from said strategies.
The more mature an organisation is against this benchmark, the less at risk it is in terms of risks associated with poor data management practices.
|Data Mart||A "Data Mart" is a simple form of a data warehouse that is focused on a single subject (or functional area). A data warehouse is a central repository for all an organization's data. The goal of a data mart, however, is to meet the particular demands of a specific group of users within the organization, such as Sales, or Finance. Generally, an organization's data marts are subsets of the organization's data warehouse.|
|Data Mining||"Data mining" is sorting through data to identify patterns and establish relationships.
The phrase data mining is commonly misused to describe software that presents data in new ways. True data mining software doesn't just change the presentation, but actually discovers previously unknown relationships among the data.
Data mining is popular in the science and mathematical fields but is also utilized increasingly by marketeers trying to distill useful consumer data from Web sites.
|Data Preparation||"Data Preparation" covers a range of processing activities that transform data sources into a format, quality and structure suitable for further analytical or operational processing. The process of preparing data generally entails correcting any errors (typically from human and/or machine input), filling in nulls and incomplete data, and merging data from several sources or data formats.|
|Data Quality||"Data Quality" is the degree to which a set of characteristics of data fulfills requirements. Requirements are defined as a need or expectation that is stated, generally implied or obligatory. Examples of characteristics are: completeness, validity, accuracy, consistency across sources, availability and timeliness.
Data quality is a relative and never-ending judgment; one that needs to be defined by the business (or business unit) that's consuming the data.
|Data Refinery||A "Data Refinery" is a process for unifying, cleansing and verifying data content, leaving it in a state ready for subsequent processing or consumption (analytics, reporting, etc). A data refinery process (data refining) can underpin, streamline, and simplify the creation of data warehouse content over and above what can then be achieved using data warehouse automation tools. It supports greater efficiency in the use of Self Service BI tools.|
|Data Space Frame®||"Data Space Frame" is a complete set of documented design principles, methodologies and standards for creating enterprise grade data management solutions. These design standards enable any organization in any sector to create data management products that satisfy the 'build once, deploy many' mantra.
Products created using Data Space Frame are locally configurable and infinitely expandable whilst still maintaining the single code base at the core. This separation means the product can be upgraded in the field to a new release without losing any of the local modifications or configurations in place in each deployed implementation.
Data Space Frame is a registered trademark of Insource Limited.
|Data Storytelling||Once a business has started collecting and combining all kinds of data, the next elusive step is to extract value from it. The data may hold tremendous amounts of potential value, but not an ounce of value can be created unless insights are uncovered and translated into actions or business outcomes.
"Data Storytelling" is the process of translating data analyses into layman's terms in order to influence a business decision or action. With the rise of digital business and data-driven decision making, data storytelling has become a much-talked-about skill.
The idea is to connect the dots between sophisticated data analyses and the decision makers, who may not have the ability to interpret the data. To date, there is no set of best practices on how to tell compelling data stories, but experts often describe data storytelling in traditional storytelling terms, which include a "hook" or a device to draw the listener or reader in, themes, the use of emotion and a conclusion or a set of conclusions.
|Data Warehouse||A "Data Warehouse", also known as an Enterprise Data Warehouse, is a system used for reporting and data analysis, and is considered as a core component of a business intelligence environment. A Data Warehouse is a central repository of integrated data from one or more disparate sources. They store current and historical data and are used for creating analytical reports for knowledge workers throughout an enterprise. Examples of reports could range from annual and quarterly comparisons and trends to detailed daily sales analysis.
The term data warehouse was coined by William H. Inmon, who is known as the Father of Data Warehousing. Inmon described a data warehouse as being a subject-oriented, integrated, time-variant and nonvolatile collection of data that supports management's decision-making process.
|Data Warehouse Automation||"Data Warehouse Automation" is a way to gain efficiencies and improve effectiveness in data warehousing processes.
Every industry since Henry Ford has used automation to increase productivity, reduce manual effort, improve quality and consistency, reduce cost and increase speed of delivery.
Data Warehouse Automation is much more than simply automating the development process. It encompasses all of the core processes of data warehousing including design, development, testing, deployment, operations, impact analysis and change management.
|Delta Load||A "Delta Load" is most usually associated with loading data into data warehouses, but applies whenever data is being imported from a source system into a target system.
A delta load processes a source dataset (aka a delta file, or delta feed) that contains all of the changed items in the source since the previous delta file was produced. These records will all need updating in the target database (or data warehouse).
The delta file is most often produced by the source application itself (e.g. perhaps as an audit file). Usually the delta file is small and very efficient to access, cutting down on the load time compared to processing all the source data every load.
A delta load is a kind of incremental load, but varies in that the delta file contains all the changed records; the extraction process itself does not need to identify changed data in the source, by looking for records modified since the previous incremental load occurred.
|DevOps||"DevOps" is a practice and environment where building, testing, and releasing software can happen rapidly, frequently, and reliably.
DevOps involves operations and development engineers participating together in the entire service lifecycle, from design through the development process to production support, for resilient systems at scale.
Because DevOps is a collaboration between development, operations and testing, there is no single DevOps 'tool'; rather there is a set of tools which is reflective of the software development and delivery process:
• Code – Code development and review, continuous integration tools
• Build – Version control tools, code merging, build status
• Test – Test and results determine performance
• Package – Artifact repository, application pre-deployment staging
• Release – Change management, release approvals, release automation
• Configure – Infrastructure configuration and management, Infrastructure as Code tools
• Monitor – Applications performance monitoring, end user experience
|Digital Maturity Model||Digital Maturity is the ability of an organization to respond to the immediate needs of a digital savvy customer base, while ensuring its long term survival using technology to operate more effectively and efficiently.
A "Digital Maturity Model" is a framework that is used as a benchmark for comparison when looking at an organisation's processes for identifying, articulating and executing on digital opportunities that will increase that organisation's competitive advantage.
Organisations that are further along within the maturity model are more likely to repeatedly achieve successful completion of their projects.
Specifically for the NHS, NHS England has produced a Digital Maturity Model and an associated assessment tool that measures the extent to which healthcare services in England are supported by the effective use of digital technology.
For 2015/16, the following five objectives for the Digital Maturity Self-assessment process have been identified:To identify key strengths and gaps in providers' ability to operate paper-free at the point of careTo support internal planning, prioritisation and investment decisions within providers towards operating paper-freeTo support planning and prioritising of investment decisions within commissioner-led footprints to move local health and care economies towards operating paper-freeTo provide a means of baselining / benchmarking nationally the current ability of providers to operate paper-freeTo identify the capacity and capability gaps in local economies to transform services and operate paper-free
See here for more details:
|ELT||"ELT" stands for Extract, Load, Transform, a variation on the better known ETL set of processes.
•Extract is the process of reading data from a source into a staging database
•Load is the process of writing the data into the target database after basic integrity checks have been completed. This provides a validated and cleaned offline copy of the source data in the target database or data warehouse.
•Transform is the process of re-shaping the loading data into its desired output format. Transformation occurs by using rules or lookup tables or by combining the data with other data.
When using ETL the transformations are processed by the ETL tools, while in ELT the transformations are processed by the target database.
ELT has the benefit of minimizing the processing on the source system, since no transforming is being done, which can be extremely important if the source is a production system where you could be impacting end-user performance.
|Embedded Analytics||"Embedded analytics" is the use of reporting and analytic capabilities in transactional business applications. These capabilities can reside outside the application, reusing the analytic infrastructure built by many enterprises, but must be easily accessible from inside the application, without forcing users to switch between systems.
The goal is to help users work smarter by incorporating together relevant data and analytics that help solve business problems, and to also work more efficiently as these capabilities are available inside the applications that are used every day.
|Embedded BI||"Embedded analytics" is the use of reporting and analytic capabilities in transactional business applications. These capabilities can reside outside the application, reusing the analytic infrastructure built by many enterprises, but must be easily accessible from inside the application, without forcing users to switch between systems.
The goal is to help users work smarter by incorporating together relevant data and analytics that help solve business problems, and to also work more efficiently as these capabilities are available inside the applications that are used every day.
|Enterprise Class Software||"Enterprise Class Software" is another name for Enterprise Grade Software. Refer to the Enterprise Grade Software definition.|
|Enterprise Data Warehouse||An "Enterprise Data Warehouse" is another name for a Data Warehouse.|
|Enterprise Grade Software||"Enterprise Grade Software" , also known as Enterprise Class Software, is a term that refers to applications that are designed to be robust and scalable across a large organization, and that are reliable and powerful enough to serve companies of any size. There are no firm standards for what makes an application or platform enterprise grade, but enterprise grade applications are generally:
|ETL||"ETL" is short for Extract, Transform, Load, three processes used in database and data warehouse solutions.
With ETL only the final transformed data is available in the target database.
ETL can be contrasted with ELT (Extract, Load, Transform) which transfers raw data from a source on to the target server which then prepares (transforms and loads ) the information for downstream uses.
With ELT all the loaded source data is available in the target database.
|Graph Database||A "Graph Database", also called a graph-oriented database, is a type of database that uses graph theory to store, map and query relationships.
A graph is composed of two elements: a node and a relationship. Each node represents an entity (a person, place, thing, category or other piece of data), and each relationship represents how two nodes are associated. For example, the two nodes "cake" and "dessert" would have the relationship "is a type of" pointing from "cake" to "dessert."
Relationships take first priority in graph databases. This contrasts with conventional relational databases, where links between data are stored in the data itself (i.e. foreign keys), and queries search for this data and use the JOIN concept to collect the related data.
Graph databases, by design, allow simple and rapid retrieval of complex hierarchical structures that are difficult to model in relational systems.
|Hadoop||"Hadoop" is a free, Java-based programming framework that supports the processing of large data sets in a distributed computing environment. It is part of the Apache project sponsored by the Apache Software Foundation.
Hadoop can provide fast and reliable analysis of both structured data and unstructured data. Hadoop makes it possible to run applications on systems with thousands of nodes involving thousands of terabytes. Given its capabilities to handle large data sets, it's often associated with the phrase Big Data.
The core of Hadoop consists of a storage part, known as Hadoop Distributed File System (HDFS), and a processing part called MapReduce. Hadoop splits files into large blocks and distributes them across nodes in a cluster. To process data, Hadoop transfers packaged code for nodes to process in parallel based on the data that needs to be processed.
Doug Cutting, Hadoop's creator, named the framework after his child's stuffed toy elephant.
|Hybrid Data Warehouse||A "Hybrid Data Warehouse" virtually and dynamically combines data that is stored in the data warehouse with other data that are stored in other external, independent systems.
A hybrid data warehouse has an architecture in which data lakes and warehouses coexist. It is another term for a Logical Data Warehouse.
|In Memory Database||An "In Memory" database uses a system's main memory for data storage rather than the disk-based storage typically utilized by traditional databases. Accessing data in memory eliminates seek time when querying the data, which provides faster and more predictable performance than disk. Source data is loaded into system memory in a compressed, non-relational format.
Three developments in recent years have made in-memory databasess increasingly feasible: 64-bit computing, multi-core servers and lower RAM prices.
An in memory database is also known as a Main Memory Database System or a Memory Resident Database.
|Incremental Data Load||"Incremental Data Load" is most usually associated with loading data into data warehouses, but applies whenever data is being imported from a source system into a target system. Incremental data loads allow you to import only the data which has changed since the previous import. You can even, optionally, remove items from the warehouse which have been deleted in the source.
In order to support Incremental Loads, your source table must have:
1. A Primary Key.
2. A "Last Updated Date" field
The technique is employed to perform a faster load in less time utilizing less system resources.
The main disadvantage is around maintainability. With a full load, if there's an error you can re-run the entire load without having to do much else in the way of cleanup / preparation. With an incremental load, the files generally need to be loaded in order. So if you have a problem with one batch, others queue up behind it till you correct it.
NB: An incremental load is not the same as a delta load. With a delta load, some other process (at the source end) has already identified all the changed records and produced a source dataset that containsall of the changed items which need updating in the target system.
|Industrial Internet of Things||The "Industrial Internet of Things (IIoT)" is the use of Internet of Things (IoT) technologies in manufacturing. Energy, healthcare, automotive, and now other industries are beginning to grapple with the IIoT, where devices such as sensors, robots, mixing tanks, and insulin pumps are becoming increasingly more connected.
Although the Industrial Internet of Things has been heralded primarily as a way to improve operational efficiency, it can also be used as a tool for finding growth by exposing new unexpected market opportunities. IIoT in manufacturing will probably generate so much business value that it will eventually lead to the fourth industrial revolution, so the so-called Industry 4.0.
|Intelligent validation||"Intelligent Validation" relates to the task of validating Patient Pathways (the Referral to Treatment journey) in an acute hospital. Intelligent Validation is an effective, 'right-first-time' method of pathway validation that takes place as an integral part of all the daily "business as usual" pathway management activity. Intelligent validation embeds best practice in all pathway processes, leading to fewer anomalies in data recording that need to be managed and fixed retrospectively.
Intelligent validation requires advanced technology based systems to support the trust staff involved in managing pathways but it provides the most effective, efficient, and sustainable way of managing patient pathways.
Once Intelligent Validation is established across a trust, there is no longer a need for dedicated PTL validation teams to carry out regular reviews of the PTL looking for 'lost' pathways, or erroneous pathways that could and should be closed. Additionally, as the root causes of these errors are fixed 'at source' during normal day-to-day processing, no backlogs build up that require special validation projects or teams to address.
|Internet of Everything (IoE)||Termed by Cisco, the "Internet of Everything" (IoE) is the intelligent connection of people, process, data and things. The concept includes, besides machine to machine communications, machine-to-people, and technology-assisted people-to-people interactions.
By comparison, the "Internet of Things" refers simply to the networked connection of physical objects (doesn't include the "people" and "process" components of the Internet of Everything).
|Internet of Things (IoT)||The "Internet of Things" (IoT) is the network of physical entities (devices, vehicles, buildings and other items) —embedded with electronics, software, sensors, actuators, and network connectivity, that enable these entities to collect and exchange data without requiring human-to-human or human-to-computer interaction. British entrepreneur Kevin Ashton coined the term in 1999.|
|Key-Value Databases||A Key-Value database (also known as a key-value store, or a key-value store database) is a type of NoSQL database. A key-value database stores an associative array (aka a dictionary or a hash). A dictionary holds a collection of objects or records. Each record holds one or more fields containing data (collectively the "value"). A record is accessed using a key (the "Key") that uniquely identifies it within the dictionary.
Key-value databases work in a very different fashion from relational database management systems. (RDBMS systems). With an RDBMS you pre-define the data structure in the database as a series of tables containing fields with defined data types. Key-value systems treat the data as a single opaque collection (the value) which may have different fields for every record. The consumer has to know how to interpret the value. This offers considerable flexibility and more closely follows modern concepts used in object-oriented programming.
Redis and Oracle NoSQL Database are examples of key-value databases.
|Logical Data Warehouse||A "Logical Data Warehouse" is a data management architecture for analytics which combines the strengths of traditional repository warehouses with alternative data management and access strategy.
A logical data warehouse has an architectural layer that sits atop the usual data warehouse store of persisted data. The logical layer provides several mechanisms for viewing data in the warehouse store and elsewhere across an enterprise without relocating and transforming data ahead of view time. A logical data warehouse complements the traditional core warehouse (and its primary function of a priori data aggregation, transformation, and persistence) with functions that fetch and transform data, in real time (or near to it), thereby instantiating non-persisted data structures, as needed. A logical data warehouse is another name for a Hybrid Data Warehouse.
|Master Data Management||"Master Data Management" (MDM) is a comprehensive method of enabling an enterprise to link all of its critical data (master data) to one file, called a master file, which provides a common point of reference.
Master data are the products, accounts and parties for which business transactions are completed. The root cause problem stems from business unit and product line segmentation, in which the same party (e.g. customer) will be serviced by different product lines, with redundant data being entered about the customer and account in order to process the transaction. The redundancy of party and account data is compounded in the front to back office life cycle, where the authoritative single source for the party, account and product data is needed but is often once again redundantly entered or augmented.
Master data management has the objective of providing processes for collecting, aggregating, matching, consolidating, quality-assuring, persisting and distributing such data throughout an organization to ensure consistency and control in the ongoing maintenance and application use of this information.
|Normalisation||"Normalisation", also known as data normalization, is the process of reorganizing data in a database so that it meets two basic requirements: (1) There is no redundancy of data (all data is stored in only one place), and (2) data dependencies are logical (all related data items are stored together).
So for instance, rather than storing an employees office address within each employee record, you would create a relationship (put in a foreign key) to a record in a separate Address table. All the employee records for the staff at the same office would point to the same single row in the Address table. If that office address changes, there is only one record to update.
The objective is to isolate data so that additions, deletions, and modifications of a field can be made in just one table and then propagated through the rest of the database via the defined relationships. This reduces data redundancy and improves data integrity.
There are three main normal forms, the most common is Third Normal Form (3NF). In 3NF, no duplicate information is permitted. So, for example, if two tables both require a birthdate field, the birthdate information would be separated into a separate table, and the two other tables would then access the birthdate information via an index field in the birthdate table. Any change to a birthdate would automatically be reflected in all tables that link to the birthdate table.
|NoSQL Database||A "NoSQL" (meaning "non SQL", or "not only SQL") database provides a mechanism for storage and retrieval of data which is modelled in means other than just the tabular relations used in relational databases. They may or may not support SQL-like query languages.
NoSQL is especially useful when an enterprise needs to access and analyze massive amounts of unstructured data or data that's stored remotely on multiple servers in the cloud (commonly known as Big Data).
Examples of NoSQL databases include Hadoop and Apache Cassandra.
|Object-Oriented Database||"Object oriented databases", also called Object databases, and Object Database Management Systems (ODBMS), are databases that store user definable objects rather than data such as integers, strings or real numbers.
Object oriented databases emerged in the mid-80's in response to the feeling that relational databases were inadequate for certain classes of applications like CAD which dealt with many complex, nested objects.
Object-oriented databases combine database capabilities with object-oriented programming language capabilities. Object oriented languages include Smalltalk, C++, Java, and others.
Object databases are best used when there is complex data and/or complex data relationships. This includes a many to many object relationship.
Examples of object-oriented database engines include db4o, Smalltalk and Cache.
|OLAP||"OLAP" is an acronym for Online Analytical Processing. OLAP enables a user to easily and selectively extract and view data from different points of view.
To facilitate this kind of analysis, OLAP utilises a multidimensional schema architecture (termed a cube, though they typically have more than three dimensions). Whereas a relational database can be thought of as two-dimensional (rows and columns), a multidimensional architecture or database considers each data attribute (such as product, geographic sales region, and time period) as a separate "dimension." OLAP software can locate the intersection of dimensions (all products sold in the Eastern region above a certain price during a certain time period) and display them. Attributes such as time periods can be broken down into sub-attributes (e.g. year, quarter, month, day).
The cube metadata is typically created from a star schema or snowflake schema or fact constellation of tables in a relational database. Measures are derived from the records in the fact table and dimensions are derived from the dimension tables.
MOLAP (multi-dimensional online analytical processing) is the classic form of OLAP and is sometimes referred to as just OLAP. MOLAP stores data in an optimized multi-dimensional array storage, rather than in a relational database.
Some MOLAP tools require the pre-computation and storage of derived data, such as consolidations – the operation known as processing. Such MOLAP tools generally utilize a pre-calculated data set referred to as a data cube. The data cube contains all the possible answers to a given range of questions. As a result, they have a very fast response to queries. On the other hand, updating can take a long time depending on the degree of pre-computation.
Relational OLAP (ROLAP) tools work directly off of Star or Snowflake schemas in relational databases, and do not require pre-computation.
|Power BI||"Power BI" aka "PowerBI" is Microsoft's cloud-based business intelligence technology that is part of the Office 365 suite, the cloud-based suite of productivity applications. Like other BI tools, Power BI aims to democratize the data analysis process by analyzing and visualizing data in a self-service way.
Power BI is a hybrid of on-premises and cloud components. Authoring is done in Excel 2013 on Windows desktops. Publishing is through Office 365's SharePoint service—requiring end-users to have only a Web browser or a tablet device for native mobile apps.
If you don't want to publish via the cloud, each consumer of the report will need the PowerBI desktop platform deployed to their PC, and the report definitions will need to be held in a SQL Server 2016 database.
|Predictive Analytics||"Predictive analytics" uses data mining techniques to predict trends and behavior patterns.
The central element of predictive analytics is the predictor, a variable that can be measured for an individual or other entity to predict future behavior. For example, an insurance company is likely to take into account potential driving safety predictors such as age, gender, and driving record when issuing car insurance policies.
|Relational Database Management System (RDBMS)||A "Relational Database Management System (RDBMS)" is a type of database management system (DBMS) that stores data in the form of related tables. The term "relational database" was invented by E. F. Codd at IBM in 1970. The underlying relational theory that Codd developed is based on the mathematics of set theory, which delivers rigor and accuracy to data access and manipulation.
Relational databases are powerful because they require few assumptions about how data is related or how it will be extracted from the database. As a result, the same database can be viewed in many different ways. Data is extracted from an RDBMS using SQL.
|Self Service Analytics||"Self Service Analytics" is an approach to advanced analytics that allows business users to manipulate data, perform queries, and generate reports on their own, with little or no IT support.
Self-service analytics is characterized by simple-to-use BI tools with basic analytic capabilities and an underlying data model that has been simplified or scaled down for ease of understanding and straightforward data access.
This term is used interchangeably with Self Service Business Intelligence.
|Self Service Business Intelligence||"Self-Service Business Intelligence" is where end users design and deploy their own reports and analyses within an approved and supported architecture and tools portfolio.
It supports business people (who don't need to be technically savvy) accessing the data they need for their decision-making, without having to go to technology experts each time they have a new question to be answered.
Self-service business intelligence tools allow people to gather information from multiple sources, analyze it, and share it with others, without having to know the technical protocols required to access the data.
|Self Service Data Preparation||"Self Service Data Preparation" tools are aimed primarily at Business Analysts (rather than IT specialists).
The tools allow users to combine, transform and cleanse relevant data prior to analysis: to "prepare" it. Typically the tools work across all types of data (structured, semi-structured and unstructured) and across all data sources (both internal to an organisation and external).
A common application would be for exploration of a "data lake" or for use in big data environments more generally.
|Single Version of the Truth||"Single Version of the Truth" (SVoT) is the concept of leveraging a Data Warehouse to provide the sole source of all the data an organization uses for its reporting. It is not easy to achieve, and the term Single Version of the Facts might be more appropriate. Truth can be an interpretation of a fact (e.g. is a glass half empty or half full?), but if everyone starts from the same set of facts it will help. For an organization to truly reach a SVoT, the organization must collectively:Agree the definition of a data item (fact)Agree the derivation logic for calculated data itemsBelieve in the quality of the data
NB: Insource has trademarked the name SVT Engine ®, the processing logic at the heart of its industry leading Acute Health Data Enterprise product for acute hospitals. AHDE uses the published NHS data definitions, supports lineage to show how all calculated data items are derived, and incorporates many data preparation tools to highlight data quality issues.
|Snowflake Schema||A "Snowflake Schema" architecture is a more complex variation of a star schema. Business data is still divided into either data that can be measured ('measures', held in fact tables) or the data providing the "who, what, where, when, why, and how" descriptive context (dimensions). Snowflake schemas differ from Star Schemas in that the tables which describe the dimensions are normalized.
The schema is diagrammed with each fact table surrounded by its associated dimension tables (as in a star schema), but those dimension tables are further related to other dimension tables, branching out into a snowflake pattern.
The normalization of the dimension tables (snowflaking) is used to improve the performance of certain queries.
As with a star schema, a snowflake schema is instantiated in a relational database, but can be used to create an OLAP cube in a multi-dimensional database such as Microsoft SQL Server Analysis Services.
|SQL||"SQL" stands for Structured Query Language. SQL is a standardized programming language used for managing relational database management systems (RDBMSs) and performing all the operations on the data in them.
First developed in the early 1970s at IBM by Raymond Boyce and Donald Chamberlin, SQL was commercially released by Relational Software Inc. (now known as Oracle Corporation) in 1979.
An official SQL standard was adopted by the American National Standards Institute (ANSI) in 1986 and then by the International Organization for Standardization, known as ISO, in 1987. The most recent version is SQL:2011.
Major RDBMS vendors also have proprietary versions of SQL that are incorporated and built on ANSI SQL, e.g., SQL*Plus (Oracle), and Transact-SQL (T-SQL) (Microsoft).
|Star Schema||A "Star Schema" is the simplest form of architecture for supporting dimensional modelling (aka 'cubes', 'slicing and dicing'). With a star schema, business process data is separated into facts, which hold the measurable, quantitative data about a business, and dimensions which are descriptive attributes related to fact data.
A fact is an event that is counted or measured, such as a sale. A dimension contains reference information about the fact, such as date, product, customer, or sales region.
The schema supports analysis of the fact data using any of the dimensions (e.g. sales by region, sales by product, sales by customer). The dimensions can be hierarchical to support queries such as sales all time, or by year, or by quarter, by month etc.
Usually the fact tables in a star schema are in third normal form(3NF) whereas dimensional tables are de-normalized.
The star schema gets its name from the physical model's resemblance to a star shape with a fact table at its centre and the dimension tables surrounding it representing the star's points.
A star schema is created within a relational database. It can be used to create an OLAP cube within a multi-dimensional database such as Microsoft SQL Server Analysis Services. An OLAP cube contains dimensional attributes and facts, but it is accessed via languages with more analytic capabilities than SQL, such as XMLA.
|Stream Processing||"Stream Processing" is the processing of data 'on the fly' as it streams into or through a server. In contrast to a traditional database model, where data is first prepared (stored in tables, indexed etc) and then processed by queries, stream processing takes the inbound data while it is in flight (i.e. using SQL-type queries that operate over time and buffer windows). This allows high-velocity, high-volume data to be processed with minimal latency.|
|Structured Data||"Structured data" describes data that can reside in a fixed fields within a record or file. This includes data contained in relational databases and spreadsheets.
Data is structured if you know enough about it to be able to create a data model in advance – a model of the types of business data that will be recorded and how they will be stored, processed and accessed. This includes defining what fields of data will be stored and how that data will be stored: data type (numeric, currency, alphabetic, name, date, address) and any restrictions on the data input (number of characters; restricted to certain terms such as Mr., Ms. or Dr.; M or F).
Structured data has the advantage of being easily entered, stored, queried and analysed.
|Technical Debt||"Technical Debt" is a term coined by Ward Cunningham to describe the cumulative consequences of corners being cut throughout a software project's design and development.
The debt can be thought of as work that needs to be done before a particular job can be considered complete or proper. If the debt is not repaid, then it will keep on accumulating interest, making it harder to implement changes later on.
Technical debt is usually as a result of concentrating mainly on functionality of the code and not as much on its documentation and on-going support, as well as omitting administration and maintenance features within an application. Technical debt has a financial impact just as real debt does.
|Unstructured Data||"Unstructured Data" (or unstructured information) refers to information that either does not have a pre-defined data model or is not organized in a pre-defined manner.
Unstructured data can be textual or non-textual. Textual unstructured data is generated in media like email messages, blog entries, PowerPoint presentations, Word documents, collaboration software and instant messages. Non-textual unstructured data is generated in media like JPEG images, MP3 audio files and Flash video files.
Unstructured information may contain data such as dates, numbers, and facts as well. For example emails have the sender, recipient, date, time and other fixed fields added to the unstructured data of the email message content and any attachments. This is sometimes called semi structured data.
The irregularities and ambiguities in unstructured data make it difficult to understand using traditional programs as compared to data stored in fielded form in databases or annotated (semantically tagged) in documents such as XML.