| λ |
The Greek letter "lambda" used to represent the mean of a Poisson distribution. |
| μ |
The Greek letter "mu" used to represent the mean of a population. |
| Accessibility |
The characteristic of being able to access data when it is required. |
| Accuracy to reality |
A characteristic of information quality measuring the degree to which a data value (or set of data values) correctly represents the attributes of the real-world object or event. |
| Accuracy to surrogate source |
A measure of the degree to which data agrees with an original, acknowledged authoritative source of data about a real world object or event, such as a form, document, or unaltered electronic data received from outside the organization. See also Accuracy. |
| Aggregation |
The process of associating objects of different types together in a meaningful whole. Also called composition. |
| Algorithm |
A set of statements or a formula to calculate a result or solve a problem in a defined set of steps. |
| Alias |
A secondary and non-standard synonym or alternate name of an enterprise standard business term, entity type or attribute name, used only for cross reference of an official name to legacy or software package data name. |
| ANSI |
Acronym for American National Standards Institute, the U.S. body that sets standards. |
| Application |
A collection of computer hardware, computer programs, databases, procedures, and knowledge workers that work together to perform a related group of services or business processes. |
| Application architecture |
A graphic representation of a system showing the process, data, hardware, software, and communications components of the system across a business value chain. |
| Archival database |
A copy of a database saved in its exact state for historical purposes, recovery, or restoration. |
| Artificial Intelligence (AI) |
The capability of a system to perform functions normally associated with human intelligence, such as reasoning, learning, and self-improvement. |
| Association |
See Relationship. |
| Associative entity type |
An entity type that describes the relationship of a pair of entity types that have a many-to-many relationship or cardinality. For example, COURSE COMPLETION DATE has meaning only in the context of the relationship of a STUDENT and COURSE OFFERING entity types. |
| Asynchronous replication |
Replication in which a primary data copy is considered complete once the update transaction completes, and secondary replicated data copies are queued to be updated as soon as possible or on a predefined schedule. |
| Atomic value |
An individual data value representing the lowest level of meaningful fact. |
| Attribute |
An inherent property, characteristic, or fact that describes an entity or object. A fact that has the same format, interpretation, and domain for all occurrences of an entity type. An attribute is a conceptual representation of a type of fact that is implemented as a field in a record or data element in a database file. |
| Attributive entity type |
An entity type that cannot exist on its own and contains attributes describing another entity. An attributive entity type resolves a one-to-many relationship between an entity type and a descriptive attribute that may contain multiple values. Also called characteristic or dependent entity type. |
| Audit trail |
Data that can be used to trace activity such as database transactions. |
| Authentication |
The process of verifying that a person requesting a resource, such as data or a transaction, has authority or permission to access that resource. |
| Availability |
A percentage measure of the reliability of a system indicating the percentage of time the system or data is accessible or usable, compared to the amount of time the system or data should be accessible or usable. |
| Backup |
To restore a database to its state at a previous point in time. Backup is achieved : (1) from an archived or a snapshot copy of the database at a specified time; or (2) from an archived copy of a database and applying the logged update activity of changes since that archived copy was made. |
| Benchmarking |
The process of analyzing and comparing an organization's processes to that of other organizations to identify Best practices. |
| Best practice |
A process, standard or component that is generally recognized to produce superior results when compared with similar processes, standards or components. |
| Bias |
A vested interest, or strongly held paradigm or condition that may skew the results of sampling, measuring, or reporting the findings of a quality assessment. For example, if information producers audit their own data quality, they will have a bias to overstate its quality. If data is sampled in such a way that it does not reflect the entire population sampled, the sample result will be biased. |
| Biased sampling |
Sampling procedures that result in a sample that is not truly representative of the population sampled. |
| Bounds |
See Confidence interval. |
| Boyce/Codd Normal Form (BCNF) |
(1) A relation R is in Boyce/Codd normal form (BCNF) if and only if every determinant is a candidate key. (2) A table is in BCNF if every attribute that is a unique identify of attributes describing an entity is a candidate key of that entity. |
| Business application model |
A graphic illustration of the conceptual application systems, both manual and automated, including their dependencies, required to perform the processes of an organization. |
| Business information resource data |
The Set of information resource data that must be known to information producers and knowledge workers in order to understand the meaning of information, the business rules that governs its quality and the stakeholders who create or require it. |
| Business information steward |
A business subject-matter expert designated and accountable for overseeing some parts of data definition for a collection of data for the enterprise, such as data definition integrity, legal restriction compliance standards, data quality standards, and authorization security. |
| Business process |
A synonym for value chain, the term is used to differentiate a value chain of activities from a functional process or functional set of activities. |
| Business process model |
A graphic and descriptive representation of business processes or value chains that cut across functions and organizations. The model may be expressed in different levels of detail, including decomposition into successive lower levels of activities. |
| Business process reengineering |
the process of analyzing, redefining, and redesigning business activities to eliminate or minimize activities that add cost and to maximize activities that add value. |
| Business resource category |
A business classification of data about a resource the enterprise must manage across business functions and organizations, used as a basis for high-level information modeling. The internal resource categories are human resource, financial, materials and products, facilities and tangible assets, and information. External resources include business partners, such as suppliers and distributors; customers; and external environment, such as regulation and economic factors. Also called subject area. |
| Business rule |
A statement expressing a policy or condition that governs business actions and establishes data integrity guidelines. |
| Business rule conformance |
See Validity. |
| Business term |
A word, phrase, or expression that has a particular meaning to the enterprise. |
| Business value chain |
See Value chain |
| Candidate key |
A key that can serve to uniquely identify occurrences of an entity type. A candidate key must have two properties : (1) Each occurrence or record must have a different value of the key, so that a key value identifies only one occurrence; and (2) No attribute in the key can be eliminated without nullifying the first property. |
| Cardinality |
The number of occurrences that may exist between occurrences of two related entity types. The cardinalities between a pair of related entity types are : one to one, one to many, or many to many. See Relationship. |
| CASE |
Acronym for Computer-Aided Systems Engineering. the application of automated technologies to business and information modeling and software engineering. |
| CASS |
(Coding Accuracy Support System) : A system for verifying the integrity of United States addresses against a USPS maintained database containing every mailing address in the United States. The system is concerned with just the addresses, not the people or organizations residing at these addresses. |
| Catalog |
The component of a Database Management System (DBMS) where physical characteristics about the database are stored, such as its physical design schema, table or file names, primary keys, foreign key relationships, and other data required for the DBMS to manage the data. |
| Cause-and-effect diagram |
A chart in the shape of a "fishbone" used to analyze the relationship between error cause and error effect. The diagram, invented by Ishikawa, shows a specific effect and possible causes or error. The errors are drawn in four categories, each a bone on the fish. The categories are : (1) Human (Ishikawa called this manpower), (2)Methods, (3) Machines, and (4) Materials. |
| Central tendency |
The phenomenon that data measured from a process generally aggregates around a value somewhere between the high and low values. |
| Champion |
In Six Sigma, the executive or manager who "owns" a process to be improved, and whose role is an advocate for the improvement project, with oversight and management of critical elements, reporting project success to up-line management, and who removes barriers to enable project improvement success. |
| Checklist |
A technique for quality improvement to identify steps to perform or items to check before work is complete. |
| Class word |
See Domain type. |
| Cleansing |
See Data cleansing. |
| Cluster |
(1) A way of storing records or rows from one or more tables together physically, based on a common key or partial key value (ER). (2) Groups of objects that have similar characteristics or behaviors that are significantly different from other objects that are discovered through data analysis or mining (Stat). |
| Cluster sampling |
Sampling a population by taking samples from a smaller number of subgroups (such as geographic areas) of the population. The subsamples from each cluster are combined to make up the final sample. For example, in sampling sales data for a chain of stores, one may choose to take a subsample of a representative subset of stores (each a cluster) into a cluster sample rather than randomly select sales data from every store. |
| Code |
(1) To represent data in a form that can be accepted by an application program. (2) : A shorthand representation or abbreviation of a specific value of an attribute. |
| Commit |
A DML command that signals a successful end of a transaction and confirms that a record(s) inserted, updated, or deleted in the database is complete. |
| Common cause |
A source of unacceptable variation or defect caused by the process or system itself. See also Special cause. |
| Completeness |
A characteristic of information quality measuring the degree to which all required data is known. (1) Fact completeness is a measure of data definition quality expressed as a percentage of the attributes about an entity type that need to be known to assure that they are defined in the model and implemented in a database. For example, "80 percent of the attributes required to be known about customers have fields in a database to store the attribute values." (2) Value completeness is a measure of data content quality expressed as a percentage of the columns or fields of a table or file that should have values in them, in fact do so. For example, "95 percent of the columns for the customer table have a value in them." Also referred to as Coverage. (3) Occurrence completeness is a measure of the percent of records in an information collection that it should have to represent all occurrences of the real world objects it should know. For example, does a Department of Corrections have a record for each Offender it is responsible to know about? (IQ). |
| Conceptual data model |
See Data model. |
| Concurrency |
(1) A characteristic of information quality measuring the degree to which the timing of equivalence of data is stored in redundant or distributed database files. The measure data concurrency may describe the minimum, maximum, and average information float time from when data is available in one data source and when it becomes available in another data source. Or it may consist of the relative percent of data from a data source that is propagated to the target within a specified time frame. |
| Concurrency assessment |
An audit of the timing of equivalence of data stored in redundant or distributed database files. See Equivalence. |
| Concurrency control |
A DBMS mechanism of locking records used to manage multiple transactions access to shared data. |
| Conditional relationship |
An association that is optional depending on the nature of the related entities or on the rules of the business environment. |
| Confidence interval, or confidence interval of the mean |
The upper and lower limits or values, or bounds on either side of a sample mean for which a confidence level is valid. |
| Confidence level |
The degree of certainty, expressed as a percentage, of being sure that the value for the mean of a population is within a specific range of values around the mean of a sample. For example, a 95 percent confidence level indicates that one is 95 percent sure that the estimate of the mean is within a desired precision or range of values called a confidence interval. Stated another way, a 95 percent confidence level means that out of 100 samples from the same population, the mean of the population is expected to be contained within the confidence interval in 95 of the 100 samples. |
| Confidence limits |
See Confidence interval. |
| Configuration management |
The process of identifying and defining configurable items in an environment by controlling their release and any subsequent changes throughout the development life cycle; recording and reporting the status of those items and change requests; and verifying the completeness and correctness of configurable items. |
| Consensus |
The agreement of a group with a judgment, decision, or data definition in which the stakeholders have participated and can say, "I can live with it." |
| Consistency |
A measure of information quality expressed as the degree to which a set of data is equivalent in redundant or distributed databases. |
| Constraint |
A business rule that places a restriction on business actions and therefore restrictions the resulting data. For example, "only wholesale customers may place wholesale orders." |
| Contamination |
See Information quality contamination. |
| Control |
The mechanisms used to manage processes to maintain acceptable performance. |
| Control chart |
A graphical device for reporting process performance over time for monitoring process quality performance. |
| Control group |
A selected set of people, objects, or processes to be observed to record behavior or performance characteristics. Used to compare behavior and performance to another group in which changes or improvements have been made. |
| Conversion |
The process of preparing, reengineering, cleansing and transforming data, and loading it into a new target data architecture. |
| Corporate data |
See Enterprise data. |
| Correlation |
A predictive relationship that exists between two factors, such that when one of the factors changes, you can predict the nature of change in the other factor. For example, if information quality goes up, the costs of information scrap and rework go down. |
| Cost of acquisition |
(1) The cost of acquiring a new customer, including identifying, marketing and presales activities to get the first sale. (2) The costs of acquiring products, such as software packages, and services. This should be weighed against the cost of ownership. |
| Cost of information quality assessment |
The costs associated with measurement and quality conformance assurance as a component of the cost of quality information. |
| Cost of nonquality information |
The total costs associated with failure or nonquality information and information services, including, but not limited to reruns, rework, downstream data verification, data correction, data transformation to nonstandard definition or format, work arounds. |
| Cost of ownership |
The total costs of ownership of products, such as software packages, and services, including planning, acquiring, process redesign, implementation, and support required for the successful use of the product or service. |
| Cost of quality information |
The total costs associated with providing nonquality information or information services. The costs consists of costs of failure or nonquality information plus the costs of assessment and conformance plus the costs of information process improvement and data defect prevention. |
| Cost of retention |
The cost of managing customer relationships that result in subsequent sales to existing customers. |
| Coverage |
See Completeness. |
| Critical information |
Information that if missing or wrong can cause enterprise-threatening loss of money, life, or liability, such as failure to properly calculate pension withholding, not setting the airplane flaps correctly for take-off, or prescribing the wrong drug. |
| Cross-functional |
The characteristic of data or process that is of interest to more than one business or functional area. |
| Currency |
A characteristic of information quality measuring the degree to which data represents reality from the required point in time. For example, one information view may require data currency to be the most up-to-date point, such as stock prices for stock trades, while another may require data to be the last stock price of the day, for stock price running average. |
| Customer |
The persons or organizations whose needs the enterprise must meet, and whose satisfaction with its products and services, including information, determines enterprise success or failure. |
| Customer life cycle |
The states of existence and relative time periods of a typical customer from being a prospect to becoming an active customer, to becoming nonactive and a "former" customer. |
| Customer lifetime revenue |
The net present value of the average customer revenue over the life of relationship with the enterprise. |
| Customer lifetime value (LTV) |
The net present value of the average profit of a typical customer over the life of relationship with the enterprise. |
| Customer segment |
A meaningful aggregation of customers for the purpose of marketing or determining customer lifetime value. |
| Customer-supplier relationship |
See Information customer-supplier relationship. |
| CUSUM |
Abbreviation for Cumulative Summation, a more sensitive method for detecting out-of-control measurements than a simple control chart. The CUSUM indicates when a process has been off aim for too long a period of time. |
| Cycle time |
The time required for a process (or subprocess) to execute from start to completion. |
| d |
A symbol representing the set of deviations of a set of items from the mean of the set of items, expressed as d = x - x bar for each value of x |
| Data |
1) Symbols, numbers or other representation of facts; 2) The raw material from which information is produced when it is put in a context that gives it meaning. See also Information. |
| Data administration |
See Data management. |
| Data administrator |
One who manages or provides data administration functions. |
| Data analyst |
One who identifies data requirements, defines data, and synthesizes it into data models. |
| Data architect |
One who is responsible for the development of data models. |
| Data audit |
See Information quality assessment. |
| Data cleansing |
An information scrap-and-rework process to correct data errors in a collection of data in order to bring the level of quality to an acceptable level to meet the information customers’ needs. |
| Data cleanup |
See Data cleansing. |
| Data consistency assessment |
The process of measuring data equivalence and information float or timeliness in an interface-based information value chain. |
| Data content quality |
The subset of information quality referring to the quality of data values. |
| Data defect prevention |
The process of information process improvement to eliminate or minimize the possibility of data errors from getting into an information product or database. |
| Data definition |
The specification of the meaning, valid values or ranges (domain), and business integrity rules for an entity type or attribute. Data definition includes name, definition, and relationships, as well as domain value definition and business rules that govern business actions that are reflected in data. These components represent the "information product specification" components of Information Resource Data or meta data. |
| Data Definition Language (DDL) |
The language used to describe database schemas or designs. |
| Data definition quality |
A component of information quality measuring the degree to which data definition accurately, completely, and understandably defines what the information producers and knowledge workers should know in order to perform their job processes effectively. Data definition quality is a measure of the quality of the information product specification. |
| Data dictionary |
A repository of information (meta data) defining and describing the data resource. A repository containing meta data. An active data dictionary, such as a catalog, is one that is capable of interacting with and controlling the environment about which it stores information or meta data. An integrated data dictionary is one that is capable of controlling the data and process environments. A passive data dictionary is one that is capable of storing meta data or data about the data resource, but is not capable of interacting with or controlling the computerized environment external to the data dictionary. See also Repository. |
| Data dissemination |
The distribution of a copy or extract of information in any form, from electronic to paper from a database or data source to other parties. This is NOT to be confused with data or information sharing. (Q) |
| Data element |
The smallest unit of named data that has meaning to a knowledge worker. A data element is the implementation of an attribute. Synonymous with data item and field. |
| Data flow diagram |
A graphic representation of the "flow" of data through business functions or processes. It illustrates the processes, data stores, external entities, data flows, and their relationships. |
| Data independence |
The property of being able to change the overall logical or physical structure of the data without changing the application program's view of the data. |
| Data intermediary |
See Data scribe. |
| Data intermediation |
The design of and performance of processes in which the actual creator or originator of knowledge does not capture that knowledge electronically, but gives it in paper or other form to be entered into a database by someone else. |
| Data management |
The management and control of data as an enterprise asset. It includes strategic information planning, establishing data-related standards, policies, and procedures, and data modeling and information architecture. Also called data administration. |
| Data Manipulation Language (DML) |
The language used to access data in one or more databases. |
| Data mart |
A subset of enterprise data along with software to extract data from a data warehouse or operational data store, summarize and store it, and to analyze and present information to support trend analysis and tactical decisions and processes. The scope can be that of a complete data subject such as Customer or Product Sales, or of a particular business area or line of business, such as Retail Sales. A data mart architecture, whether subject or business area, must be an enterprise-consistent architecture. |
| Data mining |
The process of analyzing large volumes of data using pattern recognition or knowledge discovery techniques to identify meaningful trends and relationships represented in data in large databases. |
| Data model |
A logical map or representation of real-world objects and events that represents the inherent properties of the data independently of software, hardware, or machine performance considerations. The model shows data attributes grouped into third normal form entities, and the relationships among those entities. |
| Data presentation quality |
A component of information quality measuring the degree to which information-bearing mechanisms, such as screens, reports, and other communication media, are easy to understand, efficient to use, and minimize the possibility of mistakes in its use. |
| Data quality |
See Information quality. |
| Data quality assessment |
See Information quality assessment. |
| Data reengineering |
The process of analyzing, standardizing, and transforming data from unarchitected or nonstandardized files or databases into an enterprise-standardized information architecture. |
| Data replication |
The controlled process of propagating equivalent data values from a source database to one or more duplicate copies in other databases. |
| Data resource management |
See Information resource management. |
| Data scribe |
A role in which individuals transcribe data in one form, such as a paper document, to another form, such as into a computer database; for example, a data entry clerk entering data from a paper order form into a database. |
| Data store |
Any place in a system where data is stored. This includes manual files, machine-readable files, data tables, and databases. A data store on a logical data flow diagram is related to one or more entities in the data model. |
| Data transformation |
The process of defining and applying algorithms to change data from one form or domain value set to another form or domain value set in a target data architecture to improve its value and usability for the information stakeholders. |
| Data type |
An attribute of a data element or field that specifies the DBMS type of physical values, such as numeric, alphanumeric, packed decimal, floating point, or datetime. |
| Data value |
A specific representation of a fact for an attribute at a point in time. |
| Data visualization |
Graphical presentation of patterns and trends represented by data relationships. |
| Data warehouse |
A collection of software and data organized to collect, cleanse, transform, and store data from a variety of sources, and analyze and present information to support decision-making, tactical and strategic business processes. |
| Data warehouse audits and controls |
A collection of checks and balances to assure the extract, cleansing, transformation, summarization, and load processes are in control and operate properly. The controls must assure the right data is extracted from the right sources, transformed, cleansed, summarized correctly, and loaded to the right target files. |
| Data-driven development |
See Value-centric development. |
| Database administration |
The function of managing the physical aspects of the data resource, including physical database design to implement the conceptual data model; and database integrity, performance, and security |
| Database integrity |
The characteristic of data in a database in which the data conforms to the physical integrity constraints, such as referential integrity and primary key uniqueness, and is able to be secured and recovered in the event of an application, software, or hardware failure. Database integrity does not imply data accuracy or other information quality characteristics not able to be provided by the DBMS functions. |
| Database marketing |
The use of collected and managed information about one's customers and prospects to provide better service and establish long-term relationships with them. Database marketing involves analyzing and designing pertinent customer information needs, collecting, maintaining, and analyzing that data to support mass customization of marketing campaigns to decrease costs, improve response, and to build customer loyalty, reduce attrition, and increase customer satisfaction. |
| Database server |
The distributed implementation of a set of database management functions in which one dedicated collection of database management functions, accessing one or more databases on that mode, serves multiple knowledge workers or clients that provide a human-machine interface for the requesting of a creation of data. |
| DDL |
Acronym for Data Definition Language. |
| Decision Support System (DSS) |
Applications that use data in a free-form fashion to support managerial decisions by applying ad hoc query, summarization, trend analysis, exception identification, and "what-if" questions. |
| Defect |
An item that does not conform to its quality standard or customer expectation. |
| Defect Prevention Software |
Software that enables the identification and elimination of information quality problems at the electronic source of data capture, such as nonconformance to all business rules or the identification of potential duplicate records, or the non-uniqueness of primary identifiers such as tax-it numbers. (IQ) |
| Defect rate |
See Error rate. |
| Definition conformance |
The characteristic of data, such that the data values represent a fact consistent with the agreed-upon definition of the attribute. For example, a value of "6/7/1997" actually represents the "Order Date : the date an order is placed by the customer," and not the system date created when the order is entered into the system. |
| Delphi approach |
An approach used to achieve consensus, that involves individual judgments made independently, group discussion of the rationales for disparate judgments, and a consensus judgment being agreed upon by the participants. |
| Demography |
The study of human populations, especially with reference to size, density, distribution and other vital statistics. |
| Derived data |
Data that is created or calculated from other data within the database or system. |
| Deviation (d) |
The difference in value of an item in a set of items and the mean (x bar) of the set as expressed in the formula d = x-xbar, where d = deviation, x = the value of an item in a set, and xbar is the mean or average of all items in the set. |
| Devil's advocate |
A technique used in decision making in which someone plays the role of challenging the predominant position in order to expose potential flaws, influence critical thinking and prevent biased and potentially harmful decisions. |
| DFD |
Acronym for Data Flow Diagram. |
| DIF |
Acronym for Data Interchange Format. |
| Dimension |
A category for summarizing or viewing data (e.g., time period, product, product line, geographic area, and organization). |
| Directory |
A table, block, index, or folder containing addresses and locations or relationships of data or files and used as a way of organizing files. |
| Discount rate |
The market rate of interest representing the cost to borrow money. This rate may be applied to future income to calculate its net present value. |
| DMAIC |
Acronym for Define-Measure-Analyze-Improve-Control, the Six Sigma method for process improvement. |
| DML |
Acronym for Data Manipulation Language. |
| Domain |
(1) Set or range of valid values for a given attribute or field, or the specification of business rules for determining the valid values. (2) The area or field of reference of an application or problem set. |
| Domain value redundancy |
A dysfunctional characteristic of an attribute or field in which the same fact of information is represented by more than one value. For example, unit of measure code having domain values of "doz," "dz," and "12" may all represent the fact that the unit of measure is "one dozen." |
| Domain chaos |
A dysfunctional characteristic of an attribute or field in which multiple types of facts are represented by more. For example, unit of measure code for one product has a domain value of "doz," to represent a unit of measure of "one dozen," while for another product unit of measure code has a value of "150," to represent a the reorder point quantity. |
| Domain type |
A general classification that characterizes the kind of values that may be values of a specific attribute, such as a number, date, currency amount, or percent. The domain type name may be used as a component of an attribute name. Also called a class word. |
| Drill down |
The process of accessing more detailed data from summary data to identify exceptions and trends. May be multitier. |
| Drill through |
The process of accessing the original source data from a replicated or transformed copy to verify equivalence to the record-of-origin data. |
| DSS |
Acronym for Decision Support Systems. |
| E-commerce |
Acronym for electronic commerce, the conducting of business transactions over the Internet (I-Net). |
| EDI |
Acronym for Electronic Data Interchange. |
| Edit and validatation |
The process of assuring data being created conforms to the governing business rules and is correct. Database integrity controls and software routines can edit and validate conformance to business rules. Information producers must validate correctness of data. |
| EIS |
Acronym for Executive Information System. |
| Empty value |
A data element that has no value has been capture, and for which the real-world object represented has no corresponding value. For example, there is no date value for the data element, "Last date of service" for an active Employee. Contrast with Missing value. (Stat, Q) |
| End-consumer |
The persons or organizations whose needs a product or service provider must meet, and whose satisfaction with its products and services, including information, determines enterprise success or failure. A customer may be a direct, immediate Customer or the End-consumer of the product or service. |
| Enterprise data |
The data of an organization or corporation that is owned by the enterprise and managed by a business area. Characteristics of corporate data are that it is essential to run the business and/or it is shared by more than one organizational unit within the enterprise. |
| Entity integrity |
The assurance that a primary key value will identify no more than one occurrence of an entity type, and that no attribute of the primary key may contain a null value. Based on this premise, the real-world entities are uniquely distinguishable from all other entities. |
| Entity life cycle |
The phases, or distinct states, through which an occurrence of an object moves over a definable period of time. The subtypes of an entity that are mutually exclusive over a given time frame. Also referred to as entity life history and state transition diagram. |
| Entity Relationship Diagram (ERD) |
See Entity relationship model. |
| Entity relationship model |
A graphical representation illustrating the entity types and the relationships of those entity types of interest to the enterprise. |
| Entity subtype |
A specialized subset of occurrences of a more general entity type, having one or more different attributes or relationships not inherent in the other occurrences of the generalized entity type. For example, an hourly employee will have different attributes from a salaried employee, such as hourly pay rate and monthly salary. |
| Entity supertype |
A generalized entity in which some occurrences belong to a distinct, more specialized subtype. |
| Entity type |
A classification of the types of real-world objects (such as person, place, thing, concept, or events of interest to the enterprise) that have common characteristics. Sometimes the term entity is used as a short name. |
| Entity/process matrix |
A matrix that shows the relationships of the processes, identified in the business process model, with the entity types identified in the information model. The model illustrates which processes create, update, or reference the entity types. |
| Equivalence |
A characteristic of information quality that measures the degree to which data stored in multiple places is conceptually equal. Equivalence indicates the data has equal values or is in essence the same. For example, a value of "F" for Gender Code for J. J. Jones in database A and a value of "1" for Sex Code for J. J. Jones in database B mean the same thing : J. J. Jones is female. The measure equivalence is the percent of fields in records within one data collection that are semantically equivalent to their corresponding fields within another data collection or database. Also called semantic equivalence. |
| ERD |
Acronym for Entity Relationship Diagram. |
| Error cause removal |
Elimination of cause(s) of error in a way that prevents recurrence of the error. |
| Error event |
A measure of the frequency that errors occur in a process. Also called failure rate (in manufactured products), or defect rate. |
| Error rate |
A measure of the frequency that errors occur in a process. Also called failure rate (in manufactured products), or defect rate. |
| Event |
An occurrence of something that happens that is of interest to the enterprise. |
| Executive Information System (EIS) |
A graphical application that supports executive processes, decisions, and information requirements. Presents highly summarized data with drill-down capability, and access to key external data. |
| Expert system |
(1) A specific class of knowledge base system in which the knowledge, or rules, are based on the skills and experience of a specific expert or group of experts in a given field. (2) A branch of artificial intelligence. An expert system attempts to represent and use knowledge in the same way a human expert does. Expert systems simulate the human trait of thinking. |
| Export |
The function of extracting information from a repository or database and packaging it to an export/import file. |
| Extensibility |
The ability to dynamically augment a database (or data dictionary) schema with knowledge worker-defined data types. This includes addition of new data types and class definitions for representation and manipulation of unconventional data such as text data, audio data, image data, and data associated with artificial intelligence applications. |
| Fact |
(1) Something that is known or needs to be known. (2) In data warehousing, a specific numerical sum that represents a key business performance measure. |
| Fact completeness |
See Completeness. |
| Fact table |
The primary table in dimensional modeling that contains key business measurements. The facts are viewed by various Dimensions. See also Enterprise fact. |
| Failure cost |
See Costs of nonquality information. |
| Failure mode |
(1) The precipitating defect or mechanism that causes a failure. (2) The result or consequence of a failure or the manifestation of a failure. (3) The way in which a failure occurs and its impact on the normal process. |
| Failure model analysis (FMA) |
A procedure to determine the precipitating cause or symptoms that occur just before or after a process failure. The procedure analyses failure mode data from current and previous process designs with a goal to define improvements to prevent recurrence of failure. See also Information process improvement. |
| Failure rate |
A measure of the frequency that defective items are produced by a process; hence, the frequency with which the process fails. See also Error rate. |
| False Negative |
(1) In quality measurement, the condition of measuring a value for accuracy (or validity) and finding it to be not accurate (or not valid) when it is accurate (or valid). (2) In record matching, the condition of failing to identify that two records represent the same real world object. |
| False positive |
(1) In quality measurement, the condition of measuring a value for accuracy (or validity) and finding it to be accurate (or valid) when it is not. (2) In record matching, the condition of incorrectly identifying that two records represent the same real world object, when they actually represent two unique real world objects. |
| Feedback loop |
A formal mechanism for communicating information about process performance and information quality to the process owner and information producers. |
| Field |
A data element or data item in a data structure or record. |
| Fifth Normal Form (5NF) |
(1) A relation R is in fifth normal form (5NF) (also called Projection Join Normal Form (PJ/NF)) if and only if every join dependency in R is a consequence of the candidate keys of R. (2) A table is in 5NF if a relation or record in which all elements within a concatenated key are independent of each other and cannot be derived from the remainder of the key. |
| File integrity |
The degree to which documents in a file retain their original form and utility (i.e., no misfiled or torn documents). |
| Filter |
See Information quality measure. |
| First Normal Form (1NF) |
(1) A relation R is in first normal form (1NF) if and only if all underlying domains contain atomic values only. (2) A table is in 1NF if it can be represented as a two-dimensional table, and for every attribute there exists one single meaningful and atomic value, never a repeating group of values. |
| Fishbone diagram |
See Cause-and-effect diagram. |
| Flexibility |
A characteristic of information quality measuring the degree to which the information architecture or database is able to support organizational or process reengineering changes with minimal modification of the existing objects and relationships, only adding new objects and relationships. |
| FMA |
See Failure mode analysis. |
| Focus group |
A facilitated group of customers that evaluates a product or service against those of competitors, in order to clearly define customer preferences and quality expectations. |
| Foolproofing |
Building edit and validation routines in application programs or procedures to reduce inadvertent human error. |
| Foreign key |
A data element in one entity (or relation) that is the primary key of another entity that serves to implement a relationship between the entities. |
| Fourth Normal Form (4NF) |
(1) A relation R is in fourth normal form (4NF) if and only if, whenever there exists an MVD in R, say A ->-> B, then all attributes of R are also functionally dependent upon A. In other words, the only dependencies (FDs or MVDs) in R are of the form K -> X (i.e., a functional dependency from a candidate K to some other attribute X). Equivalently, R is in 4NF if it is in BCNF and all MVDs in R are in fact FDs. (2) A table is in 4NF if no row of the table contains two or more independent multivalued facts about an entity. |
| Frequency distribution |
The relation number of occurrences of values of an attribute, including a graphic representation of that "distribution" of values. |
| Functional dependence |
The degree to which an attribute is an inherent characteristic of an entity type. If an attribute is an inherent characteristic of an entity type, that attribute is fully functionally dependent on any candidate key of that entity type. See Normal form. |
| Generalization |
The process of aggregating similar types of objects together in a less-specialized type based upon common attributes and behaviors. The identification of a common supertype of two or more specialized (sub)types. See also Specialization. |
| Heuristics |
A method or rule of thumb for obtaining a solution through inference or trial-and-error using approximate methods while evaluating progress toward a goal. |
| Hidden complaint |
An unhappy customer who has a complaint about a product or service, but who does NOT tell the provider organization. |
| Highly summarized |
Data that is summarized to more than two hierarchies of summarization from the base detail data. Highly summarized data may have lightly summarized data as its source. |
| Holding the gain |
Putting in place controls in a process that has been improved to maintain the quality level achieved by the improvement. |
| Homonym |
A word or phrase that has the same spelling or sounds the same, but has a different meaning. |
| Hoshin planning (Hoshin Kanri) |
Also known as Policy Management or Policy Deployment, is a management technique developed in Japan by combining Management by Objectives and the Plan-Do-Check-Act (PDCA) improvement cycle. Hoshin planning provides a planning, implementation and review process to align business strategy and daily operations through total employee participation to achieve business objectives and breakthrough improvements. |
| House of quality |
A mapping of customer quality expectations in product or service to the quality measures of the product or service to summarize all expectations and the work to meet them. |
| Human error |
An action performed by a person that is wholly expected to have a positive or satisfactory outcome, but that does not. (Ben Marguglio). Human error is NOT a root cause of defects, rather, human error is predictable, manageable, and human error is preventable. |
| Human factors |
Static constraints related to human ergonomic and cognitive limitations. |
| Hypermedia |
The convergence of hypertext and multimedia. |
| Hypertext |
The ability to organize text data in logical chunks or documents that can be accessed randomly via links as well as sequentially. |
| Hypothetical reasoning |
Hypothetical reasoning is a problem-solving approach that explores several different alternative solutions in parallel to determine which approach or series of steps best solves a particular problem. It is useful in business planning or optimization problems, where solutions vary according to cost or where numerous solutions may be feasible. |
| Identifier |
One or more attributes that uniquely locate an occurrence of an entity type. conceptually synonymous with primary key. |
| In control |
The state of a process characterized by the absence of special causes of variation. Processes in control produce consistent results within acceptable limits of variation. |
| Inadvertent error |
Error introduced unconsciously; for example, when a data intermediary unwittingly transposes values or skips a line in data entry. See also Intentional error. |
| Incremental load |
The propagation of changed data to a target database or data warehouse in which only the data that has been changed since the last load is loaded or updated in the target. |
| Informate |
A term coined by Shoshona Zuboff in The Age of The Smart Machine to described the benefit of information technology when used to capture knowledge about business events so that the knowledge can "informate" other knowledge workers to more intelligently perform their jobs. |
| Information |
1) Data in context, i.e., the meaning given to data or the interpretation of data based on its context; 2) the finished product as a result of processing, presentation and interpretation of data. |
| Information architecture |
A "blueprint" of an enterprise expressed in terms of a business process model, showing what the enterprise does; an enterprise information model, showing what information resources are required; and a business information model, showing the relationships of the processes and information. |
| Information architecture quality |
A component of information quality measuring the degree to which data models and database design are stable, flexible, and reusable, and implement principles of data structure integrity. |
| Information assessment |
See Information quality assessment. |
| Information chaos |
A state of the dysfunctional learning organization in which there are unmanaged, inconsistent, and redundant databases that contain data about a single type of thing or fact. The information chaos quotient is the number of unmanaged, inconsistent, and redundant databases containing data about a single type of thing or fact. |
| Information chaos quotient |
The count of the number of unmanaged, inconsistent, and redundant databases containing data about a single type of thing or fact. |
| Information customer-supplier relationship |
The information stakeholder partnerships between the information producers who create information and the knowledge workers who depend on it. |
| Information directory |
A repository or dictionary of the information stored in a data warehouse, including technical and business meta data, that supports all warehouse customers. The technical meta data describes the transformation rules and replication schedules for source data. The business meta data supports the definition and domain specification of the data. |
| Information float |
The length of the delay in the time a fact becomes known in an organization to the time in which an interested knowledge worker is able to know that fact. Information float has two components : Manual float is the length of the delay in the time a fact becomes known to when it is first captured electronically in a potentially sharable database. Electronic float is the length in time from when a fact is captured in its electronic form in a potentially sharable database, to the time it is "moved" to a database that makes it accessible to an interested knowledge worker. |
| Information group |
A relatively small and cohesive collection of information, consisting of 20-50 attributes and entity types, grouped around a single subject or subset of a major subject. An information group will generally have one or more subject matter experts and several business roles that use the information. |
| Information life cycle |
See Information value/cost chain. |
| Information Management (IM) |
The function of managing information as an enterprise resource, including planning, organizing and staffing, leading and directing, and controlling information. Information management includes managing data as the enterprise knowledge infrastructure and information technology as the enterprise technical infrastructure, and managing applications across business value chains. |
| Information model |
A high-level graphical representation of the information resource requirements of an organization showing the information classes and their relationships. |
| Information myopia |
A disease that occurs when knowledge workers can see only part of the information they need, caused by not defining data relationships correctly or not having access to data that is logically related because it exists in multiple nonintegrated databases. |
| Information policy |
A statement of important principles and guidelines required to effectively manage and exploit the enterprise information resources. |
| Information presentation quality |
The characteristic in which information is presented, whether in a report or document, on a screen, in forms, orally or visually, in a manner to communicate clearly to the recipient knowledge worker to facilitate understanding and enabling taking the right action or making the right decision. |
| Information preventive maintenance |
establishing processes to control the creation and maintenance of volatile and critical data to keep it maintained at the highest level feasible, possibly including validating volatile data on an appropriate schedule and assessment of that data before critical processes use it. |
| Information process improvement |
The process of improving processes to eliminate data errors and defects. This is one component of data defect prevention. Information process improvement is proactive information quality. |
| Information producer |
The role of individuals in which they originate, capture, create, or update data or knowledge as a part of their job function or as part of the process they perform. Information producers create the actual information content and are accountable for its accuracy and completeness to meet all information stakeholders' needs. See also Data intermediary. |
| Information product improvement |
The process of data cleansing, reengineering, and transformation required to improve existing defective data up to an acceptable level of quality. This is one component of information scrap and rework. See also Data cleansing, Data reengineering, and Data transformation. Information product improvement is reactive information quality. |
| Information product specifications |
The set of information resource data (meta data) characteristics that define all characteristics for a process and creating/updating applications can produce quality information. Information product specification characteristics include : data name, definition, domain or data value set (code values or ranges) and the business rules that identify policies and constraints on the potential values. These specifications must be understandable to the information producers who create and maintain the data and the knowledge workers who apply the data in their work. |
| Information quality |
Consistently meeting all knowledge worker and end-customer expectations in all the characteristics of the information products and services they deem important. The degree to which information consistently meets the requirements and expectations of all knowledge workers who require it to perform their processes. |
| Information quality assessment |
The random sampling of a data collection and measuring it against various quality characteristics, such as accuracy, completeness, validity, nonduplication or timeliness to determine its level of quality or reliability. Also called data quality assessment or data audit. |
| Information quality characteristic |
An aspect of information that an information customer deems important in order to be considered "quality information." Characteristics include completeness, accuracy, timeliness, understandability, objectivity and presentation clarity, among others. |
| Information quality contamination |
The creation of inaccurate derived data by combining accurate data with inaccurate data. |
| Information quality decay |
The characteristic of data such that formerly accurate data will become not accurate over time because the characteristic about the real world object will change without a corresponding update to the data applied. For example, John Doe's marital status value of "single" in a database is subject to information quality decay and will become inaccurate the moment he becomes married. |
| Information quality decay rate |
The rate, usually expressed as a percent per year, at which the accuracy of a data collection will deteriorate over time if no data updates are applied, (e.g., (1) person age decay rate is 100% within one year, decaying at a rate of approximately 1.9% per week; (2) if 17% of a population moves annually, the annual decay rate of address is 17%). |
| Information quality management |
The function that leads the organization to improve information quality by implementing processes to measure, asses costs of, improve and control information quality, and by providing guidelines, policies, and education for information quality improvement. |
| Information quality measure(s) |
A specific quality measure or test (set of measures or tests) to assess information quality. For example, Product Id will be tested for uniqueness, Customer records will be tested for duplicate occurrences, Customer address will be tested to assure it is the correct address, Product Unit of Measure will be tested to be a valid Unit of Measure domain code, and Order Total Price Amount will be tested to assure it has been calculated correctly. Quality measures will be assessed using business rule tests in automated quality analysis software, coded routines in internally developed quality assessment programs, or in physical quality assessment procedures. Some call information quality measures filters or metrics. |
| Information Resource Management (IRM) |
(1) The application of generally accepted management principles to data as a strategic business asset. (2) The function of managing data as an enterprise resource. This generally includes operational data management or data administration, strategic information management, repository management, and database administration. See also Information management. (3) The organization unit responsible for providing principles and processes for managing the information assets of the enterprise. |
| Information scrap and rework |
The activities and costs required to cleanse or correct nonquality information, to recover from process failure caused by nonquality information, or to rework or work around problems caused by missing or nonquality information. Analogous to manufacturing scrap and rework. |
| Information stakeholder |
Any individual who has an interest in and dependence on a set of data or information. Stakeholders may include information producers, knowledge workers, external customers, and regulatory bodies, as well as various information systems roles such as database designers, application developers, and maintenance personnel. |
| Information steward |
A role in which an individual has accountability for the quality of some part of the information resource. See Information stewardship. |
| Information stewardship |
Accountability for the quality of some part of the information resource for the well-being of the larger organization. Every individual within an organization holds one or more information stewardship roles, based on the nature of their job and its relationship to information, such as creating information, applying it, defining it, modeling it, developing a computer screen to display it or moving it from one database or file to another. See Strategic information steward, Managerial information steward, and Operational information steward. |
| Information stewardship agreement |
A formal agreement among business managers specifying the quality standard and target date for information produced in one business area and used in one or more other business areas. |
| Information value |
The measure of importance of information expressed in tangible metrics. Information has realized and potential value. Realized value is the actual value derived from information applied by knowledge workers in the accomplishment of the business processes. Potential value is the future value of information that could be realized if applied to business processes in which the information is not currently used. |
| Information value/cost chain |
The end-to-end set of processes and data stores, electronic and otherwise, involved in creating, updating, interfacing, and propagating data of a specific type from its origination to its ultimate data store, including independent data entry processes, if any. |
| Information view |
A knowledge worker's perceived relationship of the data elements needed to perform a process, showing the structure and data elements required. A process activity has one and only one information view. |
| Information view model |
A local data model derived from an enterprise model to reflect the specific information required for one business area or function, one organization unit, one application or system, or one business process. |
| Intentional error |
Error introduced consciously. For example, an information producer required to enter an unknown fact like birth date, enters his or her own or some "coded" birth date used to mean "unknown." See also Inadvertent error. |
| Interface program |
An application that extracts data from one database, transforms it, and loads it into a non-controlled redundant database. Interface programs represent one cost of information scrap and rework in that the information in the first database is not "able" to be used from that source and must be "reworked" for another process or knowledge worker to use. |
| Interfaceation |
The technique of supposedly "integrating" application systems by developing "interface programs" or middleware to extract data in one format from a data source and transform it to another format for a data target rather than by standardizing the data definition and format. |
| Internal view |
The physical database design or schema in the ANSI 3-schema architecture. |
| IRM |
Acronym for Information Resource Management. |
| ISO |
Acronym for International Standards Organization. A European body founded in 1946 to set international standards in all engineering disciplines, including information technology. Its members are national standards bodies; for example, BSI (British Standards Institute). ISO approves standards, including OSI communications protocols and ISO 9000 standards. |
| ISO 9000 |
International standards for quality management specifying guidelines and procedures for documenting and managing business processes and providing a system for third-party certification to verify those procedures are followed in actual practice. |
| Knowledge |
Information context; understanding of the significance of information. |
| Knowledge base |
(1) That part of a knowledge base system in which the rules and definitions used to build the application are stored. The knowledge base may also include a fact or object storage facility. (2) A database where the codification of knowledge is kept; usually a set of rules specified in an if . . . then format. |
| Knowledge base system |
A software system whose application-specific information is programmed in the form of rules and stored in a specific facility, known as the knowledge base. The system uses Artificial Intelligence (AI) procedures to mimic human problem-solving techniques, applying the rules stored in the knowledge base and facts supplied to the system to solve a particular business problem. |
| Knowledge error |
Information quality error introduced as a result of lack of training or expertise. |
| Knowledge worker |
The role of individuals in which they use information in any form as part of their job function or in the course of performing a process, whether operational or strategic. Also referred to as an information consumer or customer. Accountable for work results created as a result of the use of information and for adhering to any policies governing the security, privacy, and confidentiality of the information used. |
| Legacy data |
Data that comes from files and/or databases developed without using an enterprise data architecture approach. |
| Legacy systems |
Systems that were developed without using an enterprise data architecture approach. |
| Lifetime value (LTV) |
See Customer lifetime value. |
| Lightly summarized |
Data that is summarized only one or two levels of hierarchy of summary from the base detailed data. |
| Load |
To sequentially add a set of records into a database or data warehouse. See also Incremental load. |
| Lock |
A means of serializing events or preventing access to data while an application or information producer may be updating that data. |
| Log |
A collection of records that describe the events that occur during DBMS execution and their sequence. The information thus recorded is used for recovery in the event of a failure during DBMS execution. |
| Lower control limit |
The lowest acceptable value or characteristic in a set of items deemed to be of acceptable quality. Together with the upper control limit, it specifies the boundaries of acceptable variability in an item to meet quality specifications. |
| LTV |
Acronym for Customer Lifetime Value. |
| Managerial information steward |
The role of accountability a business manager or process owner has for the quality of data produced by his or her processes. |
| Managerial information stewardship |
The fact that a business manager or process owner who has accountability for one or more business processes also has accountability for the integrity of the data produced by those processes. |
| MDDB |
Acronym for Multidimensional Database. |
| Mean |
The average of a set of values, usually calculated to one place of decimals more than the original data. |
| Measure |
A metric or characteristic of information quality, such as percent of accuracy or average information float, to be assessed. |
| Measurement curve bundle |
The collection of measurement points of a real-world attribute that represents the variation of values of that attribute in the real world |
| Measurement system |
A collection of processes, procedures, software, and databases used to assess and report information quality. |
| Median |
The middle value in an ordered set of values. If the set contains an even number of values, the median is calculated by adding the middle two values and dividing by 2. |
| Meta data |
See Information Resource data. A term used to mean data that describes or specifies other data. The term has not made its way into either Webster's Unabridged Dictionary or the Oxford Dictionary. The closest term is meta language, defined as "a language used to describe other languages." The term Information Resource data is preferred all the term meta data as a business term. |
| Methodology |
A formalized collection of tools, procedures, and techniques to solve a specific problem or perform a given function. |
| Metric |
(1) See Measure. (2) A fact type in data warehousing, generally numeric (such as sales, budget, and inventory) that is analyzed in different ways or dimensions in decision support analysis. |
| Misinterpretation |
Human error resulting from poor information presentation quality. |
| Missing value |
A data element that has no value has been capture, but for which the real-world object represented has a value. For example, there is no date value for the data element, "last date of service" for a retired Employee whose last day of official employment was June 15, 2002. Contrast with Empty value. |
| Modal interval |
The range interval used to group continuous data values in order to determine a mode. |
| Mode |
The most frequently occurring value in a set of values. |
| Monte Carlo |
A problem-solving technique that uses statistical methods and random sampling to solve mathematical or physical problems. |
| Multidimensional Database (MDDB) |
A database designed around arrays of data that support many dimensions or views of data (such as product sales by time period, geographic location, and organization) to support decision analysis. |
| n |
Algebraic symbol representing the number of items in a set. |
| Negative side effect |
see Side effect |
| Net Present Value (NPV) |
The value of a sum of future money expressed in terms of its worth in today's currency. NPV is calculated by discounting the amount by the discount rate compounded by the number of years between the present and the future date the money is anticipated. |
| NIST |
Acronym for National Institute of Standards and Technology. The U.S. government agency that maintains Federal Information Processing Standards (FIPS). NIST is responsible for administering the Baldrige Quality Award program. |
| Noise |
A term used in data mining to refer to data with missing values (where one does exist in the real world object or event, empty values (where no value exists for the real world object or event), inaccurate values or measurement bias or data that may be inconsequential or misleading in data analysis or data mining. |
| Nonduplication |
A characteristic of information quality measuring the degree to which there are no redundant occurrences of data, in other words, a real world object or event is represented by only one record in a database. (Q) |
| Nonquality data |
Data that is incorrect, incomplete, or does not conform to the data definition specification or meet knowledge workers' expectations. |
| Nonrepudiation |
The ability to provide proof of transmission and receipt of electronic communication. |
| Normal form |
A level of normalization that characterizes a group of attributes or data elements. |
| Normalization |
The process of associating attributes with the entity types for which they are inherent characteristics. The decomposition of data structures according to a set of dependency rules, designed to give simpler, more stable structures in which certain forms of redundancy are eliminated. A step-by-step process to remove anomalies in data integrity caused by add, delete, and update actions. Also called non-loss decomposition. |
| NPV |
Acronym for Net Present Value. |
| Null |
The absence of a data value in a data field or data element. The value may exist for the characteristic of the real world object or event and is missing or unknown, or there may be no value (called "empty") because the characteristic does not exist in the real world object or event. |
| Objectivity |
A characteristic of information quality which measures how well information is presented to the information consumer free from bias that can cause the information consumer to take the wrong action or make the wrong decision (Q). |
| Occurrence |
A specific instance of an entity type. For example, "customer" is an entity type. "John Doe" is an occurrence of the customer entity type. |
| Occurrence of record |
A specific record selected from a group of duplicate records as the authoritative record, and into which data from the other records may be consolidated. Related records from the other duplicate records are re-related to this occurrence of record. |
| OCR |
Acronym for Optical Character Recognition. |
| ODS |
Acronym for Operational Data Store. |
| OLAP |
Acronym for On-Line Analytical Processing. Software technology that transforms data into multidimensional views and that supports multidimensional data interaction, exploration, and analysis. |
| Operational data |
Data at a detailed level used to support daily activities of an enterprise. |
| Operational Data Store (ODS) |
A collection of operation or bases data that is extracted from operation databases and standardized, cleansed, consolidated, transformed, and loaded into a enterprise data architecture. An ODS is used to support data mining of operational data, or as the store for base data that is summarized for a data warehouse. The ODS may also be used to audit the data warehouse to assure the summarized and derived data is calculated properly. The ODS may further become the enterprise shared operational database, allowing operational systems that are being reengineered to use the ODS as their operations databases. |
| Operational information steward |
An information producer accountable for the data created or updated as a result of the processes he or she performs. |
| Optical Character Recognition (OCR) |
The technique by which printed, digitized, or photographed characters can be recognized and converted into ASCII or a similar format. |
| Optimum |
As applied to a quality goal, that which meets the needs of both customer and supplier at the same time, minimizing their combined costs. |
| Outlier |
A sampled item that has a value or characteristics far separated from those of the other items in the sample, indicating a possible anomaly, different population or a bias or error in the sampling technique. |
| Overloaded data element |
A data element that contains more than one type of fact, usually the result of the need to know more types of facts growing faster than the ability to make additions to the data structures. This causes process failure when downstream processes find unexpected data values. |
| Paradigm |
An example or pattern that represents an acquired way of thinking about something that shapes thought and action in ways that are both conscious and unconscious. Paradigms are essential because they provide a culturally shared model for how to think and act, but they can present major obstacles to adopting newer, better approaches. |
| Pareto diagram |
A specialized column chart in which the bars represent defect types and are ordered by frequency, percentage, or impact with the cumulative percentage plotted. This is used to identify the areas needing improvement, from greatest to least. |
| Pareto principle |
The phenomenon that a few factors are responsible for the majority of the result. |
| Parsing |
The electronic analysis of data to break into meaningful patterns or attributes for the purpose of data correction, or record matching, de-duplication and consolidation. |
| Partnership |
The relationship of business personnel and information systems personnel in the planning, requirements analysis, design, and development of applications and databases. |
| PDCA |
Acronym for Plan-Do-Check-Act. |
| Perceived needs |
The requirements that motivate customer action based upon their perceptions. For example, a perceived need of a car purchaser is that a convertible will enhance his or her attractiveness. See also Real needs and Stated needs. |
| Personal data |
Data that is of interest to only one organization component of an enterprise, (e.g., task schedule for a department project). Contrasted with Enterprise data. |
| Physical database design |
Mapping of the conceptual or logical database design data groupings into the physical database areas, files, records, elements, fields, and keys while adhering to the physical constraints of the hardware, DBMS software, and communications network to provide physical data integrity while meeting the performance and security constraints of the services to be performed against the database. |
| Plan-Do-Check/Study-Act (PDC/SA) cycle |
A closed-loop process for planning to solve a problem, implementing suggested improvements, analyzing the results, and standardizing the improvements. Also called a Shewhart cycle after its developer, W. A. Shewhart. |
| Poisson distribution |
A distribution of items that does not have a normal curve, rather the tail on one side of the curve is longer and less populated than the curve on the other side, as it true for a distribution of data records by frequency of error in each record. |
| Poka Yoke |
Japanese for "mistake proofing," a system using control methods (to halt operations) and warning methods (to call attention to defects) that assures immediate feedback and corrective action in a way that assures no defects are allowed to get through without correction. |
| Policy deployment |
See Hoshin planning. |
| Population |
An entire group of items or data that comes from an entire group of items that we wish to measure. |
| Post condition |
A data integrity mechanism in object orientation that specifies an assertion, condition, business rule or guaranteed result that will be true upon completion of an operation or method; else, the operation or method fails. |
| Potential information value |
See Information value. |
| Precision |
A characteristic of information quality measuring the degree to which data is known to the right level of granularity. For example, a percentage value with two decimal points (00.00%) discriminates to the closest 1/100th of a percent. |
| Precondition |
A data integrity mechanism in object orientation that specifies an assertion, condition or business rule that must be true before invoking an operation or method; else, the operation or method cannot be performed. |
| Presentation format |
The specification of how an attribute value or collection of data is to be displayed. |
| Primary key |
The attribute(s) that are used to uniquely identify a specific occurrence of an entity, relation, or file. A primary key that consists of more than one attribute is called a composite (or concatenated) primary key. |
| Prime word |
A component of an attribute name that identifies the entity type the attribute describes. |
| Procedural error |
Error introduced as a result of failure to follow the defined process. |
| Process |
A defined set of activities to achieve a goal or end result. An activity that computes, manipulates, transforms, and/or presents data. A process has identifiable begin and end points. See Business process. |
| Process control |
The systematic evaluation of performance of a process, taking corrective action if performance is not acceptable according to defined standards. |
| Process management |
The process of ensuring that a process is defined, controlled to consistently produce products that meet defined quality standards, improved as required to meet or exceed all customer expectations and optimized to eliminate waste and non-value adding. |
| Process management cycle |
A set of repeatable tasks for understanding customer needs, defining a process, establishing control, and improving the process. |
| Process management team |
A team, including a process owner and staff, to carry out process ownership obligations. |
| Process owner |
The person responsible for the process definition and/or process execution. The process owner is the managerial information steward for the data created or updated by the process, and is accountable for process performance integrity and the quality of information produced. |
| Product |
The output or result of a process. |
| Product satisfaction |
The measure of customer happiness with a product. |
| Psychographics |
Measures of a population based on social, personality and lifestyle behaviors. |
| QFD |
Acronym for Quality Function Deployment. |
| QLF |
See Taguchi quality loss function. |
| Quality |
Consistently meeting or exceeding customers' expectations. |
| Quality assessment |
An independent measurement of product's or service's quality. |
| Quality characteristic |
(1) An identifiable aspect or feature of a product, process or service that a customer deems important in order to be considered a "quality" product or service. (2) A distinct attribute or property of a product, process or service that can be measured for conformance to a specific requirement. See Information quality characteristic. |
| Quality circle |
An ad hoc group formed to correct problems in or to improve a shared process. The goal is an improved work environment and productivity and quality. |
| Quality Function Deployment (QFD) |
The involvement of customers in the design of products and services for the purpose of better understanding customer requirements, and the subsequent design of products and services that better meet their needs on initial product delivery. |
| Quality goal |
See Quality standard. |
| Quality improvement |
A measurable and noticeable improvement in the level of quality of a process and its resulting product. |
| Quality loss function (QLF) |
See Taguchi quality loss function. |
| Quality Management Software |
Software that supports the general functions of quality management, including tracking and documenting the resolution of quality issues and/or generating quality management reports, such as Pareto Diagrams and Statistical Quality Control charts. (IQ) |
| Quality measure |
A metric or characteristic of information quality, such as percent of accuracy or average information float, to be assessed. |
| Quality standard |
A mandated or required quality goal, reliability level, or quality model to be met and maintained. |
| r |
Algebraic symbol represented in the coefficient of correlation. |
| RAD |
Acronym for Rapid Application Development. The set of tools, techniques, and methods that results in at least one-order-of-magnitude acceleration in the time to develop an application with no loss in quality compared to using conventional techniques. |
| RADD |
Acronym for Rapid Data Development. An intensive group process to rapidly develop and define sharable subject area data models involving a facilitator, knowledge workers, and data resource management personnel, using compression planning techniques. |
| Random number generator |
A software routine that selects a number from a range of values in such a way that any number within the range has an equal likelihood of being selected. This may be used to identify which records from a database to select for assessment. |
| Random sampling |
The sampling of a population in which every item in the population is likely to be selected with equal probability. This is also called statistical sampling. See also Cluster sampling, Systematic sampling, and Stratified sampling. |
| Real needs |
The fundamental requirements that motivate customer decisions. For example, a real need of a car customer is the kind of transportation it provides. See also Stated needs and Perceived needs. |
| Realized information value |
See Information value. |
| Reasonability tests |
Edit and validation rules applied to assure that a data value is within an expected range of values or is a realistic value. |
| Record |
A collection of related fields representing an occurrence of an entity type. |
| Record linkage |
The process of matching data records within a database or across multiple databases to match data that represents one real world object or event. Used to identify potential duplicates for "de-duping" (eliminating duplicate occurrences) or consolidation of attributes about a single real world object. |
| Record of origin |
The first electronic file in which an occurrence of an entity type is created. |
| Record of reference |
The single, authoritative database file for a collection of fields for occurrences of an entity type. This file represents the most reliable source of operational data for these attributes or fields. In a fragmented data environment, a single occurrence may have different collections of fields whose record of reference is in different files. |
| Recovery |
Restoring a database to some previous condition or state after system, or device, or program failure. See also Commit. |
| Recovery log |
A collection of records that describe the events that occur during DBMS execution and their sequence. The information thus recorded is used for recovery in the event of a failure during DBMS execution. |
| Recursive relationship |
A relationship or association that exists between entity occurrences of the same type. For example, an organization can be related to another organization as a Department manages a Unit. |
| Reengineering |
A method for radical transformation of business processes to achieve breakthrough improvements in performance. |
| Reference data |
A term used to classify data that is, or should be, standardized, common to and shared by multiple application systems, such as Customer, Supplier, Product, Country, or Postal Code. Reference data tends to be data about permanent entity types and domain value sets to be stored in tables or files, as opposed to business event entity types. |
| Referential integrity |
Integrity constraints that govern the relationship of an occurrence of one entity type or file to one or more occurrences of another entity type or file, such as the relationship of a customer to the orders that customer may place. Referential integrity defines constraints for creating, updating, or deleting occurrences of either or both files. |
| Relationship |
The manner in which two entity or object types are associated with each other. Relationships may be one to one, one to many, or many to many, as determined by the meaning of the participating entities and by business rules. Synonymous with association. Relationships can express cardinality (the number of occurrences of one entity related to an occurrence of the second entity) and/or optionality (whether an occurrence of one entity is a requirement given an occurrence of the second entity). |
| Replication |
See Data replication. |
| Repository |
A database for storing information about objects of interest to the enterprise, especially those required in all phases of database and application development. A repository can contain all objects related to the building of systems including code, objects, pictures, definitions, etc. Acts as a basis for documentation and code generation specifications that will be used further in the systems development life cycle. Also referred to as design dictionary, encyclopedia, object-oriented dictionary, and knowledge base. |
| Requirements |
Customer expectations of a product or service. May be formal or informal, or they may be stated, required or perceived needs. |
| Return on Investment (ROI) |
A statement of the relative profitability generated as a result of a given investment. |
| Reverse engineering |
The process of taking a complete system or database and decomposing it to its source definitions, for the purpose of redesign |
| ROI |
An acronym for Return on Investment. |
| Role type |
A classification of the different roles occurrences of an entity type may play, such as an organization may play the role of a customer, supplier, and/or competitor. |
| Rollback |
The process of restoring data in a database to the state at its last commit point. |
| Root cause |
The underlying cause of a problem or factor resulting in a problem, as opposed to its precipitating or immediate cause. |
| Rule |
A knowledge representation formalism containing knowledge about how to address a particular business problem. Simple rules are often stated in the form : "If then , where is a condition (a test or comparison) and is an action (a conclusion or invocation of another rule)." An example of a rule would be "If the temperature of any closed valve is greater than or equal to 100 degrees Farenheit, then open the valve." |
| Sample |
An item or subset of items, or data about an item or a subset of items that comes from a sampling frame or a population. A sample is used for the purpose of acquiring knowledge about the entire population. |
| Sampling |
The technique of extracting a small number of items or data about those items from a larger population of items in order to analyze and draw conclusions about the whole population. See Random sample, Cluster sampling, Stratified sampling, and Systematic sampling. |
| Sampling frame |
A subset of items, or data about a subset of items of a population from which a sample is to be taken. |
| SC21 |
Acronym for ISO/IEC JTCI Sub-Committee for OSI data management and distributed transaction processing. |
| Schema |
The complete description of a physical database design in terms of its tables or files, columns or fields, primary keys, relationships or structure, and integrity constraints. |
| Scrap and rework |
The activities and costs required to correct or dispose of defective manufactured products. See Information scrap and rework. |
| SDLC |
Acronym for Systems Development Life Cycle. |
| Seamless integration |
True seamless integration is integration of applications through commonly defined and shared information, with managed, replication of any redundant data. False "seamless" integration is use of interface programs to transform data from one applications databases to another applications databases. See "Interfaceation." |
| Second Normal Form (2NF) |
(l) A relation R is in second normal form (2NF) if and only if it is in 1NF and every nonkey attribute is fully functionally dependent on the primary key. (2) A table is in 2NF if each nonidentified attribute provides a fact that describes the entity identified by the entire primary key and not part of it. See Functional dependence. |
| Security |
The prevention of unauthorized access to a database and/or its contents for updating, retrieving, or deleting the database; or the prevention of unauthorized access to applications that have authorized access to databases. |
| Semantic equivalence |
See Equivalence. |
| Sensor |
An instrument that can measure, capture information about or receive input directly from external objects or events. |
| Shewhart cycle |
See Plan-Do-Check-Act cycle. |
| Side effect |
The state that occurs when a change to a process causes unanticipated conditions or results beyond the planned result, such as when an improvement to a process creates a new problem. |
| Sigma (Σ) |
Uppercase Greek letter that stands for the summation of a group of numbers. |
| sigma (σ or s) |
Lowercase Greek letter that stands for standard deviation. The symbol "σ" refers to the standard deviation of an entire population of items. The symbol "s" refers to the standard deviation of a sample of items. |
| Six Sigma |
Six standard deviations used to describe a level of quality in which six standard deviations of the population fall within the upper and lower control limits of quality and in which the defect rate approaches zero, allowing no more than 3.4 defects per million parts. |
| SME |
Acronym for Subject Matter Expert. |
| Source information producer |
The point of origination or creation of data or knowledge within the organization. |
| SPC |
Acronym for Statistical Process Control. |
| Special cause |
A source of unacceptable variation or defect that comes from outside the process or system. |
| Specialization |
The process of aggregating subsets of objects of a type, based upon differing attributes and behaviors. The resulting subtypes specialization inherits characteristics from the more generalized type. |
| Spread |
Describes how much variation there is in a set of items. |
| SQC |
Acronym for Statistical Quality Control. |
| Stability |
A characteristic of information quality measuring the degree to which information architecture or a database is able to have new applications developed to use it with minimal modification of the existing objects and relationships, only adding new objects and relationships. |
| Standard deviation (√ or s) |
A widely used measure of variability that expresses the measure of spread in a set of items. The standard deviation is a value such that approximately 68 percent of the items in a set fall within a range of the mean plus or minus the standard deviation. For data from a large sample of a population of items, the standard deviation ? (standard deviation of a population) or s (standard deviation of a sample) is expressed as :
s = √∑d² ∕ √(n - 1) s (σ) = standard deviation of a sample (population) d = the deviation of any item from the mean or average n = the number of items in the sample ∑ = "the sum of". |
| Standard deviation calculation |
A measure of dispersion of a frequency distribution that is the square root of the arithmetic mean of the squares of the derivation of each of the class frequencies from the arithmetic mean of the frequency distribution. Also a similar quantity found by dividing by one less than the number of squares in the sum of squares instead of taking the arithmetic mean. |
| State |
A stage in a life cycle of an object class in which an entity occurrence or object may exist at a point in time. Transition to a state is triggered by an event. The state of an object is represented by the values of its attributes at a point in time and determines future behavior of the object. |
| State transition diagram |
A representation of the various states of an entity or object along with the triggering events. See also Entity life cycle. |
| Stated needs |
Requirements as seen from the customers' viewpoint, and as stated in their language. These needs may or may not be the real requirements. See also Real needs and Perceived needs. |
| Statistical control chart |
See Control chart. |
| Statistical Process Control (SPC) |
See Statistical quality control. |
| Statistical Quality Control (SQC) |
Processes and methods for measuring process performance, identifying unacceptable variance, and applying corrective actions to maintain acceptable process control. Also called statistical process control. |
| Stored procedure |
A precompiled routine of code stored as part of a database and callable by name. |
| Strategic information steward |
The role a senior manager holds as being accountable for a major information resource of subject, authorizes business information stewards and resolves business rule issues. |
| Stratified sampling |
Sampling a population that has two or more distinct groupings, or strata, in which random samples are taken from each stratum to assure the strata are proportionately represented in the final sample. |
| Subject area |
See Business resource category. |
| Subject database |
A physical database built around a subject area. |
| Subject Matter Expert (SME) |
A business person who has significant experience and knowledge of a given business subject or function. |
| Suboptimization |
The phenomenon such that the accomplishment of departmental goals minimizes the ability to accomplish the enterprise goals. |
| Subtype |
See Entity subtype. |
| Supertype |
See Entity supertype. |
| Synchronization |
The process of making data equivalent in two or more redundant databases. |
| Synchronous replication |
Replication in which all copies of data must be updated before the update transaction is considered complete. This requires two-phase commit. |
| Synonym |
A word, phrase, or data value that has the same or nearly the same meaning as another word, phrase or data value. |
| System log |
Audit trail of events occurring within a system (e.g., transactions requested, started, ended, accessed, inspected, and updated). |
| System of record |
See Record of reference. The term system of record is meaningless when defining the authoritative record in an integrated, shared data environment where data may be updated by many different application systems within a single database. |
| Systematic sampling |
Sampling of a population using a technique such as selecting every eleventh item, to ensure an even spread of representation in the sample. |
| Systems approach |
The philosophy of developing applications as vertical functional projects independent of how they fit within the larger business value chain. This approach carves out an application development project into a standalone project and does not attempt to define data to be shared across the business value chain or to meet all information stakeholder needs. |
| Systems Development Life Cycle (SDLC) |
The methodology of processes for developing new application systems. The phases change from methodology to methodology, but generically break down into the phases of requirements definition, analysis, design, testing, implementation, and maintenance. If data definition quality is lacking, this process requires improvement. |
| Systems thinking |
The fifth discipline of the learning organization, this sees problems in the context of the whole. Applications developed with systems thinking see the application scope within the context of its value chain and the enterprise as a whole, defining data as a sharable and reusable resource. |
| Taguchi Loss Function |
The principle (for which Dr. Genichi Taguchi who won the Japanese Deming Prize in 1960) that deviations from the ideal cause different degrees of loss in quality and economic loss. Small deviations in some critical characteristics can cause significantly more economic loss than even large deviations in other characteristics. Some data quality problems are likewise critical and cause significantly more economic loss than others, and become the higher priority for process improvement and cleanup. |
| Teamwork |
The cooperation of many within different processes or business areas to increase the quality or output of the whole. |
| Technical information resource data |
The Set of information resource data that must be known to information systems and information resource management personnel in order to develop applications and databases. |
| Third normal form |
See Normal form. |
| Third Normal Form (3NF) |
(1) A relation R is in third normal form (3NF) if and only if it is in 2NF and every nonkey attribute is nontransitively dependent upon the primary key. (2) A table is in 3NF if each nonkey column provides a fact that is dependent only on the entire key of the table. |
| Timeliness |
A characteristic of information quality measuring the degree to which data is available when knowledge workers or processes require it. |
| Total Quality Management (TQM) |
Techniques, methods, and management principles that provide for continuous improvement to the processes of an enterprise. |
| TPCP |
Acronym for Two Phase Commit Protocol. |
| TQM |
Acronym for Total Quality Management. |
| Transaction consistency |
The highest isolation level that allows an application to read only committed data and guarantees that the transaction has a consistent view of the database, as though no other transactions were active. All read locks are kept until the transaction ends. Also known as serializable. |
| Transformation |
See Data transformation. |
| Trigger |
A software device that monitors the values of one or more data elements to detect critical events. A trigger consists of three components : a procedure to check data whenever it changes, a set or range of criterion values or code to determine data integrity or whether a response in called for, and one or more procedures that produce the appropriate response. |
| Trusted database |
Data that has been secured and protected from unauthorized access. |
| Two-phase commit |
In multithreaded processing systems it is necessary to prevent more than one transaction from updating the same record at the same time. Where each transaction may need to update more than one record or file, the two-phase commit protocol is often used. Each transaction first checks that all the necessary records are available and contain the required data, simultaneously locking each one. Once it is confirmed that all records are ready and locked, the updates are applied and the locks freed. If any record is not available, the whole transaction is aborted and all other records are unlocked and left in their original state. |
| Two-stage sampling |
Sampling a population in two steps. The first step extracts sample items from a lot of common groupings of items such as sales orders by order taker. The second stage takes a second sample from the items in the primary or first stage samples. |
| Uncommitted read |
The lowest isolation level that allows an application to read both committed and uncommitted data. Should be used only when one does not need an exact answer, or if one is highly assured the data is not being updated by someone else. (Also known as read uncommitted, read through, or dirty read). |
| Undo |
A state of a unit of recovery that indicates that the unit of recovery's changes to recoverable database resources must be backed out. |
| Unit of recovery |
A sequence of operations within a unit of work between points of consistency. |
| Unit of work |
A self-contained set of instructions performing a logical outcome in which all changes are performed successfully or none of them is performed. |
| Universe |
See Population. |
| Update |
Causing to change values in one or more selected occurrences, groups, or data elements stored in a database. May include the notion of adding or deleting data occurrences. |
| Upper control limit |
The highest acceptable value or characteristic in a set of items deemed to be of acceptable quality. Together with the lower control limit, it specifies the boundaries of acceptable variability in an item to meet quality specifications. |
| User |
An unfortunate term used to refer to the role of people to information technology, computer systems, or data. The term implies dependence on something, or one who has no choice, or one who is not actively involved in the use of something. The term is inappropriate to describe the role of information producers and knowledge workers who perform the work of the enterprise, employing information technology, applications and data in the process. The role of business personnel to information technology, applications, and data, is one of information producer and knowledge worker. The relationship of business personnel to information systems personnel is not as users, but as partners. If Industrial-Age personnel were [machine] "operators" or "workers," then Information-Age personnel are "knowledge workers." |
| Utility |
The usefulness of information to its intended consumers, including the public. (OMB 515) |
| Validity |
A characteristic of information quality measuring the degree to which the data conforms to defined business rules. Validity is not synonymous with accuracy, which means the values are the correct values. A value may be a valid value, but still be incorrect. For example, a customer date of first service can be a valid date (within the correct range) and yet not be an accurate date. |
| Value |
(1) Relative worth, utility, or importance. (2) An abstraction with a single attribute or characteristic that can be compared with other values, and may be represented by an encoding of the value. |
| Value chain |
An end-to-end set of activities that begins with a request from a customer and ends with specific benefits for a customer, either internal or external. Also called a business process or value stream. See Information value chain and Business value chain. |
| Value completeness |
See Completeness. |
| Value stream |
See Value chain. |
| Value-centric development |
A method of application development that focuses on data as an enterprise resource and automates activities as a part of an integrated business value chain. Value-centric development incorporates "systems thinking," which sees an application as a component within its larger value chain, as opposed to a "systems approach," which isolates the application as a part of a functional or departmental view of activity and data. |
| Variance (vσ) |
The mean of the squared deviations of a set of values, expressed as : v = Σd2∕n - 1 . |
| View |
A presentation of data from one or more tables. A view can include all or some of the columns contained in the table or tables on which it is defined. See also Information view. |
| Visual management |
The quality management technique of providing instruction and information about a task in a clear and visible way so that personnel can maximize their productivity. |
| Voice of the Customer |
Documentation of the wants and needs of a product or service, including customer verbatims (actual words used) and reworded data into specific implications for the product or service. |
| Voice of the Engineer |
Documentation of the specification required to meet a quality requirement for a product or service as made by the engineer of a product or designer of a service. |
| Voice of the Process |
Statistical data from or out of a process that indicates the process stability or capability that provides feedback to process performers as a tool for continual improvement. |
| Wisdom |
Knowledge in context. Knowledge applied in the course of actions. |
| World Class |
The level of process performance that is as good as, or better than, the best competitors in the performance of a process type or in the quality of a product type. |
| x |
The algebraic symbol representing a set of values. |
| X (x bar) |
The algebraic symbol representing the mean, or average, of a set of values. |
| Xσn (X Sigma n) |
Formula to find the standard deviation(s) of the X values. Sometimes written as σn. |
| Zero defects |
A state of quality characterized by defect-free products or Six-Sigma level quality. |