Home / Dissertation / Data Warehouse Design, Dissertation Example

Data Warehouse Design, Dissertation Example

Pages: 11

Words: 2955

Literature review

Introduction

To begin a systematic analysis of data warehousing technology and to arrive at a best practices model of data warehousing, it is first necessary, to start with, an original definition of data warehousing and to arrive at an operational definition of data warehousing for this body of work. Informatica (2019) defines data warehousing as a technology that is used to compile structured data from one or more source with the aim of creating business intelligence that serves as the basis for strategic business decisions. It can be distinguished from a standard database in that it is used to depict data over time and has the inherent capability to offer unique insights into a business.

Glowtouch Technologies (2016) maintains that there are many advantages to data warehousing. These advantages include the opportunity to take advantage of increased organization-specific business intelligence, increased data retrieval speeds, the ability to access data in a timely manner, improved quality of data as a result of the inherent redundancies and a higher return on investment by virtue of the business intelligence that can be derived by using data warehousing. For data warehousing to be active, there is a need for a comprehensive data warehousing strategy. This strategy begins with an identification of the data and its importance, identifying the ways in which it will be stored, determining the means in which the data will be packaged in order to facilitate optimal retrieval and processing, discovered how the data would be processed and finally establishing the rules governing the utilization of data (SAS, n.d.).

Business Intelligence (BI) systems are present in a wide variety of industries. They serve as the driving force behind data-driven solutions in environmental modeling systems (Eskelinen, Räsänen, Santti, Happonen & Kajanus (2017), virtual reality gaming systems (Suen, Wu, & Lin, 2018), product distribution management systems (Kesner & Russell, 2009) and military operations (Roberts & Koumpis, 2004). BI systems enable decision-makers to make vital business decisions promptly and through the utility of vast amounts of data that can be located in different areas and different formats. BI systems utilize Key Performance Indicators (KPIs) in their decision-making interfaces. The KPIs serve to measure the level of success an organization has achieved in meeting its goals and will be used to impact performance improvements. For BIs to function effectively and efficiently, there need to be data repositories that need to be managed in the background. The data repositories store data in data warehouses (DW). Informatica (2019) defines data warehousing as a technology that is used to compile structured data from one or more source with the aim of creating business intelligence that serves as the basis for strategic business decisions. It can be distinguished from a standard database in that it is used to depict data over time and has the inherent capability to offer unique insights into a business. The underlying technology involved in information storage and capturing of KPIs are often unpredictable, and this necessitates failure in relating the KPIs to business goals. The result of this is the failure of many BI systems. Keeping this in mind, there is a need to design a framework that aligns KPIs exhibited in the business goals with DW technology to provide end-user information that can be utilized to make data-driven decisions that are accurate indicators of the level of adherence to business goals.

Data warehouse

The data warehouse is as well a “time-variant, subject-oriented, non-volatile, and integrated, knowledge assortment in support of the process responsible for the decision-making of management.” Also, data warehouse comprises of information containing the company’s business history data. However, this historical data is applicable for business choices supportive analysis from a performance-based strategic of a structure unit considered definite. Additionally, in operational databases of economic integration associated with nursing, it provides an associate atmosphere that sanctions knowledge strategically. Furthermore, MDDB and relative system of management get embraced by these technologies as well as the style of client/server, repositories, and modeling of meta-data, and graphical program among others.

The domain of cross-discipline emergence which consists of finance-based management, e-commerce, and health information played a part in the enormous knowledge amount that requires analysis. Data warehouse knowledge evolution can supply numerous dimensions of data set capable of various problems unraveling. Thus, a data warehouse model acceptable in decision making concerning knowledge set needs is necessary. Besides, data warehouse major proponents are Ralph Kimball (Kimball, 1996) and William Inmon (Inmon, 1999). Nevertheless, in terms of style and favor, they have accomplished data warehouse-based different opinions. Moreover, the making of a business structure’s dependent data place from public data warehouse got performed by Inmon whereas Kimball ascertained public data warehouse as a business structure’s bus-based position of data. The Ralph Kimball and William Inmon variations in data warehouse structure are well outlined.

Whatsoever, a data warehouse is classified as read-only data so long as there is no permission for end-users to vary the knowledge elements or values. Besides, the style strategy of the data warehouse as presented by Inmon is not the same as the one submitted by Kimball. For instance, the model of the data warehouse by Inmon splits distribution of duplicate associated data marts as Nursing interface associate in between end users and data warehouse. In contrary to that, data warehouse view by Kimball is as a knowledge marts unions. Therefore, the data warehouse comprises of the knowledge marts collections joined to produce a central repository. An illustration of Kimball and Inmon variations style adopted from the data warehouse is imminent.

Despite the existing contradiction between Kimball and Inmon, they have a particular knowledge warehouse browse trend in which they agree on prospering implementation. Nevertheless, the warehouse is influenced by an associate in an assortment of nursing economical of knowledge operations as well as knowledge business place validation. The role of data information processes of ETL and staging on data is every researchers’ necessary elements in the fashion of knowledge warehouse. Also, every dependent and believed the style of a data warehouse is very vital in the end users necessity of enterprise fulfilments in term of data affiliation, preciseness, and property tempura.

The architecture of data warehouse

Despite a broad research scope of data warehouse architecture, it entails the ability to undergo viewing in many viewpoints. (Devlin and Cote, 1996) And (Sahama and Croll, 2007) express some important technique of data warehouse architecture analysis and view. According to Devlin, a system of data warehouse termed as successful get determined by the process of database staging responsible for data deriving from different online transactional integrated Handling a system named (OLTP). The method of ETL plays a fundamental role to ensure a working operation of database staging. Moreover, according to a survey on data warehouse architecture influential selection factors by (Devlin, 1996) recognizes “five common data warehouse architecture.

Independent Data Marts

Also, it can be identified as small scale or localized data warehouse. The significant use of these type of data is by departments company divisions with an aim to provide operational databases mainly personal. Also, this data mart type is simple though comprises of different derivable form especially from various design structures as well as numerous database designs considered as consistent. Therefore, it is held responsible for cross-analysis of data mart complications. However, since every unit of organizational prefers to construct their operating database responsible for independent data mart tackling, it is preferably cast-off as data warehouse ad-hoc as well as being recognized as a prototype in preparations to construct an actual data warehouse.

Data Mart Bus Architecture

Roberts, (2004) Founded the data warehouse architecture and design with amalgamations of data marts recognized as virtual data or bus architecture warehouse. Moreover, this type permits data marts located in a single server as well as the one found on an altered server. Therefore, this agreement to the functioning of the data warehouse more in a computer-generated mixed mode in all data marts thus progressed as a single data warehouse.

Hub-and-spoke architecture

The developed hub and speaking of architecture was done by (Kesner, 2009). Furthermore, the hub is referred to as sever at the middle responsible for exchanges of information and the transformation of spoke handle data for all data stores local operation. Therefore, the main focus of spoke and was predominantly on a data warehouse’s maintainable and scalable infrastructure construction.

Centralized Data Warehouse Architecture

A warehouse architecture central data is constructed based on “hub-and-spoke architecture” though deprived of the component of the dependent data mart. The role of this architecture is to stores and copy external and operational data termed heterogeneous to a consistent and solo data warehouse. However, this architecture consists of a single data model believed to be complete and consistent from every source of data. According to Kesner (2009), a centralized data warehouse should contain a staging database or data store that is operational as an operational processing transitional phase integration of data before data warehouse transformation.

Federated Architecture

(Eskelinen, 2017), indicated that federated data warehouse is a many different data marts integration operational or staging database data store and a system of reporting as well as an analytical combination. The federated concept concentrates on frameworks said to be integrated with an aim to make a more reliable data warehouse. On the other hand, (Kesner, 2009) argued that a data warehouse that is federated is an approach-based practical since it delivers outstanding worth and concentrates on exceptional dependability.

It is very significant in this study to recognize the type of data warehouse architecture that is scalable and robust in terms of the comprehensive system of enterprise exploitation and construction. Warehouse architecture of appropriate data selection must integrate various model of warehouse data successful characteristic (Devlin, 1996). However, as shown by (Devlin and Cote, 1996) and (Sahama and Croll, 2007) it is evident that there are two popular data architectures. Firstly, the hub-and-spoke illustrated by Devlin as a dependent data marts of the data warehouse. Secondly, the bus architecture of data mart represented by Sahama as pertaining data marts that are dimensional. Besides, the current proposed model selection will use “hub-and-spoke data warehouse architecture” possible for MDDB modeling utilization.

Multidimensional model

Data warehouses which act a component of intelligent business systems are used to supply data required for the measurement of key performance indicators in an organizational setup. The warehouses, therefore, need to be designed in a manner that meets these corporate requirements. For instance, many scholars have shown more concern towards the multidimensional model as the best concept applied in many data warehouses. The multidimensional model is thus analyzed using various ideas to determine its viability and applicability in the context of data warehouse modeling. The design of this model is analyzed following several design process steps such as the logical, requirement specification, conceptual, and physical. The conceptual design is viewed to be of no applicability in the designing of the multidimensional model since it is not widely accepted. As a result, the logical model is widely used for modeling purposes although it leads to several schemas that make it difficult for an ordinary user to understand. The logical model thus represents the multidimensional model in tables that are relational and arranged in particular structures such as snowflake schemas and star schemas (Devlin and Cote, 1996). These schemas, therefore, show a relationship between various dimensional tables and fact tables. Star schemas are viewed to use several different tables that result in denormalized dimensional tables while the snowflake is considered to make use of normalized tables for their hierarchies and dimensions. In this data warehouse relational representation, a server known as the OLAP provides a cube of data which in return bring a view of the multidimensional model of a data warehouse.

Additionally, the physical design of the multidimensional model is concerned with matters of implementation. With the provision of a typical size of a data warehouse, physical design is an essential component of ensuring that there is enough time to respond to hoc and complex queries that need to be supported. Several techniques are thus used to enhance the performance of a system which includes data partitioning, materialized views and indexing. Notably, the commonly used indexes are bitmap indexes which mostly are used in the context of data warehousing compared to B-trees which are used in operational warehousing (Sahama and Croll, 2007). The difference between operational and data warehousing is that the data warehouses are usually found at the end and thus this factor helps in distinguishing the two hence providing relevant guidance during the construction of these warehouses. Also, in data warehouses data is usually collected from various sources; therefore the data can be modeled after it is received and transformed to fit the model of the warehouse and then stored in the warehouse. Thus, this process is referred to as the ETL (extraction, transformation, and loading) which help in analyzing the data warehouse model using multidimensional design.

The ETL is an essential component in a data warehouse that helps users in ensuring that data is clean as it is obtained from several OLTP systems. Since the warehouse collects data from various systems, it is evident that there are a lot of difficulties in sorting the data and storing it in a central location and thus the ETL process becomes essential in assisting the sorting process (Kesner & Russell, 2009). The extraction process involves a lot of scrubbing and cleansing of data that ensure validation of all data while the transformation process involves multiple data transformation so that it meets the warehouse data standards and requirements and the last method which is loading involve storage of the transformed data. The transformation process also ensures the integrity of data thus making the data stored in warehouses that use this model reliable by the users. Also, the ETL help in the exportation and importation of complex operational data between embedding and object-oriented linking architecture whereby the data is transformed to ensure validated data is stored in the warehouse. The data that is stored in the form of star schema is viewed to consist of fact tables and dimensions. The ETL process hence makes the multidimensional model a practical design for data warehouses since it ensures the efficiency of key performance indicators.

However, according to Devlin (1996), despite the ETL being a significant component to ensure the integrity of data, it is also fundamental that the issue of scalability and complexity is put into consideration since it plays a critical role in deciding the types of data to be stored in the warehouse using the warehouse architecture. The best way of obtaining a non-complex and scalable solution is through the adoption of “hub-and-spoke architecture” for the process of ETL. This architecture helps the ETL process to operate flexibly and efficiently. A centralized design for data warehouse helps in influencing the maintenance of complete access control of the process of ETL. In the “hub-and-spoke architecture” the hub is viewed as the data warehouse after it has processed data from the operational warehouse to a staging warehouse database while the spoke represents the marts used for the distribution of data. The architecture is considered to use a “one-to-many interfaces” to ensure transportation of data from data warehouse to distribution marts. The “one-to-many” interface is most preferred due to its merits which involve fewer costs, consistent dimensions, and simple to implement compared to the “many-to-many interface” which is very costly and complicated hence cannot comply in the long run.

Moreover, a data warehouse is viewed to comprise of both success and failure factors that contribute towards the key performance indicators of an organization. The success of a data warehouse depends on the implementation of a correct model which support the core functions of the warehouse. The success of the data warehouse is thus determined by the success factors that are implemented in support of the chosen model while the failure of the data warehouse emanates from the weaknesses and threats that face the warehouse and no preventive measures are taken. Therefore, the success factors are such as support of the top management, and implementation and acquisition of quality sources of data which are well-defined and profound to meet the business needs and fully satisfy the data warehouse demands (Kesner, 2009). Also, having adequate knowledge of climatic factors is an essential factor since it enables prevention of further climatic impacts. On the other hand, the failure factors include the negative influence of political, economic, and climatic factors. Inadequate funds for implementation of required data warehouse model is also another failure factor. Therefore, it is essential that a data warehouse implements the best model to help in achieving the organizational objectives. The multidimensional model is thus viewed to be the best data warehouse model since it meets most of the warehouse requirements.

References

Bevanda, V. (2018). Decision Engineering: Settling A Lean Decision Modeling Approach. Varazdin: Varazdin Development and Entrepreneurship Agency (VADA).

Eskelinen, T., Räsänen, T., Santti, U., Happonen, A., & Kajanus, M. (2017). Designing a Business Model for Environmental Monitoring Services using Fast MCDS Innovation Support Tools. Technology Innovation Management Review, 7(11), 36-46.

Kesner, R. M., & Russell, B. (2009). Enabling business processes through information management and IT systems: The FastFit and winter gear distributors case studies. Journal of Information Systems Education, 20(4), 401-405.

Roberts, B., & Koumpis, A. (2004). A framework for situation room analysis and exploration of its application potential in the information technologies market. Management Decision, 42(7), 882-891.

Suen, J., Wu, T., & Lin, K. (2018). A System Framework Design for Virtual Reality Game Using Gameplay Big Data Technology. International Journal of Organizational Innovation (Online), 11(2), 230-239.

Devlin, B., & Cote, L. D. (1996). Data warehouse: from architecture to implementation. Addison-Wesley Longman Publishing Co., Inc.

Sahama, T. R., & Croll, P. R. (2007, January). A data warehouse architecture for clinical data warehousing. In Proceedings of the fifth Australasian Symposium on ACSW frontiers-Volume 68 (pp. 227-232). Australian Computer Society, Inc.