INTRODUCTION TO DATA WAREHOUSING
A data warehouse is a repository of an organization’s electronically stored data. Data warehouses are designed to facilitate reporting and analysis.
A data warehouse is a powerful database model that significantly enhances the user’s ability to quickly analyze large, multidimensional data sets.
It cleanses and organizes data to allow users to make business decisions based on facts.
Hence, the data in the data warehouse must have strong analytical characteristics creating data to be analytical requires that it be –subject- oriented, integrated, time – referenced and non – volatile.
SUBJECT- ORIENTED DATA
This means a data warehouse has a defined scope and it only stores data under that scope. So for example, if the sales team of your company is creating a data warehouse – the data warehouse by definition is required to contain data related to sales.
Data Warehouses group data by subject rather by activity. In contrast, transactional systems are organized around activities – payroll processing, shipping products, loan processing, and the like.
Data organized around activities cannot answer questions such as, “how many salaried employees have a tax deductions of ‘X’ amount across all branches of the company?’’ this request would require have searching and aggregation of employee and account records of all the branches.
Imagine the query response time for a company having branches all over the country with employee strength of 20,000!
In a data warehouse environment, information’s used for analysis is organized around subjects- employees, accounts sales, products, and so on. This subject specific design helps in reducing the query response time by searching through very few records to get an answer to the user’s question.
Integrated data refers to de – duplicating information and merging it from many sources into one consistent location.
When short listing your top 20 customers, you must know that ‘’HAL’’ and ‘’Hindustan aeronautics limited’’ are one and the same. There must be just one customer number for any form of HAL or Hindustan aeronautics limited, in your database.
This means that the data stored in a data warehouse make sense. Fact and figures are related to each other and they are integrable and project a single point of truth.
Much of the transformation and loading work that foes into the data warehouse is centered on integrating data and standardizing it,
TIME – REFERENCED DATA
The most important and most scrutinized characteristic of the analytical data is its prior state of bing. In other words, time-referenced data essentially refers to its time – valued characteristic. For example, the user may ask ‘’what were the total sales of product ‘A’ for the past three years on New Year’s Day across region ‘Y’?’’ to answer this question, you need to know the sales figures of the product on new year’s day in all the branches for that particular region.
This means that data is not constant, as new and new data gets loaded in the warehouse, data warehouse also grows in size
Time – referenced data when analyzed can also help in spotting the hidden treads between different associative data elements, which may not be obvious to the naked eye. This exploration activity is termed ‘’data mining’’.
NON – VOLATILE DATA
Since the information in a data warehouse is heavily queried against time, it is extremely important to preserve it pertaining to each and every business event of the company. The non – volatility of data, characteristic of data warehouse, enables users to dig deep into history and arrive at specific business decisions based on facts.
This means that data once stored in the data warehouse are not removed or deleted from it and always stay there no matter what.
NECESSITY –THE DATA ACCESS CRISIS
If there is a single key to survival in the 1990s and beyond, it is being able to analyze, plan, and react to changing business conditions in a much more repaid fashion. In order to do this, to managers, analysts, and knowledge workers in our enterprises, need more and better information.
Information technology (IT) has made possible the revolution in the way organizations operate throughout the world today. But the sad truth is, in many organizations, despite the availability of powerful computers on each desk and communication that span the globe, large numbers of executives and decision – makers cannot get their hands on exiting critical information in the organization.
Every day, organizations large and small, create billions of bytes of data about all aspects of their business; millions of individual facts about their customers, products, operations and people. But for the most part, this is locked up in a maze of computer systems and is exceedingly difficult to get at. This phenomenon has been described as “data in jail”.
Industry experts have estimated that only a small fraction of the data that is captured, processed and stored in the enterprise, is actually available to executives and decision makers. While technologies for the manipulation and presentations of data have literally exploded, it is only recently that those involved in developing IT strategies for large enterprise have concluded that large segments of the enterprise are “data poor”.
Posted By-: Vissicomp Technology Pvt. Ltd.
Website -: http://www.vissicomp.com