DataWare Housing

ONLINE ANALYTICAL PROCESSING SERVER (OLAP)

Posted on

ONLINE ANALYTICAL PROCESSING SERVER (OLAP)

  •    Online Analytical Processing Server (OLAP) is based on multidimensional data model.
  •     It allows the managers, analysts to get insight the information through fast, consistent, interactive access to information.

TYPES OF OLAP SERVERS

    Four types of OLAP servers that are

  • Relational OLAP(ROLAP)

The Relational OLAP servers are placed between relational back-end server and client front-end tools. To store and manage warehouse data the Relational OLAP use relational or extended-relational DBMS.

  • Multidimensional OLAP (MOLAP)

Multidimensional OLAP (MOLAP) uses the array-based multidimensional storage engines for multidimensional views of data. With multidimensional data stores, the storage utilization may be low if the data set is sparse. Therefore many MOLAP Server uses the two level of data storage representation to handle dense and sparse data sets.

 

  • Hybrid OLAP (HOLAP)

The hybrid OLAP technique combination of ROLAP and MOLAP both. It has both the higher scalability of ROLAP and faster computation of MOLAP. HOLAP server allows storing the large data volumes of detail data. The aggregations are stored separated in MOLAP store.

 

  • Specialized SQL Servers

specialized SQL servers provides advanced query language and query processing support for SQL queries over star and snowflake schemas in a read-only environment.

Postped by Vissicomp Technology Pvt Ltd

http://www.vissicomp.com

Process Flow in Data Warehouse

Posted on

Process Flow in Data Warehouse

There are four major processes that build a data warehouse. Here is the list of four processes:

  • Extract and load data.
  • Cleaning and transforming the data.
  • Backup and Archive the data.
  • Managing queries & directing them to the appropriate data sources.

Extract and Load Process

  • The Data Extraction takes data from the source systems.
  • Data load takes extracted data and loads it into data warehouse.

Clean and Transform Process

Once data is extracted and loaded into temporary data store it is the time to perform Cleaning and Transforming. Here is the list of steps involved in Cleaning and Transforming:

  • Clean and Transform the loaded data into a structure.
  • Partition the data.
  • Aggregation

Backup and Archive the data

  • To recover the data in event of data loss, software failure or hardware failure it is necessary to backed up on regular basis.
  • Archiving involves removing the old data from the system in a format that allow it to be quickly restored whenever required.

Posted  by Vissicomp Technology Pvt Ltd

http://www.vissicomp.com

 

Datamart

Posted on

Why to create Datamart

  • To partition data in order to impose access control strategies.
  • To speed up the queries by reducing the volume of data to be scanned.
  • To segment data into different hardware platforms.
  • To structure data in a form suitable for a user access tool.
  • Identify the Functional Splits
  • Identify User Access Tool Requirements
  • Identify Access Control Issues

Steps to determine that data mart appears to fit the bill

Identify the Functional Splits

In this step we determine that whether the natural functional split is there in the organization. We look for departmental splits, and we determine whether the way in which department use information tends to be in isolation from the rest of the organization. Let’s have an example…

Suppose in a retail organization where the each merchant is accountable for maximizing the sales of a group of products. For this the information that is valuable is:

  • sales transaction on daily basis
  • sales forecast on weekly basis
  • stock position on daily basis
  • stock movements on daily basis

As the merchant is not interested in the products they are not dealing with, so the data marting is subset of the data dealing which the product group of interest. Following diagram shows data marting for different users.

Datamart
Datamart

Issues in determining the functional split:

  • The structure of the department may change.
  • The products might switch from one department to other.
  • The merchant could query the sales trend of other products to analyze what is happening to the sales.

These are issues that need to be taken into account while determining the functional split.

Identify User Access Tool Requirements

For the user access tools that require the internal data structures we need data mart to support such tools. The data in such structures are outside the control of data warehouse but need to be populated and updated on regular basis.

There are some tools that populated directly from the source system but some cannot. Therefore additional requirements outside the scope of the tool are needed to be identified for future.

Populated from the data warehouse rather each tool must have its own data mart.

Identify Access Control Issues

There need to be privacy rules to ensure the data is accessed by the authorized users only. For example in data warehouse for retail banking institution ensure that all the accounts belong to the same legal entity. Privacy laws can force you to totally prevent access to information that is not owned by the specific bank.

Data mart allows us to build complete wall by physically separating data segments within the data warehouse. To avoid possible privacy problems the detailed data can be removed from the data warehouse. We can create data mart for each legal entity and load it via data warehouse, with detailed account data.

Posted by Vissicomp Technology Pvt Ltd

http://www.vissicomp.com

FRAMEWORK OF THE DATAWAREHOUSE

Posted on Updated on

FRAMEWORK OF THE DATAWAREHOUSE

  • One of the reasons that data warehousing has taken such a long time to develop is that it is actually a very comprehensive technology.
  •  In fact, it can be best represented as an enterprise – wide framework for managing informational data within the organization. All the components involved in a warehousing strategy are related, it is essential to have data warehouse architecture.

 

Data warehouse Architecture

  • Data warehouse architecture (DWA) is a way of representing the overall structure of data, communication and presentation that exists for end – user computing within the enterprise.
  • Below is the architecture which is made up of a number of interconnected parts are :
  • Source system
  • Source data transport layer
  • Data quality control and data profiling layer
  • Metadata management layer
  • Data integration layer
  • Data processing layer
  • End user reporting layer
Data warehouse Architecture
Data warehouse Architecture

Posted by Vissicomp Technology Pvt Ltd

http://www.vissicomp.com

Data Warehousing- introduction

Posted on Updated on

What is Data Warehousing- introduction?

A data warehouse is a repository of an organization’s electronically store data and are designed to facilitate reporting and analysis of historical data.

This definition of the data warehouse focuses on data storage which means to retrieve and analyze data, to extract, transform and load data, and to manage the data dictionary are also considered essential components of a data warehousing system.

An expanded definition for data warehousing includes business intelligence tools, tools to extract, transform, and load data into the repository, and tools to manage and retrieve metadata.

Where is Data Warehousing required- ?

Data warehousing arises in an organization’s need for reliable, consolidated, unique and integrated reporting and analysis of its data, at different levels of aggregation.

The practical reality of most organizations is that their data infrastructure is made up by a collection of heterogeneous systems.

For example, an organization might have one system that handles customer-relationship, a system that handles employees, systems that handle sales data or production data, yet another system for finance and budgeting data etc.

In practice, these systems are often poorly or not at all integrated and simple questions like: “How much time did sales person A spend on customer C, how much did we sell to Customer C, was customer C happy with the provided service, Did Customer C pay his bills” can be very hard to answer, even though the information is available “somewhere” in the different data systems.

Another problem is that ERP systems are designed to support relevant operations. For example, a finance system might keep track of every single stamp bought; When it was ordered, when it was delivered, when it was paid and the system might offer accounting principles (like double bookkeeping) that further complicates the data model. Such information is great for the person in charge of buying “stamps” or the accountant trying to sort out an irregularity, but the CEO is definitely not interested in such detailed information, the CEO wants to know stuff like “What’s the cost?”, “What’s the revenue?”, “did our latest initiative reduce costs?”.

Posted by Vissicomp Technology Pvt Ltd

http://www.vissicomp.com

EVOLUTION TO THE DW2.0 ENVIRONMENT

Posted on Updated on

EVOLUTION TO THE DW2.0 ENVIRONMENT

  • From the standpoint evolution of architecture it is interesting to look backward and to examine the forces that shaped the evolution. In fact there have been many forces that have shaped the evolution of information architecture to its highest point – DW 2.0
  • Some of the forces of evolution have been:

1)        The demand for more and different uses of technology: when one compares the very first system to those of DW 2.0 one can see that there has been a remarkable upgrade of systems and their ability to communicate information to the end user. It seems almost inconceivable that not so long ago output from computer system was in the form of holes punched in cards. And end user output was buried as a scrap of information in a hexadecimal dump. The truth is that the computer was not very useful to the end user as long as output was in the very crude from in which it originally appeared.

2)        Online processing: as long as access to data was restricted to very short amounts of time, there was only so much the business person could do with the computer. But the instant that online processing became a possibility, the business opened up to the possibilities of the interactive use of information intertwined in the day – to – day life of the business. With online processing, reservation systems, bank teller processing, ATM processing, online inventory management, and a whole host of other important uses of the computer became a reality.

3)        The hunger for integrated, corporate date: As long as there were many applications, the thirst of the office community was slaked. But after it was recognized that something important was missing, what was missing was corporate information. Corporate information could not be obtained by adding together many tiny little applications. Instead data had to be recast into the integrated corporate understanding of information. But once corporate data became reality whole new vistas of processing opened up.

4)        The need to include unstructured textual data in the mix: for many year decisions were made exclusively on the basis of structured transaction data. While structured transaction data is certainly important, there are other vistas of information in the corporate environment. There is a health of information tied up in textual, unstructured format. Unfortunately unlocking the textual information is not easy. Fortunately, textual ETL (extract/transform/load) emerged and gave organizations the key to unlocking text as a basis for making decisions.

5)        Capacity: if the world of technology had topped marking innovations, a sophisticated world such as DW 2.0 simply would not have been possible. But the capacity of technology, the speed with which technology works, and the ability to interrelate different forms of technology work all capacity is an infrequently encountered constraint imagine a world in which storage was held entirely on magnetic tape(as the world was not so long ago.) Most of the types of processing that are taken for granted today simply would not have been possible.

6)        Economics: In addition to the growth of capacity, the economics of technology have been very favorable to the consumer. If the consumer had to pay the prices for technology that were used a decade ago, the data warehouses of DW 2.0 would simply be out of orbit from a financial perspective. Thanks to Moore’s law the unit cost of technology has been shrinking for many years now. The result is affordability at the consumer level.

Posted by Vissicomp Technology Pvt Ltd

For more http://www.vissicomp.com

Data Warehouse

Posted on Updated on

Data Warehouse

  • A decision support database that is maintained separately from the organization’s operational database.
  • Support information processing by providing a solid platform of consolidated, historical data for analysis.
  • Data warehousing defined as –
    • Subject-oriented:
      • Organized around major subjects, such as customer, product, sales.
      • Focusing on the modeling and analysis of data for decision making, not on daily operations or transaction processing.
      • Provide a simple and concise view around particular subject by excluding data that are not useful in the decision support process.
    • Integrated
      • Constructed by integrating multiple, heterogeneous data sources as relational databases, flat files, on-line transaction records.
      • Providing data cleaning and data integration techniques.
    • Time variant
      • The time horizon for the data warehouse is significantly longer than that of operational systems.
      • Every key structure in the data warehouse contains an element of time (explicitly or implicitly).
    • Non-volatile
      • A physically separate store of data transformed from the operational environment.
      • Does not require transaction processing, recovery, and concurrency control mechanisms.
      • Requires only two operations in data accessing: initial loading of data and access of data (no data updates).

Posted by Vissicomp Technology Pvt Ltd

For more http://www.vissicomp.com

COMPONENTS OF DATA WAREHOUSE ENVIRONMENT

Posted on Updated on

VARIOUS COMPONENTS OF DATA WAREHOUSE ENVIRONMENT

  • There are various components of a data warehouse environment. In the beginning those components were not widely recognized. But soon the basic components of the data warehouse environment became known.
  • Figure shows the progression from an early stand – alone data warehouse to full – fledged data warehouse architecture.
  • Soon the data warehouse evolved into a full-blown architecture sometimes called the corporate information factory
  • The full – fledged architecture show in above figure includes some notable components, which are discussed below,

ETL – extract/ transform/ load

  • ETL technology allows data to pull from the legacy system environment and transformed into corporate data. The ETL component performs many functions, such as

Logical conversion of data;

Domain verification;

Conversion from one DBMS to another;

Creation of default values when needed;

Summarization of data; addition of time values to the data key;

Restructuring of the data key;

Merging of records;

Deletion of extraneous or redundant data.

FUNCTIONAL DEPENDENCY OF THE DATA

Posted on Updated on

FUNCTIONAL DEPENDENCY OF THE DATA

  Functional dependency of data means that the attribute within a given entity are fully dependent on the entire primary key of the entity-no more, no less.

The implication is that each attribute has a one-to-one relationship with its primary key. For each instance of a primary key, there is one set of applicable values; for each attribute, there is only one primary key that properly gives it its identity.

For example, in the customer entity, names, DOB all belong to the primary key of CustomerID.

If the columns with each table contain attributes with proper functional dependency, then the simplest way to make a change to any column is to rely on the functional dependency.

Each change will be applied to one attribute in one place.

If customer changes name, then the customer row and change the name.

If the new customer address is used as the mailing address, go to address role and change it there-one place. Functional dependency is not irrelevant in analytical models.

When you define a level, you define a relationship between the level key attributes and the other attributes in the level (default attribute and related attributes). The relationship indicates that the level key attributes can be used together to determine the other attributes in the level. DB2 Data Warehouse Edition documents the relationship between the level’s attributes by defining functional dependencies between the attributes. The relationships, defined by functional dependencies, can be used to perform intelligent optimization of your data.

If a functional dependency exists between a level key attribute and the level’s related attributes, the Optimization Advisor wizard can include the level key attribute without the related attributes in the summary table. Queries that are interested in the level’s related attributes can still be routed to the summary table because the DB2 optimizer joins the summary table with the dimension table when the query is issued to create the final result set.

Posted by Vissicomp Technology Pvt Ltd

http://www.vissicomp.com

SCOPE OF THE DATA WAREHOUSE

Posted on Updated on

DATA WAREHOUSE OPTIONS

   There are many ways to develop data warehouse as there are organization and  number of key factors that need to be considered:

  • SCOPE OF THE DATA WAREHOUSE
  • DATA REDUNDANCY
  • TYPE OF END – USER

        i.            SCOPE OF THE DATA WAREHOUSE

The scope of a data warehouse may be as broad as all the informational data for the entire enterprise from the beginning of time, or it may be as narrow as a personal data warehouse for a single manager for a single year. There is nothing that makes one of these more of a data warehouse than another.

     ii.            DATA REDUNDANCY

There are essentially three levels of data redundancy that enterprises should think about when considering their data warehouse options:

a)    “virtual” or “point – to point” data warehouses

b)    Central data warehouses

c)     Distributed data warehouses

A)    “VIRTUAL” OR “POINT TO POINT” DATA WAREHOUSES

A virtual or point to point data warehousing strategy means that end users are allowed to get at operational databases directly, using whatever tools are enabled to the “ data access network” this approach provides the ultimate in flexibility as well as the minimum amount of redundant data must be loaded and maintained.

 

B) CENTRAL DATA WAREHOUSES

A central data warehouses may contain records for any specific period of time and usually, contains information from multiple operational systems.

Central data warehouse are real. The data stored here is accessible from one place and must be loaded and maintained on a regular basis. Normally, data warehouses are built around advanced RDBMs or some form of multidimensional informational database server.

C)    DISTRIBUTED DATA WAREHOUSES

Distributed Data warehouses are those in which certain components are distributed across a number of different physical databases.

 

   iii.            TYPE OF END – USER

There are different ways to organize a data warehouse, it is also important to note that there is an increasingly wide range of end – users too. They can broadly be categorized into three;

Executives and managers

Power users (business and financial analysts, engineers)

Support users (clerical, administrative)

Each of these has its own set of requirements for data, access, flexibility and ease of use.

Posted by Vissicomp Technology Pvt Ltd

For more http://www.vissicomp.com