Data warehouse is very important in managing data in huge organizations. In computing, a data warehouse (DW or DWH) is a database used for reporting and analysis. The data stored in the warehouse are uploaded from the operational systems (such as marketing, sales etc., shown in the figure to the right). The data may pass through an operational data store for additional operations before they are used in the DW for reporting.
The DWS (Data Warehouse Striping) technique allows the distribution of large data warehouses through a cluster of computers. The data partitioning approach partition the facts tables through all nodes and replicates the dimension tables. The replication of the dimension tables creates a limitation to the applicability of the DWS technique to data warehouses with big dimensions. This paper proposes a strategy to handle large dimensions in a distributed DWS system and evaluates the proposed strategy experimentally. With the proposed strategy the performance speed up and scale up obtained in the DWS technique are not affected by the presence of big dimensions. Furthermore, it extends the scope of the technique to queries that browse big dimensions that can also benefit of the performance increase of the DWS technique.
In data warehousing, a fact table consists of the measurements, metrics or facts of a business process. It is often located at the center of a star schema or a snowflake schema, surrounded by dimension tables.
Fact tables provide the (usually) additive values that act as independent variables by which dimensional attributes are analyzed. Fact tables are often defined by their grain. The grain of a fact table represents the most atomic level by which the facts may be defined.
Contrary to fact tables, the dimension tables contain descriptive attributes (or fields) which are typically textual fields or discrete numbers that behave like text. These attributes are designed to serve two critical purposes: query constraining/filtering and query result set labeling.