From the software engineering point of view, the design and construction of a data warehouse may consist of the following steps: planning, requirements study, problem analysis, warehouse design, data integration, and testing, and finally deployment of the data warehouse. Large software systems can be developed using two methodologies — the waterfall method or the spiral method.
The waterfall method performs a structured and systematic analysis at each step before proceeding to the next, which is like a waterfall, falling from one step to the next. The spiral method involves the rapid generation of increasingly functional systems, with short intervals between successive releases. This is considered a good choice. data warehouse development, especially for data marts, because the turnaround time is short, modifications can be done quickly, and new designs and technologies can be adapted in a timely manner.
In general, the warehouse design process consists of following steps —
- Choose a business process to model, for example, orders, invoices, shipments, inventory, account administration, sales,-or the•general ledger. If the business process is organizational and involves’ multiple complex object collections, a data warehouse model should be followed. However, if the process is. departmental and focuses on the analysis. of one kind of business process, the data mart model should be chosen.
- Choose the grain of the business process. The grain is the fundamental, atomic level of data to be represented in the fact table for this process, for example, individual transactions, individual daily snapshots, and so on.
- Choose the dimensions that will apply to each fact table record. Typical dimensions are time, item, customer, supplier; warehouse, transaction type, and status.
- Choose the measures that will populate each fact table record. Typical measures are numeric additive quantities like dollars _sold and units sold.
Because data warehouse construction is a difficult and long-term task, its implementation task should be clearly defined. The goal of an initial data warehouse implementation should be specific, achievable, and measurable: