Planning Data Warehouse - Azure

Why DataWarehousing?

Companies of huge data and they want to perform analysis on that data to see trends which help in management decision making.

What is DataWarehousing?

Creating a separate data store in a particular standard format without changing the actual data from different data sources of organisation which will then be used to perform analysis.


Considerations for data warehousing:

what are the source systems, types of data,volume of data, velocity of data

Functional and Non-Functional requirements (like Load and Query Concurrency , Load and Query SLA , Availability and Disaster Recovery )

what are retention requirements

what granularity is required for the consumption of data

Access control - Security of the data has to be kept in mind because the role base access of data previously should also be valid after creating data warehousing.

OLTP (online transaction processing) Vs OLAP (online analytical processing):



OLTP (Database)OLAP (Data Warehouse)
Current DataHistoric Data, Non- Volatile
Used to run the BusinessUsed to analyse the Business
Highly detailedConsolidated and Summarized
Typilcally in GB scaleTB and above
Queries run against small scale dataQueries run against large scale data


Data Warehouse Architecture:




Different formats of data will loaded into DataWarehouse by ETL(Extract, Transform, Load) process and then OLAP servers are used to run the Analysis on Data in Data Warehouse




  • Different formats of data will loaded into Azure Blob Storage by Azure DataFactory which will Orchestrate(Meaning: plan or coordinate the elements of (a situation) to produce a desired effect)  all the steps required using the control flow. 

  • This control flow will drive the sets of activities that will complete overall data flow. Data flow is data's pass through life  through its creation, ingestion, transformation and its ultimate storage analysis that's enabled through above set of activites. 

  • Then it will push to Azure Blob Storage and this data can be used by different components like Data Bricks which can prepare and train the data for Analysis, PolyBase can be used to push data directly to Azure SQL Data warehouse where they can analysed present that to Power BI for Visualizing the Analysis. 


Designing Data Warehouse:

If we have the data and queries in the same system then there will be no latency and bandwidth issues but we will not be able to scale out.

But with todays network speed we no more have that problem, so we can componentize the whole process and use different and feasible solutions for each component as the system evolves.


Azure DataFactory:





Post a Comment

0 Comments