Microsoft Fabric Overview
Data is the fuel for AI, and AI is the engine for business innovation. To succeed in the AI era, organizations need to have a robust and reliable data pipeline that can deliver insights at scale. However, many organizations struggle with complex and fragmented analytics systems that hinder their data potential.
Microsoft Fabric is a solution that simplifies and streamlines the data pipeline, enabling organizations to build AI-powered experiences faster and easier. In this blog post, we will introduce Microsoft Fabric from the data architect’s point of view and show how Fabric can help you create more value for your organization with less time and effort.
Create Value Quicker
Fabric has been something I have anticipated for a couple of years. It is a SaaS solution that lets you focus on the business problem and the data model, rather than the technical details of how to connect different Azure services. For example, you can create a Fabric workspace in seconds and start working on your data solution. As a data architect, you may be used to building data analytics solutions by combining various Azure services, such as Azure Data Factory, Azure Synapse, and Azure Data Lake Storage. You must get into the weeds to ensure that everything is properly networked and integrated. With Fabric, you can skip the hassle of stitching together different services and work on the data model directly. You can create value faster and easier with Fabric.
OneLake is Central to Fabric
OneLake is central to everything in Fabric. OneLake is a single, unified, logical data lake that comes automatically with every Microsoft Fabric tenant and is designed to be the single place for all your analytics data. OneLake is built on top of Azure Data Lake Storage Gen2, which means you can use any tool or service compatible with ADLS Gen2 to access your data in OneLake. OneLake also supports multiple analytical engines, such as Spark, SQL, and Python, and allows you to create different types of data items, such as Lakehouses, Warehouses, and Shortcuts.
You can think of OneLake as a OneDrive for data. Your Fabric tenant has a single OneLake and it’s organized by workspaces. Microsoft takes care of the scaling and the syncing across regions for you.
Microsoft has centralized on the Delta Lake open standard format, which is just parquet files with some metadata that allows transactions and time travel and such. Every compute engine in Microsoft Fabric knows how to talk to OneLake and the Delta Lake file format.
The Power of Shortcuts
Fabric lets you store only one copy of each file or table, thanks to shortcuts. Shortcuts are symbolic links in OneLake that refer to other storage locations, either inside or outside OneLake. Shortcuts help you unify your data across domains, clouds and accounts by creating a single virtualized data lake for your whole enterprise. You can also use shortcuts to avoid duplicate copies of data and reduce process latency caused by data copying and staging.
You can create shortcuts in both Lakehouses and KQL Databases. Moreover, the shortcuts you create in these items can point to other OneLake locations, ADLS Gen2 or Amazon S3 storage accounts. You can put shortcuts under Tables or Files folders in Lakehouses, and they will be automatically recognized as tables if they have data in the Delta parquet format.
Power BI and Direct Lake
Direct Lake is a new feature of Power BI and Microsoft Fabric that lets you analyze very large data volumes in a fast and efficient way. Direct Lake is based on reading parquet-formatted files directly from OneLake.
With common architectures today, after you load your data warehouse or a data lake, you have to refresh and load that data into Power BI, using an import model.
This can take a long time for a large model which adds latency before your users can begin analyzing data. But with Microsoft Fabric, you can skip loading an import model in Power BI and put Power BI directly on your Delta Lake tables in OneLake. It eliminates the data set refresh time! Now Power BI just reads from the columnar parquet files on demand at query time and still achieves import model performance.
A Better Way to Work
Direct Lake requires a different way of working for your IT teams. It means that instead of doing many transformations when you load data into your Power BI data model with Power Queries or SQL queries or views, you must do those transformations upstream so they are materialized as tables in OneLake.
This means your BI teams and your data integration teams should work more closely together. In a small team, you can communicate and make sure that the data integration team adds the transformations that you need for BI. In a larger team, Fabric will give you a lot of flexibility in how you can separate duties.
For example, you can have one workspace for your enterprise data warehouse, and then you can use shortcuts to access those tables in a different workspace for your BI team. The BI team can make more transformations and create more tables.
Fabric lets you organize teams and work in a flexible way.
Performance is Handled for You
Microsoft Fabric follows a very similar model to Power BI Premium Gen2, in that you purchase a capacity, which is basically a certain allocation of CPU amount over a 24-hour period, and then each of the different compute engines in Fabric can use CPU from this logical capacity.
Like Power BI Premium, when a query is run against a Fabric capacity, the goal is for this huge pool of compute to complete the query as quickly as possible.
But if one query is using lots of CPU power you could use all your allocation and more for a particular time slice. Fabric prevents this with its bursting and smoothing features. Fabric tries to get your work done as quickly as possible (bursting), and then it spreads that CPU load over the next number of time slices (smoothing) so that a single query doesn’t consume all your quota. In fact, long background jobs like loading your warehouse are smoothed over the next 24 hours. This means you can schedule jobs when you want to schedule them and not worry about whether your capacity is busy with other jobs.
Flexibility to Work How You Want
Fabric gives you a single seamless experience for all data workloads, whether you are doing data integration, data engineering, data warehousing, data science, or BI.
Fabric is suitable for both your professional developers and your less technical business analysts. Just like with Power BI, you can use both pro-code and low-code tools. This gives you the flexibility to choose what works best for your organization.
If your organization prefers a more centralized and IT-managed approach with some room for analysts to do their own analysis, that’s possible. If your organization prefers a more self-service approach and provides powerful tools for analysts to do their own data marts, data integration, and BI, that’s possible too.
And if you are thinking of moving to a more decentralized approach like a data mesh architecture, then Fabric also supports that. You can have each department manage their own data assets and then share them with other departments as needed through shortcuts in OneLake.
I hope you enjoyed this blog post and learned more about Microsoft Fabric. Fabric can play a critical role in enabling data architects to create value faster for their organization. Here are three key takeaways:
Efficiency and Integration: Microsoft Fabric streamlines the process of data management by integrating various Azure services and providing features like OneLake, enabling data architects to work more efficiently.
Performance Optimization: Features such as Direct Lake, Bursting, and Smoothing in Microsoft Fabric help in managing large data volumes swiftly and efficiently, optimizing both cost and performance.
Flexibility in Approach: With support for both professional developers and business analysts, Fabric allows different organizational data strategies, whether centralized, self-service, or decentralized, and can be adapted to specific business needs.
Thank you for reading!