๐ง Data Modeling: Concepts and Types#
Data modeling is the process of designing the structure, relationships, and logic of data to support analysis, reporting, and decision-making.
๐ Key Aspects of Data Modeling#
Aspect |
Description |
---|---|
Entities |
Core objects or concepts (e.g., Customers, Orders) |
Attributes |
Properties or fields of entities (e.g., Order Date, Customer Name) |
Relationships |
Logical connections between entities (e.g., One-to-Many, Many-to-Many) |
Keys |
Unique identifiers (Primary Key, Foreign Key) |
Cardinality |
Defines the nature of relationships (e.g., 1:1, 1:N, N:N) |
Normalization |
Organizing data to reduce redundancy and improve integrity |
Denormalization |
Flattening data for performance or reporting simplicity |
๐งฉ Types of Data Models#
๐ Data Model Types: Summary of Strengths, Weaknesses, and Applicability#
Model Type |
Description |
Strengths |
Weaknesses |
Applicability in BI |
---|---|---|---|---|
Conceptual |
High-level overview of entities |
Easy to understand; good for stakeholder alignment |
Lacks technical detail; not implementation-ready |
Early planning; stakeholder communication |
Logical |
Detailed structure without implementation |
Platform-independent; precise schema design |
Needs translation to physical model |
Schema validation; database design |
Physical |
Actual database schema |
Directly implementable; optimized for performance |
Platform-specific; less abstract |
SQL databases; Power BI data models |
Dimensional |
Fact and dimension tables for analytics |
Optimized for queries; intuitive for analysts |
May sacrifice normalization; redundancy possible |
Star/Snowflake schemas; dashboards; OLAP cubes |
Hierarchical |
Tree-structured parent-child relationships |
Fast access for structured data |
Rigid structure; poor flexibility |
Legacy systems; XML data |
Network |
Graph-like complex relationships |
Handles many-to-many relationships well |
Difficult to maintain; less intuitive |
Advanced relational databases; niche analytics |
Object-Oriented |
Combines data and behavior |
Reusable; encapsulates logic and structure |
Complex for relational mapping |
Application-driven modeling; hybrid systems |
๐ BI Context: Dimensional Modeling#
Fact Table: Contains measurable metrics (e.g., Sales, Revenue)
Dimension Tables: Describe context (e.g., Time, Product, Region)
Star Schema: Central fact table connected to dimensions
Snowflake Schema: Dimensions normalized into sub-dimensions
๐ Fact Tables, Dimension Tables, and Schema Design in BI#
Understanding how data is structured in analytical models is essential for building efficient and insightful dashboards.
๐งฎ Fact Table#
A Fact Table contains measurable, quantitative dataโtypically business metrics.
๐น Example: SalesFact
#
SaleID |
DateID |
ProductID |
CustomerID |
RegionID |
Revenue |
Quantity |
---|---|---|---|---|---|---|
101 |
20230801 |
P001 |
C001 |
R01 |
1200.00 |
3 |
Keys: Foreign keys to dimension tables
Measures: Revenue, Quantity, Profit, etc.
๐งพ Dimension Tables#
Dimension Tables provide descriptive context for facts. They are used for filtering, grouping, and slicing data.
๐น Example: ProductDim
#
ProductID |
ProductName |
Category |
Brand |
---|---|---|---|
P001 |
Laptop |
Electronics |
Dell |
๐น Example: CustomerDim
#
CustomerID |
Name |
Segment |
JoinDate |
---|---|---|---|
C001 |
Alice |
Retail |
2022-01-15 |
๐น Example: DateDim
#
DateID |
Date |
Month |
Quarter |
Year |
---|---|---|---|---|
20230801 |
2023-08-01 |
Aug |
Q3 |
2023 |
โญ Star Schema#
Central Fact Table
Connected directly to denormalized Dimension Tables
Simple, fast for querying
๐น Structure#
SalesFact โโโ ProductDim โโโ CustomerDim โโโ DateDim โโโ RegionDim
โ๏ธ Snowflake Schema#
Dimensions are normalized into sub-dimensions
Reduces redundancy, improves integrity
More complex joins, slower queries
๐น Structure#
SalesFact โโโ ProductDim โ โโโ CategoryDim โโโ CustomerDim โ โโโ SegmentDim โโโ DateDim โโโ RegionDim โโโ CountryDim
โญ Star Schema vs โ๏ธ Snowflake Schema#
Feature |
โญ Star Schema |
โ๏ธ Snowflake Schema |
---|---|---|
Structure |
Central fact table with directly linked dimensions |
Dimensions are normalized into sub-dimensions |
Complexity |
Simple and flat |
More complex with multiple joins |
Query Performance |
Faster due to fewer joins |
Slower due to normalized joins |
Storage Efficiency |
Less efficient (redundancy in dimensions) |
More efficient (reduces data duplication) |
Ease of Use |
Easier for analysts and BI tools |
Requires deeper understanding of relationships |
Normalization Level |
Denormalized |
Normalized |
Maintenance |
Easier to maintain and modify |
More difficult to maintain due to relational depth |
Use Case |
Dashboards, ad-hoc analysis |
Large-scale enterprise systems with strict data integrity |
Example |
ProductDim includes category and brand |
ProductDim links to CategoryDim and BrandDim |
๐ง Reflection Questions#
What are the trade-offs between normalization and denormalization?
Why is dimensional modeling preferred in BI tools like Power BI?
How do relationships and cardinality affect data integrity and performance?
๐งฉ Interactive Schema Explorer Overview#
This module simulates a Power BI-style data model using pandas and ipywidgets:
Fact Table (
sales_fact
): Contains transactional data including sales, revenue, and quantity.Dimension Tables: Include
product_dim
,customer_dim
,date_dim
, andregion_dim
for descriptive attributes.Snowflake Extension:
category_dim
adds hierarchy toproduct_dim
via category โ department.Schema Toggle: Users can switch between Star Schema and Snowflake Schema views.
Join Simulation: Demonstrates how snowflake schemas normalize data via joins.
Quiz Prompts: Encourage reflection on schema design, performance, and join logic.
Ideal for teaching schema modeling, join operations, and Power BI data relationships.
# ๐ฆ Imports
import pandas as pd
import ipywidgets as widgets
from IPython.display import display, Markdown, clear_output
# ๐งฎ Fact Table
sales_fact = pd.DataFrame({
'SaleID': [101, 102, 103],
'DateID': [20230801, 20230802, 20230803],
'ProductID': ['P001', 'P002', 'P003'],
'CustomerID': ['C001', 'C002', 'C003'],
'RegionID': ['R01', 'R02', 'R01'],
'Revenue': [1200.0, 800.0, 1500.0],
'Quantity': [3, 2, 5]
})
# ๐งพ Dimension Tables
product_dim = pd.DataFrame({
'ProductID': ['P001', 'P002', 'P003'],
'ProductName': ['Laptop', 'Tablet', 'Monitor'],
'Category': ['Electronics', 'Electronics', 'Accessories'],
'Brand': ['Dell', 'Samsung', 'HP']
})
category_dim = pd.DataFrame({
'Category': ['Electronics', 'Accessories'],
'Department': ['Tech', 'Peripherals']
})
customer_dim = pd.DataFrame({
'CustomerID': ['C001', 'C002', 'C003'],
'Name': ['Alice', 'Bob', 'Charlie'],
'Segment': ['Retail', 'Corporate', 'Retail'],
'JoinDate': ['2022-01-15', '2021-11-03', '2023-02-20']
})
date_dim = pd.DataFrame({
'DateID': [20230801, 20230802, 20230803],
'Date': ['2023-08-01', '2023-08-02', '2023-08-03'],
'Month': ['Aug', 'Aug', 'Aug'],
'Quarter': ['Q3', 'Q3', 'Q3'],
'Year': [2023, 2023, 2023]
})
region_dim = pd.DataFrame({
'RegionID': ['R01', 'R02'],
'RegionName': ['North', 'South'],
'Country': ['USA', 'USA']
})
# ๐๏ธ Schema Toggle
schema_toggle = widgets.ToggleButtons(
options=['Star Schema', 'Snowflake Schema'],
description='Schema View:',
button_style='info'
)
# ๐ Callback
def show_schema(change):
clear_output(wait=True)
display(Markdown("### ๐งช Interactive Schema Explorer"))
display(schema_toggle)
display(Markdown("### ๐ Fact Table"))
display(sales_fact)
display(Markdown("### ๐งพ Dimension Tables"))
display(product_dim)
display(customer_dim)
display(date_dim)
display(region_dim)
if schema_toggle.value == 'Snowflake Schema':
display(Markdown("### โ๏ธ Sub-Dimension: Category โ Department"))
display(category_dim)
# Join simulation
enriched_product = pd.merge(product_dim, category_dim, on='Category', how='left')
display(Markdown("### ๐ Joined Product Table (Snowflake View)"))
display(enriched_product)
display(Markdown(f"### โ
Current Schema: **{schema_toggle.value}**"))
# ๐ง Quiz Prompts
display(Markdown("""
### ๐ง Quiz Prompts
1. What is the difference between a fact and a dimension table?
2. How does the snowflake schema improve data integrity?
3. Why might a star schema be preferred for dashboard performance?
4. Which fields would you use to join the tables in Power BI?
"""))
schema_toggle.observe(show_schema, names='value')
# โถ๏ธ Display
display(Markdown("### ๐งช Interactive Schema Explorer"))
display(schema_toggle)
show_schema(None)
๐งช Interactive Schema Explorer
๐ Fact Table
SaleID | DateID | ProductID | CustomerID | RegionID | Revenue | Quantity | |
---|---|---|---|---|---|---|---|
0 | 101 | 20230801 | P001 | C001 | R01 | 1200.0 | 3 |
1 | 102 | 20230802 | P002 | C002 | R02 | 800.0 | 2 |
2 | 103 | 20230803 | P003 | C003 | R01 | 1500.0 | 5 |
๐งพ Dimension Tables
ProductID | ProductName | Category | Brand | |
---|---|---|---|---|
0 | P001 | Laptop | Electronics | Dell |
1 | P002 | Tablet | Electronics | Samsung |
2 | P003 | Monitor | Accessories | HP |
CustomerID | Name | Segment | JoinDate | |
---|---|---|---|---|
0 | C001 | Alice | Retail | 2022-01-15 |
1 | C002 | Bob | Corporate | 2021-11-03 |
2 | C003 | Charlie | Retail | 2023-02-20 |
DateID | Date | Month | Quarter | Year | |
---|---|---|---|---|---|
0 | 20230801 | 2023-08-01 | Aug | Q3 | 2023 |
1 | 20230802 | 2023-08-02 | Aug | Q3 | 2023 |
2 | 20230803 | 2023-08-03 | Aug | Q3 | 2023 |
RegionID | RegionName | Country | |
---|---|---|---|
0 | R01 | North | USA |
1 | R02 | South | USA |
โ Current Schema: Star Schema
๐ง Quiz Prompts
What is the difference between a fact and a dimension table?
How does the snowflake schema improve data integrity?
Why might a star schema be preferred for dashboard performance?
Which fields would you use to join the tables in Power BI?
๐ Power BI Visual Types & Applications#
Visual Type |
Description |
Best For |
Example Use Case |
---|---|---|---|
Bar Chart |
Horizontal bars to compare categories |
Comparing discrete values across categories |
Sales by region |
Column Chart |
Vertical bars for category comparison |
Time-based or categorical comparisons |
Monthly revenue |
Line Chart |
Points connected by lines |
Trend analysis over time |
Stock price movement |
Pie Chart |
Circular chart showing proportions |
Showing part-to-whole relationships |
Market share distribution |
Donut Chart |
Pie chart with center cutout |
Similar to pie chart, more aesthetic |
Customer segment breakdown |
Area Chart |
Line chart with filled area below line |
Cumulative trends over time |
Website traffic growth |
Scatter Plot |
Points plotted on X-Y axis |
Correlation and distribution analysis |
Revenue vs. customer age |
Bubble Chart |
Scatter plot with variable-sized bubbles |
Multivariate comparison |
Sales by product and region |
Table |
Tabular data with rows and columns |
Detailed data view |
Transaction log |
Matrix |
Pivot-style table with hierarchical grouping |
Cross-tab analysis |
Sales by product and region |
Card |
Single value display |
KPIs and summary metrics |
Total revenue |
Gauge |
Circular dial showing progress |
Performance against target |
% of sales goal achieved |
Map (Filled) |
Geographic heatmap |
Regional data visualization |
Sales by country |
Tree Map |
Nested rectangles sized by value |
Hierarchical part-to-whole relationships |
Product category contribution |
Waterfall Chart |
Shows cumulative effect of sequential values |
Profit/loss breakdown |
Operating income analysis |
Histogram |
Frequency distribution of numeric data |
Distribution analysis |
Customer age distribution |
Slicer |
Interactive filter control |
User-driven filtering |
Filter by year or region |
๐ง Tips for Choosing Visuals#
Use line charts for trends, bar/column charts for comparisons.
Prefer tables/matrices for detailed data, cards/gauges for KPIs.
Use maps for geographic data, scatter/bubble for relationships.
Avoid pie charts with too many categoriesโuse tree maps instead.
Based on Eazybi#
๐ Chart Type Selector: A Decision Guide for Effective Data Visualization#
Use this guide to choose the most appropriate chart type based on your reporting goal and data structure.
๐ฏ Step 1: What is your presentation goal?#
Goal Type |
Description |
Typical Use Case |
---|---|---|
Comparison |
Compare values across categories or time |
Sales by region, monthly revenue |
Composition |
Show parts of a whole (static or over time) |
Market share, budget breakdown |
Distribution |
Show frequency or spread of values |
Age distribution, score histogram |
Relationship |
Show correlation between variables |
Revenue vs. customer age |
๐ข Step 2: How many variables are you showing?#
One variable โ Simple bar, column, or line chart
Two variables โ Line, scatter, or grouped bar chart
Three or more โ Bubble chart, matrix, or stacked visuals
๐ Step 3: Are you showing data over time?#
โ Yes โ Use line chart, area chart, or stacked column chart
โ No โ Use bar chart, pie chart, or tree map
๐ฅ Step 4: Are you comparing groups or categories?#
โ Yes โ Use bar chart, column chart, stacked chart
โ No โ Use scatter plot, bubble chart, or histogram
๐ง Chart Type Selection Rules#
Goal |
Few Data Points |
Many Data Points |
---|---|---|
Comparison |
Bar, Column |
Line Chart |
Composition |
Pie, Stacked Bar |
Stacked Column, Area |
Distribution |
Column, Bar |
Histogram, Line, Scatter |
Relationship |
Scatter, Bubble |
Scatter, Bubble |
๐งช Example Prompts#
โI want to compare monthly revenue across regionsโ โ Column Chart
โI want to show how product categories contribute to total salesโ โ Tree Map
โI want to explore correlation between age and spendingโ โ Scatter Plot
โI want to show budget breakdown over timeโ โ Stacked Area Chart
import ipywidgets as widgets
from IPython.display import display, Markdown
# Step 1: Presentation goal
goal = widgets.Dropdown(
options=['Comparison', 'Composition', 'Distribution', 'Relationship'],
description='Goal:',
style={'description_width': 'initial'}
)
# Step 2: Number of variables
variables = widgets.Dropdown(
options=['One', 'Two', 'Three or more'],
description='Variables:',
style={'description_width': 'initial'}
)
# Step 3: Time-based data
time_based = widgets.ToggleButtons(
options=['Yes', 'No'],
description='Over time?',
style={'description_width': 'initial'}
)
# Step 4: Group/category comparison
groupwise = widgets.ToggleButtons(
options=['Yes', 'No'],
description='Groupwise?',
style={'description_width': 'initial'}
)
# Step 5: Data volume
data_points = widgets.Dropdown(
options=['Few', 'Many'],
description='Data points:',
style={'description_width': 'initial'}
)
# Output logic
def recommend_chart(goal, variables, time_based, groupwise, data_points):
chart = ''
explanation = ''
if goal == 'Comparison':
if time_based == 'Yes':
chart = 'Line Chart' if data_points == 'Many' else 'Column Chart'
else:
chart = 'Bar Chart' if groupwise == 'Yes' else 'Column Chart'
explanation = 'Used to compare values across categories or time. Line charts show trends; bar/column charts show discrete comparisons.'
elif goal == 'Composition':
if time_based == 'Yes':
chart = 'Stacked Area Chart' if variables != 'One' else 'Area Chart'
else:
chart = 'Pie Chart' if variables == 'One' else 'Stacked Bar Chart'
explanation = 'Used to show parts of a whole. Pie charts for simple static compositions; stacked charts for evolving compositions.'
elif goal == 'Distribution':
chart = 'Histogram' if variables == 'One' else 'Scatter Plot'
explanation = 'Used to show frequency or spread. Histograms for single-variable bins; scatter plots for value spread and clustering.'
elif goal == 'Relationship':
chart = 'Scatter Plot' if variables == 'Two' else 'Bubble Chart'
explanation = 'Used to show correlation. Scatter plots for two variables; bubble charts add a third dimension via size.'
return chart, explanation
# Button and output
button = widgets.Button(description="Recommend Chart Type")
output = widgets.Output()
def on_button_click(b):
output.clear_output()
chart, explanation = recommend_chart(goal.value, variables.value, time_based.value, groupwise.value, data_points.value)
with output:
display(Markdown(f"### โ
Recommended Chart: **{chart}**"))
display(Markdown(f"**Why this chart?** {explanation}"))
button.on_click(on_button_click)
# Display UI
display(goal, variables, time_based, groupwise, data_points, button, output)
๐ DAX Function Reference Table#
Use this table to explore DAX functions by category, with descriptions to guide usage.
๐ข Aggregation & Math#
Function |
Description |
---|---|
|
Adds values in a column |
|
Calculates mean |
|
Returns maximum value |
|
Returns minimum value |
|
Counts non-blank values |
|
Counts all non-empty values |
|
Counts rows in a table |
|
Performs division with error handling |
|
Rounds to specified decimals |
|
Ranks values in context |
๐ Date & Time#
Function |
Description |
---|---|
|
Current date |
|
Current date and time |
|
Constructs date from year, month, day |
|
Difference between two dates |
|
Extracts year |
|
Extracts month |
|
Extracts day |
|
Week number |
|
End of month date |
|
Shifts date by interval |
โฑ๏ธ Time Intelligence#
Function |
Description |
---|---|
|
Year-to-date total |
|
Month-to-date total |
|
Quarter-to-date total |
|
Same period last year |
|
Previous month |
|
Next month |
|
YTD date range |
|
Offset period |
|
First date in column |
|
Last date in column |
๐ Logical & Conditional#
Function |
Description |
---|---|
|
Basic condition |
|
Multi-condition logic |
|
Logical AND |
|
Logical OR |
|
Logical NOT |
๐งน Filter & Context#
Function |
Description |
---|---|
|
Changes filter context |
|
Returns filtered table |
|
Removes filters |
|
Removes all filters except specified |
|
Clears filters |
๐ Relationships & Tables#
Function |
Description |
---|---|
|
Gets value from related table |
|
Gets related table |
|
Activates inactive relationship |
|
Unique values |
|
Selects columns |
|
Adds calculated column |
|
Groups and aggregates |
|
Cartesian product |
๐ค Text & Formatting#
Function |
Description |
---|---|
|
Joins two strings |
|
Joins values in column |
|
Formats value as text |