๐Ÿง  Data Modeling: Concepts and Types#

Data modeling is the process of designing the structure, relationships, and logic of data to support analysis, reporting, and decision-making.


๐Ÿ“ Key Aspects of Data Modeling#

Aspect

Description

Entities

Core objects or concepts (e.g., Customers, Orders)

Attributes

Properties or fields of entities (e.g., Order Date, Customer Name)

Relationships

Logical connections between entities (e.g., One-to-Many, Many-to-Many)

Keys

Unique identifiers (Primary Key, Foreign Key)

Cardinality

Defines the nature of relationships (e.g., 1:1, 1:N, N:N)

Normalization

Organizing data to reduce redundancy and improve integrity

Denormalization

Flattening data for performance or reporting simplicity


๐Ÿงฉ Types of Data Models#

๐Ÿ“Š Data Model Types: Summary of Strengths, Weaknesses, and Applicability#

Model Type

Description

Strengths

Weaknesses

Applicability in BI

Conceptual

High-level overview of entities

Easy to understand; good for stakeholder alignment

Lacks technical detail; not implementation-ready

Early planning; stakeholder communication

Logical

Detailed structure without implementation

Platform-independent; precise schema design

Needs translation to physical model

Schema validation; database design

Physical

Actual database schema

Directly implementable; optimized for performance

Platform-specific; less abstract

SQL databases; Power BI data models

Dimensional

Fact and dimension tables for analytics

Optimized for queries; intuitive for analysts

May sacrifice normalization; redundancy possible

Star/Snowflake schemas; dashboards; OLAP cubes

Hierarchical

Tree-structured parent-child relationships

Fast access for structured data

Rigid structure; poor flexibility

Legacy systems; XML data

Network

Graph-like complex relationships

Handles many-to-many relationships well

Difficult to maintain; less intuitive

Advanced relational databases; niche analytics

Object-Oriented

Combines data and behavior

Reusable; encapsulates logic and structure

Complex for relational mapping

Application-driven modeling; hybrid systems


๐Ÿ“Š BI Context: Dimensional Modeling#

  • Fact Table: Contains measurable metrics (e.g., Sales, Revenue)

  • Dimension Tables: Describe context (e.g., Time, Product, Region)

  • Star Schema: Central fact table connected to dimensions

  • Snowflake Schema: Dimensions normalized into sub-dimensions

๐Ÿ“Š Fact Tables, Dimension Tables, and Schema Design in BI#

Understanding how data is structured in analytical models is essential for building efficient and insightful dashboards.


๐Ÿงฎ Fact Table#

A Fact Table contains measurable, quantitative dataโ€”typically business metrics.

๐Ÿ”น Example: SalesFact#

SaleID

DateID

ProductID

CustomerID

RegionID

Revenue

Quantity

101

20230801

P001

C001

R01

1200.00

3

  • Keys: Foreign keys to dimension tables

  • Measures: Revenue, Quantity, Profit, etc.


๐Ÿงพ Dimension Tables#

Dimension Tables provide descriptive context for facts. They are used for filtering, grouping, and slicing data.

๐Ÿ”น Example: ProductDim#

ProductID

ProductName

Category

Brand

P001

Laptop

Electronics

Dell

๐Ÿ”น Example: CustomerDim#

CustomerID

Name

Segment

JoinDate

C001

Alice

Retail

2022-01-15

๐Ÿ”น Example: DateDim#

DateID

Date

Month

Quarter

Year

20230801

2023-08-01

Aug

Q3

2023


โญ Star Schema#

  • Central Fact Table

  • Connected directly to denormalized Dimension Tables

  • Simple, fast for querying

๐Ÿ”น Structure#

SalesFact โ”œโ”€โ”€ ProductDim โ”œโ”€โ”€ CustomerDim โ”œโ”€โ”€ DateDim โ””โ”€โ”€ RegionDim


โ„๏ธ Snowflake Schema#

  • Dimensions are normalized into sub-dimensions

  • Reduces redundancy, improves integrity

  • More complex joins, slower queries

๐Ÿ”น Structure#

SalesFact โ”œโ”€โ”€ ProductDim โ”‚ โ””โ”€โ”€ CategoryDim โ”œโ”€โ”€ CustomerDim โ”‚ โ””โ”€โ”€ SegmentDim โ”œโ”€โ”€ DateDim โ””โ”€โ”€ RegionDim โ””โ”€โ”€ CountryDim

โญ Star Schema vs โ„๏ธ Snowflake Schema#

Feature

โญ Star Schema

โ„๏ธ Snowflake Schema

Structure

Central fact table with directly linked dimensions

Dimensions are normalized into sub-dimensions

Complexity

Simple and flat

More complex with multiple joins

Query Performance

Faster due to fewer joins

Slower due to normalized joins

Storage Efficiency

Less efficient (redundancy in dimensions)

More efficient (reduces data duplication)

Ease of Use

Easier for analysts and BI tools

Requires deeper understanding of relationships

Normalization Level

Denormalized

Normalized

Maintenance

Easier to maintain and modify

More difficult to maintain due to relational depth

Use Case

Dashboards, ad-hoc analysis

Large-scale enterprise systems with strict data integrity

Example

ProductDim includes category and brand

ProductDim links to CategoryDim and BrandDim

๐Ÿง  Reflection Questions#

  1. What are the trade-offs between normalization and denormalization?

  2. Why is dimensional modeling preferred in BI tools like Power BI?

  3. How do relationships and cardinality affect data integrity and performance?

๐Ÿงฉ Interactive Schema Explorer Overview#

This module simulates a Power BI-style data model using pandas and ipywidgets:

  • Fact Table (sales_fact): Contains transactional data including sales, revenue, and quantity.

  • Dimension Tables: Include product_dim, customer_dim, date_dim, and region_dim for descriptive attributes.

  • Snowflake Extension: category_dim adds hierarchy to product_dim via category โ†’ department.

  • Schema Toggle: Users can switch between Star Schema and Snowflake Schema views.

  • Join Simulation: Demonstrates how snowflake schemas normalize data via joins.

  • Quiz Prompts: Encourage reflection on schema design, performance, and join logic.

Ideal for teaching schema modeling, join operations, and Power BI data relationships.

# ๐Ÿ“ฆ Imports
import pandas as pd
import ipywidgets as widgets
from IPython.display import display, Markdown, clear_output

# ๐Ÿงฎ Fact Table
sales_fact = pd.DataFrame({
    'SaleID': [101, 102, 103],
    'DateID': [20230801, 20230802, 20230803],
    'ProductID': ['P001', 'P002', 'P003'],
    'CustomerID': ['C001', 'C002', 'C003'],
    'RegionID': ['R01', 'R02', 'R01'],
    'Revenue': [1200.0, 800.0, 1500.0],
    'Quantity': [3, 2, 5]
})

# ๐Ÿงพ Dimension Tables
product_dim = pd.DataFrame({
    'ProductID': ['P001', 'P002', 'P003'],
    'ProductName': ['Laptop', 'Tablet', 'Monitor'],
    'Category': ['Electronics', 'Electronics', 'Accessories'],
    'Brand': ['Dell', 'Samsung', 'HP']
})

category_dim = pd.DataFrame({
    'Category': ['Electronics', 'Accessories'],
    'Department': ['Tech', 'Peripherals']
})

customer_dim = pd.DataFrame({
    'CustomerID': ['C001', 'C002', 'C003'],
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Segment': ['Retail', 'Corporate', 'Retail'],
    'JoinDate': ['2022-01-15', '2021-11-03', '2023-02-20']
})

date_dim = pd.DataFrame({
    'DateID': [20230801, 20230802, 20230803],
    'Date': ['2023-08-01', '2023-08-02', '2023-08-03'],
    'Month': ['Aug', 'Aug', 'Aug'],
    'Quarter': ['Q3', 'Q3', 'Q3'],
    'Year': [2023, 2023, 2023]
})

region_dim = pd.DataFrame({
    'RegionID': ['R01', 'R02'],
    'RegionName': ['North', 'South'],
    'Country': ['USA', 'USA']
})

# ๐ŸŽ›๏ธ Schema Toggle
schema_toggle = widgets.ToggleButtons(
    options=['Star Schema', 'Snowflake Schema'],
    description='Schema View:',
    button_style='info'
)

# ๐Ÿ”„ Callback
def show_schema(change):
    clear_output(wait=True)
    display(Markdown("### ๐Ÿงช Interactive Schema Explorer"))
    display(schema_toggle)

    display(Markdown("### ๐Ÿ“Š Fact Table"))
    display(sales_fact)

    display(Markdown("### ๐Ÿงพ Dimension Tables"))
    display(product_dim)
    display(customer_dim)
    display(date_dim)
    display(region_dim)

    if schema_toggle.value == 'Snowflake Schema':
        display(Markdown("### โ„๏ธ Sub-Dimension: Category โ†’ Department"))
        display(category_dim)

        # Join simulation
        enriched_product = pd.merge(product_dim, category_dim, on='Category', how='left')
        display(Markdown("### ๐Ÿ”— Joined Product Table (Snowflake View)"))
        display(enriched_product)

    display(Markdown(f"### โœ… Current Schema: **{schema_toggle.value}**"))

    # ๐Ÿง  Quiz Prompts
    display(Markdown("""
### ๐Ÿง  Quiz Prompts
1. What is the difference between a fact and a dimension table?
2. How does the snowflake schema improve data integrity?
3. Why might a star schema be preferred for dashboard performance?
4. Which fields would you use to join the tables in Power BI?
"""))

schema_toggle.observe(show_schema, names='value')

# โ–ถ๏ธ Display
display(Markdown("### ๐Ÿงช Interactive Schema Explorer"))
display(schema_toggle)
show_schema(None)

๐Ÿงช Interactive Schema Explorer

๐Ÿ“Š Fact Table

SaleID DateID ProductID CustomerID RegionID Revenue Quantity
0 101 20230801 P001 C001 R01 1200.0 3
1 102 20230802 P002 C002 R02 800.0 2
2 103 20230803 P003 C003 R01 1500.0 5

๐Ÿงพ Dimension Tables

ProductID ProductName Category Brand
0 P001 Laptop Electronics Dell
1 P002 Tablet Electronics Samsung
2 P003 Monitor Accessories HP
CustomerID Name Segment JoinDate
0 C001 Alice Retail 2022-01-15
1 C002 Bob Corporate 2021-11-03
2 C003 Charlie Retail 2023-02-20
DateID Date Month Quarter Year
0 20230801 2023-08-01 Aug Q3 2023
1 20230802 2023-08-02 Aug Q3 2023
2 20230803 2023-08-03 Aug Q3 2023
RegionID RegionName Country
0 R01 North USA
1 R02 South USA

โœ… Current Schema: Star Schema

๐Ÿง  Quiz Prompts

  1. What is the difference between a fact and a dimension table?

  2. How does the snowflake schema improve data integrity?

  3. Why might a star schema be preferred for dashboard performance?

  4. Which fields would you use to join the tables in Power BI?

๐Ÿ“Š Power BI Visual Types & Applications#

Visual Type

Description

Best For

Example Use Case

Bar Chart

Horizontal bars to compare categories

Comparing discrete values across categories

Sales by region

Column Chart

Vertical bars for category comparison

Time-based or categorical comparisons

Monthly revenue

Line Chart

Points connected by lines

Trend analysis over time

Stock price movement

Pie Chart

Circular chart showing proportions

Showing part-to-whole relationships

Market share distribution

Donut Chart

Pie chart with center cutout

Similar to pie chart, more aesthetic

Customer segment breakdown

Area Chart

Line chart with filled area below line

Cumulative trends over time

Website traffic growth

Scatter Plot

Points plotted on X-Y axis

Correlation and distribution analysis

Revenue vs. customer age

Bubble Chart

Scatter plot with variable-sized bubbles

Multivariate comparison

Sales by product and region

Table

Tabular data with rows and columns

Detailed data view

Transaction log

Matrix

Pivot-style table with hierarchical grouping

Cross-tab analysis

Sales by product and region

Card

Single value display

KPIs and summary metrics

Total revenue

Gauge

Circular dial showing progress

Performance against target

% of sales goal achieved

Map (Filled)

Geographic heatmap

Regional data visualization

Sales by country

Tree Map

Nested rectangles sized by value

Hierarchical part-to-whole relationships

Product category contribution

Waterfall Chart

Shows cumulative effect of sequential values

Profit/loss breakdown

Operating income analysis

Histogram

Frequency distribution of numeric data

Distribution analysis

Customer age distribution

Slicer

Interactive filter control

User-driven filtering

Filter by year or region


๐Ÿง  Tips for Choosing Visuals#

  • Use line charts for trends, bar/column charts for comparisons.

  • Prefer tables/matrices for detailed data, cards/gauges for KPIs.

  • Use maps for geographic data, scatter/bubble for relationships.

  • Avoid pie charts with too many categoriesโ€”use tree maps instead.

Based on Eazybi#

๐Ÿ“Š Chart Type Selector: A Decision Guide for Effective Data Visualization#

Use this guide to choose the most appropriate chart type based on your reporting goal and data structure.


๐ŸŽฏ Step 1: What is your presentation goal?#

Goal Type

Description

Typical Use Case

Comparison

Compare values across categories or time

Sales by region, monthly revenue

Composition

Show parts of a whole (static or over time)

Market share, budget breakdown

Distribution

Show frequency or spread of values

Age distribution, score histogram

Relationship

Show correlation between variables

Revenue vs. customer age


๐Ÿ”ข Step 2: How many variables are you showing?#

  • One variable โ†’ Simple bar, column, or line chart

  • Two variables โ†’ Line, scatter, or grouped bar chart

  • Three or more โ†’ Bubble chart, matrix, or stacked visuals


๐Ÿ” Step 3: Are you showing data over time?#

  • โœ… Yes โ†’ Use line chart, area chart, or stacked column chart

  • โŒ No โ†’ Use bar chart, pie chart, or tree map


๐Ÿ‘ฅ Step 4: Are you comparing groups or categories?#

  • โœ… Yes โ†’ Use bar chart, column chart, stacked chart

  • โŒ No โ†’ Use scatter plot, bubble chart, or histogram


๐Ÿง  Chart Type Selection Rules#

Goal

Few Data Points

Many Data Points

Comparison

Bar, Column

Line Chart

Composition

Pie, Stacked Bar

Stacked Column, Area

Distribution

Column, Bar

Histogram, Line, Scatter

Relationship

Scatter, Bubble

Scatter, Bubble


๐Ÿงช Example Prompts#

  • โ€œI want to compare monthly revenue across regionsโ€ โ†’ Column Chart

  • โ€œI want to show how product categories contribute to total salesโ€ โ†’ Tree Map

  • โ€œI want to explore correlation between age and spendingโ€ โ†’ Scatter Plot

  • โ€œI want to show budget breakdown over timeโ€ โ†’ Stacked Area Chart

import ipywidgets as widgets
from IPython.display import display, Markdown

# Step 1: Presentation goal
goal = widgets.Dropdown(
    options=['Comparison', 'Composition', 'Distribution', 'Relationship'],
    description='Goal:',
    style={'description_width': 'initial'}
)

# Step 2: Number of variables
variables = widgets.Dropdown(
    options=['One', 'Two', 'Three or more'],
    description='Variables:',
    style={'description_width': 'initial'}
)

# Step 3: Time-based data
time_based = widgets.ToggleButtons(
    options=['Yes', 'No'],
    description='Over time?',
    style={'description_width': 'initial'}
)

# Step 4: Group/category comparison
groupwise = widgets.ToggleButtons(
    options=['Yes', 'No'],
    description='Groupwise?',
    style={'description_width': 'initial'}
)

# Step 5: Data volume
data_points = widgets.Dropdown(
    options=['Few', 'Many'],
    description='Data points:',
    style={'description_width': 'initial'}
)

# Output logic
def recommend_chart(goal, variables, time_based, groupwise, data_points):
    chart = ''
    explanation = ''

    if goal == 'Comparison':
        if time_based == 'Yes':
            chart = 'Line Chart' if data_points == 'Many' else 'Column Chart'
        else:
            chart = 'Bar Chart' if groupwise == 'Yes' else 'Column Chart'
        explanation = 'Used to compare values across categories or time. Line charts show trends; bar/column charts show discrete comparisons.'

    elif goal == 'Composition':
        if time_based == 'Yes':
            chart = 'Stacked Area Chart' if variables != 'One' else 'Area Chart'
        else:
            chart = 'Pie Chart' if variables == 'One' else 'Stacked Bar Chart'
        explanation = 'Used to show parts of a whole. Pie charts for simple static compositions; stacked charts for evolving compositions.'

    elif goal == 'Distribution':
        chart = 'Histogram' if variables == 'One' else 'Scatter Plot'
        explanation = 'Used to show frequency or spread. Histograms for single-variable bins; scatter plots for value spread and clustering.'

    elif goal == 'Relationship':
        chart = 'Scatter Plot' if variables == 'Two' else 'Bubble Chart'
        explanation = 'Used to show correlation. Scatter plots for two variables; bubble charts add a third dimension via size.'

    return chart, explanation

# Button and output
button = widgets.Button(description="Recommend Chart Type")
output = widgets.Output()

def on_button_click(b):
    output.clear_output()
    chart, explanation = recommend_chart(goal.value, variables.value, time_based.value, groupwise.value, data_points.value)
    with output:
        display(Markdown(f"### โœ… Recommended Chart: **{chart}**"))
        display(Markdown(f"**Why this chart?** {explanation}"))

button.on_click(on_button_click)

# Display UI
display(goal, variables, time_based, groupwise, data_points, button, output)

๐Ÿ“˜ DAX Function Reference Table#

Use this table to explore DAX functions by category, with descriptions to guide usage.


๐Ÿ”ข Aggregation & Math#

Function

Description

SUM()

Adds values in a column

AVERAGE()

Calculates mean

MAX()

Returns maximum value

MIN()

Returns minimum value

COUNT()

Counts non-blank values

COUNTA()

Counts all non-empty values

COUNTROWS()

Counts rows in a table

DIVIDE()

Performs division with error handling

ROUND()

Rounds to specified decimals

RANKX()

Ranks values in context


๐Ÿ“… Date & Time#

Function

Description

TODAY()

Current date

NOW()

Current date and time

DATE()

Constructs date from year, month, day

DATEDIFF()

Difference between two dates

YEAR()

Extracts year

MONTH()

Extracts month

DAY()

Extracts day

WEEKNUM()

Week number

EOMONTH()

End of month date

DATEADD()

Shifts date by interval


โฑ๏ธ Time Intelligence#

Function

Description

TOTALYTD()

Year-to-date total

TOTALMTD()

Month-to-date total

TOTALQTD()

Quarter-to-date total

SAMEPERIODLASTYEAR()

Same period last year

PREVIOUSMONTH()

Previous month

NEXTMONTH()

Next month

DATESYTD()

YTD date range

PARALLELPERIOD()

Offset period

FIRSTDATE()

First date in column

LASTDATE()

Last date in column


๐Ÿ” Logical & Conditional#

Function

Description

IF()

Basic condition

SWITCH()

Multi-condition logic

AND()

Logical AND

OR()

Logical OR

NOT()

Logical NOT


๐Ÿงน Filter & Context#

Function

Description

CALCULATE()

Changes filter context

FILTER()

Returns filtered table

ALL()

Removes filters

ALLEXCEPT()

Removes all filters except specified

REMOVEFILTERS()

Clears filters


๐Ÿ”— Relationships & Tables#

Function

Description

RELATED()

Gets value from related table

RELATEDTABLE()

Gets related table

USERELATIONSHIP()

Activates inactive relationship

VALUES()

Unique values

SELECTCOLUMNS()

Selects columns

ADDCOLUMNS()

Adds calculated column

SUMMARIZE()

Groups and aggregates

CROSSJOIN()

Cartesian product


๐Ÿ”ค Text & Formatting#

Function

Description

CONCATENATE()

Joins two strings

CONCATENATEX()

Joins values in column

FORMAT()

Formats value as text