Data Analysis with Pandas¶

Learn to use pandas for data analysis in Excel with this hands-on tutorial.

Tutorial Goals¶

By the end of this tutorial, you'll be able to:

Load data from Excel into pandas DataFrames
Clean and transform data
Perform statistical analysis
Write results back to Excel

Setup¶

Install Pandas¶

Open Package Manager
Search for "pandas"
Install latest version

Pre-installed Package

This tutorial uses pandas for demonstration purposes. Since pandas is a core XPyCode package, it's already installed by default—no installation needed!

Create a New Module¶

Right-click your workbook in Project Explorer
Select New Module
Name it data_analysis

Load Data from Excel¶

import pandas as pd
import xpycode

def load_data_from_excel() -> pd.DataFrame:
    """Load data from active worksheet into a DataFrame.

    Returns:
        DataFrame with the loaded data
    """
    # Get active worksheet (Office.js method)
    ws = xpycode.workbook.worksheets.getActiveWorksheet()

    # Read data range (assuming headers in row 1)
    # values returns 2D array
    data = ws.getRange("A1:D100").values

    # Convert to DataFrame (first row is headers)
    df = pd.DataFrame(data[1:], columns=data[0])

    print(f"Loaded {len(df)} rows")
    print(df.head())
    return df

Clean Data¶

def clean_data(df: pd.DataFrame) -> pd.DataFrame:
    """Clean the data by removing missing values and duplicates.

    Args:
        df: Input DataFrame

    Returns:
        Cleaned DataFrame
    """
    # Remove missing values
    df = df.dropna()

    # Convert data types
    df['Age'] = df['Age'].astype(int)
    df['Score'] = df['Score'].astype(float)

    # Remove duplicates
    df = df.drop_duplicates()

    print(f"After cleaning: {len(df)} rows")
    return df

Analyze Data¶

def analyze_data(df: pd.DataFrame) -> dict:
    """Analyze data with descriptive statistics.

    Args:
        df: DataFrame to analyze

    Returns:
        Dictionary with analysis results
    """
    # Descriptive statistics
    stats = df.describe()
    print(stats)

    # Group by and aggregate
    summary = df.groupby('Category').agg({
        'Score': ['mean', 'min', 'max', 'std'],
        'Age': 'mean'
    })
    print(summary)

    # Calculate correlations
    corr = df[['Age', 'Score']].corr()
    print(corr)

    return {'stats': stats, 'summary': summary, 'corr': corr}

Write Results to Excel¶

def write_results_to_excel(summary: pd.DataFrame):
    """Write analysis results to Excel.

    Args:
        summary: Summary DataFrame to write
    """
    # Get worksheet by name (Office.js method)
    ws_summary = xpycode.workbook.worksheets.getItem("Summary")

    # Convert summary to 2D array for writing
    summary_data = summary.reset_index()

    # Write to Excel (2D array for ranges)
    ws_summary.getRange("A1").getResizedRange(len(summary_data.index),len(summary_data.columns)-1).values = summary_data

    print("Results written to Summary sheet")

Tips¶

Office.js Range.values Requirements

The Office.js Range object's values attribute requires 2-D arrays with the exact size (or accepts a scalar if it represents only one cell)
The getResizedRange() method takes "The number of rows and columns by which to expand..." — don't confuse it with COM Range object's Resize method that takes the size of the future range

Next Steps¶

API Integration Tutorial - Fetch external data
Automation Tutorial - Automate Excel tasks
Custom Functions - Create pandas-powered functions