๐ผ Pandas 101 โ Beginner to Intermediate Tutorial
README
๐ผ Pandas 101 โ Beginner to Intermediate Tutorial
This guide will walk you through the essentials of using Pandas, the most popular data analysis library in Python โ and take it a step further with more powerful filtering and CRUD-style operations.
๐ฆ What Is Pandas?
Pandas is an open-source Python library designed for data manipulation and analysis. It makes working with structured data fast, flexible, and intuitive โ especially if you're dealing with CSV files, Excel sheets, SQL tables, JSON, or APIs.
The two core data structures in Pandas are:
Series: A 1D labeled array (like a column)DataFrame: A 2D labeled data structure (like a full spreadsheet or SQL table)
๐ What Is a DataFrame?
A DataFrame is a two-dimensional, tabular data structure with labeled rows and columns. Think of it as a Python-native spreadsheet โ but way more powerful and programmable.
โ Key Features:
- Label-based indexing
- Column-wise and row-wise operations
- Support for mixed data types
- Fast, vectorized operations (built on NumPy)
๐ Example DataFrame
import pandas as pd
data = {
'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35],
'Country': ['USA', 'Canada', 'UK']
}
df = pd.DataFrame(data)
print(df)
Output:
Name Age Country
0 Alice 25 USA
1 Bob 30 Canada
2 Charlie 35 UK
๐ฅ Loading a DataFrame
df = pd.read_csv('data.csv') # Load from CSV
df = pd.read_excel('data.xlsx') # Load from Excel
df = pd.DataFrame(data_dict) # Load from dictionary
๐ Exploring a DataFrame
df.head() # First 5 rows
df.tail() # Last 5 rows
df.info() # Column types and non-null values
df.describe() # Summary stats
df.columns # Column names
df.index # Row indices
๐ฏ Accessing & Filtering Data
๐ Basic Access
df['Age'] # Single column
df[['Name', 'Age']] # Multiple columns
df.iloc[0] # First row by position
df.loc[0] # First row by label
๐งช Conditional Filtering (Boolean masks)
df[df['Age'] > 30] # Age over 30
df[df['Country'] == 'USA'] # Country match
df[(df['Age'] > 25) & (df['Country'] == 'UK')] # Multiple conditions
๐ฏ Advanced Filtering
df[df['Name'].str.startswith('A')] # Names starting with A
df[df['Country'].isin(['USA', 'UK'])] # Match multiple values
df[~df['Country'].isin(['Canada'])] # NOT condition
๐ CRUD Operations in Pandas
โ Create (Insert Rows)
new_row = pd.DataFrame([{'Name': 'Diana', 'Age': 28, 'Country': 'Germany'}])
df = pd.concat([df, new_row], ignore_index=True)
๐ต๏ธ Read (Access Data)
df.loc[df['Name'] == 'Alice']
df.iloc[2]
โ๏ธ Update (Modify Values)
df.loc[df['Name'] == 'Bob', 'Age'] = 31 # Update Bob's age
df['Country'] = df['Country'].str.upper() # Convert all country names to uppercase
โ Delete (Drop Data)
df = df.drop(2, axis=0) # Drop row with index 2
df = df[df['Name'] != 'Alice'] # Drop rows where name == Alice
df = df.drop('Age', axis=1) # Drop 'Age' column
๐งน Cleaning Data
df.isnull().sum() # Count missing values
df.dropna(inplace=True) # Drop rows with missing values
df.fillna({'Age': 0}, inplace=True) # Fill missing Age values with 0
df.rename(columns={'Age': 'Years'}, inplace=True) # Rename column
df['Years'] = df['Years'].astype(float) # Convert column to float
๐ Analyzing Data
df['Country'].value_counts() # Count occurrences
df.groupby('Country')['Age'].mean() # Average age per country
df.sort_values(by='Age', ascending=False) # Sort by Age descending
๐พ Saving Data
df.to_csv('cleaned_data.csv', index=False)
df.to_excel('output.xlsx', index=False)