DataFrames are one of the core data structures in Pandas and are essential for working with structured data. Think of a DataFrame as a table similar to a spreadsheet or SQL table, but designed for more flexible and powerful data operations. Each column in a DataFrame can be of a different data type, making it a versatile tool for data manipulation.
To start working with DataFrames, you can create them using various methods. One of the simplest ways is by using a Python dictionary. Each key in the dictionary represents a column name, and the corresponding value is a list of column values.
import pandas as pd
data = {
'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35],
'Occupation': ['Engineer', 'Doctor', 'Artist']
}
df = pd.DataFrame(data)
print(df)
This will output:
Name Age Occupation
0 Alice 25 Engineer
1 Bob 30 Doctor
2 Charlie 35 Artist
You can also create a DataFrame from a list of dictionaries, where each dictionary represents a row of data.
data = [
{'Name': 'Alice', 'Age': 25, 'Occupation': 'Engineer'},
{'Name': 'Bob', 'Age': 30, 'Occupation': 'Doctor'},
{'Name': 'Charlie', 'Age': 35, 'Occupation': 'Artist'}
]
df = pd.DataFrame(data)
In practical scenarios, data is often stored in files. Pandas provides functions to easily load data from CSV, Excel, SQL databases, and more. Here's how to load a CSV file:
df = pd.read_csv('data.csv')
This command reads the content of data.csv
into a DataFrame called df
. Pandas automatically infers the data types of each column, making it straightforward to start analyzing the data immediately.
Once you have a DataFrame, you can look into its structure and content. Use the head()
method to view the first few rows:
print(df.head())
The info()
method provides a concise summary of the DataFrame:
df.info()
This will display the number of entries, column names, data types, and memory usage, helping you understand the dataset at a glance.
Selecting specific data from a DataFrame is a common task. You can select columns by passing their names as strings:
ages = df['Age']
To select multiple columns, pass a list of column names:
subset = df[['Name', 'Occupation']]
Rows can be selected using the loc[]
and iloc[]
methods. Use loc[]
for label-based indexing and iloc[]
for positional indexing:
# Select rows by label
row = df.loc[0]
# Select rows by position
row = df.iloc[0]
You can easily add, modify, or delete columns in a DataFrame. To add a new column, simply assign the data to a new column name:
df['Salary'] = [70000, 80000, 75000]
To modify an existing column, assign new values to it:
df['Age'] = df['Age'] + 1
Deleting a column is just as straightforward:
df.drop('Salary', axis=1, inplace=True)
DataFrames provide several methods to quickly perform summary statistics:
print(df.describe())
The describe()
method returns a summary of statistics for numerical columns, such as mean, median, and standard deviation. For more specific operations, methods like mean()
, sum()
, and count()
are available:
average_age = df['Age'].mean()
total_entries = df['Age'].count()
DataFrames are powerful yet intuitive structures that simplify the process of data analysis. By mastering DataFrames, you gain a versatile tool for handling data in a variety of forms, helping you tackle more complex data science tasks and analyses with confidence.
© 2025 ApX Machine Learning