One of the most frequent tasks when working with a DataFrame is isolating one or more columns. Perhaps you only need the names and ages from a larger dataset, or you want to perform calculations on a specific numerical column. Pandas provides straightforward ways to achieve this.
Let's start with a sample DataFrame to illustrate the concepts. Imagine we have data about individuals, including their name, age, city, and salary:
import pandas as pd
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Age': [25, 30, 35, 40],
'City': ['New York', 'Los Angeles', 'Chicago', 'Houston'],
'Salary': [70000, 80000, 90000, 100000]}
df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)
Output:
Original DataFrame:
Name Age City Salary
0 Alice 25 New York 70000
1 Bob 30 Los Angeles 80000
2 Charlie 35 Chicago 90000
3 David 40 Houston 100000
The most direct way to select a single column is by using square brackets []
with the column's name (as a string) inside:
# Select the 'Name' column
names = df['Name']
print("Selected 'Name' column:")
print(names)
print("\nType of the selected object:", type(names))
Output:
Selected 'Name' column:
0 Alice
1 Bob
2 Charlie
3 David
Name: Name, dtype: object
Type of the selected object: <class 'pandas.core.series.Series'>
Notice that selecting a single column this way returns a Pandas Series object, not a DataFrame. A Series is like a one-dimensional labeled array, holding the data for that column along with its index.
To select multiple columns, you again use square brackets []
. However, inside the brackets, you provide a list of the column names you want to select:
# Select the 'Name' and 'City' columns
subset = df[['Name', 'City']]
print("Selected 'Name' and 'City' columns:")
print(subset)
print("\nType of the selected object:", type(subset))
Output:
Selected 'Name' and 'City' columns:
Name City
0 Alice New York
1 Bob Los Angeles
2 Charlie Chicago
3 David Houston
Type of the selected object: <class 'pandas.core.frame.DataFrame'>
Important observation: When you select multiple columns using a list within the brackets [['Col1', 'Col2']]
, the result is a new DataFrame containing only the specified columns, in the order you listed them.
Pandas also allows accessing a single column using dot notation, similar to accessing attributes of an object, provided the column name is a valid Python identifier (e.g., no spaces, doesn't start with a number, doesn't conflict with existing DataFrame methods):
# Select the 'Age' column using dot notation
ages_dot = df.Age
print("Selected 'Age' column using dot notation:")
print(ages_dot)
print("\nType of the selected object:", type(ages_dot))
Output:
Selected 'Age' column using dot notation:
0 25
1 30
2 35
3 40
Name: Age, dtype: int64
Type of the selected object: <class 'pandas.core.series.Series'>
Like bracket notation for a single column, dot notation also returns a Series.
[]
)?While dot notation can be convenient for quick access, bracket notation (df['Column']
or df[['Col1', 'Col2']]
) is generally recommended for selecting columns, especially for beginners, for a few significant reasons:
df['First Name']
, df['Sales 2023']
). Dot notation fails for such names.count
, df.count
would refer to the method, not your column). Bracket notation (df['count']
) avoids this ambiguity..loc
and .iloc
indexers we will see shortly, providing a more consistent syntax pattern.df['New Column'] = values
).For these reasons, while you might encounter dot notation in examples, sticking to bracket notation for column selection promotes clearer, more robust, and less error-prone code.
Selecting the right columns is often the first step in isolating the data you need for further analysis or processing. As we move forward, we'll see how to combine column selection with row selection for more powerful data retrieval.
© 2025 ApX Machine Learning