Before we introduce more specialized Python constructs used heavily in data science, let's quickly revisit the core Python concepts that form the bedrock for everything that follows. This review ensures we share a common understanding of the fundamental building blocks you'll use daily when working with data in Python. Familiarity with these basics is assumed, but a quick refresh can solidify your foundation.
At its heart, programming involves manipulating data stored in variables. Python is dynamically typed, meaning you don't need to declare a variable's type explicitly. However, understanding the primary data types is important for effective data handling.
int
): Whole numbers, like 10
, -5
, 0
.float
): Numbers with a decimal point, like 3.14
, -0.001
, 2.7e5
(scientific notation). Be mindful of potential precision issues inherent in floating-point arithmetic.str
): Sequences of characters, enclosed in single ('...'
) or double ("..."
) quotes. Used for textual data. Operations include concatenation (+
) and slicing.bool
): Represent truth values, either True
or False
. Essential for conditional logic.# Variable assignment
count = 100
temperature = 23.5
city_name = "San Francisco"
is_valid = True
# Checking types (useful for debugging)
# print(type(count))
# print(type(temperature))
# print(type(city_name))
# print(type(is_valid))
While dynamic typing offers flexibility, explicitly indicating expected types using type hints (e.g., count: int = 100
) is becoming increasingly common, especially in larger projects, as it improves code readability and allows for static analysis.
Python provides several built-in data structures for organizing collections of data. Choosing the right structure is often tied to performance and the specific task.
list
): Ordered, mutable (changeable) sequences of items. Defined with square brackets []
. Lists are versatile and commonly used for storing sequences of data points or observations.
features = ['sepal_length', 'sepal_width', 'petal_length', 'petal_width']
measurements = [5.1, 3.5, 1.4, 0.2]
measurements.append(0.3) # Lists are mutable
# print(features[0]) # Access by index
# print(measurements)
tuple
): Ordered, immutable (unchangeable) sequences of items. Defined with parentheses ()
. Because they are immutable, tuples are often used for data that shouldn't change, like coordinates or fixed configuration settings. They can also be used as keys in dictionaries.
point = (10, 20)
# point[0] = 15 # This would raise a TypeError
# print(point)
dict
): Unordered (in Python versions before 3.7) collections of key-value pairs. Defined with curly braces {}
. Keys must be unique and immutable (strings, numbers, or tuples are common keys). Dictionaries are extremely useful for mapping information, like feature names to values or storing configuration parameters.
sample = {'sepal_length': 5.1, 'sepal_width': 3.5, 'species': 'setosa'}
# print(sample['sepal_length']) # Access by key
sample['petal_length'] = 1.4 # Add new key-value pairs
# print(sample)
set
): Unordered collections of unique, immutable items. Defined with curly braces {}
or the set()
function. Sets are highly optimized for membership testing (in
operator) and removing duplicates from sequences.
unique_species = {'setosa', 'versicolor', 'virginica', 'setosa'}
# print(unique_species) # Output: {'setosa', 'versicolor', 'virginica'}
# print('setosa' in unique_species) # Fast membership testing
Control flow statements direct the order in which code is executed.
if
, elif
, else
): Execute blocks of code based on whether conditions evaluate to True
or False
.
value = 75
if value > 90:
grade = 'A'
elif value > 70:
grade = 'B'
else:
grade = 'C'
# print(f"Grade: {grade}")
for
, while
): Repeat blocks of code.
for
loops iterate over sequences (like lists, tuples, strings, dictionaries, or generator outputs).
# Iterate over list elements
total = 0
numbers = [1, 2, 3, 4, 5]
for num in numbers:
total += num
# print(f"Sum: {total}")
# Iterate over dictionary keys
# for key in sample:
# print(f"{key}: {sample[key]}")
while
loops continue as long as a condition remains True
. Be careful to ensure the condition eventually becomes False
to avoid infinite loops.
count = 0
while count < 3:
# print(f"Count is {count}")
count += 1
break
, continue
): break
exits the current loop entirely, while continue
skips the rest of the current iteration and proceeds to the next one.Functions are reusable blocks of code that perform a specific task. They are fundamental to writing organized, modular, and maintainable programs.
def
): Use the def
keyword to define a function, followed by the function name, parentheses ()
for parameters, and a colon :
. The indented block below constitutes the function body.return
): Functions can optionally return a value using the return
statement. If omitted, the function returns None
.def calculate_mean(data_list):
"""Calculates the arithmetic mean of a list of numbers."""
if not data_list: # Handle empty list case
return 0.0
return sum(data_list) / len(data_list)
# Calling the function
scores = [88, 92, 75, 98, 85]
average_score = calculate_mean(scores)
# print(f"Average score: {average_score}")
Functions help break down complex problems into smaller, manageable pieces. They also promote code reuse, reducing redundancy and making updates easier.
This brief review covers the absolute essentials. As we move forward, we'll build upon these concepts, introducing more efficient ways to work with sequences (comprehensions, generators), techniques for writing more flexible functions (advanced arguments, decorators), and methods for structuring larger applications (OOP, context managers). A solid grasp of these fundamentals will make mastering the subsequent topics significantly smoother.
© 2025 ApX Machine Learning