Besides changing the size of images (scaling), we often need to adjust their orientation or position within the frame. Two fundamental geometric transformations that achieve this are rotation and translation. These operations manipulate the spatial relationship of pixels without necessarily changing their intensity values (though interpolation might slightly alter them). They are essential for tasks like aligning images, correcting camera tilt, or preparing data for recognition algorithms that might not be rotation-invariant.
Like scaling, rotation and translation are types of affine transformations. This means they preserve points, straight lines, and planes. Parallel lines remain parallel after an affine transformation. We often use a single function in libraries like OpenCV to perform these operations, typically by defining a transformation matrix.
Rotation involves turning an image around a fixed point, called the center of rotation, by a certain angle.
When you rotate an image, especially by angles other than multiples of 90 degrees, the original rectangular grid of pixels gets mapped to new positions. This raises two considerations:
New Pixel Positions: Where does the intensity value of an original pixel (x,y) end up after rotation? This is calculated using trigonometry, often encapsulated within a rotation matrix. For a counter-clockwise rotation by angle θ around the origin (0,0), the new coordinates (x′,y′) are related to the old coordinates (x,y) by:
(x′y′)=(cosθsinθ−sinθcosθ)(xy)When rotating around a different center, translation operations are combined with this rotation.
Empty Areas and Interpolation: Rotating a rectangular image within its frame often leaves empty areas where no original pixel maps directly. Also, the calculated new coordinates (x′,y′) might fall between pixel locations on the output grid. We need a way to determine the pixel value at integer grid locations in the output image. This process is called interpolation. Common methods include:
Libraries like OpenCV handle these calculations. You typically provide the image, the rotation center, the angle, and an optional scale factor. The library computes the necessary transformation matrix and applies it, using an interpolation method you can often specify.
# Example using OpenCV (Conceptual - details depend on library version)
import cv2
import numpy as np
# Assume 'image' is loaded (e.g., using cv2.imread)
rows, cols = image.shape[:2]
# Center of rotation: center of the image
center = (cols / 2, rows / 2)
# Angle: 45 degrees counter-clockwise
angle = 45
# Scale: 1.0 (no scaling)
scale = 1.0
# Calculate the rotation matrix
M_rotate = cv2.getRotationMatrix2D(center, angle, scale)
# Apply the rotation using warpAffine
# The third argument is the output image size (width, height)
rotated_image = cv2.warpAffine(image, M_rotate, (cols, rows))
# 'rotated_image' now holds the rotated version
Translation is simpler: it involves shifting an image horizontally, vertically, or both. Imagine sliding the image across the screen without rotating or resizing it.
The transformation is straightforward: every pixel originally at (x,y) moves to a new position (x′,y′) where:
x′=x+txy′=y+tyThis can be represented by a translation matrix. For use with functions like OpenCV's warpAffine
(which expects a 2x3 matrix for affine transformations), the translation matrix looks like this:
Here, the tx and ty directly represent the horizontal and vertical shifts.
Like rotation, translation can move parts of the image out of the visible frame, and it can create empty areas that need to be filled (usually with black or some other specified background color).
# Example using OpenCV (Conceptual)
import cv2
import numpy as np
# Assume 'image' is loaded
rows, cols = image.shape[:2]
# Shift amount: 50 pixels right, 20 pixels down
tx = 50
ty = 20
# Define the translation matrix
M_translate = np.float32([[1, 0, tx],
[0, 1, ty]])
# Apply the translation using warpAffine
# The third argument is the output image size (width, height)
translated_image = cv2.warpAffine(image, M_translate, (cols, rows))
# 'translated_image' now holds the shifted version
warpAffine
As seen in the examples, functions like OpenCV's cv2.warpAffine
are versatile. They take a 2x3 transformation matrix M
and apply the corresponding affine transformation (which can be rotation, translation, scaling, shearing, or combinations) to the input image.
The general form of the 2x3 matrix M
is:
where the transformation combines scaling by α, rotation by θ (where α=scale⋅cosθ, β=scale⋅sinθ), and translation relative to a center (cx,cy). However, you usually don't construct this matrix manually. Instead, you use helper functions like cv2.getRotationMatrix2D
for rotation (which incorporates center and scale) or define the simple translation matrix directly, as shown above. The warpAffine
function then applies the transformation defined by the matrix M
to map pixels from the source image to the destination image, using interpolation where necessary.
Understanding rotation and translation is fundamental for manipulating image geometry. These operations are building blocks for image alignment, data augmentation in machine learning (creating slightly modified training images), and correcting perspective distortions. In the hands-on section later in this chapter, you'll get to apply these transformations yourself.
© 2025 ApX Machine Learning