Okay, we know that computers work with numbers, not with visual scenes the way humans do. So, how does a computer "see" an image? The answer lies in how digital images are structured and stored. Building on the idea that images are made of pixels, let's look at how these pixels are represented numerically.
Think of a digital image as a large grid, much like a spreadsheet or a chessboard. Each square in this grid corresponds to a single pixel. The computer stores a numerical value (or sometimes multiple values) in each square, representing the color and intensity of that specific pixel location.
Let's begin with the simplest type: a grayscale image, which contains only shades of gray, ranging from pure black to pure white. In a grayscale image, each pixel is represented by a single number. This number indicates the intensity or brightness of the pixel.
A widely used convention is 8-bit grayscale. In this format, each pixel's intensity is stored as an integer between 0 and 255. A value of 0 typically represents pure black, while 255 represents pure white. Values in between correspond to the various shades of gray. The entire grayscale image can thus be thought of as a 2D matrix (a grid) where each element contains one of these intensity values.
A small segment of a grayscale image represented as a grid of intensity values (e.g., 0=black, 255=white).
Color images require more information per pixel. The most common way to represent color is the RGB model, which stands for Red, Green, and Blue. The idea is that any color can be closely approximated by mixing different amounts of these three primary colors of light.
For an RGB image, each pixel location stores three numerical values:
These three values are often referred to as color channels. Similar to grayscale, each channel typically uses an 8-bit representation, meaning the intensity for each of Red, Green, and Blue ranges from 0 (no intensity of that color) to 255 (maximum intensity of that color).
A single pixel's color is determined by the combination of these three values. For instance:
A single color pixel location stores three separate intensity values for Red, Green, and Blue channels.
In programming, particularly with libraries like OpenCV, NumPy, or PIL (Pillow) in Python, these representations translate directly into array-like data structures:
image[row][column]
holds the pixel's intensity value.image[row][column]
would itself be a small array or tuple containing the three values (R, G, B)
for that pixel. The third dimension (size 3) represents the color channels.For example, a relatively small color image of 640×480 pixels would be represented by a data structure containing 640×480×3=921,600 individual numbers!
We mentioned the 0-255 range, which comes from using 8 bits (28=256 possible values) per channel. This is known as the bit depth. While 8-bit is very common for standard images (JPEG, PNG), some applications (like medical imaging or scientific photography) use higher bit depths (e.g., 10-bit, 12-bit, or 16-bit) to capture finer intensity variations. A higher bit depth means more possible values per pixel or channel, resulting in smoother gradients and more detail, but also larger file sizes.
Understanding that images are fundamentally grids of numbers is essential. It's this numerical representation that allows us to apply mathematical operations and algorithms to process, analyze, and interpret visual information using computers. In the following sections, we'll explore different ways to represent colors (color spaces) and common file formats used to store these numerical grids efficiently.
© 2025 ApX Machine Learning