Now that we understand images are made up of pixels arranged in a grid, we need a consistent way to refer to the location of any specific pixel. This is where image coordinate systems come in. Think of it like having an address for every pixel within the image grid.Most image processing libraries and computer vision applications use a coordinate system that might feel slightly different from the Cartesian coordinates you learned in math class. Here's the standard convention:Origin (0,0): The starting point is located at the top-left corner of the image.X-axis: This axis runs horizontally from left to right. The x-coordinate represents the column number.Y-axis: This axis runs vertically from top to bottom. The y-coordinate represents the row number.So, a pixel's location is typically specified by a pair of values $(x, y)$, where $x$ is the horizontal position (column) and $y$ is the vertical position (row), both starting from 0.digraph G { rankdir=LR; node [shape=plaintext]; subgraph cluster_img { label = "Image Grid (Height x Width)"; style=dashed; bgcolor="#e9ecef"; origin [label="(0, 0)\nTop-Left", pos="0,3!", color="#f03e3e"]; point_x [label="(x, 0)", pos="1.5,3!"]; point_y [label="(0, y)", pos="0,1.5!"]; point_xy [label="(x, y)", pos="1.5,1.5!"]; max_x [label="(Width-1, 0)\nTop-Right", pos="3,3!"]; max_y [label="(0, Height-1)\nBottom-Left", pos="0,0!"]; max_xy [label="(Width-1, Height-1)\nBottom-Right", pos="3,0!"]; // Invisible nodes for axis lines x_axis_start [pos="0,3!", shape=point, style=invis]; x_axis_end [label="X-axis (Columns)", pos="3.5,3!", shape=point]; y_axis_start [pos="0,3!", shape=point, style=invis]; y_axis_end [label="Y-axis (Rows)", pos="0,-0.5!", shape=point]; x_axis_start -> x_axis_end [arrowhead=vee, label=" → Width", fontcolor="#1c7ed6"]; y_axis_start -> y_axis_end [arrowhead=vee, label=" → Height", fontcolor="#1c7ed6"]; // Grid lines (optional, subtle) edge [style=dotted, color="#adb5bd"]; point_x -> point_xy; point_y -> point_xy; } }A visual representation of the image coordinate system. The origin (0,0) is at the top-left. The x-coordinate increases to the right, and the y-coordinate increases downwards.Pixel IndexingWhen you access a pixel, you use its coordinates. If an image has a width $W$ and a height $H$, the pixels are indexed as follows:The x-coordinates range from $0$ to $W-1$.The y-coordinates range from $0$ to $H-1$.For example, in an image that is 800 pixels wide and 600 pixels high ($W=800, H=600$):The top-left pixel is at $(x, y) = (0, 0)$.The top-right pixel is at $(x, y) = (799, 0)$.The bottom-left pixel is at $(x, y) = (0, 599)$.The bottom-right pixel is at $(x, y) = (799, 599)$.A pixel somewhere in the middle might be at $(x, y) = (400, 300)$.Important Note on Programming ConventionsWhile the $(x, y)$ notation (column, row) is common, many programming libraries, especially those using NumPy arrays in Python (like OpenCV), access image data using array indexing which follows a (row, column) or (y, x) convention.For instance, to access the pixel value at the coordinate $(x, y)$, you might use code like pixel_value = image[y, x].image[0, 0] accesses the top-left pixel.image[599, 799] accesses the bottom-right pixel in our 800x600 example.This difference between coordinates $(x, y)$ and array indexing [y, x] is a frequent source of confusion for beginners. Always check the documentation of the specific function or library you are using to be sure which order it expects. For this course, when discussing coordinates we will often use $(x, y)$, but when showing code examples involving array access, we will use the [y, x] (row, column) format consistent with libraries like OpenCV and NumPy.Understanding this coordinate system is fundamental because almost all image processing operations, from simple cropping to complex object detection, require you to know how to specify and access pixel locations accurately.