docker run
docker-compose.yml
Okay, let's dive into how you make your containerized inference API reachable from the outside world, specifically from your host machine or other machines on your network. When you run an application like a Flask or FastAPI web server inside a Docker container, it listens on a specific port within the container's isolated network environment. By default, this port is not accessible from your host machine. To bridge this gap, you need to map a port on your host machine to the port inside the container.
Think of a Docker container as having its own private network interface and IP address. When your Python web server (like Uvicorn for FastAPI or the development server for Flask) starts, it binds to a port within this container network. For example, FastAPI with Uvicorn often defaults to port 8000, while Flask defaults to 5000.
If your server inside the container binds only to 127.0.0.1
(localhost) within the container, it will only accept connections originating from inside that same container. To allow connections from the Docker host network (and thus, through port mapping, from the outside), you must configure your server to listen on 0.0.0.0
. This tells the server to accept connections on all available network interfaces within the container.
Here's how you typically run Uvicorn (for FastAPI) to listen on all interfaces within the container on port 8000:
uvicorn main:app --host 0.0.0.0 --port 8000
Or for Flask's development server:
flask run --host=0.0.0.0 --port=5000
EXPOSE
The Dockerfile provides the EXPOSE
instruction. Its primary purpose is to document which port(s) the application inside the container is intended to listen on. It acts as metadata for image consumers (including yourself) and can be used by other tools.
For instance, if your FastAPI application listens on port 8000, you would add this line to your Dockerfile:
# ... other instructions (FROM, COPY, RUN pip install ...)
# Document that the application uses port 8000
EXPOSE 8000
# Command to run the application, listening on 0.0.0.0
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]
It is significant to understand that EXPOSE
does not actually publish the port or make it accessible from the host machine. It's purely informational. The actual connection is established when you run the container.
docker run -p
To make the container's port reachable from your host machine, you use the -p
or --publish
flag when executing the docker run
command. This flag maps a port on the host machine to a port inside the container.
The format is: -p <host_port>:<container_port>
<host_port>
: The port number on your Docker host machine (your computer). You will use this port to access the service.<container_port>
: The port number inside the container that the application is listening on (the one specified in EXPOSE
and used by your application server).Examples:
Map host port 8080 to container port 8000:
docker run -d -p 8080:8000 --name my_inference_api my_inference_image:latest
In this case, you would access your API via http://localhost:8080
on your host machine. Docker forwards the traffic from host port 8080 to container port 8000.
Map host port 8000 to container port 8000:
docker run -d -p 8000:8000 --name my_inference_api my_inference_image:latest
Here, the host port and container port are the same. Access would be via http://localhost:8000
. This only works if port 8000 is not already in use on your host machine.
Map container port 8000 to a random host port:
docker run -d -p 8000 --name my_inference_api my_inference_image:latest
If you only specify the container port, Docker automatically assigns a random available port on the host machine. You can find out which port was assigned using the docker ps
command:
docker ps
Look for the PORTS
column in the output, which might show something like 0.0.0.0:32768->8000/tcp
. This means host port 32768 is mapped to container port 8000.
The following diagram illustrates mapping host port 8080 to container port 8000.
This diagram shows traffic directed to port 8080 on the host machine being forwarded by Docker to port 8000 inside the running container, where the inference API application is listening.
Once your container is running with the ports correctly mapped, you can test the inference API using tools like curl
or by simply navigating to the URL in your web browser (if your API supports GET requests at the root).
Using the first example above (-p 8080:8000
), you would send a request to the host port:
# Example: Sending a POST request with JSON data to a /predict endpoint
curl -X POST http://localhost:8080/predict \
-H "Content-Type: application/json" \
-d '{"features": [1.0, 2.5, 0.8]}'
If the mapping is successful and your API is running correctly inside the container, you should receive the prediction response.
Remember to choose a host port that isn't already being used by another application on your system. Using EXPOSE
in your Dockerfile documents the intended container port, and using docker run -p <host_port>:<container_port>
makes your containerized inference service accessible for requests.
© 2025 ApX Machine Learning