Alright, you've learned about the tools available, like Ollama and LM Studio, which act as helpful assistants for running Large Language Models on your computer. Now it's time for the most rewarding part: actually running a model and seeing it generate text based on your input.
This section provides practical steps to get you started. We'll use a relatively small model for this first exercise to ensure it runs smoothly on a wider range of hardware. Remember from the previous sections, you should have already installed either Ollama or LM Studio. Choose the tool you installed and follow the corresponding steps below.
If you opted for Ollama, you'll interact with it using your terminal or command prompt.
Download a Model (If You Haven't Already):
Let's download a small, capable model. We'll use phi3:mini
, a model developed by Microsoft known for its good performance relative to its size. Open your terminal and run:
ollama pull phi3:mini
You'll see download progress indicators. This might take a few minutes depending on your internet speed. Once downloaded, the model is stored locally.
Run the Model Interactively: Now, start an interactive session with the model using this command:
ollama run phi3:mini
Ollama will load the model (this might take a moment, especially the first time) and then present you with a prompt, often looking like >>> Send a message (/? for help)
.
Interact with the Model:
Type your prompt directly after the >>>
and press Enter. Let's try asking it to do something creative:
>>> Write a short story about a robot who discovers gardening.
The model will process your request and generate a response, streaming the text output directly to your terminal.
Continue the Conversation: You can continue interacting. The model remembers the previous parts of the conversation within its context window (as discussed in Chapter 5). Try asking a follow-up question.
Exit the Session:
When you're finished, you can exit the Ollama interactive session. Type /bye
and press Enter, or on most systems, you can press Ctrl+D
.
Basic workflow for running a model interactively using Ollama via the command line.
If you prefer a graphical interface, LM Studio makes running models straightforward.
Launch LM Studio: Open the LM Studio application you installed previously.
Download a Model (If Needed):
phi3 mini instruct
. You'll see various versions. Look for a GGUF format model, preferably one with Q4_K_M
or Q4_0
in the name, as these offer a good balance of size and quality. For example, you might find Phi-3-mini-4k-instruct-q4_0.gguf
.Load the Model for Chat:
Phi-3
model you just downloaded.Interact with the Model:
Explain what a Large Language Model is in one sentence.
Continue Chatting: You can type more prompts and carry on a conversation just like you would with the command-line version. LM Studio manages the interaction history for you.
Basic workflow for running a model interactively using the LM Studio graphical interface.
A Quick Note on Performance: Loading the model (transferring it from your storage drive into your computer's RAM or VRAM) is often the slowest part, especially the first time you run a specific model after starting the application. Once loaded, generating text should be relatively faster, though the speed still depends heavily on your hardware (CPU, GPU, RAM) and the size of the model.
Congratulations! You have successfully downloaded and run your first Large Language Model locally. You prompted it, and it generated text based on your input, all running entirely on your own machine. In the next chapter, we will look more closely at how to effectively communicate with these models through prompting.
© 2025 ApX Machine Learning