Ollama

Running LLMs Locally with Ollama

Ollama is a tool that allows you to run large language models (LLMs) locally on your own machine – yes, your laptop or desktop, whatever homebuilt device you want to run these models on. While llama ecosystem models tend not to be as large as say Frontier models like GPT-4o or Claude 3.5 Sonnet, you might be surprised to hear that for many use cases, they are actually on par due to how much data these models have been trained on. The real benefit of using these local models, however, has nothing to do with benchmarks but has more to do with the fact that you can audit with 100% certainty that the model weights are not changing and that your interactions with them are kept completely private.

About Ollama

Ollama provides a straightforward way to download, set up, and run state-of-the-art open-source large language models directly on your computer. It bundles model weights, configurations, and data into a single package managed by a command-line interface, making local AI accessible and manageable for developers and curious users alike.

Key Features

Local Execution: Run powerful LLMs entirely on your own hardware, ensuring data never leaves your machine.
Model Auditability: Have confidence that the model you are using remains consistent, as the weights are stored locally and aren't subject to unannounced cloud updates.
Complete Privacy: Interactions with the models stay on your device, perfect for sensitive information or confidential tasks.
Simple Command-Line Interface: Manage models and run interactions easily through your terminal.
Built-in API Server: Ollama serves a local API, allowing integration with other applications or custom workflows.
Wide Model Support: Access a vast library of models from the Ollama library, including popular families like Llama, Mistral, and others, stretching well into the hundreds and even including models from outside of Llama's ecosystem.

Pros and Cons

✅ Run powerful AI models entirely offline.
✅ Guarantees 100% privacy for your prompts and generated content.
✅ Allows auditing of model weights for consistency.
✅ Access to a diverse and growing library of open-source models.
✅ Free to download and use.
❌ Primarily operated via the command line, which may require some technical comfort.
❌ Running larger, more capable models demands significant RAM resources.
❌ While powerful, local models might lag behind the very largest proprietary models for certain complex tasks.

Download and Availability

To get started with Ollama, simply come over to ollama.com. Click the download button and download the appropriate version for your OS (Mac OS, Windows, Linux supported). Ollama is completely free to download and use.

How to Use Ollama

Installation is designed to be simple. For Mac OS, after clicking the download button, you'll get a ZIP file. Unzip it, and drag the resulting Ollama application into your Applications folder. Then, launch the Ollama application, accept all of the permission dialogues, and you'll be presented with a little wizard. Click next, click install (you may have to put in your account password), and click finish. That's it. You should then be able to type ollama -v on the command line and see the Ollama client version printed out. Also note, on Mac, you can start and stop Ollama via the llama icon in the menu bar. As Ollama is a command line tool, you'll need to pop open a shell to use it. To use a particular model, enter the ollama run command followed by the model's ID.

Understanding Models

You'll interact with models using their IDs. A great resource is the table on the Ollama GitHub repo showing popular models. To keep things light, consider running one of the smallest models like llama3.2:1b-instruct-fp16. The 3.2 tells us the family, 1B tells us the amount of parameters (1 billion), instruct tells us it was fine-tuned for instructions, and fp16 specifies that internal calculations use numbers with up to four decimal places during generation. You can think of parameters as a rough indication of simulated brain cells; more parameters often means smarter, assuming sufficient training data. However, depending on training details, a smaller model can outperform a larger one in certain cases. As a rough guideline for hardware, you'll need about 8 gigs of RAM for 7 billion parameter models, 16 gigs for 13 billion, and 32 gigs for 33 billion. A 1B model should work on just about any computer Ollama supports.

Advanced Usage

Ollama supports interactive sessions with models directly in your terminal. In the background, it also serves an API. To see the logs of the Ollama API, open up a new terminal and type tail -f followed by the path to the Ollama server logs. You can also curl the Ollama API by closing the interactive session and sending curl statements. There are other ways to use Ollama, so feel free to experiment, but hopefully this gives you a great gist of how it works. For those interested in learning more about how AI training works, researching topics like neural networks and back propagation can serve as a good starting point.

Visit