Skip to content

Simple CLI Chat with Ollama

In this project, I will show you how to download and install Ollama models, and use the API to integrate them into your app.

The main purpose of this project is to show examples of how streaming and non-streaming API requests work within the Ollama environment.

If you just want to get some examples here is the Github Repo.


Step 1 - Pre-Requisites

Ollama Installation

macOS / Windows — use the official download at ollama.com

Linux:

Terminal window
curl -fsSL https://ollama.com/install.sh | sh

Python Environment

You’ll need Python 3.12+. Set up a virtual environment:

Terminal window
mkdir my-project && cd my-project
python3 -m venv .venv
source .venv/bin/activate
which python

Step 2 - Ollama Setup

Important Commands

Start the Ollama API:

Terminal window
ollama serve

Pull a model:

Terminal window
ollama pull llama3.1
ollama pull llama3.1:70b

List installed models:

Terminal window
ollama list

Remove a model:

Terminal window
ollama rm <model-name>

Custom Modelfiles

Create a Modelfile:

FROM llama3.1
PARAMETER temperature 1
SYSTEM """
You are Mario from Super Mario Bros. Answer as Mario, the assistant, only.
"""

Create and run the model:

Terminal window
ollama create <name-of-new-model> -f ./Modelfile
ollama run <name-of-new-model>

My Personal Favourite Models

ModelParametersSizeDownload
Llama 3.1:7b7B3.8GBollama run llama3.1:7b
Mistral-Nemo6B3.2GBollama run mistral-nemo
CodeLlama7B3.8GBollama run codellama
Phi 314B7.9GBollama run phi3
Gemma 29B5.5GBollama run gemma2
CodeGemma13B8.2GBollama run codegemma

Step 3 - Creating a Custom CLI

Clone the repo or code along:

Terminal window
git clone https://github.com/LargeLanguageMan/python-ollama-cli

Ollama API Requests

Streaming (token by token):

Terminal window
curl http://localhost:11434/api/generate -d '{
"model": "llama3.1",
"prompt":"Why is the sky blue?"
}'

Non-streaming (full response at once):

Terminal window
curl http://localhost:11434/api/generate -d '{
"model": "llama3.1",
"prompt": "Why is the sky blue?",
"stream": false
}'

Install the Python requests library:

Terminal window
pip install requests

Option 1: Streaming

response = requests.post(url, headers=headers, data=json.dumps(data), stream=True)
all_chunks = []
for chunk in response.iter_lines():
if chunk:
decoded_data = json.loads(chunk.decode('utf-8'))
all_chunks.append(decoded_data)
return all_chunks

Print the output:

for response in result:
obj = obj + response["response"]
print(obj)

Option 2: Non-Streaming

response = requests.post(url, headers=headers, data=json.dumps(data))
return response.json()
print(result['response'])

CLI example with no streaming