Local LLM Inference with Ollama

This project provides a Python interface for local inference using Ollama with three different models:

DeepSeek Coder 1.3B (replacing DeepSeek-R1-1.5B)
Llama 3.2 1B
Qwen2 Math 1.5B

Prerequisites

Install Ollama from https://ollama.ai/
Python 3.8 or higher
pip (Python package manager)

Setup

Clone this repository
Install the required Python packages:
```
pip install -r requirements.txt
```

Model Storage

The model weights are stored locally using Ollama in the following directory:

macOS/Linux: ~/.ollama/models/
Windows: C:\Users\<username>\.ollama\models\

When you run client.pull_model(), the model weights are downloaded from Ollama's servers and stored in this directory. The total size of all three models is approximately 3GB.

Usage

Basic Inference

Start the Ollama service:
```
ollama serve
```
Run the basic inference example:
```
python local_inference.py
```
Run the inference tests:
```
python test_inference.py
```

RAG (Retrieval-Augmented Generation) Inference

Run the RAG inference example:
```
python rag_inference.py
```
Run the RAG tests:
```
python rag_test.py
```

Using the API in Your Code

You can use the OllamaInference class in your code:

from local_inference import OllamaInference

client = OllamaInference()

# Pull a model
client.pull_model("deepseek-r1-1.5b")  # This will pull deepseek-coder:1.3b

# Generate text
response = client.generate(
    model_name="deepseek-r1-1.5b",
    prompt="Your prompt here",
    max_tokens=100,
    temperature=0.7
)
print(response["response"])

Available Models

deepseek-r1-1.5b: Maps to DeepSeek Coder 1.3B (776MB)
- A capable coding model trained on two trillion code and natural language tokens
- Optimized for code generation and understanding
- Size: 776MB
llama3.2-1b: Maps to Llama 3.2 1B (1.3GB)
- Meta's latest 1B parameter model optimized for edge devices
- Balanced performance and resource usage
- Size: 1.3GB
qwen2.5-1.5b: Maps to Qwen2 Math 1.5B (934MB)
- Specialized math language model built upon Qwen2
- Optimized for mathematical reasoning and problem-solving
- Size: 934MB

API Reference

OllamaInference Class

Methods

generate(model_name, prompt, max_tokens=None, temperature=0.7, top_p=0.9, stop=None, stream=False): Generate text using the specified model
list_models(): List all available models
pull_model(model_name): Pull a model from Ollama

Notes

The models are stored locally using Ollama
Make sure you have sufficient disk space for the models (approximately 3GB total)
The first time you use a model, it will be downloaded automatically
The Ollama service must be running for the inference to work
If you see "address already in use" error when starting Ollama, it means the service is already running
Test results are stored in the results/ directory

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Local LLM Inference with Ollama

Prerequisites

Setup

Model Storage

Usage

Basic Inference

RAG (Retrieval-Augmented Generation) Inference

Using the API in Your Code

Available Models

API Reference

OllamaInference Class

Methods

Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.gitignore		.gitignore
README.md		README.md
inference.py		inference.py
local_inference.py		local_inference.py
rag_inference.py		rag_inference.py
rag_test.py		rag_test.py
requirements.txt		requirements.txt
test_inference.py		test_inference.py

Folders and files

Latest commit

History

Repository files navigation

Local LLM Inference with Ollama

Prerequisites

Setup

Model Storage

Usage

Basic Inference

RAG (Retrieval-Augmented Generation) Inference

Using the API in Your Code

Available Models

API Reference

OllamaInference Class

Methods

Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages