Ollama Local Model Setup Guide
Ollama is a powerful local large language model runner that lets you run open-source AI models on your own computer, completely free and privacy-preserving.
What is Ollama?
Ollama is an open-source project that simplifies running large language models locally:
- One-click deployment: Simple commands to download and run models
- Completely free: No API fees, just need a computer
- Privacy protection: Data processed entirely locally, never uploaded to the cloud
- Rich model library: Supports Llama, Mistral, Qwen, and many other open-source models
Installing Ollama
macOS
Option 1: Homebrew (Recommended)
brew install ollama
Option 2: Official Installer
- Visit the Ollama download page
- Download the macOS installer
- Open the
.dmgfile and drag Ollama to your Applications folder
Windows
- Visit the Ollama download page
- Download the Windows installer (
OllamaSetup.exe) - Run the installer and follow the prompts
Linux
One-line install script:
curl -fsSL https://ollama.com/install.sh | sh
Or manual installation:
See the Ollama GitHub for installation instructions.
Starting the Ollama Service
After installation, you need to start the Ollama service:
macOS/Linux:
ollama serve
Windows: After installation, Ollama typically runs automatically in the background. You can see its icon in the system tray.
Downloading Models
Use the ollama pull command to download models:
Recommended Models
# Llama 3.2 - Meta's latest open-source model
ollama pull llama3.2
# Llama 3.2 3B - Smaller version, faster
ollama pull llama3.2:3b
# Qwen 2.5 - Alibaba's open-source model, strong Chinese capability
ollama pull qwen2.5
# Mistral - European open-source model, excellent performance
ollama pull mistral
# DeepSeek Coder - Specialized for code generation
ollama pull deepseek-coder
Model Selection Guide
| Model | Parameters | Memory Required | Features |
|---|---|---|---|
| llama3.2:3b | 3B | 4GB+ | Fast, good for beginners |
| llama3.2 | 8B | 8GB+ | Balanced choice |
| qwen2.5:7b | 7B | 8GB+ | Excellent Chinese capability |
| mistral | 7B | 8GB+ | Strong reasoning |
| llama3.1:70b | 70B | 64GB+ | Best performance |
View Downloaded Models
ollama list
Configure in Chatbox
Step 1: Ensure Ollama is Running
Run this command in terminal to check:
curl http://localhost:11434
If it returns Ollama is running, the service is working.
Step 2: Open Chatbox Settings
- Open the Chatbox app
- Click the "Settings" entry in the bottom left
- Select "AI Provider" or "Model Settings"
Step 3: Add Ollama
- Click "Add Provider"
- Select "Ollama"
- Set API Host:
http://localhost:11434 - Save settings
Step 4: Select a Model
- Choose a downloaded model from the list
- Or manually enter the model name (e.g.,
llama3.2) - Start chatting
Remote Connection Setup (Optional)
If you want to access Ollama on your computer from other devices (like phone, tablet):
Security Warning: Setting
OLLAMA_HOST=0.0.0.0exposes Ollama on all network interfaces. Ollama has no built-in authentication. Only use this on trusted local networks, and do not expose port11434to the public internet. Consider using a VPN or SSH tunnel for remote access.
Step 1: Configure Ollama Listen Address
macOS/Linux:
# Set environment variable
export OLLAMA_HOST=0.0.0.0
# Restart Ollama
ollama serve
Make it permanent:
Add export OLLAMA_HOST=0.0.0.0 to ~/.bashrc or ~/.zshrc
Windows:
- Add
OLLAMA_HOSTto system environment variables with value0.0.0.0 - Restart the Ollama service
Step 2: Open Firewall Port
macOS: Usually no extra configuration needed
Windows:
- Open "Windows Firewall"
- Select "Advanced Settings" → "Inbound Rules"
- Create new rule, open TCP port
11434
Linux:
# Ubuntu/Debian
sudo ufw allow 11434
# CentOS/RHEL
sudo firewall-cmd --add-port=11434/tcp --permanent
sudo firewall-cmd --reload
Step 3: Connect from Chatbox
In Chatbox on other devices:
- Add Ollama Provider
- Set API Host to:
http://your-computer-IP:11434 - For example:
http://192.168.1.100:11434
Common Issues
Model Download is Slow
Solutions:
- Check network connection
- Try downloading during off-peak hours
- Use proxy or mirror
Out of Memory When Running Models
Solutions:
- Use smaller model versions (e.g.,
:3b) - Close other memory-intensive programs
- Add more RAM to your computer
Chatbox Cannot Connect to Ollama
Troubleshooting:
- Confirm Ollama service is running
- Check if API Host address is correct
- Try accessing
http://localhost:11434in browser
Slow Model Response
Optimization tips:
- Use smaller models (e.g., 3B parameters)
- If you have a GPU, ensure Ollama is using GPU acceleration
- Reduce conversation context length
How to Delete Models
ollama rm model-name
# For example
ollama rm llama3.2
Usage Tips
-
Choose models based on hardware:
- 8GB RAM: Use 3B-7B parameter models
- 16GB RAM: Can try 13B parameter models
- 32GB+ RAM: Can run larger models
-
Leverage GPU acceleration:
- NVIDIA GPU: Ollama uses it automatically
- Apple Silicon Mac: Uses Metal acceleration
-
Update models regularly:
ollama pull model-name -
Explore the model library: Visit the Ollama model library to discover more models