If you’re looking to squeeze some serious edge AI performance out of the Jetson Orin Nano using Ollama and the compact yet capable Llama 3.2:1b model, you’re in the right place. This walkthrough will guide you step-by-step—from flashing your device to firing up Ollama in your terminal.
Getting Started: Flashing and Setting Up Your Jetson Orin Nano
Get your Jetson Orin Nano here: https://amzn.to/45HFZXj
First, get your Jetson board prepared:
- Flash Jetson OS using SDK Manager: Use NVIDIA’s SDK Manager to flash the latest JetPack 6 (or 5.x if necessary) onto your Orin Nano. It ensures proper L4T support and CUDA compatibility (Altium, Ajeet Singh Raina).
- Update the system:
sudo apt update && sudo apt upgrade -y
- Maximize performance:
sudo nvpmodel -m 0 sudo jetson_clocks
These commands set the device to max power mode and lock in peak performance (NVIDIA Developer Forums).
Step 1: Installing Ollama
You have two solid methods:
Option A: Native Install (Simplest)
This installs Ollama for ARM64 with JetPack support: Open up your terminal and type the following command.
$ curl -fsSL https://ollama.com/install.sh | sh
You’ll see the service auto-starting and enabling Ollama via a systemd service (NVIDIA Developer Forums, NVIDIA Jetson AI Lab).
Option B: Using Docker via Jetson Containers
- Clone and install jetson-containers:
git clone https://github.com/dusty-nv/jetson-containers.git cd jetson-containers sudo bash install.sh
- Run Ollama container:
jetson-containers run $(autotag ollama)
This ensures proper JetsonOS–CUDA compatibility (Jeremy Morgan).
Step 2: Downloading Llama 3.2:1b
Inside your terminal (whether native Ollama or container), pull the model:
$ ollama pull llama3.2:1b
This pulls the 1.24 B‑parameter model (about 1.3 GB, Q8_0 quantized) (NVIDIA Developer Forums, Ollama).
Step 3: Running Llama 3.2:1b
Once downloaded:
$ ollama run llama3.2:1b
You’ll enter an interactive prompt to chat, ask questions, or test responses locally on your Orin Nano.
Optional Step: Web Interface using Open WebUI
Prefer a browser-based interface? You can layer in Open WebUI.
If you used Docker:
docker run -d -p 3000:8080 --gpus=all \
-v ollama:/root/.ollama \
-v open-webui:/app/backend/data \
--name open-webui \
--restart always \
ghcr.io/open-webui/open-webui:ollama
This binds GPU support, persists models, and serves the GUI at http://<JETSON_IP>:3000
(or localhost
if local) (Altium, Collabnix, Ajeet Singh Raina).
Notes & Troubleshooting
- Ollama Version: Make sure your install or container includes Ollama version 0.4.2 or higher. Earlier versions may not run Llama3.2 models correctly (NVIDIA Developer Forums).
- Overcurrent Alerts: While more related to larger models or heat-intensive workloads, be aware that some users encountered thermal or power throttling with vision‑heavy LLMs—not necessarily with the 1B text‑only model (NVIDIA Developer Forums).
- GPU Detection: If Ollama doesn’t detect the GPU (especially in native install), double-check CUDA library paths and container usage. A known GitHub issue reported “no GPU detected” errors, typically resolved by running inside a proper container with GPU access (GitHub).
Quick Reference: Command Summary
# System setup
sudo apt update && sudo apt upgrade -y
sudo nvpmodel -m 0
sudo jetson_clocks
# Native install
curl -fsSL https://ollama.com/install.sh | sh
# Or Docker setup via Container
git clone https://github.com/dusty-nv/jetson-containers.git
cd jetson-containers
sudo bash install.sh
jetson-containers run $(autotag ollama)
# Pull and run Llama 3.2:1b
ollama pull llama3.2:1b
ollama run llama3.2:1b
# Optional GUI
docker run -d -p 3000:8080 --gpus=all \
-v ollama:/root/.ollama \
-v open-webui:/app/backend/data \
--name open-webui --restart always \
ghcr.io/open-webui/open-webui:ollama
You’re now primed to run Meta’s Llama 3.2:1b straight from the terminal—or through a slick web UI—on your Jetson Orin Nano. It’s remarkable how a compact edge device can host a responsive language model entirely offline. Dive in, experiment, and see what playful queries or creative projects you unlock next.