LLM on Jetson Orin Nano

How to Run Ollama LLM on NVIDIA Jetson Orin Nano in 5 Minutes

If you’re looking to squeeze some serious edge AI performance out of the Jetson Orin Nano using Ollama and the compact yet capable Llama 3.2:1b model, you’re in the right place. This walkthrough will guide you step-by-step—from flashing your device to firing up Ollama in your terminal.


Getting Started: Flashing and Setting Up Your Jetson Orin Nano

Get your Jetson Orin Nano here: https://amzn.to/45HFZXj

First, get your Jetson board prepared:

  • Flash Jetson OS using SDK Manager: Use NVIDIA’s SDK Manager to flash the latest JetPack 6 (or 5.x if necessary) onto your Orin Nano. It ensures proper L4T support and CUDA compatibility (Altium, Ajeet Singh Raina).
  • Update the system: sudo apt update && sudo apt upgrade -y
  • Maximize performance: sudo nvpmodel -m 0 sudo jetson_clocks

These commands set the device to max power mode and lock in peak performance (NVIDIA Developer Forums).


Step 1: Installing Ollama

You have two solid methods:

Option A: Native Install (Simplest)
This installs Ollama for ARM64 with JetPack support: Open up your terminal and type the following command.

$ curl -fsSL https://ollama.com/install.sh | sh

You’ll see the service auto-starting and enabling Ollama via a systemd service (NVIDIA Developer Forums, NVIDIA Jetson AI Lab).

Option B: Using Docker via Jetson Containers

  • Clone and install jetson-containers: git clone https://github.com/dusty-nv/jetson-containers.git cd jetson-containers sudo bash install.sh
  • Run Ollama container: jetson-containers run $(autotag ollama)

This ensures proper JetsonOS–CUDA compatibility (Jeremy Morgan).


Step 2: Downloading Llama 3.2:1b

Inside your terminal (whether native Ollama or container), pull the model:

$ ollama pull llama3.2:1b

This pulls the 1.24 B‑parameter model (about 1.3 GB, Q8_0 quantized) (NVIDIA Developer Forums, Ollama).


Step 3: Running Llama 3.2:1b

Once downloaded:

$ ollama run llama3.2:1b

You’ll enter an interactive prompt to chat, ask questions, or test responses locally on your Orin Nano.


Optional Step: Web Interface using Open WebUI

Prefer a browser-based interface? You can layer in Open WebUI.

If you used Docker:

docker run -d -p 3000:8080 --gpus=all \
  -v ollama:/root/.ollama \
  -v open-webui:/app/backend/data \
  --name open-webui \
  --restart always \
  ghcr.io/open-webui/open-webui:ollama

This binds GPU support, persists models, and serves the GUI at http://<JETSON_IP>:3000 (or localhost if local) (Altium, Collabnix, Ajeet Singh Raina).


Notes & Troubleshooting

  • Ollama Version: Make sure your install or container includes Ollama version 0.4.2 or higher. Earlier versions may not run Llama3.2 models correctly (NVIDIA Developer Forums).
  • Overcurrent Alerts: While more related to larger models or heat-intensive workloads, be aware that some users encountered thermal or power throttling with vision‑heavy LLMs—not necessarily with the 1B text‑only model (NVIDIA Developer Forums).
  • GPU Detection: If Ollama doesn’t detect the GPU (especially in native install), double-check CUDA library paths and container usage. A known GitHub issue reported “no GPU detected” errors, typically resolved by running inside a proper container with GPU access (GitHub).

Quick Reference: Command Summary

# System setup
sudo apt update && sudo apt upgrade -y
sudo nvpmodel -m 0
sudo jetson_clocks

# Native install
curl -fsSL https://ollama.com/install.sh | sh

# Or Docker setup via Container
git clone https://github.com/dusty-nv/jetson-containers.git
cd jetson-containers
sudo bash install.sh
jetson-containers run $(autotag ollama)

# Pull and run Llama 3.2:1b
ollama pull llama3.2:1b
ollama run llama3.2:1b

# Optional GUI
docker run -d -p 3000:8080 --gpus=all \
  -v ollama:/root/.ollama \
  -v open-webui:/app/backend/data \
  --name open-webui --restart always \
  ghcr.io/open-webui/open-webui:ollama

You’re now primed to run Meta’s Llama 3.2:1b straight from the terminal—or through a slick web UI—on your Jetson Orin Nano. It’s remarkable how a compact edge device can host a responsive language model entirely offline. Dive in, experiment, and see what playful queries or creative projects you unlock next.

Leave a Comment

Your email address will not be published. Required fields are marked *