I Put a Local AI on a Raspberry Pi and It Kept Getting Warm
Brian Stever
2025 ยท Raspberry Pi 5, Python, Pygame, Ollama, faster-whisper
Abstract. Pi Oracle is an on-device voice assistant built for Raspberry Pi 5. What started as one app turned into a small family of local assistants sharing the same hardware constraints. oracle.py is the theatrical Matrix version, pocket_doctor.py is the more clinically useful nursing reference, and voice_assistant.py pushes the same idea with a heavier model. The core question stayed the same: how much voice-assistant experience can you fit onto a tiny Linux box before the box starts radiating concern?
1.Motivation
I built the first version of this because I wanted a voice assistant that didn't send my questions to a server farm in Virginia. That's the principled explanation. The more honest one is that I'd been playing with Ollama on my laptop and started wondering what would happen if I tried to run it on the smallest computer I owned. Curiosity and a Raspberry Pi 5 are a dangerous combination, especially when the Pi is just sitting on your desk looking available.
Most voice assistants feel useful in the way hotels feel luxurious: the experience works, but only because a great deal of invisible infrastructure stands behind the curtain. I wanted to see how much of that could be reproduced on a single device with no cloud calls. The Pi 5 was the right target because it's powerful enough to tempt you and constrained enough to punish you. That tension made the project attractive.
2.How It Grew
What started as one persona kept splitting into more. The Matrix-flavored oracle was the first. It uses llama3.2:1b with a tiny Whisper model and an aggressively unserious personality prompt. But once the hardware setup worked, I started wondering what else it could be. The nursing-focused Pocket Doctor variant came next, switching to smollm2:1.7b and a clinical voice I built while working on OslerAI. A third version uses llama3.2:3b for fuller answers. Same box, same touchscreen, different social ambitions.
That evolution matters because it turned a single novelty interface into a small design study. What changes when the exact same constrained hardware is asked to feel mystical in one mode and clinically useful in another? Quite a lot, and almost none of it has to do with adding more buttons.
Table 1. Variants.
| Variant | Model + personality |
|---|---|
| Oracle | Matrix-style, Gen Z-coded assistant using llama3.2:1b |
| Pocket Doctor | Clinical reference tool using smollm2:1.7b and base Whisper |
| Voice Assistant | Heavier clinical build using llama3.2:3b |
3.Interaction Model
The Oracle interface itself is intentionally minimal. A tap begins listening. Speech is transcribed locally, routed through a local language model, and rendered back as text on a full-screen pseudo-terminal. Long press exits. Double tap clears history. That is more or less the whole language of the device.
Minimalism was partly aesthetic and partly tactical. I did not have room for a heavyweight UI layer, and I also did not want the interface to explain itself too much. The device needed to feel legible without becoming chatty. A blinking cursor, a recording state, and a visible stream of words were enough.
4.Implementation Constraints
The main constraint was memory. An 8GB Raspberry Pi sounds generous until the operating system, audio pipeline, screen handling, transcription model, and language model all want their share at once. Model choice therefore became a systems-design decision rather than a pure quality decision.
The Matrix Oracle opts for faster, lighter inference with llama3.2:1b and tiny Whisper; Pocket Doctor spends a bit more budget on transcription quality and uses smollm2:1.7b. This is the kind of product decision that sounds philosophical until you are standing over a small hot computer at 11 PM wondering whether the better model is worth the extra three seconds of latency.
def callback(indata, frames, time_info, status):
state.recording_data.append(indata.copy())
level = np.max(np.abs(indata))
if level > SILENCE_THRESHOLD:
has_speech = True
silence_start = None
elif has_speech and time.time() - silence_start > SILENCE_DURATION:
state.stop_recording = Trueresp = requests.post('http://localhost:11434/api/generate',
json={'model': 'llama3.2:1b', 'prompt': full_prompt, 'stream': True},
stream=True)
for line in resp.iter_lines():
if line:
token = json.loads(line)['response']
oracle_msg.text += token5.Operational Problems
The first problem was feedback. A conventional app can use loaders, progress bars, and microcopy to tell the user what is happening. A terminal-style object has fewer options. I ended up relying on a small visual vocabulary: red or active states while listening, a processing indicator during inference, and a blinking cursor when idle. It is a tiny UX system, but it does real work.
The second problem was heat. Speech recognition plus local inference is enough to make the Pi work very hard for a very small box. Active cooling became part of the architecture. There is something charming about a mystical oracle that also needs a fan, but it does make the illusion slightly more mechanical.
The third problem was development ergonomics. A fullscreen Pygame UI on a Pi touchscreen is not friendly when it crashes. Remote access via VNC became essential because it let me keep terminal visibility while still working on the device interface. In other words, the magic mirror was debugged like a normal Linux box, which is probably healthy.
6.Reflection
Pi Oracle convinced me that local AI products become more interesting, not less, when the hardware is constrained. Scarcity forces taste. You have to decide what the device is actually for, what feedback matters, and which model is good enough rather than theoretically best. And they come free with a $60 computer.
It also confirmed that I enjoy building objects that feel a little theatrical as long as the theatrics serve the interaction. This one is part assistant, part prop, part clinical reference, and part thermal management exercise. I showed it to my mom and she asked it a medical question. It gave a reasonable answer. Then the fan got loud and she asked if it was okay. I am fond of all parts of this story.