I Put a Brain in a
Potato-Powered Computer
Local LLM. Local speech recognition. Local existential dread when it tells you the meaning of life is "touch grass, bestie."
The Story
Post-Dinner Project Spiral
I had a Raspberry Pi 5 sitting on my desk doing nothing. It's been through a few half-finished projects... a home server that was overkill, a retro gaming setup I never touched. After dinner tonight, I decided to see how far I could push it.
The question: could I build a fully offline voice assistant? Not "offline but phones home for the hard parts," but actually offline. Speech recognition, language model, everything running on this tiny $80 computer with no internet connection. I genuinely didn't know if it would work.
The Pi has 8GB of RAM. That sounds like a lot, but the OS takes a chunk, the speech-to-text model needs memory, and then whatever's left has to run an actual language model. It's tight. Most tutorials I found assumed you had a beefy GPU or were calling cloud APIs.
But I had a touchscreen attached, which got me thinking about the interface. If I'm building something for a little screen with no keyboard, it should feel different than a terminal app. Falling green code, CRT scanlines, typewriter text... the Matrix aesthetic was mostly a joke, but it stuck.
The Experience
Tap. Speak. Receive.
The first time it worked, actually worked, was surreal. I tapped the screen, asked a question, watched the little red recording dot, then saw text start appearing character by character. No loading spinner. No network request. Just the Pi doing math and generating a response from nothing but local compute.
I spent the next hour asking it increasingly weird questions. (You can change the system prompt to make it talk like a Gen Z teenager, which is exactly as funny as you'd expect. "Bestie that's giving main character energy fr fr.")
The Interesting Part
Knowing When You've Stopped Talking
Here's a problem I didn't think about until I was neck-deep in code: how does the device know when you're done speaking?
You could make the user tap again to stop recording, but that's clunky. Push-to-talk works for walkie-talkies, not for something that's supposed to feel magical. The answer is silence detection. You listen for audio, track when sound stops, and after about 1.5 seconds of quiet, you assume they're done.
The trick is using a callback-based audio stream. Instead of recording for a fixed duration, you process audio in chunks as it comes in. Each chunk, you check the peak amplitude. If it's above a threshold, someone's talking. If it drops below and stays there, they've stopped.
def callback(indata, frames, time_info, status):
nonlocal has_speech, silence_start
state.recording_data.append(indata.copy())
# Check audio level
level = np.max(np.abs(indata))
if level > SILENCE_THRESHOLD:
has_speech = True
silence_start = None
elif has_speech:
# Started hearing silence after speech
if silence_start is None:
silence_start = time.time()
elif time.time() - silence_start > SILENCE_DURATION:
# 1.5 seconds of silence = done talking
state.stop_recording = TrueThe SILENCE_THRESHOLD took some tuning. Too low and background noise triggers it. Too high and quiet speakers get cut off. 500 worked well for my USB mic in a normal room.
The Numbers
Dinner to 1am
Raw Input
Touch Events on Linux
Pygame can handle touch events, but I wanted more control. Specifically, I wanted to detect how long someone was touching the screen. A tap should start recording. A double-tap should clear the conversation. A long press (2 seconds) should exit the app.
The evdev library gives you raw input events straight from the kernel. Touch down, touch up, with timestamps. You can calculate duration, detect patterns, whatever you need.
def touch_handler(dev, running):
for ev in dev.read_loop():
if not running[0]: break
# Raw touch events: type=1, code=330
if ev.type == 1 and ev.code == 330:
if ev.value == 1: # Touch down
state.is_touching = True
state.touch_start_time = time.time()
else: # Touch up
dur = time.time() - state.touch_start_time
state.is_touching = False
if dur >= LONG_PRESS_TIME:
running[0] = False # Exit app
elif time.time() - state.last_tap_time < DOUBLE_TAP_TIME:
state.messages = [] # Double-tap: clear
elif state.status == "ready":
# Single tap: start listening
threading.Thread(target=voice_flow).start()
state.last_tap_time = time.time()The magic numbers: type=1 means it's a key/button event, and code=330 is the touch contact code. value=1 means press, value=0 means release.
Streaming
The Typewriter Effect
Language models generate text token by token. If you wait for the full response before showing anything, the user stares at a blank screen for several seconds. That's death for the "mystical oracle" vibe.
Ollama supports streaming, so you get each token as it's generated. I append them to the message object in real-time, then the UI loop handles the actual display with a separate typewriter effect (18ms between characters feels right).
# Stream tokens as they generate
resp = requests.post('http://localhost:11434/api/generate',
json={
'model': 'smollm2:1.7b',
'prompt': full_prompt,
'stream': True,
'options': {
'num_ctx': 512, # Context window
'num_predict': 300, # Max tokens
'temperature': 0.3 # Lower = more focused
}
}, stream=True)
oracle_msg = Message('oracle', '')
state.messages.append(oracle_msg)
for line in resp.iter_lines():
if line:
data = json.loads(line)
if 'response' in data:
oracle_msg.text += data['response']
if data.get('done'): break# In the main loop - typewriter effect
for msg in state.messages:
if msg.role == 'oracle' and msg.displayed_chars < len(msg.text):
if pygame.time.get_ticks() - state.last_type_time > 18:
msg.displayed_chars += 1
state.last_type_time = pygame.time.get_ticks()The two-stage approach (streaming into the message, then displaying with a delay) means the text appears smoothly even if tokens arrive in bursts. It also lets me hide the cursor while text is generating. Small touch, but it makes the interface feel more polished.
Model Selection
Finding a Brain That Fits
Ollama makes model swapping trivial. ollama pull modelname and you're good. I tried probably five or six before landing on one that worked.
Llama 3.2 3B was my first attempt. Smart, coherent, but agonizingly slow. Like, you could count to three between each word. Not great for conversation flow.
SmolLM2 1.7B was the winner. I'd never heard of it before this project. It's smaller, faster, and still surprisingly coherent. The responses aren't as nuanced as a bigger model, but for a fun voice assistant? More than good enough.
✗ Llama 3.2:3b
Smart but glacially slow. Multi-second pauses between tokens. The mystical vibe died waiting.
~ Llama 3.2:1b
Fast and a bit unhinged. Good for the Gen Z personality mode where chaos is the point.
✓ SmolLM2:1.7b
The sweet spot. Quick responses, coherent enough to be useful, light enough to leave headroom.
Challenges
What Went Wrong
Hardware projects are humbling. Software errors give you stack traces. Hardware errors give you silence and confusion.
The Pivot
What If It Was Actually Useful?
The novelty of asking a fake oracle about the meaning of life wears off eventually. But I kept staring at the thing, thinking: I have a portable, offline, voice-in-text-out device. What would I actually want to use this for?
One of my projects is OslerAI, an AI platform for medical education. And one thing that comes up constantly in healthcare is reference lookups... drug doses, lab ranges, clinical guidelines. Healthcare workers pull out their phone, unlock it, find the app, type the query... it's a lot of friction for a quick answer.
What if you could just ask? Tap, speak, answer. "Normal potassium range?" → "3.5 to 5.0 millimoles per liter." No unlocking, no typing, no cloud latency.
I cloned the project, swapped the system prompt, and called it Pocket Doctor. Same hardware, completely different personality: "You're a clinical reference tool. Canadian context. SI units. Be concise. No disclaimers."
Pocket Doctor Mode
- Canadian clinical context (SI units: mmol/L, μmol/L)
- Concise 1-2 sentence answers, no "As an AI..." hedging
- Drug doses, lab values, vital sign ranges
- Teal-on-dark UI, easier on tired eyes
The Stack
What's Running
Hardware
- Raspberry Pi 5, 8GB RAM model
- 7" Touchscreen, official RPi display
- USB Microphone, generic, works fine
- Active Cooler, essential for sustained inference
Software
- Pygame, UI, touch events, rendering
- sounddevice, audio capture with callbacks
- Faster-Whisper, local speech-to-text
- Ollama, local LLM runtime
What's Next
The To-Do List
Right now it's a naked Pi with a screen and cables everywhere. Not exactly portable. The dream is to make it actually pocketable.
Enclosure
3D print something that looks like a product instead of a science project.
Battery
A UPS HAT or chunky LiPo to cut the power cable.
Better Models
Small models are improving fast. Ready to swap in whatever drops next.
The dream: A pocketable, battery-powered, fully offline AI assistant. Star Trek tricorder energy... except it dispenses clinical reference info and occasionally speaks in Gen Z slang.
Takeaway
The Point
I started this project at midnight because I couldn't sleep. By 4am I had a working voice assistant that runs entirely offline on an $80 computer.
That's the part that surprised me. A few years ago, this would have required a server rack and a team of engineers. Now it's a one-night project with off-the-shelf hardware and open-source models. The barrier to building local AI keeps dropping.
Is this production-ready? No. Would I trust Pocket Doctor in an actual clinical setting? Absolutely not, always verify with real sources. But as a proof of concept for what's possible with local compute? I think there's something here.
Seacrest out.
Brian Stever