Hardware

2025

One Night Build

I Put a Brain in a
Potato-Powered Computer

Local LLM. Local speech recognition. Local existential dread when it tells you the meaning of life is "touch grass, bestie."

Raspberry Pi 5

Python

Ollama

Whisper

Pygame

The Story

Post-Dinner Project Spiral

I had a Raspberry Pi 5 sitting on my desk doing nothing. It's been through a few half-finished projects... a home server that was overkill, a retro gaming setup I never touched. After dinner tonight, I decided to see how far I could push it.

The question: could I build a fully offline voice assistant? Not "offline but phones home for the hard parts," but actually offline. Speech recognition, language model, everything running on this tiny $80 computer with no internet connection. I genuinely didn't know if it would work.

The Pi has 8GB of RAM. That sounds like a lot, but the OS takes a chunk, the speech-to-text model needs memory, and then whatever's left has to run an actual language model. It's tight. Most tutorials I found assumed you had a beefy GPU or were calling cloud APIs.

But I had a touchscreen attached, which got me thinking about the interface. If I'm building something for a little screen with no keyboard, it should feel different than a terminal app. Falling green code, CRT scanlines, typewriter text... the Matrix aesthetic was mostly a joke, but it stuck.

The Experience

Tap. Speak. Receive.

The first time it worked, actually worked, was surreal. I tapped the screen, asked a question, watched the little red recording dot, then saw text start appearing character by character. No loading spinner. No network request. Just the Pi doing math and generating a response from nothing but local compute.

I spent the next hour asking it increasingly weird questions. (You can change the system prompt to make it talk like a Gen Z teenager, which is exactly as funny as you'd expect. "Bestie that's giving main character energy fr fr.")

TERM_01● ONLINE

$> ORACLE_OS v1.0.4 loaded

The Interesting Part

Knowing When You've Stopped Talking

Here's a problem I didn't think about until I was neck-deep in code: how does the device know when you're done speaking?

You could make the user tap again to stop recording, but that's clunky. Push-to-talk works for walkie-talkies, not for something that's supposed to feel magical. The answer is silence detection. You listen for audio, track when sound stops, and after about 1.5 seconds of quiet, you assume they're done.

The trick is using a callback-based audio stream. Instead of recording for a fixed duration, you process audio in chunks as it comes in. Each chunk, you check the peak amplitude. If it's above a threshold, someone's talking. If it drops below and stays there, they've stopped.

silence_detection.py

def callback(indata, frames, time_info, status):
    nonlocal has_speech, silence_start
    state.recording_data.append(indata.copy())
    
    # Check audio level
    level = np.max(np.abs(indata))
    
    if level > SILENCE_THRESHOLD:
        has_speech = True
        silence_start = None
    elif has_speech:
        # Started hearing silence after speech
        if silence_start is None:
            silence_start = time.time()
        elif time.time() - silence_start > SILENCE_DURATION:
            # 1.5 seconds of silence = done talking
            state.stop_recording = True

The SILENCE_THRESHOLD took some tuning. Too low and background noise triggers it. Too high and quiet speakers get cut off. 500 worked well for my USB mic in a normal room.

The Numbers

Dinner to 1am

~300

Lines of Python

Models Tested

1.7B

Billion Parameters

API Calls

Raw Input

Touch Events on Linux

Pygame can handle touch events, but I wanted more control. Specifically, I wanted to detect how long someone was touching the screen. A tap should start recording. A double-tap should clear the conversation. A long press (2 seconds) should exit the app.

The evdev library gives you raw input events straight from the kernel. Touch down, touch up, with timestamps. You can calculate duration, detect patterns, whatever you need.

touch_handler.py

def touch_handler(dev, running):
    for ev in dev.read_loop():
        if not running[0]: break
        
        # Raw touch events: type=1, code=330
        if ev.type == 1 and ev.code == 330:
            if ev.value == 1:  # Touch down
                state.is_touching = True
                state.touch_start_time = time.time()
            else:  # Touch up
                dur = time.time() - state.touch_start_time
                state.is_touching = False
                
                if dur >= LONG_PRESS_TIME:
                    running[0] = False  # Exit app
                elif time.time() - state.last_tap_time < DOUBLE_TAP_TIME:
                    state.messages = []  # Double-tap: clear
                elif state.status == "ready":
                    # Single tap: start listening
                    threading.Thread(target=voice_flow).start()
                    
                state.last_tap_time = time.time()

The magic numbers: type=1 means it's a key/button event, and code=330 is the touch contact code. value=1 means press, value=0 means release.

Streaming

The Typewriter Effect

Language models generate text token by token. If you wait for the full response before showing anything, the user stares at a blank screen for several seconds. That's death for the "mystical oracle" vibe.

Ollama supports streaming, so you get each token as it's generated. I append them to the message object in real-time, then the UI loop handles the actual display with a separate typewriter effect (18ms between characters feels right).

streaming_response.py

# Stream tokens as they generate
resp = requests.post('http://localhost:11434/api/generate', 
    json={
        'model': 'smollm2:1.7b',
        'prompt': full_prompt,
        'stream': True,
        'options': {
            'num_ctx': 512,      # Context window
            'num_predict': 300,  # Max tokens
            'temperature': 0.3   # Lower = more focused
        }
    }, stream=True)

oracle_msg = Message('oracle', '')
state.messages.append(oracle_msg)

for line in resp.iter_lines():
    if line:
        data = json.loads(line)
        if 'response' in data:
            oracle_msg.text += data['response']
        if data.get('done'): break

typewriter_effect.py

# In the main loop - typewriter effect
for msg in state.messages:
    if msg.role == 'oracle' and msg.displayed_chars < len(msg.text):
        if pygame.time.get_ticks() - state.last_type_time > 18:
            msg.displayed_chars += 1
            state.last_type_time = pygame.time.get_ticks()

The two-stage approach (streaming into the message, then displaying with a delay) means the text appears smoothly even if tokens arrive in bursts. It also lets me hide the cursor while text is generating. Small touch, but it makes the interface feel more polished.

Model Selection

Finding a Brain That Fits

Ollama makes model swapping trivial. ollama pull modelname and you're good. I tried probably five or six before landing on one that worked.

Llama 3.2 3B was my first attempt. Smart, coherent, but agonizingly slow. Like, you could count to three between each word. Not great for conversation flow.

SmolLM2 1.7B was the winner. I'd never heard of it before this project. It's smaller, faster, and still surprisingly coherent. The responses aren't as nuanced as a bigger model, but for a fun voice assistant? More than good enough.

✗ Llama 3.2:3b

Smart but glacially slow. Multi-second pauses between tokens. The mystical vibe died waiting.

~ Llama 3.2:1b

Fast and a bit unhinged. Good for the Gen Z personality mode where chaos is the point.

✓ SmolLM2:1.7b

The sweet spot. Quick responses, coherent enough to be useful, light enough to leave headroom.

Challenges

What Went Wrong

Hardware projects are humbling. Software errors give you stack traces. Hardware errors give you silence and confusion.

Headless Development

The fullscreen Pygame UI covers everything. If it crashes, you're staring at a black void with no terminal, no way to exit, no feedback at all.

Hover to see solution →

Solution

VNC. Remote desktop into the Pi from my laptop. I could see the touchscreen UI while having full terminal access. First time using it... game changer for headless work.

RAM Tetris

8GB sounds like a lot until the OS takes 1GB, Whisper takes 2GB, and you're trying to run an LLM with what's left. Most 'small' models still wanted 6-8GB.

Hover to see solution →

Solution

SmolLM2 (1.7B params) fit perfectly. Never heard of it before this project. It loads fast, runs fast, and leaves enough headroom for everything else.

User Feedback Without UI

Terminal aesthetic means no loading spinners or status bars. How does someone know if it's listening, thinking, or crashed?

Hover to see solution →

Solution

Visual language through the things I already had: red dot when recording, blinking cursor when idle, animated ellipsis when processing. The screen tells you everything without breaking the vibe.

Thermal Management

Speech recognition + LLM inference = 100% CPU. The Pi 5 gets hot. Like, 'might throttle and ruin everything' hot.

Hover to see solution →

Solution

Active cooling with a PWM fan. It kicks in during inference and keeps temps stable. Not pretty, but the Pi hasn't thermal-throttled once.

The Pivot

What If It Was Actually Useful?

The novelty of asking a fake oracle about the meaning of life wears off eventually. But I kept staring at the thing, thinking: I have a portable, offline, voice-in-text-out device. What would I actually want to use this for?

One of my projects is OslerAI, an AI platform for medical education. And one thing that comes up constantly in healthcare is reference lookups... drug doses, lab ranges, clinical guidelines. Healthcare workers pull out their phone, unlock it, find the app, type the query... it's a lot of friction for a quick answer.

What if you could just ask? Tap, speak, answer. "Normal potassium range?" → "3.5 to 5.0 millimoles per liter." No unlocking, no typing, no cloud latency.

I cloned the project, swapped the system prompt, and called it Pocket Doctor. Same hardware, completely different personality: "You're a clinical reference tool. Canadian context. SI units. Be concise. No disclaimers."

Pocket Doctor Mode

Canadian clinical context (SI units: mmol/L, μmol/L)
Concise 1-2 sentence answers, no "As an AI..." hedging
Drug doses, lab values, vital sign ranges
Teal-on-dark UI, easier on tired eyes

The Stack

What's Running

Hardware

Raspberry Pi 5, 8GB RAM model
7" Touchscreen, official RPi display
USB Microphone, generic, works fine
Active Cooler, essential for sustained inference

Software

Pygame, UI, touch events, rendering
sounddevice, audio capture with callbacks
Faster-Whisper, local speech-to-text
Ollama, local LLM runtime

Raspberry Pi 5PythonPygamesounddeviceevdevFaster-WhisperOllamaSmolLM2numpy

What's Next

The To-Do List

Right now it's a naked Pi with a screen and cables everywhere. Not exactly portable. The dream is to make it actually pocketable.

Enclosure

3D print something that looks like a product instead of a science project.

Battery

A UPS HAT or chunky LiPo to cut the power cable.

Better Models

Small models are improving fast. Ready to swap in whatever drops next.

The dream: A pocketable, battery-powered, fully offline AI assistant. Star Trek tricorder energy... except it dispenses clinical reference info and occasionally speaks in Gen Z slang.

Takeaway

The Point

I started this project at midnight because I couldn't sleep. By 4am I had a working voice assistant that runs entirely offline on an $80 computer.

That's the part that surprised me. A few years ago, this would have required a server rack and a team of engineers. Now it's a one-night project with off-the-shelf hardware and open-source models. The barrier to building local AI keeps dropping.

Is this production-ready? No. Would I trust Pocket Doctor in an actual clinical setting? Absolutely not, always verify with real sources. But as a proof of concept for what's possible with local compute? I think there's something here.

Seacrest out.

Brian Stever

View Resume • More Projects

I Put a Brain in aPotato-Powered Computer

Post-Dinner Project Spiral

Tap. Speak. Receive.

Knowing When You've Stopped Talking

Dinner to 1am

Touch Events on Linux

The Typewriter Effect

Finding a Brain That Fits

✗ Llama 3.2:3b

~ Llama 3.2:1b

✓ SmolLM2:1.7b

What Went Wrong

Headless Development

Solution

RAM Tetris

Solution

User Feedback Without UI

Solution

Thermal Management

Solution

What If It Was Actually Useful?

Pocket Doctor Mode

What's Running

Hardware

Software

The To-Do List

Enclosure

Battery

Better Models

The Point

I Put a Brain in a
Potato-Powered Computer