I Built a Voice-to-Blog Pipeline and Then Stopped Blogging

Brian Stever

2024 · Next.js, OpenAI, PostgreSQL, TipTap

Abstract. This project explored whether a personal publishing workflow could be reduced to a short voice memo and a button press. The resulting system combined Whisper transcription, GPT-4 drafting, DALL-E image generation, and a PostgreSQL-backed Next.js frontend. Technically, it worked. Behaviorally, it did not solve the more difficult problem, which is that I still needed to want to blog.

1.Problem Statement

I like having a personal site. I like the idea of writing regularly. I do not, however, enjoy the specific moment where you open a blank document, place your fingers on a keyboard, and are asked to produce coherent prose from nothing. Talking is easier. Talking I can do for free.

The basic hypothesis was embarrassingly simple: if typing was the bottleneck, maybe I could remove typing from the system. Record a voice memo, transcribe it, let GPT-4 turn the ramble into something publishable, generate a header image, and ship the whole thing to the blog. In theory, the cost of publishing drops from “write a post” to “talk for two minutes.”

There are obvious reasons this is a bad idea. Your spoken thoughts are usually worse than you remember, and once you give an LLM permission to “help,” it will occasionally decide that you are an authority on topics you have never once studied. That said, both problems seemed manageable compared to the existential weight of opening a blank Google Doc.

2.Method

The pipeline accepted an uploaded audio file, sent it to Whisper for transcription, passed the resulting text to GPT-4 with a prompt asking for a coherent first draft, then used the generated title to seed a DALL-E header image. The final output was saved to a PostgreSQL-backed content store and surfaced through a Next.js frontend with a TipTap editor for cleanup before publishing.

In practice, the interface was intentionally narrow. I wanted the system to feel less like a CMS and more like a small machine: record, wait, review, publish. The more configuration I added, the more I was rebuilding a writing tool instead of removing one.

ai-blogger pipeline
01Record a rough voice memo
02Transcribe with Whisper
03Expand into a draft with GPT-4
04Generate a header image with DALL-E
05Persist title, body, and image URL
06Publish to the site
const createPost = async (audioBlob) => {
  const transcript = await whisper.transcribe(audioBlob);
  const draft = await gpt4.complete({ transcript });
  const image = await dalle.generate(draft.title);
  await db.posts.insert({ ...draft, imageUrl: image.url });
};
Figure 1. Simplified representation of the voice-to-blog pipeline. The demo advances through transcription, drafting, image generation, and persistence.

3.Observed Behavior

The system worked well enough to prove the concept. A rough memo could become a structured draft in under a minute. The generated posts were often readable, occasionally decent, and sometimes suspiciously eager to improve on what I had actually said. OpenAI would call that “creative assistance.” I would call it “inventing facts about my life.”

The real surprise was that removing the typing step did not automatically create a blogging habit. It turns out the barrier to publishing is not always keystrokes. Sometimes the barrier is taste. Sometimes it is energy. Sometimes it is not wanting to read a machine's interpretation of your own half-formed thought and decide whether it still sounds like you.

Table 1. Operational summary of the prototype.

MetricValue
External AI services3
Time from memo to draft<60 s
Rich-text editorTipTap
Days before I forgot to use it90

4.Failure Modes

The first failure mode was transcription quality. Background noise, mumbling, and low-energy voice notes made Whisper produce transcripts that were technically English but spiritually inaccurate. The phrase “AI blogging” briefly became “eye flogging,” which felt less like a typo and more like a judgment.

The second was model drift. GPT-4 is extremely willing to smooth over gaps in your thinking by inventing connective tissue you never provided. This is excellent if you want a confident blog post and less excellent if you care whether the confident blog post remains tethered to reality.

The third was maintenance. The project used a free-tier database. I then failed to publish anything for long enough that the database paused from inactivity. This is an almost perfect case study in solving the wrong problem very well.

5.Reflection

I still think the core idea is sound. For the right kind of writer, someone who thinks clearly out loud and is comfortable editing generated drafts, this is a genuinely useful workflow. It reduces the overhead of getting from thought to draft, and that matters.

What it did not do was remove the editorial responsibility from the process. The machine can draft, summarize, and decorate, but it cannot decide whether a post is worth publishing. Unfortunately, that remains a human problem. More unfortunately, the human was me.