Apr 1, 2026 4 min read

The Accessibility Gap in Text-to-Speech — and Why It Matters for the Attention Economy

Based on "The State of Modern AI Text To Speech Systems for Screen Reader Users") by Samuel Proulx (January 5, 2026)

The explosion of AI-powered text-to-speech has transformed how sighted users interact with technology — from personal assistants to GPS navigation to automated phone systems. But there's an irony hiding in plain sight: the people who depend on TTS the most — visually impaired users — have been largely left behind.

A recent deep-dive by accessibility expert Samuel Proulx laid bare the state of TTS for screen reader users, and the findings should concern anyone in digital publishing. The technology that millions rely on daily is built on foundations from the 1990s and early 2000s, with the most popular voice among Western English visually impaired users, Eloquence, last updated in 2003. It's a 32-bit binary that can't even run natively on modern systems without emulation.

This isn't just a niche accessibility problem. It's a warning sign for the entire attention economy.

Two Worlds of Text-to-Speech

Here's the core tension: sighted users and visually impaired users want fundamentally different things from TTS. Sighted users want voices that sound warm, natural, and human. Visually impaired users need voices that are fast, precise, and controllable — often listening at 800–900 words per minute, roughly four times normal speaking speed.

Modern AI TTS systems like Kokoro, Supertonic, and Kitten TTS have been engineered for the first group. They sound great. But when tested for screen reader use, they stumble on the basics: skipping words, misreading numbers, failing to respond to punctuation cues, and requiring entire sentences before they can begin generating audio.

For someone navigating the web by ear, a skipped word or misread number isn't a minor inconvenience — it's a broken experience.

What This Means for Publishers

If you're a news publisher, this story contains a lesson that goes beyond accessibility compliance. It reveals a deeper truth about the attention economy: audio is no longer optional, and one-size-fits-all TTS doesn't serve anyone well.

Your readers are increasingly time-starved. They commute, they multitask, they consume content in fragments across devices. Offering an audio layer on your articles isn't a nice-to-have anymore — it's how you capture the attention that would otherwise go to podcasts, social media, or competing outlets.

But here's the catch: the same AI TTS that sounds impressive in a demo can quietly erode trust when it gets the details wrong. A misread figure in a financial story. A garbled proper noun in a political report. A stumble on a technical term in a science piece. For publishers, accuracy isn't negotiable.

The BotTalk Approach: Audio That Serves Everyone

This is exactly the problem BotTalk was built to solve — not by picking one TTS engine and hoping for the best, but by taking a fundamentally different approach.

BotTalk is model-agnostic. Instead of locking publishers into a single TTS provider, BotTalk lets you choose and swap between the best engines available. As AI voices improve (and they are improving rapidly), BotTalk ensures you're never stuck with yesterday's technology — a lesson the screen reader community knows all too well after two decades with Eloquence.

BotTalk is accuracy-obsessed — and this is where being model-agnostic really pays off. No matter which TTS engine a publisher chooses, BotTalk adds its own proprietary AI layer on top. The number problem Proulx describes — AI voices misreading numbers, skipping words, mangling names — is one that every TTS model struggles with. BotTalk solves it before the audio is ever generated, with two purpose-built components. First, a normalization engine that understands context: is "3:10" a time of day, a Bible verse, or a game score? The answer changes the pronunciation entirely, and BotTalk resolves the ambiguity so the underlying model doesn't have to guess. Second, a curated dictionary with over 10,000 entries covering proper names, abbreviations, technical terms, and regional expressions that no generic TTS model gets right out of the box. This AI layer is the reason leading publishers like SPIEGEL — where accuracy is not a nice-to-have but a matter of editorial credibility — choose BotTalk.

BotTalk is privacy-first. In an era where every third-party integration is a potential data liability, BotTalk minimizes text storage and keeps data flows transparent. For publishers operating under GDPR and similar frameworks — especially in the DACH region — this isn't a feature, it's a prerequisite.

And BotTalk is economically smart. Through intelligent caching and fragment reuse, publishers don't pay to re-synthesize content that's already been voiced. This is how audio scales from a handful of flagship articles to an entire content catalog without the costs spiraling.

Attention Is the Publisher's Scarcest Resource

The screen reader community's struggle illuminates a broader principle: when the tools don't match the use case, engagement breaks down. Visually impaired users forced to use voices optimized for sighted listeners lose speed and efficiency. Readers offered a clunky or inaccurate audio experience on a news site simply leave.

The publishers winning the attention economy understand that consumption format is as important as content quality. An article that can be read, listened to, or experienced across contexts — on the train, during a workout, while cooking — is an article that earns more time and deeper engagement.

BotTalk gives publishers that flexibility by turning their existing article catalog into an audio experience — no podcast production team required, no editorial workflow changes, no single-vendor risk. Just a "Listen" button that works, backed by the best available TTS technology and real analytics on how your audience actually engages with audio.

Looking Forward

Proulx's analysis ends on an uncertain note for the screen reader community: the old technology is crumbling, and the new technology doesn't yet meet their needs. It's a gap that will take serious investment and cross-disciplinary expertise to close.

For news publishers, though, the path is clearer. The technology to deliver high-quality, reliable, scalable audio already exists. The question isn't whether to add an audio layer to your content — it's how quickly you can do it without compromising on accuracy, privacy, or control.

The attention economy waits for no one. Every article published without an audio option is attention left on the table.

BotTalk is a TTS platform for digital publishers that turns articles into audio directly within the publisher's own product — website or app — with model-agnostic flexibility, privacy-first architecture, and cost efficiency through smart caching. Learn more at bottalk.io

Two Worlds of Text-to-Speech

What This Means for Publishers

The BotTalk Approach: Audio That Serves Everyone

Attention Is the Publisher's Scarcest Resource

Looking Forward

Andrey Esaulov