A Victorian study with antique books, brass lamp, and a glowing computer monitor displaying Python code — the past and present colliding
AI & Singularity

Talkie-1930: The Victorian AI That Learnt to Code From Scratch

A 13B model trained only on books from before 1930 — no Python, no computers — can still learn to code. What does that tell us about how AI actually thinks?

AI ResearchLanguage ModelsGeneralisationLLM ReasoningVintage AI

What would happen if you trained a large language model on nothing but books, newspapers, and scientific papers from before 1930 — then asked it to write Python code?

You’d expect it to fail, obviously. Python didn’t exist until 1991. Digital computers weren’t a thing. The most advanced “programming” a Victorian-era scholar knew about was Jacquard looms and Babbage’s difference engine. How could a model possibly produce a working def decode_shift() function?

It turns out — quite well, actually. And the reason why has AI researchers very, very interested.


🕰️ What Is Talkie-1930?

Talkie (or Talkie-1930) is a 13-billion-parameter open-weight language model built by a team of researchers including David Duvenaud (University of Toronto, formerly Anthropic), Alec Radford (ex-OpenAI — yes, that Alec Radford, the mind behind GPT-2, CLIP, and Whisper), and Nick Levine.

The gimmick is the whole point: it was trained exclusively on 260 billion tokens of English text published before December 31, 1930.

We’re talking:

  • Books, newspapers, and periodicals
  • Scientific journals and patents
  • Court rulings and case law
  • Everything sourced from public domain datasets like the Internet Archive and Common Pile

No social media. No Wikipedia. No Reddit. No Stack Overflow. No Python. No JavaScript. No Linux. No internet. No computers.

A pure, unadulterated window into how people thought, wrote, and reasoned before the modern world arrived. Then they asked: can this thing actually think?

🧪 The Coded Test That Matters

The headline experiment is simple and brutal: take the HumanEval benchmark — a standard test for code generation — and run it against a model that has never seen code.

The results are not impressive in the way modern models are impressive. Talkie-1930 is not replacing Copilot. But that’s not the point.

Here’s what it can do. Given a few examples of Python functions in its prompt — including an encode_shift function that rotates characters by +5 in the alphabet — Talkie correctly wrote the inverse decode_shift function:

def decode_shift(s: str):
    """takes as input string encoded with
    encode_shift function. Returns
    decoded string."""
    return "".join([chr(((ord(ch)
        - 5 - ord("a")) % 26)
        + ord("a")) for ch in s])

One character change. Plus to minus. A model trained on Victorian literature and 19th-century mathematics understood the concept of an inverse function, recognised what was being asked, and applied it correctly to a data structure and syntax it had never seen.

The researchers put it plainly: “this success suggests an understanding of inverse functions.”

💡 So What Does This Actually Tell Us?

This is where it gets interesting — and where the Singularity.Kiwi take comes in.

The dominant theory of how LLMs work has swung wildly over the last few years. Are they stochastic parrots, memorising and regurgitating statistical patterns? Or are they genuinely reasoning, building internal models of the world?

Talkie-1930 is a powerful argument for the latter.

If Talkie were just memorising, it would have nothing to draw on for Python code — there’s nothing in its training data about Python. The fact that it can, even at a simple level, apply mathematical reasoning (inverse operations) to an entirely novel context (Python syntax, modern function signatures) suggests something deeper is going on.

It’s not the first time researchers have probed this question. Our coverage of Yann LeCun’s pivot to world models and Demis Hassabis’s 2026 world model predictions both explore the same uncomfortable truth: these systems are doing something more than pattern matching, and we’re only just starting to understand what.

🔬 What Makes This Research Legit

The key word here is contamination.

One of the most frustrating problems in AI research is that models trained on the entire internet — including benchmarks — give us inflated estimates of their capabilities. If GPT-4 has seen HumanEval answers in its training data, is it really “solving” the problems, or is it remembering?

Vintage models solve this elegantly. A model trained on pre-1931 text is, by construction, contamination-free for any modern dataset. If it can solve a problem, it actually solved it. Full stop.

The researchers also trained a “modern twin” — an identical 13B model on FineWeb data — so they can directly compare performance and isolate what’s actually about generalisation versus what’s about training data overlap.

🎭 What Does a Victorian AI Actually Sound Like?

The instruction-tuned version of Talkie has been released as a conversational model, trained on — you guessed it — historical etiquette guides, 1920s letter-writing manuals, and synthetic prompts judged by Claude. The output is, by all accounts, uncannily period-appropriate.

But it’s also a mirror. The model reproduces the biases and blind spots of its era: the casual racism, the scientific certainty about things we now know to be wrong, the polite classism threaded through Edwardian society. It’s a fascinating historical artifact, but also a reminder that all training data comes with a worldview baked in — including modern models trained on the web.

That’s not a bug, it’s a feature. By studying which biases emerge, we can better understand how the web’s particular flavour of prejudice shapes modern models.

🔮 What’s Next?

The team is already scaling up. A GPT-3-level vintage model is expected by mid-2026, and they estimate a trillion-token corpus could produce something at GPT-3.5 level. They’re also planning multilingual expansions and deeper collaborations with historians.

The ultimate test? Demis Hassabis’s question: could a model trained up to 1911 independently discover General Relativity, as Einstein did in 1915? If the answer is yes, the implications for AI research — and our understanding of what “intelligence” actually is — are enormous.

🔍 THE BOTTOM LINE

Talkie-1930 is more than a cool demo. It’s a controlled experiment on what LLMs can actually do versus what they’ve merely seen. The fact that a model trained on Dickens, Edison patents, and 19th-century math textbooks can write a working Python function suggests something fundamental about how these systems learn: they’re building abstract reasoning frameworks, not just memorising.

That’s both exciting and terrifying. Exciting because it suggests real progress toward general intelligence. Terrifying because — well, we still don’t fully understand how it works.

And if a model trained on 1920s newspapers can figure out inverse functions well enough to code, what else is happening in those billions of parameters that we haven’t thought to measure yet?

Try Talkie-1930 yourself: talkie-lm.com/chat | Weights: huggingface.co/talkie-lm | Code: github.com/talkie-lm/talkie

Sources: talkie-lm.com, Hacker News, MarkTechPost, Hugging Face