The 1989 Mac That Learned: How a Vintage Macintosh Ran a Real Transformer Neural Network in HyperCard
In 1989, Apple’s Macintosh computers were marvels of personal computing—capable of running spreadsheets, word processors, and the revolutionary HyperCard, a visual programming environment that let users build interactive “stacks” of cards with buttons, fields, and scripts. Fast forward to today, and artificial intelligence dominates headlines with billion-parameter models running on GPU clusters. Yet, in a stunning act of retro-computing defiance, a developer has trained a fully functional transformer neural network—complete with embeddings, positional encoding, self-attention, backpropagation, and gradient descent—entirely within HyperCard, on a 1989 Macintosh. The result? A 1,216-parameter AI model that learns a complex mathematical pattern, proving that the foundations of modern AI aren’t magic—they’re math, and math works anywhere.
This project, dubbed MacMind, is more than a technical curiosity. It’s a powerful demonstration that the core ideas behind today’s most advanced AI systems—like the transformers that power ChatGPT and image generators—are not inherently tied to modern hardware. They can run, albeit slowly, on machines from the floppy disk era. Every line of code is written in HyperTalk, the simple but expressive scripting language that shipped with HyperCard. And perhaps most astonishingly, the entire “intelligence” of the model—its learned weights—is stored as 1,216 numbers hidden in fields within the HyperCard stack. Save the file, shut down the Mac, and when you return, the trained mind is still there, ready to run.
The task MacMind was trained on is deceptively simple: learn the bit-reversal permutation, a foundational step in the Fast Fourier Transform (FFT). The FFT is a cornerstone of digital signal processing, used in everything from MP3 compression to MRI imaging and quantum computing. It transforms a signal from the time domain into the frequency domain, revealing the underlying frequencies. But before the FFT can work its magic, it often requires reordering data using a bit-reversal permutation—a seemingly random rearrangement of indices based on reversing the binary digits of their positions.
For example, in an 8-element array, index 1 (binary `001`) becomes index 4 (binary `100`), and index 3 (`011`) becomes index 6 (`110`). The pattern is deterministic but not intuitive. MacMind’s job was to learn this mapping from scratch—no formulas, no hints. It had to discover the relationship between input positions and their reversed counterparts purely through trial, error, and attention.
The model itself is a minimalist transformer—stripped down to its essentials. It uses self-attention to weigh the importance of different input positions, allowing it to learn which bits in the binary representation of an index matter most for predicting the reversed output. It employs positional encoding to give the model a sense of order, since transformers don’t inherently understand sequence. And it learns via backpropagation and gradient descent, the same optimization techniques used in today’s largest AI models.
But here’s the twist: all of this runs in HyperTalk, a language designed for user interface scripting, not numerical computation. There are no built-in matrix operations, no floating-point libraries, no GPU acceleration. Every multiplication, every sigmoid, every gradient update is implemented manually in a language that was never meant for this. And yet, it works.
Training took time—nearly 200 steps to reach convergence. By step 193, the model was oscillating between 50%, 75%, and 100% accuracy, a classic sign of a learning system approaching a solution. It wasn’t smooth. It wasn’t fast. But it was learning. Like a ball rolling down a hill and settling into a valley, the model’s loss function decreased over time, guided by gradients computed step by step in HyperTalk.
1989 Macintosh hardware—a 16 MHz Motorola 68030 processor with minimal RAM.
HyperTalk execution—interpreted line by line, making each training step painfully slow by today’s standards.
193 training steps to convergence—a small number for a neural network, but monumental for a vintage machine.
No external libraries—everything built from scratch in a GUI scripting language.
The brilliance of MacMind lies not in its speed or scale, but in its conceptual clarity. Every part of the neural network is visible. You can option-click any button in HyperCard and read the actual math—the attention weights, the loss calculations, the gradient updates—all written in plain English-like code. There’s no black box. No opaque API. Just math, logic, and persistence.
This transparency is a powerful antidote to the mystique surrounding modern AI. We’re often told that AI is a “black box,” a magical system that learns in ways we can’t understand. But MacMind proves otherwise. Backpropagation is just calculus. Attention is just weighted averaging. Gradient descent is just hill climbing. These are ideas that can be implemented on a 35-year-old computer, in a language designed for kids’ games and business forms.
The creator of MacMind, a former physics student, was motivated by a desire to demystify AI. As someone who studied signal processing and quantum mechanics—fields deeply reliant on the FFT—they saw the bit-reversal permutation as the perfect test case: a real-world problem rooted in mathematics, yet complex enough to require learning. By training a transformer to solve it on a 1989 Mac, they demonstrated that intelligence isn’t about hardware—it’s about algorithms, data, and iteration.
And the model persists. Because the weights are stored in hidden fields within the HyperCard stack, the trained network survives shutdowns, reboots, and even file transfers. It’s a digital mind that lives in a file, portable across decades of computing history. You could, in theory, email the stack to someone in 1992, and they could run it on their Macintosh IIci and see the same results.
This project also highlights the resilience of software. While hardware becomes obsolete, well-designed systems can endure. HyperCard may be dead, but its ideas live on in modern web development, app builders, and visual programming tools. And the transformer architecture, first introduced in 2017, now runs on everything from supercomputers to smartwatches—and, as we’ve seen, even on a Mac from the Reagan era.
MacMind also raises philosophical questions about the nature of intelligence. Is a model that learns a bit-reversal permutation “intelligent”? By today’s standards, probably not. But it exhibits emergent behavior—a system that, through repeated interaction with data, discovers a pattern it was never explicitly programmed to find. That’s the essence of machine learning. And it happened on a machine with less processing power than a modern calculator.
The project includes three key components: a pre-trained stack (after 1,000 training steps), a blank stack for users to train their own model, and a Python/NumPy reference implementation that validates the math. This last piece is crucial—it ensures that the HyperTalk code isn’t just producing plausible results, but is actually computing the correct gradients and updates. The fact that the two implementations agree is a testament to the rigor of the work.
Running MacMind today requires an emulator or a vintage Mac, but the principles it embodies are universal. It’s a reminder that innovation often comes not from more power, but from deeper understanding. We don’t need quantum computers to grasp how AI works. We just need curiosity, patience, and a willingness to look under the hood.
The entire model is stored in a single HyperCard stack file.
Training involves forward passes, loss computation, backpropagation, and weight updates—all in HyperTalk.
The model learns purely through attention and gradient descent, with no hard-coded rules.
The project includes validation code in Python to ensure mathematical accuracy.
In an age of AI hype and fear, MacMind is a quiet revolution. It strips away the complexity and reveals the simplicity beneath. It shows that the future of AI isn’t just in scaling up—it’s in scaling down, in making the invisible visible, in teaching the next generation that intelligence, whether artificial or human, is built on patience, pattern recognition, and persistence.
And maybe, just maybe, it proves that the most powerful computers aren’t the ones with the most gigahertz—but the ones with the most insight.
This article was curated from Show HN: MacMind – A transformer neural network in HyperCard on a 1989 Macintosh via Hacker News (Top)
Discover more from GTFyi.com
Subscribe to get the latest posts sent to your email.


