-
The AI End Game
Something shifted last Thursday and I’m still trying to figure out how much it matters.
Jack Dorsey cut 4,000 people from Block — not buried in an 8-K, but in a public letter explaining that AI had changed how the company works and most of the workforce wasn’t needed anymore. Over 10,000 down to just under 6,000.
His words on X: “We’re already seeing that the intelligence tools we’re creating and using, paired with smaller and flatter teams, are enabling a new way of working which fundamentally changes what it means to build and run a company.”
Then, in his shareholder letter: “I think most companies are late. Within the next year, I believe the majority of companies will make similar structural changes.”
He’s not alone. Microsoft AI chief Mustafa Suleyman gave white-collar workers “a year to 18 months.” Jamie Dimon at JPMorgan has been saying similar things. Andrew Yang has been on this since his presidential campaign — but now the CEOs are nodding along, and they’re the ones with the authority to act on it.
I don’t know if Dorsey is right about the timeline. Maybe it’s two years, not one. Maybe most companies won’t be this aggressive. But a research outfit called Citrini Research took his premise seriously — what if the majority of large companies make similar cuts within 12-18 months? — and the stress-test scenario they published in February 2026 is worth sitting with. Not as a prediction, but as a map of a failure mode we should understand before we’re in the middle of it.
What strikes me most is the pace. I’ve been in tech for over a decade and I can’t remember a period where the ground moved this fast. By the time you finish processing one development — a new model release, a wave of layoffs, a startup that didn’t exist six months ago eating an incumbent’s lunch — three more have happened. It feels like trying to read a book while someone keeps turning the pages.
Read more -
Your AI Agent Is a Control System (It Just Doesn't Know It Yet)
Suppose your coding agent sees three failing tests, opens the wrong file, makes an edit that fixes one failure and creates two new ones, runs the tests again, notices the blast radius, backs up, and tries a narrower patch.
That is not “just inference.” That’s a feedback loop.
More specifically: it’s an iterative policy acting on a partially observed environment, using fresh observations to update its next move. If you come from ML, that’s already enough to make the control-theory comparison useful. You don’t need to believe an LLM agent is literally an industrial controller. You just need to notice that once the model is embedded in a tool-use loop, the thing you’re evaluating is no longer a one-shot predictor. It’s a dynamical system.
That shift matters because it changes what “good” means.
- A good base model is not automatically a good closed-loop agent.
- A bad planner can destabilize a strong model.
- A weak verifier can make a bad agent look competent for a surprisingly long time.
- Most real failures are not “wrong answer once.” They’re oscillation, drift, and local hacks that look good for two steps and bad for twenty.
That is the part I think control gives us: not fancy vocabulary, but a cleaner way to talk about what these systems are doing, where they fail, and what to optimize.
Read more -
Parse, Don't Validate: Type-Driven Design for ML Pipelines in Python
It’s 2 AM. Your XGBoost training job has been churning through 200GB of data on a 8-GPU cluster for the last four hours. You get paged. The job crashed with a cryptic C++ stack trace from somewhere deep in XGBoost internals. After 45 minutes of squinting at logs, you find the culprit: someone wrote
"binary_logistic"instead of"binary:logistic"in the training config YAML. A single misplaced underscore, and four hours of GPU time went up in smoke.Your first instinct is to add a validation check. Maybe an
if objective not in VALID_OBJECTIVESsomewhere early in the pipeline. But here’s the thing – that’s playing whack-a-mole. There are hundreds of config keys, each with their own constraints. The real fix is to make it structurally impossible for a bad config to reach your training code in the first place.This is the core idea behind “Parse, Don’t Validate” – a philosophy from the typed functional programming world that translates beautifully to Python ML pipelines.
Read more -
Why a 99% Accurate Test Is Often Wrong
Is 99% accuracy good enough? If you read Part 3 of this series, you saw that Galleri achieves 99.5% specificity, Shield hits 89.6%, and Cologuard Plus reaches 91%. These sound like excellent numbers. But here is the uncomfortable truth: a positive result from a 99% accurate test might mean you have less than a 1% chance of actually being sick.
Read more -
Cancer Testing in 2026: The Screening Wars
Seventy percent of cancer deaths occur in organs with no screening guideline. Pancreatic cancer, ovarian cancer, liver cancer – by the time symptoms appear, the survival window has closed. A blood test that catches 50 cancers at once sounds like science fiction. It is not. But as I dug into the OpenOnco data for this category, the story turned out to be more complicated than the headlines suggest.
Read more -
Cancer Testing in 2026: MRD — Hunting Invisible Cancer
You’ve been declared cancer-free. The CT scans are clean. Your oncologist uses words like “complete response.” But somewhere in your body, a few thousand cells may have survived chemotherapy, evaded the immune system, and are quietly dividing. Of the 44 tests on the market designed to find them, only one has FDA clearance. As I dug into the OpenOnco data for this category, Minimal Residual Disease (MRD) turned out to be the most fascinating corner of the landscape – a field exploding with innovation, locked in patent wars, and racing toward a regulatory reckoning.
Read more -
Cancer Testing in 2026: The Four Pillars of Molecular Oncology
I stumbled onto OpenOnco a few weeks ago and couldn’t stop scrolling. It’s an open-source database created by Alex Dickinson that catalogs the molecular oncology testing landscape – 155 tests, 75 vendors, 6,743 trackable data points covering everything from turnaround times to FDA statuses to reimbursement codes. For someone like me – a software engineer who has spent time adjacent to bioinformatics but has never designed an assay – it was a goldmine. I could finally see the shape of an industry I’d been curious about for years.
I am not an assay scientist. Most of the domain context in this series comes from hours of research with the help of Claude and Gemini, cross-referenced against the OpenOnco dataset, published papers, and FDA filings. Think of this as a software person’s field guide to molecular oncology testing – what I found when I tried to make sense of the landscape, with all the caveats that implies.
This is the first post in a three-part series. Part 1 (this post) maps the four categories of cancer molecular testing and introduces the dataset. Part 2 dives into Minimal Residual Disease (MRD) – the fastest-moving category where a single test (Signatera) dominates reimbursement while 43 competitors fight for clinical evidence. Part 3 covers the early cancer detection wars – blood vs. stool, single-cancer vs. multi-cancer, and the FDA’s unprecedented approval streak in 2024.
Read more -
Deep Learning from Scratch in Rust, Part 5 — Neural Network Architectures
Throughout this series, we’ve built tensors with autodiff, layers and loss functions, optimizers that learn, and backends that run efficiently on different hardware. We have all the ingredients. Now the question becomes: what do we actually build with them?
This post explores neural network architectures — from simple feedforward networks to the attention-based transformers that power modern AI. We’ll focus on building intuition first, then see how these architectures map to the components we’ve already built.
Read more -
Deep Learning from Scratch in Rust, Part 4 — Pluggable Backends
Throughout this series, we’ve been writing
B::add,B::matmul,B::expwithout explaining whatBactually is. Time to pay that debt.Bis a backend — an implementation of tensor operations. Different backends can target different hardware:- CPU with SIMD intrinsics
- Metal shaders for macOS GPUs
- CUDA kernels for NVIDIA GPUs
Today we’ll see how Rust’s type system lets us write autodiff code once and run it anywhere — with the backend choice resolved entirely at compile time.
Read more -
Deep Learning from Scratch in Rust, Part 3 — Optimizers
We have gradients. Now what?
In Part 2, we built layers, models, and loss functions. Given a model and a loss, autodiff computes ∂loss/∂θ for every parameter θ. But gradients alone don’t train a model. We need an optimizer to turn gradients into parameter updates.
Today we’ll implement the three most important optimizers: SGD, SGD with Momentum, and Adam. Along the way, we’ll see why Adam became the default choice.
Read more -
Deep Learning from Scratch in Rust, Part 2 — Layers, Models, and Loss
In Part 1, we built tensor autodiff — gradients flow through multi-dimensional arrays with broadcasting and reductions handled correctly. But we still don’t have a neural network.
What’s missing? The building blocks: layers that encapsulate learnable parameters, models that compose layers, and loss functions that define what “correct” means.
Today we bridge the gap from “autodiff engine” to “trainable model.”
Read more -
Deep Learning from Scratch in Rust, Part 1 — Tensor Gradients
In the Autodiff series, we built a working autodiff engine for scalar functions. Clean, elegant, and… completely impractical. But building it was so much fun that I decided to take it all the way — from toy scalar engine to a real deep learning framework.
Real neural networks don’t operate on individual numbers. They process tensors — multi-dimensional arrays where a single forward pass might involve millions of values. Today we’ll generalize our scalar engine to tensors and discover the new problems that emerge.
Spoiler: broadcasting is where the elegance gets messy.
Read more -
Building XGBoost from Scratch in Rust, Part 3 — Scaling to Terabytes
In Part 2, we built a working gradient boosted tree implementation. It produces correct, XGBoost-compatible models. But try running it on 100 million rows and you’ll be waiting a while. Let’s understand why, and how production systems solve it.
Read more -
Building XGBoost from Scratch in Rust, Part 2 — Implementation
In Part 1, we covered the theory behind gradient boosting. Now let’s implement it. We’ll build a gradient boosted tree library in Rust that produces XGBoost-compatible models.
Read more -
From Decision Trees to XGBoost: A Visual Guide to Gradient Boosting, Part 1 — Theory
You’ve probably heard of XGBoost—it’s won countless Kaggle competitions and powers prediction systems everywhere. But how does it actually work? In this post, we’ll build up the intuition from simple decision trees to the full gradient boosting algorithm.
Read more