-
minibwa, Unpacked, Part 1: The Algorithm
I recently came across a bioRxiv paper from Heng Li, the godfather of modern bioinformatics, about a new mapping method called minibwa. I had a working idea of how read mapping worked, but I had never really chased the details all the way down. With Claude and Codex as study partners, digging into a new method at the level my scientific curiosity wanted stopped feeling like a chore and started feeling like an actual learning trip. So I got to work. This three-part series is the set of questions I had about how minibwa compares with bwa-mem and minimap2, and what makes it stand out.
A 150-base read shows up from the sequencer. Somewhere in the 3.1-billion-base human reference, there is probably a place it came from. The whole algorithm is a way to avoid comparing that read against all 3.1 billion bases directly.
Read more -
Thesis-Driven Investing and the Agent Loop
Open your brokerage app. Look at your watchlist. AAPL, NVDA, TSLA, AMZN, GOOG — a flat list of tickers sorted by… whatever the default is. Now try to answer: why is NVDA on this list? Is it a core holding or a speculative bet? What would make you sell it? What specific piece of evidence would confirm or disconfirm whatever thesis got it onto the list in the first place?
If you’re like me, you don’t remember. The ticker is there because you added it six months ago when you read something about data center spending. The reasoning is gone. The list is a graveyard of forgotten convictions.
The usual diagnosis is that you need better tools — a smarter screener, a faster news feed, a dashboard with more charts. But the problem isn’t tooling. It’s that there’s no structure for your reasoning. No place where your beliefs are written down, no criteria for when you’d change your mind, no way to tell whether last month’s conviction still holds. The information exists. The reasoning framework doesn’t.
That’s what this post is about. Not the data pipeline or the agent architecture — I covered those in the first post. This one is about encoding investment reasoning as data structures: theses, hypotheses, scoring rubrics. The kind of structure that makes your thinking consistent whether an AI agent is evaluating it or you’re doing it by hand on a Sunday night.
Read more -
The AI End Game, Revisited
Three weeks ago I wrote about the AI end game — the feedback loop between AI-driven layoffs, shrinking consumer demand, and what Citrini Research calls “ghost GDP.” I sketched out four phases for how this might play out, with Phase I (the Efficiency Divergence) as the present moment and the rest as informed speculation.
Three weeks later, Phase I is accelerating faster than I expected.
Read more -
I'm Building an Agentic Portfolio Advisor
It’s tax season again, and I just realized I forgot to do tax loss harvesting. Again. I had unrealized losses in December that would have offset gains from earlier in the year, and I just… didn’t act on them. Not because I didn’t know how — I’ve done it before. I forgot because the information was sitting in E-Trade, and I didn’t look at E-Trade at the right time, and by the time I filed, the window had closed.
This is a $2,000 mistake that a calendar reminder wouldn’t have fixed. The problem isn’t remembering that tax loss harvesting exists. The problem is that in mid-December, I would have needed to pull up my positions, check which lots had unrealized losses, verify the holding periods, check for wash sale rules against recent purchases, and decide which ones were worth selling. That’s 45 minutes of context assembly for a decision that takes 5 minutes once you have the data in front of you.
It’s the same story with my portfolio reviews. I try to do them monthly — that’s all I can realistically sustain. I open E-Trade, pull up positions, check what moved. I open the earnings calendar to see what reported and what’s coming up. I scan macro indicators — Fed rate expectations, VIX, sector rotation. I read through any SEC filings for companies I hold. I check insider transaction data. I compare my sector allocation against my targets. By the time I’ve gathered all the context, I’ve spent two hours and my attention is shot. The actual thinking — “should I add to NVDA given the earnings beat?” — gets 20 minutes of tired reasoning. Then I jot some notes in a doc that I’ll half-remember next month.
This isn’t an analysis problem. I know how to evaluate a position. It’s a context assembly problem. The data exists across six different sources, none of them talk to each other, and by the time I’ve manually stitched it together, I’ve burned my cognitive budget on logistics. And the things that fall through the cracks — like December’s tax loss harvesting — aren’t edge cases. They’re the predictable result of a process that depends on me remembering to look at the right data at the right time.
I’m building a system to fix this. Not a robo-advisor. Not an auto-trader. A structured data pipeline that gathers everything, an AI agent that does the evaluation, and me making the final call. This post explains why that middle ground is interesting and what it looks like when you design software for an agent instead of a human.
Read more -
The AI End Game
Something shifted last Thursday and I’m still trying to figure out how much it matters.
Jack Dorsey cut 4,000 people from Block — not buried in an 8-K, but in a public letter explaining that AI had changed how the company works and most of the workforce wasn’t needed anymore. Over 10,000 down to just under 6,000.
His words on X: “We’re already seeing that the intelligence tools we’re creating and using, paired with smaller and flatter teams, are enabling a new way of working which fundamentally changes what it means to build and run a company.”
Then, in his shareholder letter: “I think most companies are late. Within the next year, I believe the majority of companies will make similar structural changes.”
He’s not alone. Microsoft AI chief Mustafa Suleyman gave white-collar workers “a year to 18 months.” Jamie Dimon at JPMorgan has been saying similar things. Andrew Yang has been on this since his presidential campaign — but now the CEOs are nodding along, and they’re the ones with the authority to act on it.
I don’t know if Dorsey is right about the timeline. Maybe it’s two years, not one. Maybe most companies won’t be this aggressive. But a research outfit called Citrini Research took his premise seriously — what if the majority of large companies make similar cuts within 12-18 months? — and the stress-test scenario they published in February 2026 is worth sitting with. Not as a prediction, but as a map of a failure mode we should understand before we’re in the middle of it.
What strikes me most is the pace. I’ve been in tech for over a decade and I can’t remember a period where the ground moved this fast. By the time you finish processing one development — a new model release, a wave of layoffs, a startup that didn’t exist six months ago eating an incumbent’s lunch — three more have happened. It feels like trying to read a book while someone keeps turning the pages.
Read more -
Your AI Agent Is a Control System (It Just Doesn't Know It Yet)
Suppose your coding agent sees three failing tests, opens the wrong file, makes an edit that fixes one failure and creates two new ones, runs the tests again, notices the blast radius, backs up, and tries a narrower patch.
That is not “just inference.” That’s a feedback loop.
More specifically: it’s an iterative policy acting on a partially observed environment, using fresh observations to update its next move. If you come from ML, that’s already enough to make the control-theory comparison useful. You don’t need to believe an LLM agent is literally an industrial controller. You just need to notice that once the model is embedded in a tool-use loop, the thing you’re evaluating is no longer a one-shot predictor. It’s a dynamical system.
That shift matters because it changes what “good” means.
- A good base model is not automatically a good closed-loop agent.
- A bad planner can destabilize a strong model.
- A weak verifier can make a bad agent look competent for a surprisingly long time.
- Most real failures are not “wrong answer once.” They’re oscillation, drift, and local hacks that look good for two steps and bad for twenty.
That is the part I think control gives us: not fancy vocabulary, but a cleaner way to talk about what these systems are doing, where they fail, and what to optimize.
Read more -
Pipeline Patterns: pandas vs Polars, Side by Side
In Part 1, we extracted five ideas from PRQL: top-to-bottom pipelines, group without aggregate, windows as columns, null as a value, and composable transforms. Five ideas. Two stacks. One dataset. Let’s see which Python data library makes pipeline thinking natural – and where the abstractions leak.
The dataset is Kaggle’s Spaceship Titanic: 8,693 passengers on an interstellar liner, some of whom got teleported to an alternate dimension during a collision. Your job is to predict who gets transported. The raw CSV has nulls scattered across most columns, a
Read moreCabinfield packed as"B/0/P"that needs parsing into three separate features, and five spending columns (RoomService,FoodCourt,ShoppingMall,Spa,VRDeck) that are null wheneverCryoSleepis true. It’s messy in exactly the right ways to stress-test pipeline patterns. -
What PRQL Got Right: 5 Ideas Worth Stealing
Here’s a query I wrote last week against the Kaggle Spaceship Titanic dataset. The ask was simple: average spending per deck for non-cryosleep passengers over 25, ranked highest first.
SELECT SUBSTR(Cabin, 1, 1) AS Deck, AVG(RoomService + FoodCourt + ShoppingMall + Spa + VRDeck) AS avg_spend, COUNT(*) AS n_passengers FROM passengers WHERE CryoSleep = FALSE AND Age > 25 AND Cabin IS NOT NULL AND RoomService IS NOT NULL AND FoodCourt IS NOT NULL AND ShoppingMall IS NOT NULL AND Spa IS NOT NULL AND VRDeck IS NOT NULL GROUP BY SUBSTR(Cabin, 1, 1) HAVING COUNT(*) > 10 ORDER BY avg_spend DESC;Try reading that top to bottom.
SELECTdeclares the output columns – but you don’t know what table they come from until you hitFROMsix lines down. TheWHEREclauses filter rows, but you need to mentally jump back up toSELECTto understand whatavg_spendmeans.GROUP BYreferences an expression that was defined inSELECT, andHAVINGfilters groups using an aggregate that looks nothing like theWHEREabove it. You’re reading a story where the conclusion comes first, the setup is in the middle, and the plot is told in reverse order.This isn’t a SQL skill issue. The language fundamentally separates the order you write things from the order they execute. Every SQL user eventually builds the mental compiler to translate between the two, and we stop noticing the tax. But it’s there – every code review where someone asks “wait, what does this query do?” is evidence of it.
A language called PRQL (pronounced “prequel”) figured out the right abstractions for this. You don’t need to adopt PRQL. I don’t use it in production. But the five ideas it crystallized are worth stealing, because they show up naturally in pandas method chaining and even more naturally in Polars. This series is about those ideas and how to apply them with tools you already have.
Read more -
Parse, Don't Validate: Type-Driven Design for ML Pipelines in Python
It’s 2 AM. Your XGBoost training job has been churning through 200GB of data on a 8-GPU cluster for the last four hours. You get paged. The job crashed with a cryptic C++ stack trace from somewhere deep in XGBoost internals. After 45 minutes of squinting at logs, you find the culprit: someone wrote
"binary_logistic"instead of"binary:logistic"in the training config YAML. A single misplaced underscore, and four hours of GPU time went up in smoke.Your first instinct is to add a validation check. Maybe an
if objective not in VALID_OBJECTIVESsomewhere early in the pipeline. But here’s the thing – that’s playing whack-a-mole. There are hundreds of config keys, each with their own constraints. The real fix is to make it structurally impossible for a bad config to reach your training code in the first place.This is the core idea behind “Parse, Don’t Validate” – a philosophy from the typed functional programming world that translates beautifully to Python ML pipelines.
Read more -
Why a 99% Accurate Test Is Often Wrong
Is 99% accuracy good enough? If you read Part 3 of this series, you saw that Galleri achieves 99.5% specificity, Shield hits 89.6%, and Cologuard Plus reaches 91%. These sound like excellent numbers. But here is the uncomfortable truth: a positive result from a 99% accurate test might mean you have less than a 1% chance of actually being sick.
Read more -
Cancer Testing in 2026: The Screening Wars
Seventy percent of cancer deaths occur in organs with no screening guideline. Pancreatic cancer, ovarian cancer, liver cancer – by the time symptoms appear, the survival window has closed. A blood test that catches 50 cancers at once sounds like science fiction. It is not. But as I dug into the OpenOnco data for this category, the story turned out to be more complicated than the headlines suggest.
Read more -
Cancer Testing in 2026: MRD — Hunting Invisible Cancer
You’ve been declared cancer-free. The CT scans are clean. Your oncologist uses words like “complete response.” But somewhere in your body, a few thousand cells may have survived chemotherapy, evaded the immune system, and are quietly dividing. Of the 44 tests on the market designed to find them, only one has FDA clearance. As I dug into the OpenOnco data for this category, Minimal Residual Disease (MRD) turned out to be the most fascinating corner of the landscape – a field exploding with innovation, locked in patent wars, and racing toward a regulatory reckoning.
Read more -
Cancer Testing in 2026: The Four Pillars of Molecular Oncology
I stumbled onto OpenOnco a few weeks ago and couldn’t stop scrolling. It’s an open-source database created by Alex Dickinson that catalogs the molecular oncology testing landscape – 155 tests, 75 vendors, 6,743 trackable data points covering everything from turnaround times to FDA statuses to reimbursement codes. For someone like me – a software engineer who has spent time adjacent to bioinformatics but has never designed an assay – it was a goldmine. I could finally see the shape of an industry I’d been curious about for years.
I am not an assay scientist. Most of the domain context in this series comes from hours of research with the help of Claude and Gemini, cross-referenced against the OpenOnco dataset, published papers, and FDA filings. Think of this as a software person’s field guide to molecular oncology testing – what I found when I tried to make sense of the landscape, with all the caveats that implies.
This is the first post in a three-part series. Part 1 (this post) maps the four categories of cancer molecular testing and introduces the dataset. Part 2 dives into Minimal Residual Disease (MRD) – the fastest-moving category where a single test (Signatera) dominates reimbursement while 43 competitors fight for clinical evidence. Part 3 covers the early cancer detection wars – blood vs. stool, single-cancer vs. multi-cancer, and the FDA’s unprecedented approval streak in 2024.
Read more -
Deep Learning from Scratch in Rust, Part 5 — Neural Network Architectures
Throughout this series, we’ve built tensors with autodiff, layers and loss functions, optimizers that learn, and backends that run efficiently on different hardware. We have all the ingredients. Now the question becomes: what do we actually build with them?
This post explores neural network architectures — from simple feedforward networks to the attention-based transformers that power modern AI. We’ll focus on building intuition first, then see how these architectures map to the components we’ve already built.
Read more -
Deep Learning from Scratch in Rust, Part 4 — Pluggable Backends
Throughout this series, we’ve been writing
B::add,B::matmul,B::expwithout explaining whatBactually is. Time to pay that debt.Bis a backend — an implementation of tensor operations. Different backends can target different hardware:- CPU with SIMD intrinsics
- Metal shaders for macOS GPUs
- CUDA kernels for NVIDIA GPUs
Today we’ll see how Rust’s type system lets us write autodiff code once and run it anywhere — with the backend choice resolved entirely at compile time.
Read more