• I'm Building an Agentic Portfolio Advisor

    It’s tax season again, and I just realized I forgot to do tax loss harvesting. Again. I had unrealized losses in December that would have offset gains from earlier in the year, and I just… didn’t act on them. Not because I didn’t know how — I’ve done it before. I forgot because the information was sitting in E-Trade, and I didn’t look at E-Trade at the right time, and by the time I filed, the window had closed.

    This is a $2,000 mistake that a calendar reminder wouldn’t have fixed. The problem isn’t remembering that tax loss harvesting exists. The problem is that in mid-December, I would have needed to pull up my positions, check which lots had unrealized losses, verify the holding periods, check for wash sale rules against recent purchases, and decide which ones were worth selling. That’s 45 minutes of context assembly for a decision that takes 5 minutes once you have the data in front of you.

    It’s the same story with my portfolio reviews. I try to do them monthly — that’s all I can realistically sustain. I open E-Trade, pull up positions, check what moved. I open the earnings calendar to see what reported and what’s coming up. I scan macro indicators — Fed rate expectations, VIX, sector rotation. I read through any SEC filings for companies I hold. I check insider transaction data. I compare my sector allocation against my targets. By the time I’ve gathered all the context, I’ve spent two hours and my attention is shot. The actual thinking — ā€œshould I add to NVDA given the earnings beat?ā€ — gets 20 minutes of tired reasoning. Then I jot some notes in a doc that I’ll half-remember next month.

    This isn’t an analysis problem. I know how to evaluate a position. It’s a context assembly problem. The data exists across six different sources, none of them talk to each other, and by the time I’ve manually stitched it together, I’ve burned my cognitive budget on logistics. And the things that fall through the cracks — like December’s tax loss harvesting — aren’t edge cases. They’re the predictable result of a process that depends on me remembering to look at the right data at the right time.

    I’m building a system to fix this. Not a robo-advisor. Not an auto-trader. A structured data pipeline that gathers everything, an AI agent that does the evaluation, and me making the final call. This post explains why that middle ground is interesting and what it looks like when you design software for an agent instead of a human.

    Read more
  • The AI End Game

    Something shifted last Thursday and I’m still trying to figure out how much it matters.

    Jack Dorsey cut 4,000 people from Block — not buried in an 8-K, but in a public letter explaining that AI had changed how the company works and most of the workforce wasn’t needed anymore. Over 10,000 down to just under 6,000.

    His words on X: ā€œWe’re already seeing that the intelligence tools we’re creating and using, paired with smaller and flatter teams, are enabling a new way of working which fundamentally changes what it means to build and run a company.ā€

    Then, in his shareholder letter: ā€œI think most companies are late. Within the next year, I believe the majority of companies will make similar structural changes.ā€

    He’s not alone. Microsoft AI chief Mustafa Suleyman gave white-collar workers ā€œa year to 18 months.ā€ Jamie Dimon at JPMorgan has been saying similar things. Andrew Yang has been on this since his presidential campaign — but now the CEOs are nodding along, and they’re the ones with the authority to act on it.

    I don’t know if Dorsey is right about the timeline. Maybe it’s two years, not one. Maybe most companies won’t be this aggressive. But a research outfit called Citrini Research took his premise seriously — what if the majority of large companies make similar cuts within 12-18 months? — and the stress-test scenario they published in February 2026 is worth sitting with. Not as a prediction, but as a map of a failure mode we should understand before we’re in the middle of it.

    What strikes me most is the pace. I’ve been in tech for over a decade and I can’t remember a period where the ground moved this fast. By the time you finish processing one development — a new model release, a wave of layoffs, a startup that didn’t exist six months ago eating an incumbent’s lunch — three more have happened. It feels like trying to read a book while someone keeps turning the pages.

    Read more
  • Your AI Agent Is a Control System (It Just Doesn't Know It Yet)

    Suppose your coding agent sees three failing tests, opens the wrong file, makes an edit that fixes one failure and creates two new ones, runs the tests again, notices the blast radius, backs up, and tries a narrower patch.

    That is not ā€œjust inference.ā€ That’s a feedback loop.

    More specifically: it’s an iterative policy acting on a partially observed environment, using fresh observations to update its next move. If you come from ML, that’s already enough to make the control-theory comparison useful. You don’t need to believe an LLM agent is literally an industrial controller. You just need to notice that once the model is embedded in a tool-use loop, the thing you’re evaluating is no longer a one-shot predictor. It’s a dynamical system.

    That shift matters because it changes what ā€œgoodā€ means.

    • A good base model is not automatically a good closed-loop agent.
    • A bad planner can destabilize a strong model.
    • A weak verifier can make a bad agent look competent for a surprisingly long time.
    • Most real failures are not ā€œwrong answer once.ā€ They’re oscillation, drift, and local hacks that look good for two steps and bad for twenty.

    That is the part I think control gives us: not fancy vocabulary, but a cleaner way to talk about what these systems are doing, where they fail, and what to optimize.

    Read more
  • Parse, Don't Validate: Type-Driven Design for ML Pipelines in Python

    It’s 2 AM. Your XGBoost training job has been churning through 200GB of data on a 8-GPU cluster for the last four hours. You get paged. The job crashed with a cryptic C++ stack trace from somewhere deep in XGBoost internals. After 45 minutes of squinting at logs, you find the culprit: someone wrote "binary_logistic" instead of "binary:logistic" in the training config YAML. A single misplaced underscore, and four hours of GPU time went up in smoke.

    Your first instinct is to add a validation check. Maybe an if objective not in VALID_OBJECTIVES somewhere early in the pipeline. But here’s the thing – that’s playing whack-a-mole. There are hundreds of config keys, each with their own constraints. The real fix is to make it structurally impossible for a bad config to reach your training code in the first place.

    This is the core idea behind ā€œParse, Don’t Validateā€ – a philosophy from the typed functional programming world that translates beautifully to Python ML pipelines.

    Read more
  • Why a 99% Accurate Test Is Often Wrong

    Is 99% accuracy good enough? If you read Part 3 of this series, you saw that Galleri achieves 99.5% specificity, Shield hits 89.6%, and Cologuard Plus reaches 91%. These sound like excellent numbers. But here is the uncomfortable truth: a positive result from a 99% accurate test might mean you have less than a 1% chance of actually being sick.

    Read more
  • Cancer Testing in 2026: The Screening Wars

    Seventy percent of cancer deaths occur in organs with no screening guideline. Pancreatic cancer, ovarian cancer, liver cancer – by the time symptoms appear, the survival window has closed. A blood test that catches 50 cancers at once sounds like science fiction. It is not. But as I dug into the OpenOnco data for this category, the story turned out to be more complicated than the headlines suggest.

    Read more
  • Cancer Testing in 2026: MRD — Hunting Invisible Cancer

    You’ve been declared cancer-free. The CT scans are clean. Your oncologist uses words like ā€œcomplete response.ā€ But somewhere in your body, a few thousand cells may have survived chemotherapy, evaded the immune system, and are quietly dividing. Of the 44 tests on the market designed to find them, only one has FDA clearance. As I dug into the OpenOnco data for this category, Minimal Residual Disease (MRD) turned out to be the most fascinating corner of the landscape – a field exploding with innovation, locked in patent wars, and racing toward a regulatory reckoning.

    Read more
  • Cancer Testing in 2026: The Four Pillars of Molecular Oncology

    I stumbled onto OpenOnco a few weeks ago and couldn’t stop scrolling. It’s an open-source database created by Alex Dickinson that catalogs the molecular oncology testing landscape – 155 tests, 75 vendors, 6,743 trackable data points covering everything from turnaround times to FDA statuses to reimbursement codes. For someone like me – a software engineer who has spent time adjacent to bioinformatics but has never designed an assay – it was a goldmine. I could finally see the shape of an industry I’d been curious about for years.

    I am not an assay scientist. Most of the domain context in this series comes from hours of research with the help of Claude and Gemini, cross-referenced against the OpenOnco dataset, published papers, and FDA filings. Think of this as a software person’s field guide to molecular oncology testing – what I found when I tried to make sense of the landscape, with all the caveats that implies.

    This is the first post in a three-part series. Part 1 (this post) maps the four categories of cancer molecular testing and introduces the dataset. Part 2 dives into Minimal Residual Disease (MRD) – the fastest-moving category where a single test (Signatera) dominates reimbursement while 43 competitors fight for clinical evidence. Part 3 covers the early cancer detection wars – blood vs. stool, single-cancer vs. multi-cancer, and the FDA’s unprecedented approval streak in 2024.

    Read more
  • Deep Learning from Scratch in Rust, Part 5 — Neural Network Architectures

    Throughout this series, we’ve built tensors with autodiff, layers and loss functions, optimizers that learn, and backends that run efficiently on different hardware. We have all the ingredients. Now the question becomes: what do we actually build with them?

    This post explores neural network architectures — from simple feedforward networks to the attention-based transformers that power modern AI. We’ll focus on building intuition first, then see how these architectures map to the components we’ve already built.

    Read more
  • Deep Learning from Scratch in Rust, Part 4 — Pluggable Backends

    Throughout this series, we’ve been writing B::add, B::matmul, B::exp without explaining what B actually is. Time to pay that debt.

    B is a backend — an implementation of tensor operations. Different backends can target different hardware:

    • CPU with SIMD intrinsics
    • Metal shaders for macOS GPUs
    • CUDA kernels for NVIDIA GPUs

    Today we’ll see how Rust’s type system lets us write autodiff code once and run it anywhere — with the backend choice resolved entirely at compile time.

    Read more
  • Deep Learning from Scratch in Rust, Part 3 — Optimizers

    We have gradients. Now what?

    In Part 2, we built layers, models, and loss functions. Given a model and a loss, autodiff computes āˆ‚loss/āˆ‚Īø for every parameter Īø. But gradients alone don’t train a model. We need an optimizer to turn gradients into parameter updates.

    Today we’ll implement the three most important optimizers: SGD, SGD with Momentum, and Adam. Along the way, we’ll see why Adam became the default choice.

    Read more
  • Deep Learning from Scratch in Rust, Part 2 — Layers, Models, and Loss

    In Part 1, we built tensor autodiff — gradients flow through multi-dimensional arrays with broadcasting and reductions handled correctly. But we still don’t have a neural network.

    What’s missing? The building blocks: layers that encapsulate learnable parameters, models that compose layers, and loss functions that define what ā€œcorrectā€ means.

    Today we bridge the gap from ā€œautodiff engineā€ to ā€œtrainable model.ā€

    Read more
  • Deep Learning from Scratch in Rust, Part 1 — Tensor Gradients

    In the Autodiff series, we built a working autodiff engine for scalar functions. Clean, elegant, and… completely impractical. But building it was so much fun that I decided to take it all the way — from toy scalar engine to a real deep learning framework.

    Real neural networks don’t operate on individual numbers. They process tensors — multi-dimensional arrays where a single forward pass might involve millions of values. Today we’ll generalize our scalar engine to tensors and discover the new problems that emerge.

    Spoiler: broadcasting is where the elegance gets messy.

    Read more
  • Building XGBoost from Scratch in Rust, Part 3 — Scaling to Terabytes

    In Part 2, we built a working gradient boosted tree implementation. It produces correct, XGBoost-compatible models. But try running it on 100 million rows and you’ll be waiting a while. Let’s understand why, and how production systems solve it.

    Read more
  • Building XGBoost from Scratch in Rust, Part 2 — Implementation

    In Part 1, we covered the theory behind gradient boosting. Now let’s implement it. We’ll build a gradient boosted tree library in Rust that produces XGBoost-compatible models.

    Read more