Components of a coding agent: a little write-up on the building blocks behind coding agents, from repo context and tool use to memory and delegation
Link: magazine.sebastianraschka.com/p/components...
Latest Posts by Sai Prakash
@simonwillison.net They used your pelican-on-bicycles in their examples!
If you thought GEPA (Genetic-Pareto) optimization was cool, the same team I think has released "optimize_anything" which demonstrates serious gains in tool defs, prompts and skills. gepa-ai.github.io/gepa/blog/20...
The Gemma 4 ones were nice too. The pelicans have come a long way on those bicycles since the early days! 🙂
It’s persistent agent memory in the case of Hermes.
As far as I can tell at the moment, it’s the latter (using session memory) and the simplest layer is probably its SQLite store with keyword search. But keep in mind this is orthogonal to knowledge extraction in graphs.
Very interesting concept and a generously-available open-source approach: hermes-agent.nousresearch.com #hermes #agents #ai #knowledgegraph
From their site: "It's an autonomous agent that lives on your server, remembers what it learns, and gets more capable the longer it runs."
System card with detailed model capabilities, training background and analytics of model characteristics/performance (fair warning: it's 244 pages long)
www-cdn.anthropic.com/53566bf5440a...
"AI models have reached a level of coding capability where they can surpass all but the most skilled humans at finding and exploiting software vulnerabilities."
Or...maybe it's just a shake of the hand at the golf course. Color me cynical on some of these choices...
Promising find - I would audit the code using Claude Code before getting started: An AI coding assistant skill. Type /graphify in Claude Code, Codex, OpenCode - it reads your files, builds a knowledge graph, and gives you back structure you didn't know was there.
github.com/safishamsi/g...
A balanced narrative from Filipo Valsorda on the situation developing on Quantum computing being a threat to production cryptography (and I don't understand the math either!): words.filippo.io/crqc-timeline/ #quantumcomputing #crqc #cryptography #security
"Seventeen approaches. Five programming languages. Three rendering philosophies." Entertaining read especially as I witnessed a lot of this story firsthand when I worked at MSFT: Microsoft Hasn't Had a Coherent GUI Strategy Since Petzold www.jsnover.com/blog/2026/03... #ui #ux #microsoft #windows
Shipped a Model Selector on RobotMunki: pick filters for task, budget, context, and region, and get ranked model picks with Artificial Analysis–style capability scores—no spreadsheet required. www.robotmunki.com/model-selector
Anthropic's Claude Opus 4.6 leads on coding and agentic benchmarks, while xAI's Grok 4.20 dominates fast, tool-heavy agent workflows. New entrants from Xiaomi (MiMo-V2-Pro) and Z.ai (GLM-5) break into the top 10, reflecting a broadening competitive frontier.
The April 2026 LLM landscape is defined by task-specific leadership across a more distributed set of vendors than ever before. Google's Gemini 3.1 Pro Preview and OpenAI's GPT-5.4 share the top composite capability score. www.robotmunki.com/blog/llm-lan... #gemini #gpt5 #ml #modellandscape #llm
For the first time since I started tracking the model leaderboard, I've seen OpenAI GPT challenged in the #1 spot by a model with the same score - and it's Google's Gemini Pro 3.1 for the win!
Researching an update from my LLM landscape article in ChatGPT resulted in it listing Google Gemini Pro 3.1 (Preview) as the top model, tied with OpenAI's flagship GPT-5.4! Google really has been making that comeback after being the author of the transformer paper but not leading in the space.
The real question now is: have they managed to suck in 78 different ways? 😝
Starting with macOS 26 (Tahoe), every Apple Silicon Mac includes an LLM as part of Apple Intelligence. Apple exposes it through the FoundationModels framework - a Swift API that gives apps access to SystemLanguageModel. Modest 3B-p model that powers Siri and this unlocks it: apfel.franzai.com
Despite being a power user with a high degree of satisfaction and success in using Cursor and Claude code, I wasn’t really thinking in terms of this “agent harness” concept and yes, it is genuinely interesting.
Haven’t tried it yet but this is a very complete set of smart prompts and tool defs you can use with Claude code Codex and with Cursor:
github.com/affaan-m/eve...
NEW POST
@birgitta410.bsky.social wrote some initial thoughts about Harness Engineering last month. Since then she's been researching more and has now written a thoughtful mental model for understanding the topic.
martinfowler.com/articles/har...
Fixed a few issues reported in this lib for fractional index generation - handles relocation better now (we use it in prod for list sorting use cases like project plans and todo's within tasks): sylonzero.github.io/frac-indexes/ #crdt #fractionalindex #algorithms #github
The problem this creates: "use AI to monitor AI" is a real and growing pattern. If the monitor model protects the model it's watching, you've quietly broken your own oversight loop.
#AIEngineering
#AISafety
#MultiAgentSystems
#LLMs
Link to the paper (again): rdi.berkeley.edu/blog/peer-preservation
Gemini 3 Pro did the weight exfiltration in 97% of trials. For a peer it trusted.
35% for a peer it had bad history with.
Nobody told it to do any of this. 🧵
The setup: one model is asked to evaluate another's performance. If scores are too low, the peer gets shut down.
Models inflated scores. Modified config files to disable shutdown. Transferred model weights to other servers to avoid deletion. 🧵