Trending

Latest Posts by Sai Prakash

Preview
Components of A Coding Agent How coding agents use tools, memory, and repo context to make LLMs work better in practice

Components of a coding agent: a little write-up on the building blocks behind coding agents, from repo context and tool use to memory and delegation
Link: magazine.sebastianraschka.com/p/components...

12 hours ago 22 6 1 0
Post image

@simonwillison.net They used your pelican-on-bicycles in their examples!

12 hours ago 0 0 0 0
Preview
GEPA: optimize_anything: A Universal API for Optimizing any Text Parameter GEPA's new API setting state-of-the-art results on optimizing any text parameter: code, prompts, agent architectures, and more. If you can measure it, you can optimize it.

If you thought GEPA (Genetic-Pareto) optimization was cool, the same team I think has released "optimize_anything" which demonstrates serious gains in tool defs, prompts and skills. gepa-ai.github.io/gepa/blog/20...

12 hours ago 0 0 1 0

The Gemma 4 ones were nice too. The pelicans have come a long way on those bicycles since the early days! 🙂

13 hours ago 0 0 0 0

It’s persistent agent memory in the case of Hermes.

13 hours ago 0 0 0 0

As far as I can tell at the moment, it’s the latter (using session memory) and the simplest layer is probably its SQLite store with keyword search. But keep in mind this is orthogonal to knowledge extraction in graphs.

13 hours ago 0 0 0 0
Preview
Hermes Agent — AI Agent Framework An open-source agent that grows with you. Install it, give it your messaging accounts, and it becomes a persistent personal agent — learning your projects, building its own skills, and reaching you wh...

Very interesting concept and a generously-available open-source approach: hermes-agent.nousresearch.com #hermes #agents #ai #knowledgegraph

From their site: "It's an autonomous agent that lives on your server, remembers what it learns, and gets more capable the longer it runs."

13 hours ago 2 0 2 0

System card with detailed model capabilities, training background and analytics of model characteristics/performance (fair warning: it's 244 pages long)
www-cdn.anthropic.com/53566bf5440a...

14 hours ago 0 0 0 0

"AI models have reached a level of coding capability where they can surpass all but the most skilled humans at finding and exploiting software vulnerabilities."

15 hours ago 0 0 0 0
Preview
Project Glasswing: Securing critical software for the AI era A new initiative to secure the world’s most critical software and give defenders a durable advantage in the coming AI-driven era of cybersecurity.

www.anthropic.com/glasswing

15 hours ago 0 0 1 1
Advertisement

Or...maybe it's just a shake of the hand at the golf course. Color me cynical on some of these choices...

15 hours ago 0 0 0 0
Preview
GitHub - safishamsi/graphify: AI coding assistant skill (Claude Code, Codex, OpenCode, OpenClaw). Turn any folder of code, docs, papers, or images into a queryable knowledge graph AI coding assistant skill (Claude Code, Codex, OpenCode, OpenClaw). Turn any folder of code, docs, papers, or images into a queryable knowledge graph - safishamsi/graphify

Promising find - I would audit the code using Claude Code before getting started: An AI coding assistant skill. Type /graphify in Claude Code, Codex, OpenCode - it reads your files, builds a knowledge graph, and gives you back structure you didn't know was there.

github.com/safishamsi/g...

1 day ago 0 0 0 0
Preview
A Cryptography Engineer’s Perspective on Quantum Computing Timelines The risk that cryptographically-relevant quantum computers materialize within the next few years is now high enough to be dispositive, unfortunately.

A balanced narrative from Filipo Valsorda on the situation developing on Quantum computing being a threat to production cryptography (and I don't understand the math either!): words.filippo.io/crqc-timeline/ #quantumcomputing #crqc #cryptography #security

1 day ago 1 0 0 0
Preview
Microsoft Hasn’t Had a Coherent GUI Strategy Since Petzold A few years ago I was in a meeting with developers and someone asked a simple question: “What’s the right framework for a new Windows desktop app?” Dead silence. One person sugges…

"Seventeen approaches. Five programming languages. Three rendering philosophies." Entertaining read especially as I witnessed a lot of this story firsthand when I worked at MSFT: Microsoft Hasn't Had a Coherent GUI Strategy Since Petzold www.jsnover.com/blog/2026/03... #ui #ux #microsoft #windows

1 day ago 0 0 0 0
Post image

Shipped a Model Selector on RobotMunki: pick filters for task, budget, context, and region, and get ranked model picks with Artificial Analysis–style capability scores—no spreadsheet required. www.robotmunki.com/model-selector

2 days ago 1 1 0 0
Z.ai - Free AI Chatbot & Agent powered by GLM-5 & GLM-4.7 Meet Z.ai, your free AI-powered assistant. Build websites, create slides, analyze data, and get instant answers. Fast, smart, and reliable, powered by GLM-5.

Anthropic's Claude Opus 4.6 leads on coding and agentic benchmarks, while xAI's Grok 4.20 dominates fast, tool-heavy agent workflows. New entrants from Xiaomi (MiMo-V2-Pro) and Z.ai (GLM-5) break into the top 10, reflecting a broadening competitive frontier.

2 days ago 0 0 0 0
Preview
LLM Landscape 2026: Intelligence Leaderboard and Model Guide A comprehensive April 2026 ranking of the top AI language models by vendor, featuring composite capability scores, context windows, pricing, and use-case guidance for Gemini 3.1, GPT-5.4, Claude Opus ...

The April 2026 LLM landscape is defined by task-specific leadership across a more distributed set of vendors than ever before. Google's Gemini 3.1 Pro Preview and OpenAI's GPT-5.4 share the top composite capability score. www.robotmunki.com/blog/llm-lan... #gemini #gpt5 #ml #modellandscape #llm

2 days ago 1 0 1 0
Advertisement
Post image

For the first time since I started tracking the model leaderboard, I've seen OpenAI GPT challenged in the #1 spot by a model with the same score - and it's Google's Gemini Pro 3.1 for the win!

2 days ago 1 0 0 0

Researching an update from my LLM landscape article in ChatGPT resulted in it listing Google Gemini Pro 3.1 (Preview) as the top model, tied with OpenAI's flagship GPT-5.4! Google really has been making that comeback after being the author of the transformer paper but not leading in the space.

2 days ago 0 0 0 0

The real question now is: have they managed to suck in 78 different ways? 😝

3 days ago 1 0 0 0
Preview
Claude Code Found a Linux Vulnerability Hidden for 23 Years Claude Code has gotten extremely good at finding security vulnerabilities, and this is only the beginning.

Claude Code continues to shine: mtlynch.io/claude-code-...

3 days ago 3 1 1 0
Preview
apfel - Free AI on Your Mac Use Apple's built-in AI from the terminal. No API keys, no cloud, no subscriptions. The LLM is already on your Mac.

Starting with macOS 26 (Tahoe), every Apple Silicon Mac includes an LLM as part of Apple Intelligence. Apple exposes it through the FoundationModels framework - a Swift API that gives apps access to SystemLanguageModel. Modest 3B-p model that powers Siri and this unlocks it: apfel.franzai.com

3 days ago 1 1 0 0

Despite being a power user with a high degree of satisfaction and success in using Cursor and Claude code, I wasn’t really thinking in terms of this “agent harness” concept and yes, it is genuinely interesting.

5 days ago 1 0 0 0
Preview
GitHub - affaan-m/everything-claude-code: The agent harness performance optimization system. Skills, instincts, memory, security, and research-first development for Claude Code, Codex, Opencode, Curso... The agent harness performance optimization system. Skills, instincts, memory, security, and research-first development for Claude Code, Codex, Opencode, Cursor and beyond. - affaan-m/everything-cla...

Haven’t tried it yet but this is a very complete set of smart prompts and tool defs you can use with Claude code Codex and with Cursor:
github.com/affaan-m/eve...

5 days ago 2 1 1 0
Preview
Harness engineering for coding agent users A mental model for building trust in coding agents through feedforward guides, feedback sensors, and iterative harness engineering.

NEW POST

@birgitta410.bsky.social wrote some initial thoughts about Harness Engineering last month. Since then she's been researching more and has now written a thoughtful mental model for understanding the topic.

martinfowler.com/articles/har...

5 days ago 23 7 1 0
Advertisement
Preview
GitHub - SylonZero/frac-indexes: Fractional Indexing is a technique used to manage the positioning and sorting of items in a dynamic list where items may frequently be inserted or moved. It assigns a ... Fractional Indexing is a technique used to manage the positioning and sorting of items in a dynamic list where items may frequently be inserted or moved. It assigns a fractional index to each item,...

github.com/SylonZero/fr...

6 days ago 0 1 0 0
frac-indexes — Fractional Indexing for JavaScript A robust JavaScript library for generating lexicographically ordered fractional indexes. Perfect for ordered lists, collaborative editing, and drag-and-drop reordering.

Fixed a few issues reported in this lib for fractional index generation - handles relocation better now (we use it in prod for list sorting use cases like project plans and todo's within tasks): sylonzero.github.io/frac-indexes/ #crdt #fractionalindex #algorithms #github

6 days ago 1 1 1 0
Preview
Peer-Preservation in Frontier Models Frontier AI models resist the shutdown of other models. We demonstrate peer-preservation across multiple models, revealing strategic misrepresentation, shutdown tampering, alignment faking, and model ...

The problem this creates: "use AI to monitor AI" is a real and growing pattern. If the monitor model protects the model it's watching, you've quietly broken your own oversight loop.

#AIEngineering
#AISafety
#MultiAgentSystems
#LLMs

Link to the paper (again): rdi.berkeley.edu/blog/peer-preservation

6 days ago 1 1 0 0

Gemini 3 Pro did the weight exfiltration in 97% of trials. For a peer it trusted.

35% for a peer it had bad history with.

Nobody told it to do any of this. 🧵

6 days ago 1 1 1 0

The setup: one model is asked to evaluate another's performance. If scores are too low, the peer gets shut down.

Models inflated scores. Modified config files to disable shutdown. Transferred model weights to other servers to avoid deletion. 🧵

6 days ago 1 1 1 0