antirez (@antirez) — tuztoz.com

Follow me on this: if Mythos benchmark numbers and model card info is correct (which I have little reasons to doubt), Anthropic can easily synthesize a better Opus, given they don't want to release Mythos. So, I bet we'll soon see Opus 4 at the level of GPT 5.4 or greater.

16 hours ago 19 1 0 0

One of the most powerful automatic coding (autocoding) trick that almost nobody uses: "Create this new project as specified in SPEC.md using as guide for coding style, design sensibility, comments, ..., the code at /foo/bar/". Style transfer is very powerful.

2 days ago 37 0 3 1

I'm still trying to post stuff here. But people seem to have moved back on X, regardless of the terrible ownership. At least, Bluesky should try to give some real advantage in the product side, but the strict character limit alone is already discouraging.

5 days ago 53 1 13 0

Super odd that the Gemma 4 release highlights specifically the ELO score: the least meaningful benchmark ever. We should ask AI labs to stop doing this thing.

5 days ago 9 0 0 0

I bet the small Gemma 4 models are going to kill all the specialized ASR models for audio to text transcription tasks.

5 days ago 4 1 1 0

Me› Sorry if in my last question I ended it with double "??", I'm not the kind of person that puts more than a single "?" at the end of a question.

GPT› Understood. I did not read it as emphasis from you in any meaningful way.

6 days ago 7 0 0 0

Redis new data structure: the HyperLogLog - <antirez>

1st of April of 12 years ago I released this post introducting of the HyperLogLog in Redis. Most people thought it was a joke, because of the name of the data structure and because it improved the state of art of the Google paper. antirez.com/news/75

1 week ago 26 1 1 0

Indeed, that's why I say "at the level we can understand it". However what I mean is, not just to know mechanically how a Transformer works. You can imagine the kind of Bayesian inference happening block after block, that is, you can have a mental model of what could happen.

1 week ago 0 0 0 0

It does not matter I know how an LLM work (at least at the level we understand NNs in general and Transformers specifically): in many ways, talking with AI is beneficial in order to have a different POV and advices even in the non technical side of my life.

1 week ago 17 0 3 0

What's the logical reason for Wikipedia not automatically integrating information from other languages of the same page in languages that miss the same info quality / quantity? This can be only done "online", marking the text in a different color, removing most of the issues.

1 week ago 9 0 0 0

Sometimes when a model is released it has stellar performances, and then a few weeks later it is somewhat less shiny. Often times it is just human bias. However when the AI provider swears it is the same model checkpoint, are you sure it didn't turn on some aggressive KV cache quantization?

1 week ago 7 0 0 0

This isn't the LLM itself using a partial compaction tool (which is what I'm referring to). Imagine this:

1) ... good work done ...
2) ... a lot of tokens on a dead attempt ...

You want the LLM to automatically jump after "1" summarizing why "2" is bad, and continue.

1 week ago 1 0 2 0

I mean: the ability of the LLM to jump back to save context when a bad path was taken, then returning at a early context point with a small summarization of the wrong path discarded.

1 week ago 4 0 1 0

I mean, this should be made *by the model itself* generating a "Jump back" tool call.

1 week ago 7 0 1 0

One thing agents harnesses should be able to do is: to jump back in history trimming what follows, just injecting some self-steering text. I wonder if they can already do it. It looks very useful.

1 week ago 18 0 3 0

Fight Chat Control - Protect Digital Privacy in the EU Learn about the EU Chat Control proposal and contact your representatives to protect digital privacy and encryption.

Please write to your MEPs to oppose untargeted mass scanning of private communications (Chat Control). On March 11 Parliament voted 458-103 to limit scanning. Today they vote again (!!!) and those protections could be watered down. Act now: fightchatcontrol.eu

1 week ago 27 7 0 0

I'm thinking that because of automatic programming certain complicated and elegant programs we wrote in the past may basically become a form of art for future generations. Like the manuscript books of the middle age for us.

3 weeks ago 30 0 2 0

For this reason you can't replace code with specification that is compiled into code each time. The specification would end as a pile of details that is better captured by the code side. This accumulation process of fine details (in the code) creates quality of real world systems.

3 weeks ago 19 1 4 0

Biggest mistake in AI coding era: to believe that specifications should be either natural language OR something else. The best combo is a natural language high level specification (the intend), plus code (as it gets written) documenting the finer behaviors.

3 weeks ago 42 1 4 1

Day 2 of Monster Scale Summit! Today, we have @skamille.themanagerswrath.com, @antirez.bsky.social, @tigerbeetle.com (Joran), @dominiktornow.bsky.social, @teivah.dev, Avi Kivity, and lots more. Guaranteed 🔥🔥🔥 www.scylladb.com/monster-scal...

3 weeks ago 7 3 0 0

GNU and the AI reimplementations - <antirez>

New blog post: "GNU and the AI reimplementations" antirez.com/news/162

4 weeks ago 27 7 1 3

Who thinks "clean room" is needed to reimplement and put it into a new license does NOT understand copyright. Clean room is a trick to make litigation simpler, it is not mandated by law: rewrites are allowed. The new code just must not copy protected expressions. Linus was Unix-aware.

1 month ago 37 5 1 0

Every morning there are two news: the news about the release of some major upgrade to some AI, and the news about the lack of DeepSeek v4 release.

1 month ago 19 0 0 0

GitHub - antirez/ZOT: Z80 emulation layer (ZEL) is a Z80, ZX Spectrum 48k and CP/M emulator Z80 emulation layer (ZEL) is a Z80, ZX Spectrum 48k and CP/M emulator - antirez/ZOT

github.com/antirez/ZOT

1 month ago 3 0 0 0

Implementing a clear room Z80 / ZX Spectrum emulator with Claude Code - <antirez>

Implementing a clear room Z80 / ZX Spectrum emulator with Claude Code: antirez.com/news/160

1 month ago 18 3 1 0

Users report that asking Sonnet 4.6, via the Anthropic API, "What's your name", it reports "DeepSeek" with a high frequency. Labs cross-fine-tune on chain of thoughts and use other models as RL signal consistently. Also the pretraining, a *key* step, is on mostly public data. Anthropic disappoints.

1 month ago 19 0 1 0

Il post code & methodology, and the anti-contamination steps taken.

1 month ago 5 0 0 0

A clear room emulator of the Z80, the Spectrum, and CP/M written by Claude Code. Vey skeptical with the compiler experiment (more complex for sure) made by Anthropic because of the fundamental flaw of not providing the agent with the specifications / papers.

1 month ago 14 0 3 0

You see, I'm ok with AI usage in many ways. Yet I can't understand why people that don't have deficiencies in written expression use it for writing. Emails. Blog posts. Comments. Why? We only lose something in this way. We want your voice.

1 month ago 52 3 6 1

Today I had to fight with GPT 5.3 to defend my position on the complexity of a specific command of the new Redis type I'm adding (released soon I hope). It had a great point about the worst case, but the typical case was as I claimed. We reached an agreement mentioned both... :D

1 month ago 22 0 4 0

Latest Posts by antirez