Google's #TurboQuant is a compression mechanism for Key-Value Cache — the 'memory' of an #LLM.
It is derived from #PolarQuant which converts KV vectors from Cartesian to polar co-ordinates.
Early tests (admittedly by Google) show 6x reduction in memory usage.
This seems quite important for #AI.