Posts

Showing posts from October, 2025

Predictable Inference output

Image
     Recently, there was a blog from thinking machines on Defeating Nondeterminism in LLM inference . First, let me say from a boots on the ground prospective. I understand this need. Inference engines are not 'standardized' like SQL. They take different inputs and have optional features. The major models families Qwen, LLama, etc have different Tokenizers. On different hardware an optimization might produce a different result. The inference engine may have a cache or be doing simultaneous requests which also can shape the result of a prompt. I have seen a few reactions to the thinking machines article and many of them reference "temperature". I would not attribute temperature to be the largest factor in non-determinism, Temperature behind the scenes causes more "iteration", and if the "iteration" is random a higher temperature will make a wider range of results more possible. I wanted to discuss some other causes of non-determinism. Model Strength...