Large language models (LLMs) can produce inconsistent or unstable outputs even when given the same input. This is because generative AI model outputs aren’t purely deterministic; underlying architecture, prompt formatting, and system-level variability (e.g., tokenization quirks or load-balancing across hardware) can all impact the results.

Surprisingly, this variability can happen even when the “temperature” setting, which controls randomness, is set to zero. However, if you have not tweaked your temperature settings yet, this may be a good place to start.

To improve stability, several strategies have proven effective.

Additional Resources:

This response has been generated by an LLM based on notes from PJMF technical consultations. All responses go through human review by our PJMF Products & Services team and are anonymized to protect our consultation participants.