Do LLM’s have Bad Days?

The Curious Case of AI Performance Fluctuations

We've all experienced those days when nothing seems to go right. You wake up groggy despite a full night's sleep, you spill coffee on your favorite shirt, and your brain feels like it's operating at half capacity. It's just a bad day – a universally human experience.

But what about the services powered by foundation models that increasingly power our digital lives? Do AI models like ChatGPT, Claude, or Gemini ever have the equivalent of a "bad day"? Do they experience performance fluctuations that mimic our own productivity ebbs and flows, or are we projecting our performance cycles on otherwise random fluctuations in performance?

We’d like your help. We’ve created a survey to gather your perspectives. It takes about four minutes to complete, and will help us to design a test of performance fluctuations of major foundation models that we will make available publicly. The survey is anonymous, but if you share your email address, we’ll send you detailed survey results.

Take the Survey: Foundation Models and Quality: Do Foundation Models Have Bad Days?

The Curious Case of AI Performance Fluctuations

AI Language Models and Business Processes: The Building Blocks