A recent Stanford and UC Berkeley study tracked how major LLMs like ChatGPT evolve over time. They found these large language models can shift their skills significantly in short periods.
While ChatGPT improved at some tasks, it declined at others without much warning. This unpredictable drift means users can’t assume it will keep working the same.
Since AI systems have interconnected skills, boosting one area can inadvertently degrade another. There’s no guarantee of stable performance.
The researchers recommend continuous monitoring of any LLM you rely on. Regularly test for changes that could impact your use case.
They plan ongoing tracking of LLMs like GPT-3.5 andGPT-4. The team also released their evaluation data publicly to spur more research into LLM drift.
The takeaway is to be vigilant. If you use ChatGPT or similar LLM, proactively assess its output for shifts. LLM abilities are fluid – consistent testing is key.