As more and more LLMs have been released over the last 6 months, comparing model quality has become a favorite pastime. We each have personal experiences with different models, and many folks use different models for different tasks — Bard is better for analysis & synthesis, Claude for code generating, and GPT for general knowledge, and so on. Many of us have also probably looked at rankings like the
An introduction to evaluating LLMs
An introduction to evaluating LLMs
An introduction to evaluating LLMs
As more and more LLMs have been released over the last 6 months, comparing model quality has become a favorite pastime. We each have personal experiences with different models, and many folks use different models for different tasks — Bard is better for analysis & synthesis, Claude for code generating, and GPT for general knowledge, and so on. Many of us have also probably looked at rankings like the