An introduction to evaluating LLMs

Jan 25

As more and more LLMs have been released over the last 6 months, comparing model quality has become a favorite pastime. We each have personal experiences with different models, and many folks use different models for different tasks — Bard is better for analysis & synthesis, Claude for code generating, and GPT for general knowledge, and so on. Many of us have also probably looked at rankings like the

Read →

0 Comments

Generating Conversation

An introduction to evaluating LLMs