A
MT-Bench – multi-turn conversation quality.
MT-Bench scores models on how well they handle multi-turn dialogue across different topics. IDO-1 is evaluated against standard baselines so we can see how natural, helpful, and consistent its responses are over longer conversations.
