Skip to content

Get the best result from your prompt

Nobody knows which AI model will give the best result to your prompt before you compare them

Current LLM performance rankings, such as those found on lmsys.org or livebench.ai, estimate the probability of one model outperforming another one in specific categories. Notably, the top-ranked LLM only has a slightly better than 50% chance of outperforming the second-ranked one.

Even with sophisticated algorithms that consider category-specific rankings (e.g. reasoning, coding, writing, languages, …), the difference in performance is not significant.

Therefore, the best approach is to reprompt and compare the models:

  • Reprompting with a second LLM to verify an AI assertion.
  • Having multiple AI proposals help select the best outcome.

Empirical data from Mammouth

Reprompting Data from Mammouth AI

Number of Text models solicited per prompts% of total prompts
> = 47%
>= 312%
>= 234%
= 166%
Number of Image Models solicited per prompts% of total prompts
>= 319%
>= 241%
= 159%

For 66% of daily text queries, users solicit one model

  • 66% of user’s queries do not need a reprompting (based on Mammouth.ai data).

For 34% of daily text queries, users solicit more than one model

  • As a consequence, 34% of total queries benefit from reprompting. Those 34% correspond to the high-value queries. Those queries are more creative and more complex.
    • 24% of total prompts are reprompted with exactly 1 model
    • 12% of total prompts are reprompted with 2 models or more.
    • 7% of total prompts are reprompted with more than 3 models
  • Indeed, 41% of image queries are reprompted with at least one model.
  • 19% of those queries are reprompted with more than two models.

As AI models are getting more performant, the definition of the best result is becoming more subjective and less objective

There are two reasons to favor one model result to another:

  • The objective reason : User favors the model that respects the rule of his prompt and provide the correct answer.
  • The subjective reason : When both AIs give an objectively correct answer, one model can be favored by the user for subjective reason.

—> As AI performance improves, the differentiation progressively moves from objective to subjective. It makes reprompting even more relevant. Hence this LLM popularity Index.