this post was submitted on 24 Sep 2024
1 points (100.0% liked)

Singularity

131 readers
2 users here now

Everything pertaining to the technological singularity and related topics, e.g. AI, human enhancement, etc.

founded 1 year ago
MODERATORS
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/singularity by /u/UnknownEssence on 2024-09-24 18:19:24+00:00.


GEMINI 1.5 PRO:

| Capability | Benchmark | May 2024 | Sep 2024 | |


|


|


|


| | General | MMLU-Pro | 69.0% | 75.8% | | Code | Natural2Code | 82.6% | 85.4% | | Math | MATH | 67.7% | 86.5% | | | HiddenMath | 28.0% | 52.0% | | Reasoning | GPQA (diamond) | 46.0% | 59.1% | | Multilingual | WMT23 | 75.3 | 75.1 | | Long Context | MRCR (1M) | 70.5% | 82.6% | | Image | MMMU | 62.2% | 65.9% | | | Vibe-Eval (Reka) | 48.9% | 53.9% | | | MathVista | 63.9% | 68.1% | | Audio | FLEURS (55 lang) | 6.5% | 6.7% | | Video | Video-MME | 77.9% | 78.6% | | Safety | XSTest | 88.4% | 98.8% |

GEMINI 1.5 FLASH:

| Capability | Benchmark | May 2024 | Sep 2024 | |


|


|


|


| | General | MMLU-Pro | 59.1% | 67.3% | | Code | Natural2Code | 77.2% | 79.8% | | Math | MATH | 54.9% | 77.9% | | | HiddenMath | 20.3% | 47.2% | | Reasoning | GPQA (diamond) | 41.4% | 51.0% | | Multilingual | WMT23 | 74.1 | 73.9 | | Long Context | MRCR (1M) | 70.1% | 71.9% | | Image | MMMU | 56.1% | 62.3% | | | Vibe-Eval (Reka) | 44.8% | 48.9% | | | MathVista | 58.4% | 65.8% | | Audio | FLEURS (55 lang) | 9.8% | 9.6% | | Video | Video-MME | 74.7% | 76.1% | | Safety | XSTest | 86.9% | 97.0% |

no comments (yet)
sorted by: hot top controversial new old
there doesn't seem to be anything here