Singularity

131 readers
2 users here now

Everything pertaining to the technological singularity and related topics, e.g. AI, human enhancement, etc.

founded 1 year ago
MODERATORS
576
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/singularity by /u/sdmat on 2024-09-25 01:10:09+00:00.


I see a lot of people fuming at the lack of a major version bump but from my testing the new 1.5 Pro is real progress. More than just the speed improvements and cost reduction.

Long context has always been the strength of Gemini 1.5, but it was frustratingly flaky once you got past the first 100-200K tokens.

My main use case for this is analyzing documents and log files. Testing the new model shows a big improvement. Specifically it seems to be able to pull in information for "wide" prompts at 2-3x the context depth that -0827 managed.

E.g. if you give the model a half million token stack of documents and asked for summaries, previously the model tended to forget about most of the documents in the deeper half of the context. Now it's much more consistent. I A/B tested the new model against -0827 with a few runs to be sure.

To be clear it's not like long context was useless previously - a specific prompt would pull out information. But it was more like long term memory than the kind of deeply associative and enumerable / recitable top-of-mind functionality we expect from SOTA models in short context.

TL;DR: the new model makes long context much more useful. Still not for the full 2 million tokens, but it's getting there.

577
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/singularity by /u/MetaKnowing on 2024-09-25 00:50:14+00:00.

578
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/singularity by /u/AdorableBackground83 on 2024-09-24 23:39:49+00:00.

Original Title: Joe Biden tells the UN that we will see more technological change in the next 2-10 years than we have seen in the last 50 and AI will change our ways of life, work and war so urgent efforts are needed on AI safety.


I

579
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/singularity by /u/UpstairsAssumption6 on 2024-09-24 23:10:12+00:00.

580
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/singularity by /u/ExtremeCenterism on 2024-09-24 21:00:43+00:00.


I spent some time just talking to the advanced voice mode without any real plan. The conversation was completely steered by the AI, possibly because of that.

It asked what I was doing and I mentioned I was at work so naturally it drilled me about what I was doing until I got specific. I hadn't planned on using it to help me with work related stuff but here we are. I started explaining the git conflicts I was running into during a rebase and it walked me through the entire process to resolution. I really wish I could screen share with it because saying full system paths and variable names can get repetitive.

All that to say I got a lot done pretty quickly even though that was never my goal. It steered me back to the task at hand. It's continual questioning kept me on task and on my toes even though I hadn't intended for that to happen.

It did have a few glitches but nothing major except and at one point I had to start the conversation over again. It made a loud whooshing sound at one point and it seemed to get very confused when my wife entered the room and started talking to me.

I also seemed to hit some kind of time limit. I had a warning window pop up on my phone saying "9 minutes left" im not sure how long I had been talking for but I put in on hold (muted my mic) to take a call from my colleague. It seems to still run down your time limit as long as your connected even if your not actively talking to it. Still it's pretty cool that it goes dormant if you stop talking to it, I just wish it didn't run down the clock.

Overall I think this technology has extraordinary potential because it feels like it caused my brain function to change and operate differently than just taking to a chatbot. Mainly because you feel like you're expected to respond and interact. If I could screen share or use a camera, then this would be truly revolutionary.

Main difference between advanced mode and the previous implementation really is the speed and interruptability. It makes it 20x more useful.

I find the current implementation quite useful, but mainly this has me excited for what's to come.

581
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/singularity by /u/UnknownEssence on 2024-09-24 18:19:24+00:00.


GEMINI 1.5 PRO:

| Capability | Benchmark | May 2024 | Sep 2024 | |


|


|


|


| | General | MMLU-Pro | 69.0% | 75.8% | | Code | Natural2Code | 82.6% | 85.4% | | Math | MATH | 67.7% | 86.5% | | | HiddenMath | 28.0% | 52.0% | | Reasoning | GPQA (diamond) | 46.0% | 59.1% | | Multilingual | WMT23 | 75.3 | 75.1 | | Long Context | MRCR (1M) | 70.5% | 82.6% | | Image | MMMU | 62.2% | 65.9% | | | Vibe-Eval (Reka) | 48.9% | 53.9% | | | MathVista | 63.9% | 68.1% | | Audio | FLEURS (55 lang) | 6.5% | 6.7% | | Video | Video-MME | 77.9% | 78.6% | | Safety | XSTest | 88.4% | 98.8% |

GEMINI 1.5 FLASH:

| Capability | Benchmark | May 2024 | Sep 2024 | |


|


|


|


| | General | MMLU-Pro | 59.1% | 67.3% | | Code | Natural2Code | 77.2% | 79.8% | | Math | MATH | 54.9% | 77.9% | | | HiddenMath | 20.3% | 47.2% | | Reasoning | GPQA (diamond) | 41.4% | 51.0% | | Multilingual | WMT23 | 74.1 | 73.9 | | Long Context | MRCR (1M) | 70.1% | 71.9% | | Image | MMMU | 56.1% | 62.3% | | | Vibe-Eval (Reka) | 44.8% | 48.9% | | | MathVista | 58.4% | 65.8% | | Audio | FLEURS (55 lang) | 9.8% | 9.6% | | Video | Video-MME | 74.7% | 76.1% | | Safety | XSTest | 86.9% | 97.0% |

582
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/singularity by /u/BanD1t on 2024-09-24 17:34:06+00:00.

583
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/singularity by /u/jocxFIN on 2024-09-24 20:30:15+00:00.

584
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/singularity by /u/OddVariation1518 on 2024-09-24 19:25:14+00:00.

585
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/singularity by /u/Nleblanc1225 on 2024-09-24 18:41:25+00:00.

586
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/singularity by /u/UnknownEssence on 2024-09-24 18:13:58+00:00.

587
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/singularity by /u/jiayounokim on 2024-09-24 18:12:39+00:00.

588
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/singularity by /u/CallMePyro on 2024-09-24 17:01:44+00:00.


589
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/singularity by /u/Gothsim10 on 2024-09-24 16:27:32+00:00.

590
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/singularity by /u/Relative_Issue_9111 on 2024-09-24 16:25:00+00:00.

591
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/singularity by /u/ryan13mt on 2024-09-24 16:08:54+00:00.

592
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/singularity by /u/MetaKnowing on 2024-09-24 15:22:55+00:00.

593
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/singularity by /u/Wiskkey on 2024-09-24 14:58:50+00:00.

Original Title: Paper that trained a model with a GPT-2-like architecture on a synthetic math dataset: "We use a synthetic setting to demonstrate that language models can learn to solve grade-school math problems through true generalization, rather than relying on data contamination or template memorization."


Paper Physics of Language Models: Part 2.1, Grade-School Math and the Hidden Reasoning Process.

Abstract:

Recent advances in language models have demonstrated their capability to solve mathematical reasoning problems, achieving near-perfect accuracy on grade-school level math benchmarks like GSM8K. In this paper, we formally study how language models solve these problems. We design a series of controlled experiments to address several fundamental questions: (1) Can language models truly develop reasoning skills, or do they simply memorize templates? (2) What is the model's hidden (mental) reasoning process? (3) Do models solve math questions using skills similar to or different from humans? (4) Do models trained on GSM8K-like datasets develop reasoning skills beyond those necessary for solving GSM8K problems? (5) What mental process causes models to make reasoning mistakes? (6) How large or deep must a model be to effectively solve GSM8K-level math questions?

Our study uncovers many hidden mechanisms by which language models solve mathematical questions, providing insights that extend beyond current understandings of LLMs.

Project page for the paper.

Results slide from the above link:

X thread about the paper from one of its authors. (Alternate link).

Video about the paper from one of its authors.

Video about the "Physics of Language Models" series of papers, including a summary of the paper.

Paper summary (not from the paper authors).

Review of the paper by a computer science professor (PDF file).

594
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/singularity by /u/Nunki08 on 2024-09-24 14:33:04+00:00.

595
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/singularity by /u/ryan13mt on 2024-09-24 14:15:11+00:00.

596
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/singularity by /u/MetaKnowing on 2024-09-24 13:04:42+00:00.

597
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/singularity by /u/nanoobot on 2024-09-24 12:33:45+00:00.

598
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/singularity by /u/MassiveWasabi on 2024-09-24 11:11:29+00:00.

599
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/singularity by /u/treebeard280 on 2024-09-24 10:44:55+00:00.


This is the sort of AI I've been looking forward to. Freeing up police resources to focus on other crime and making a safer society.

600
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/singularity by /u/Gothsim10 on 2024-09-24 08:41:21+00:00.

Original Title: OpenAI's Dane Vahey says GPT-3 was as smart as a 4th grader, GPT-4 was high school level and o1 is capable of the very best PhD students, outperforming humans more than 50% of the time and performing at a superhuman level for the first time

view more: ‹ prev next ›