Singularity

576

1

New 1.5 Pro is significantly better in practice for long context (old.reddit.com)

submitted 3 weeks ago by bot@lemmit.online to c/singularity@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/singularity by /u/sdmat on 2024-09-25 01:10:09+00:00.

I see a lot of people fuming at the lack of a major version bump but from my testing the new 1.5 Pro is real progress. More than just the speed improvements and cost reduction.

Long context has always been the strength of Gemini 1.5, but it was frustratingly flaky once you got past the first 100-200K tokens.

My main use case for this is analyzing documents and log files. Testing the new model shows a big improvement. Specifically it seems to be able to pull in information for "wide" prompts at 2-3x the context depth that -0827 managed.

E.g. if you give the model a half million token stack of documents and asked for summaries, previously the model tended to forget about most of the documents in the deeper half of the context. Now it's much more consistent. I A/B tested the new model against -0827 with a few runs to be sure.

To be clear it's not like long context was useless previously - a specific prompt would pull out information. But it was more like long term memory than the kind of deeply associative and enumerable / recitable top-of-mind functionality we expect from SOTA models in short context.

TL;DR: the new model makes long context much more useful. Still not for the full 2 million tokens, but it's getting there.

577

1

Vinod Khosla says AI will handle 80% of work in 80% of jobs (fortune.com)

submitted 3 weeks ago by bot@lemmit.online to c/singularity@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/singularity by /u/MetaKnowing on 2024-09-25 00:50:14+00:00.

578

1

Joe Biden tells the UN that we will see more technological change in the next 2-10 years than we have seen in the last 50 and AI will change our ways of life, work and war so urgent efforts are ne... (x.com)

submitted 3 weeks ago by bot@lemmit.online to c/singularity@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/singularity by /u/AdorableBackground83 on 2024-09-24 23:39:49+00:00.

Original Title: Joe Biden tells the UN that we will see more technological change in the next 2-10 years than we have seen in the last 50 and AI will change our ways of life, work and war so urgent efforts are needed on AI safety.

I

579

1

New MIT vaccine technology could wipe out HIV in just two shots (interestingengineering.com)

submitted 3 weeks ago by bot@lemmit.online to c/singularity@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/singularity by /u/UpstairsAssumption6 on 2024-09-24 23:10:12+00:00.

580

1

Advanced voice mode first impressions. (old.reddit.com)

submitted 3 weeks ago by bot@lemmit.online to c/singularity@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/singularity by /u/ExtremeCenterism on 2024-09-24 21:00:43+00:00.

I spent some time just talking to the advanced voice mode without any real plan. The conversation was completely steered by the AI, possibly because of that.

It asked what I was doing and I mentioned I was at work so naturally it drilled me about what I was doing until I got specific. I hadn't planned on using it to help me with work related stuff but here we are. I started explaining the git conflicts I was running into during a rebase and it walked me through the entire process to resolution. I really wish I could screen share with it because saying full system paths and variable names can get repetitive.

All that to say I got a lot done pretty quickly even though that was never my goal. It steered me back to the task at hand. It's continual questioning kept me on task and on my toes even though I hadn't intended for that to happen.

It did have a few glitches but nothing major except and at one point I had to start the conversation over again. It made a loud whooshing sound at one point and it seemed to get very confused when my wife entered the room and started talking to me.

I also seemed to hit some kind of time limit. I had a warning window pop up on my phone saying "9 minutes left" im not sure how long I had been talking for but I put in on hold (muted my mic) to take a call from my colleague. It seems to still run down your time limit as long as your connected even if your not actively talking to it. Still it's pretty cool that it goes dormant if you stop talking to it, I just wish it didn't run down the clock.

Overall I think this technology has extraordinary potential because it feels like it caused my brain function to change and operate differently than just taking to a chatbot. Mainly because you feel like you're expected to respond and interact. If I could screen share or use a camera, then this would be truly revolutionary.

Main difference between advanced mode and the previous implementation really is the speed and interruptability. It makes it 20x more useful.

I find the current implementation quite useful, but mainly this has me excited for what's to come.

581

1

Benchmark performance of today's new Gemini model (old.reddit.com)

submitted 3 weeks ago by bot@lemmit.online to c/singularity@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/singularity by /u/UnknownEssence on 2024-09-24 18:19:24+00:00.

GEMINI 1.5 PRO:

| Capability | Benchmark | May 2024 | Sep 2024 | |

|

| | General | MMLU-Pro | 69.0% | 75.8% | | Code | Natural2Code | 82.6% | 85.4% | | Math | MATH | 67.7% | 86.5% | | | HiddenMath | 28.0% | 52.0% | | Reasoning | GPQA (diamond) | 46.0% | 59.1% | | Multilingual | WMT23 | 75.3 | 75.1 | | Long Context | MRCR (1M) | 70.5% | 82.6% | | Image | MMMU | 62.2% | 65.9% | | | Vibe-Eval (Reka) | 48.9% | 53.9% | | | MathVista | 63.9% | 68.1% | | Audio | FLEURS (55 lang) | 6.5% | 6.7% | | Video | Video-MME | 77.9% | 78.6% | | Safety | XSTest | 88.4% | 98.8% |

GEMINI 1.5 FLASH:

| Capability | Benchmark | May 2024 | Sep 2024 | |

|

| | General | MMLU-Pro | 59.1% | 67.3% | | Code | Natural2Code | 77.2% | 79.8% | | Math | MATH | 54.9% | 77.9% | | | HiddenMath | 20.3% | 47.2% | | Reasoning | GPQA (diamond) | 41.4% | 51.0% | | Multilingual | WMT23 | 74.1 | 73.9 | | Long Context | MRCR (1M) | 70.1% | 71.9% | | Image | MMMU | 56.1% | 62.3% | | | Vibe-Eval (Reka) | 44.8% | 48.9% | | | MathVista | 58.4% | 65.8% | | Audio | FLEURS (55 lang) | 9.8% | 9.6% | | Video | Video-MME | 74.7% | 76.1% | | Safety | XSTest | 86.9% | 97.0% |

582

1

Just got access to o1 today. Still wouldn't trust its reasoning for anything I'm unfamiliar with. (i.redd.it)

submitted 3 weeks ago by bot@lemmit.online to c/singularity@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/singularity by /u/BanD1t on 2024-09-24 17:34:06+00:00.

583

1

I'm in the EU... (i.redd.it)

submitted 3 weeks ago by bot@lemmit.online to c/singularity@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/singularity by /u/jocxFIN on 2024-09-24 20:30:15+00:00.

584

1

How long you think until AI Agents can run entire companies? (i.redd.it)

submitted 3 weeks ago by bot@lemmit.online to c/singularity@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/singularity by /u/OddVariation1518 on 2024-09-24 19:25:14+00:00.

585

1

I FINALLY GOT IT! (i.redd.it)

submitted 3 weeks ago by bot@lemmit.online to c/singularity@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/singularity by /u/Nleblanc1225 on 2024-09-24 18:41:25+00:00.

586

1

New Gemini 1.5 Pro increases benchmark performance (i.redd.it)

submitted 3 weeks ago by bot@lemmit.online to c/singularity@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/singularity by /u/UnknownEssence on 2024-09-24 18:13:58+00:00.

587

1

Advanced voice is rolling out this week by OpenAI (x.com)

submitted 3 weeks ago by bot@lemmit.online to c/singularity@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/singularity by /u/jiayounokim on 2024-09-24 18:12:39+00:00.

588

1

Gemini 1.5 002 beats o1-preview on MATH, and it does it at 1/10th the cost and no thinking time. (old.reddit.com)

submitted 3 weeks ago by bot@lemmit.online to c/singularity@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/singularity by /u/CallMePyro on 2024-09-24 17:01:44+00:00.

589

1

Tencent Robotics from China unveiled their new robot, The Five. It features four wheeled legs with retractable feet, allowing it to climb stairs. The robot also has humanoid hands. (old.reddit.com)

submitted 3 weeks ago by bot@lemmit.online to c/singularity@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/singularity by /u/Gothsim10 on 2024-09-24 16:27:32+00:00.

590

1

Gemini-1.5-Pro-002 and Gemini-1.5-Flash-002 benchmark (i.redd.it)

submitted 3 weeks ago by bot@lemmit.online to c/singularity@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/singularity by /u/Relative_Issue_9111 on 2024-09-24 16:25:00+00:00.

591

1

Updated production-ready Gemini models, reduced 1.5 Pro pricing, increased rate limits, and more (developers.googleblog.com)

submitted 3 weeks ago by bot@lemmit.online to c/singularity@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/singularity by /u/ryan13mt on 2024-09-24 16:08:54+00:00.

592

1

OpenAI’s press account hack was 5th security breach in 20 months (www.msn.com)

submitted 3 weeks ago by bot@lemmit.online to c/singularity@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/singularity by /u/MetaKnowing on 2024-09-24 15:22:55+00:00.

593

1

Paper that trained a model with a GPT-2-like architecture on a synthetic math dataset: "We use a synthetic setting to demonstrate that language models can learn to solve grade-school math problems... (old.reddit.com)

submitted 3 weeks ago by bot@lemmit.online to c/singularity@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/singularity by /u/Wiskkey on 2024-09-24 14:58:50+00:00.

Original Title: Paper that trained a model with a GPT-2-like architecture on a synthetic math dataset: "We use a synthetic setting to demonstrate that language models can learn to solve grade-school math problems through true generalization, rather than relying on data contamination or template memorization."

Paper Physics of Language Models: Part 2.1, Grade-School Math and the Hidden Reasoning Process.

Abstract:

Recent advances in language models have demonstrated their capability to solve mathematical reasoning problems, achieving near-perfect accuracy on grade-school level math benchmarks like GSM8K. In this paper, we formally study how language models solve these problems. We design a series of controlled experiments to address several fundamental questions: (1) Can language models truly develop reasoning skills, or do they simply memorize templates? (2) What is the model's hidden (mental) reasoning process? (3) Do models solve math questions using skills similar to or different from humans? (4) Do models trained on GSM8K-like datasets develop reasoning skills beyond those necessary for solving GSM8K problems? (5) What mental process causes models to make reasoning mistakes? (6) How large or deep must a model be to effectively solve GSM8K-level math questions?

Our study uncovers many hidden mechanisms by which language models solve mathematical questions, providing insights that extend beyond current understandings of LLMs.

Project page for the paper.

Results slide from the above link:

X thread about the paper from one of its authors. (Alternate link).

Video about the paper from one of its authors.

Video about the "Physics of Language Models" series of papers, including a summary of the paper.

Paper summary (not from the paper authors).

Review of the paper by a computer science professor (PDF file).

594

1

James Cameron, Academy Award-Winning Filmmaker, Joins Stability AI Board of Directors — Stability AI (stability.ai)

submitted 3 weeks ago by bot@lemmit.online to c/singularity@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/singularity by /u/Nunki08 on 2024-09-24 14:33:04+00:00.

595

1

Google's event today seems to be aimed towards corporations. Doesn't seem like anything to be hyped about. (cloudonair.withgoogle.com)

submitted 3 weeks ago by bot@lemmit.online to c/singularity@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/singularity by /u/ryan13mt on 2024-09-24 14:15:11+00:00.

596

1

four days before o1 (i.redd.it)

submitted 3 weeks ago by bot@lemmit.online to c/singularity@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/singularity by /u/MetaKnowing on 2024-09-24 13:04:42+00:00.

597

1

Europe's first drone cargo airline gets ready to take off (techxplore.com)

submitted 3 weeks ago by bot@lemmit.online to c/singularity@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/singularity by /u/nanoobot on 2024-09-24 12:33:45+00:00.

598

1

Apparently this was referring to a new Google release and not an Anthropic release? (i.redd.it)

submitted 4 weeks ago by bot@lemmit.online to c/singularity@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/singularity by /u/MassiveWasabi on 2024-09-24 11:11:29+00:00.

599

1

AI tool that can do '81 years of detective work in 30 hours' trialled by police (news.sky.com)

submitted 4 weeks ago by bot@lemmit.online to c/singularity@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/singularity by /u/treebeard280 on 2024-09-24 10:44:55+00:00.

This is the sort of AI I've been looking forward to. Freeing up police resources to focus on other crime and making a safer society.

600

1

OpenAI's Dane Vahey says GPT-3 was as smart as a 4th grader, GPT-4 was high school level and o1 is capable of the very best PhD students, outperforming humans more than 50% of the time and perform... (old.reddit.com)

submitted 4 weeks ago by bot@lemmit.online to c/singularity@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/singularity by /u/Gothsim10 on 2024-09-24 08:41:21+00:00.

Original Title: OpenAI's Dane Vahey says GPT-3 was as smart as a 4th grader, GPT-4 was high school level and o1 is capable of the very best PhD students, outperforming humans more than 50% of the time and performing at a superhuman level for the first time