Singularity

New Claude with o1 type of reasoning capabilities? More iterations of o1? will google finally drop something language wise? What hints have we gotten? Dario seem to be on a twitter spree right now.

Would really appreciate your thoughts on this final quarter, need me some good Sunday reading of the tea leaves.

129

1

Vision-based hand gesture customization from Apple (old.reddit.com)

submitted 6 days ago by bot@lemmit.online to c/singularity@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/singularity by /u/longiner on 2024-10-13 08:08:26+00:00.

130

1

If you think LLMs or in general present AIs cannot do something, then make an eval. No one cares about your bad "philosophical" takes (old.reddit.com)

submitted 6 days ago by bot@lemmit.online to c/singularity@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/singularity by /u/obvithrowaway34434 on 2024-10-13 05:44:13+00:00.

Seeing this annoying thing on social media with everyone saying LLMs cannot do this or that. Most common being vague sh*t like "reasoning", "agents", "consciousness" and so on. This is absolutely pointless. It sort of reminds one about quantum mechanics in the 1930s when everyone and their dog tried to come up with some philosophical interpretations. It's only when physicists decided to "shut up and calculate" that we made progress and developed the most accurate and predictive physical theory humans ever created. I am not saying philosophical interpretations and/or a deeper understanding are not important, but they crucially rely on data and metrics and properly controlled tests. These give us measurable outcomes and how to improve on something and they also inspire better theories and interpretation. Without these you get pseudo-scientific theories like ether, soul and phlogiston etc. LLMs and in general deep neural networks are the most complex and impenetrable objects ever created. If we are to understand how they work and what they can do, there is no alternative to extensive and rigorous testing. Everything else is distraction.

131

1

There are some reports of SearchGPT/Web search features being rolled out to some US ChatGPT Plus users on iOS (old.reddit.com)

submitted 6 days ago by bot@lemmit.online to c/singularity@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/singularity by /u/obvithrowaway34434 on 2024-10-13 04:24:56+00:00.

Looks quite useful for research, especially when combined with the other tools.

Link to original tweet:

132

1

"Tesla Optimus Bot interacting with a crowd" --- These people are clearly not aware they are talking to a human pilot. And Musk allowed them to think that. This is a fraud by inaction. (youtu.be)

submitted 6 days ago by bot@lemmit.online to c/singularity@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/singularity by /u/Anen-o-me on 2024-10-13 03:04:14+00:00.

133

1

I'm confused about this recent Apple research paper, because I ran all of the test examples in the paper on open o1 preview and it was able to answer correctly. Is this an actual apple paper or is... (i.redd.it)

submitted 6 days ago by bot@lemmit.online to c/singularity@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/singularity by /u/Woootdafuuu on 2024-10-13 02:59:29+00:00.

Original Title: I'm confused about this recent Apple research paper, because I ran all of the test examples in the paper on open o1 preview and it was able to answer correctly. Is this an actual apple paper or is someone trolling?

134

1

Gödel Agent: A Self-Referential Agent Framework for Recursive Self-Improvement (arxiv.org)

submitted 6 days ago by bot@lemmit.online to c/singularity@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/singularity by /u/Variouss on 2024-10-13 02:35:44+00:00.

135

1

Given your beliefs on the expected ASI/AGI/Singularity timeline what practical changes are you making in your lives? (old.reddit.com)

submitted 6 days ago by bot@lemmit.online to c/singularity@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/singularity by /u/duluoz1 on 2024-10-13 00:41:16+00:00.

What effects, if any, has this timeline changed your plans? Have your career goals shifted, have you pivoted to a different field, has your investment strategy changed, do you plan to live in another part of the world etc.

I’m super interested in any practical changes that you’ve all made to your lives and keen to hear about them

136

1

Is this a new Gemini feature? It just did some kind of "o1 preview" (?) in this answer for me. Is google testing a new reasoning model as well or just a fancy way of displaying it? It answered eac... (www.reddit.com)

submitted 1 week ago by bot@lemmit.online to c/singularity@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/singularity by /u/NaoCustaTentar on 2024-10-13 00:23:49+00:00.

Original Title: Is this a new Gemini feature? It just did some kind of "o1 preview" (?) in this answer for me. Is google testing a new reasoning model as well or just a fancy way of displaying it? It answered each paragraph separately while telling his "thought" process"

137

1

Mustafa Suleyman says as more compute is added to AI models they become easier to control, steer and align (old.reddit.com)

submitted 1 week ago by bot@lemmit.online to c/singularity@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/singularity by /u/Gothsim10 on 2024-10-12 22:25:11+00:00.

138

1

Microsoft's Mustafa Suleyman says his team is crafting AI companions who will see and remember everything we do and which will constitute an intimate relationship with AI (old.reddit.com)

submitted 1 week ago by bot@lemmit.online to c/singularity@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/singularity by /u/Gothsim10 on 2024-10-12 21:35:10+00:00.

139

1

Will it really take 16 years from AGI, to reach singularity? (old.reddit.com)

submitted 1 week ago by bot@lemmit.online to c/singularity@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/singularity by /u/Junior_Edge9203 on 2024-10-12 19:01:10+00:00.

We all know the predicted dates that Ray gave us, 2029 for agi, then 2045 for the singularity itself. But I can't help but wonder about these dates here, would it really take 16 whole years in between for us to reach the singularity from AGI? It always seemed so awfully long in my opinion, especially if we had AGI, that should be self improving in my opinion... what do you all think?

140

1

LeanAgent: Lifelong Learning for Formal Theorem Proving (arxiv.org)

submitted 1 week ago by bot@lemmit.online to c/singularity@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/singularity by /u/Happysedits on 2024-10-12 20:03:05+00:00.

141

1

o1-preview (via Web) performs much better on "trick" math reasoning problems than other language models. Paper: Exploring the Compositional Deficiency of Large Language Models in Mathematical Reas... (arxiv.org)

submitted 1 week ago by bot@lemmit.online to c/singularity@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/singularity by /u/Wiskkey on 2024-10-12 16:11:34+00:00.

Original Title: o1-preview (via Web) performs much better on "trick" math reasoning problems than other language models. Paper: Exploring the Compositional Deficiency of Large Language Models in Mathematical Reasoning.

142

1

OpenAI's o1 Model Excels in Reasoning But Struggles with Rare and Complex Tasks [About paper "When a language model is optimized for reasoning, does it still show embers of autoregression? An anal... (old.reddit.com)

submitted 1 week ago by bot@lemmit.online to c/singularity@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/singularity by /u/Wiskkey on 2024-10-12 14:43:07+00:00.

Original Title: OpenAI's o1 Model Excels in Reasoning But Struggles with Rare and Complex Tasks [About paper "When a language model is optimized for reasoning, does it still show embers of autoregression? An analysis of OpenAI o1"]

OpenAI's o1 Model Excels in Reasoning But Struggles with Rare and Complex Tasks.

In an article recently submitted to the arXiv preprint* server, researchers investigated whether OpenAI's o1, a language model optimized for reasoning, overcame limitations seen in previous large language models (LLMs). The study showed that while o1 performed significantly better, especially on rare tasks, it still exhibited sensitivity to probability, a trait from its autoregressive origins. This suggests that while optimizing for reasoning enhances performance, it might not entirely eliminate the probabilistic biases that remain embedded in the model.

When a language model is optimized for reasoning, does it still show embers of autoregression? An analysis of OpenAI o1.

In "Embers of Autoregression" (McCoy et al., 2023), we showed that several large language models (LLMs) have some important limitations that are attributable to their origins in next-word prediction. Here we investigate whether these issues persist with o1, a new system from OpenAI that differs from previous LLMs in that it is optimized for reasoning. We find that o1 substantially outperforms previous LLMs in many cases, with particularly large improvements on rare variants of common tasks (e.g., forming acronyms from the second letter of each word in a list, rather than the first letter). Despite these quantitative improvements, however, o1 still displays the same qualitative trends that we observed in previous systems. Specifically, o1 -- like previous LLMs -- is sensitive to the probability of examples and tasks, performing better and requiring fewer "thinking tokens" in high-probability settings than in low-probability ones. These results show that optimizing a language model for reasoning can mitigate but might not fully overcome the language model's probability sensitivity.

Embers of autoregression show how large language models are shaped by the problem they are trained to solve.

Significance

ChatGPT and other large language models (LLMs) have attained unprecedented performance in AI. These systems are likely to influence a diverse range of fields, such as education, intellectual property law, and cognitive science, but they remain poorly understood. Here, we draw upon ideas in cognitive science to show that one productive way to understand these systems is by analyzing the goal that they were trained to accomplish. This perspective reveals some surprising limitations of LLMs, including difficulty on seemingly simple tasks such as counting words or reversing a list. Our empirical results have practical implications for when language models can safely be used, and the approach that we introduce provides a broadly useful perspective for reasoning about AI.

Abstract

The widespread adoption of large language models (LLMs) makes it important to recognize their strengths and limitations. We argue that to develop a holistic understanding of these systems, we must consider the problem that they were trained to solve: next-word prediction over Internet text. By recognizing the pressures that this task exerts, we can make predictions about the strategies that LLMs will adopt, allowing us to reason about when they will succeed or fail. Using this approach—which we call the teleological approach—we identify three factors that we hypothesize will influence LLM accuracy: the probability of the task to be performed, the probability of the target output, and the probability of the provided input. To test our predictions, we evaluate five LLMs (GPT-3.5, GPT-4, Claude 3, Llama 3, and Gemini 1.0) on 11 tasks, and we find robust evidence that LLMs are influenced by probability in the hypothesized ways. Many of the experiments reveal surprising failure modes. For instance, GPT-4’s accuracy at decoding a simple cipher is 51% when the output is a high-probability sentence but only 13% when it is low-probability, even though this task is a deterministic one for which probability should not matter. These results show that AI practitioners should be careful about using LLMs in low-probability situations. More broadly, we conclude that we should not evaluate LLMs as if they are humans but should instead treat them as a distinct type of system—one that has been shaped by its own particular set of pressures.

X thread about the 2 papers from one of the authors. Alternate link #1. Alternate link #2.

143

1

SpaceX tomorrow will be attempting the first ever return to launch site and catch of the Super Heavy booster. (x.com)

submitted 1 week ago by bot@lemmit.online to c/singularity@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/singularity by /u/ryan13mt on 2024-10-12 17:08:58+00:00.

144

1

Cardiologists working with AI said it was equal or better than human cardiologists in most areas (x.com)

submitted 1 week ago by bot@lemmit.online to c/singularity@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/singularity by /u/MetaKnowing on 2024-10-12 15:49:00+00:00.

145

1

Dario Amodei says AGI could arrive in 2 years, will be smarter than Nobel Prize winners, will run millions of instances of itself at 10-100x human speed, and can be summarized as a "country of gen... (i.redd.it)

submitted 1 week ago by bot@lemmit.online to c/singularity@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/singularity by /u/MetaKnowing on 2024-10-12 15:20:28+00:00.

Original Title: Dario Amodei says AGI could arrive in 2 years, will be smarter than Nobel Prize winners, will run millions of instances of itself at 10-100x human speed, and can be summarized as a "country of geniuses in a data center"

146

1

The world of work has completely changed and most people don't realise yet. (i.redd.it)

submitted 1 week ago by bot@lemmit.online to c/singularity@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/singularity by /u/MetaKnowing on 2024-10-12 15:30:26+00:00.

147

1

Apple AI researchers question OpenAI's claims about o1's reasoning capabilities [about paper "GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models"] (old.reddit.com)

submitted 1 week ago by bot@lemmit.online to c/singularity@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/singularity by /u/Wiskkey on 2024-10-12 13:05:59+00:00.

Apple AI researchers question OpenAI's claims about o1's reasoning capabilities.

A new study by Apple researchers, including renowned AI scientist Samy Bengio, calls into question the logical capabilities of today's large language models - even OpenAI's new "reasoning model" o1.

The team, led by Mehrdad Farajtabar, created a new evaluation tool called GSM-Symbolic. This tool builds on the GSM8K mathematical reasoning dataset and adds symbolic templates to test AI models more thoroughly.

The researchers tested open-source models such as Llama, Phi, Gemma, and Mistral, as well as proprietary models, including the latest offerings from OpenAI. The results, published on arXiv, suggest that even leading models such as OpenAI's GPT-4o and o1 don't use real logic, but merely mimic patterns.

GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models.

Recent advancements in Large Language Models (LLMs) have sparked interest in their formal reasoning capabilities, particularly in mathematics. The GSM8K benchmark is widely used to assess the mathematical reasoning of models on grade-school-level questions. While the performance of LLMs on GSM8K has significantly improved in recent years, it remains unclear whether their mathematical reasoning capabilities have genuinely advanced, raising questions about the reliability of the reported metrics. To address these concerns, we conduct a large-scale study on several SOTA open and closed models. To overcome the limitations of existing evaluations, we introduce GSM-Symbolic, an improved benchmark created from symbolic templates that allow for the generation of a diverse set of questions. GSM-Symbolic enables more controllable evaluations, providing key insights and more reliable metrics for measuring the reasoning capabilities of models. Our findings reveal that LLMs exhibit noticeable variance when responding to different instantiations of the same question. Specifically, the performance of all models declines when only the numerical values in the question are altered in the GSM-Symbolic benchmark. Furthermore, we investigate the fragility of mathematical reasoning in these models and show that their performance significantly deteriorates as the number of clauses in a question increases. We hypothesize that this decline is because current LLMs cannot perform genuine logical reasoning; they replicate reasoning steps from their training data. Adding a single clause that seems relevant to the question causes significant performance drops (up to 65%) across all state-of-the-art models, even though the clause doesn't contribute to the reasoning chain needed for the final answer. Overall, our work offers a more nuanced understanding of LLMs' capabilities and limitations in mathematical reasoning.

X thread about the paper from one of its authors. Alternate link #1. Alternate link #2.

148

1

What has led the development in the miniaturization of computer transistors to take place at this exact pace? (old.reddit.com)

submitted 1 week ago by bot@lemmit.online to c/singularity@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/singularity by /u/Roubbes on 2024-10-12 11:36:36+00:00.

Sometimes I wonder if the pace at which new computer manufacturing nodes have been developing has been and is a bottleneck.

What are the requirements and advances required to move from one node to the next?

Why did Moore's law predict such a specific pace?

149

1

In 2018, Ilya Sutskever discussed how AGI could potentially be trained through self-play and how multi-agent systems, or the 'Society of Agents' as he calls it, fit into that concept. With OpenAI ... (old.reddit.com)

submitted 1 week ago by bot@lemmit.online to c/singularity@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/singularity by /u/Gothsim10 on 2024-10-12 12:40:17+00:00.

Original Title: In 2018, Ilya Sutskever discussed how AGI could potentially be trained through self-play and how multi-agent systems, or the 'Society of Agents' as he calls it, fit into that concept. With OpenAI and DeepMind recently forming multi-agent research teams, this idea seems especially relevant now.

150

1

LA Noire VR - Reimagined by AI (youtu.be)

submitted 1 week ago by bot@lemmit.online to c/singularity@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/singularity by /u/Cr4zko on 2024-10-11 23:10:49+00:00.