Singularity

131 readers

1 users here now

Everything pertaining to the technological singularity and related topics, e.g. AI, human enhancement, etc.

founded 1 year ago

MODERATORS

bot@lemmit.online

New paper from Meta discloses TPO (Thought Preference Optimization) technique with impressive results (old.reddit.com)

submitted 3 days ago by bot@lemmit.online to c/singularity@lemmit.online

0 comments fedilink hide all child comments

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/singularity by /u/katerinaptrv12 on 2024-10-16 15:24:18+00:00.

A recent published paper from Meta explains their new technique TPO in detail (similar to what was used in o1 models) and their experiments with very interesting results. They got LLama 3.1 8B post-trained with this technique to be on par with performance of GPT4o and Turbo on AlpacaEval and ArenaHard benchmarks.

[2410.10630] Thinking LLMs: General Instruction Following with Thought Generation (arxiv.org)

no comments (yet)

sorted by: hot top controversial new old

there doesn't seem to be anything here