overview for Architeuthis

AI-Generated Code is Causing Outages and Security Issues in Businesses in c/techtakes@awful.systems

[–] Architeuthis@awful.systems 7 points 2 months ago

You’d think AI companies would have wised up by this point and gone through all their pre-recorded demos with a fine comb so that ~~marks~~ users at least make it past the homepage, but I guess not.

The target group for their pitch probably isn't people who have a solid grasp of coding, I'd bet quite the opposite.

"The Subprime AI Crisis" - Ed Zitron on the bubble's impending collapse in c/techtakes@awful.systems

[–] Architeuthis@awful.systems 21 points 2 months ago* (last edited 2 months ago) (2 children)

On each step, one part of the model applies reinforcement learning, with the other one (the model outputting stuff) “rewarded” or “punished” based on the perceived correctness of their progress (the steps in its “reasoning”), and altering its strategies when punished. This is different to how other Large Language Models work in the sense that the model is generating outputs then looking back at them, then ignoring or approving “good” steps to get to an answer, rather than just generating one and saying “here ya go.”

Every time I've read how chain-of-thought works in o1 it's been completely different, and I'm still not sure I understand what's supposed to be going on. Apparently you get a strike notice if you try too hard to find out how the chain-of-thinking process goes, so one might be tempted to assume it's something that's readily replicable by the competition (and they need to prevent that as long as they can) instead of any sort of notably important breakthrough.

From the detailed o1 system card pdf linked in the article:

According to these evaluations, o1-preview hallucinates less frequently than GPT-4o, and o1-mini hallucinates less frequently than GPT-4o-mini. However, we have received anecdotal feedback that o1-preview and o1-mini tend to hallucinate more than GPT-4o and GPT-4o-mini. More work is needed to understand hallucinations holistically, particularly in domains not covered by our evaluations (e.g., chemistry). Additionally, red teamers have noted that o1-preview is more convincing in certain domains than GPT-4o given that it generates more detailed answers. This potentially increases the risk of people trusting and relying more on hallucinated generation.

Ballsy to just admit your hallucination benchmarks might be worthless.

The newsletter also mentions that the price for output tokens has quadrupled compared to the previous newest model, but the awesome part is, remember all that behind-the-scenes self-prompting that's going on while it arrives to an answer? Even though you're not allowed to see them, according to Ed Zitron you sure as hell are paying for them (i.e. they spend output tokens) which is hilarious if true.

AI-Generated Code is Causing Outages and Security Issues in Businesses in c/techtakes@awful.systems

[–] Architeuthis@awful.systems 19 points 2 months ago

"When asked about buggy AI [code], a common refrain is ‘it is not my code,’ meaning they feel less accountable because they didn’t write it.”

Strong they cut all my deadlines in half and gave me an OpenAI API key, so fuck it energy.

He stressed that this is not from want of care on the developer’s part but rather a lack of interest in “copy-editing code” on top of quality control processes being unprepared for the speed of AI adoption.

You don't say.

Stubsack: weekly thread for sneers not worth an entire post, week ending Sunday 15 September 2024 in c/techtakes@awful.systems

[–] Architeuthis@awful.systems 16 points 2 months ago* (last edited 2 months ago) (2 children)

OpenAI manages to do an entire introduction of a new model without using the word "hallucination" even once.

Apparently it implements chain-of-thought, which either means they changed the RHFL dataset to force it to explain its 'reasoning' when answering or to do self questioning loops, or that it reprompts itsefl multiple times behind the scenes according to some heuristic until it synthesize a best result, it's not really clear.

Can't wait to waste five pools of drinkable water to be told to use C# features that don't exist, but at least it got like 25.2452323760909304593095% better at solving math olympiads as long as you allow it a few tens of tries for each question.

Google’s GameNGen AI Doom video game generator: dissecting a rigged demo in c/techtakes@awful.systems

[–] Architeuthis@awful.systems 14 points 2 months ago* (last edited 2 months ago)

This is conceptually different, it just generates a few seconds of doomlike video that you can slightly influence by sending inputs, and pretends that In The Future™ entire games could be generated from scratch and playable on Sufficiently Advanced™ autocomplete machines.

Google’s GameNGen AI Doom video game generator: dissecting a rigged demo in c/techtakes@awful.systems

[–] Architeuthis@awful.systems 18 points 2 months ago* (last edited 2 months ago)

Stephanie Sterling of the Jimquisition outlines the thinking involved here. Well, she swears at everyone involved for twenty minutes. So, Steph.

She seems to think the AI generates .WAD files.

I guess they fell victim to one of the classic blunders: never assume that it can't be that stupid, and someone must be explaining it wrong.

AI worse than humans in every way at summarising information, government trial finds in c/technology@lemmy.world

[–] Architeuthis@awful.systems 0 points 2 months ago

Did LLama3.1 solve the hallucination problem?

I bet we would have heard if it had, since It's the albatross hanging on the neck of this entire technology.

AI worse than humans in every way at summarising information, government trial finds in c/technology@lemmy.world

[–] Architeuthis@awful.systems 4 points 2 months ago

AI worse than humans in every way at summarising information, government trial finds in c/technology@lemmy.world

[–] Architeuthis@awful.systems 1 points 2 months ago (4 children)

but it can make a human way more efficient, and make 1 human able to do the work of 3-5 humans.

Not if you have to proof-read everything to spot the entirely convincing-looking but completely inaccurate parts, is the problem the article cites.

Stubsack: weekly thread for sneers not worth an entire post, week ending Sunday 9 September 2024 in c/techtakes@awful.systems

[–] Architeuthis@awful.systems 12 points 2 months ago (2 children)

I’m truly surprised they didn’t cart Yud out for this shit

Self-proclaimed sexual sadist Yud is probably a sex scandal time bomb and really not ready for prime time. Plus it's not like he has anything of substance to add on top of Saltman's alarmist bullshit, so it would just be reminding people how weird in a bad way people in this subculture tend to be.

Bostrom's advice for the ethical treatment of LLMs: remind them to be happy in c/sneerclub@awful.systems

[–] Architeuthis@awful.systems 8 points 2 months ago

I liked how Scalzi brushed it away, basically your consciousness gets copied to a new body, which kills the old one, and an artifact of the transfer process is that for a few moments you experience yourself as a mind with two bodies, meaning you have at least the impression of continuity of self, which is enough for most people to get on with living in a new body and let philosophers do the worrying.

Bostrom's advice for the ethical treatment of LLMs: remind them to be happy in c/sneerclub@awful.systems

[–] Architeuthis@awful.systems 10 points 2 months ago* (last edited 2 months ago) (2 children)

I feel like a subset of sci-fi and philosophical meandering really is just increasingly convoluted paths of trying to avoid or come to terms with death as a possibly necessary component of life.

Given rationalism's intellectual heritage, this is absolutely transhumanist cope for people who were counting on digital upload as a last resort to immortality in their lifetimes.