this post was submitted on 16 Apr 2024
69 points (100.0% liked)

TechTakes

1438 readers
44 users here now

Big brain tech dude got yet another clueless take over at HackerNews etc? Here's the place to vent. Orange site, VC foolishness, all welcome.

This is not debate club. Unless it’s amusing debate.

For actually-good tech, you want our NotAwfulTech community

founded 1 year ago
MODERATORS
 

Turns out software engineering cannot be easily solved with a ~~small shell script~~ large language model.

The author of the article appears to be a genuine ML engineer, although some of his takes aged like fine milk. He seems to be shilling Google a bit too much for my taste. However, the sneer content is good nonetheless.

First off, the "Devin solves a task on Upwork" demo is 1. cherry picked, 2. not even correctly solved.

Second, and this is the absolutely fantastic golden nugget here, to show off its "bug solving capability" it creates its own nonsensical bugs and then reverses them. It's the ideal corporate worker, able to appear busy by creating useless work for itself out of thin air.

It also takes over 6 hours to perform this task, which would be reasonable for an experienced software engineer, but an experienced software engineer's workflow doesn't include burning a small nuclear explosion worth of energy while coding and then not actually solving the task. We don't drink that much coffee.

The next demo is a bait-and-switch again. In this case I think the author of the article fails to sneer quite as much as it's worthy -- the task the AI solves is writing test cases for finding the Least Common Multiple modulo a number. Come on, that task is fucking trivial, all those tests are oneliners! It's famously much easier to verify modulo arithmetic than it is to actually compute it. And it takes the AI an hour to do it!

It is a bit refreshing though that it didn't turn out DEVIN is just Dinesh, Eesha, Vikram, Ishani, and Niranjan working for $2/h from a slum in India.

you are viewing a single comment's thread
view the rest of the comments
[–] aio@awful.systems 3 points 7 months ago (1 children)

the task the AI solves is writing test cases for finding the Least Common Multiple modulo a number.

Looking at the image of the prompt, it looks more like a CRT computation to me.

It’s famously much easier to verify modulo arithmetic than it is to actually compute it.

It's not particularly difficult to compute CRT, though it is definitely trivial to verify the result afterwards. I'm not sure I'd agree that that's a general fact about modular arithmetic computations though.

[–] V0ldek@awful.systems 3 points 7 months ago

It's provably easier to verify whether a multiplicative inverse of a modulo m is correct than it is to actually find it. And non-provably, but rather obviously, it takes much less code and effort.