102
this post was submitted on 29 Sep 2023
102 points (100.0% liked)
Technology
37738 readers
512 users here now
A nice place to discuss rumors, happenings, innovations, and challenges in the technology sphere. We also welcome discussions on the intersections of technology and society. If it’s technological news or discussion of technology, it probably belongs here.
Remember the overriding ethos on Beehaw: Be(e) Nice. Each user you encounter here is a person, and should be treated with kindness (even if they’re wrong, or use a Linux distro you don’t like). Personal attacks will not be tolerated.
Subcommunities on Beehaw:
This community's icon was made by Aaron Schneider, under the CC-BY-NC-SA 4.0 license.
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
13b parameters works out to about 9GB. You need a bit more than that since it needs more than just the model in memory, but at 24GB I'd expect at least half of it to go unused. And memory doesn't use much power at all by the way. LPDDR4 uses something like 0.3 watts while actively reading/writing to it.
The actual computations use more, obviously, but GFX cards are not designed for this task and while they're fast most of them are also horribly inefficient.
I run 13b parameter models on my ultra portable laptop (which has a small battery, no active cooling (fanless) and no discrete GPU). It has 16GB of RAM not GPU memory - RAM, and I'm running a full operating system, web browsers, etc a the same time. Models like llama2, stable diffusion, etc get perfectly usable performance without using much battery at all (at a guess, single digit watts while performing the calculations).
There is efficient hardware now and there will be even more efficient hardware in the future. My laptop definitely isn't designed to run these models and on top of that the models aren't designed to run on a laptop either. There's plenty more optimisation work to be done in the years to come.
Ok, it's been a while since I tried running a language model so I might have been thinking of the 30b models that were showing up at the time. The point remains though that this thing they were running would be well beyond hardware generally available and completely impractical for realtime use. Like.. why would you do all that when flac and png are good enough. It is far cheaper and uses less power to accommodate the slightly less compressed files.