this post was submitted on 12 Jan 2025
658 points (98.0% liked)

Technology

60456 readers
4127 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each another!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed

Approved Bots


founded 2 years ago
MODERATORS
 

cross-posted from: https://lemmy.ca/post/37011397

!opensource@programming.dev

The popular open-source VLC video player was demonstrated on the floor of CES 2025 with automatic AI subtitling and translation, generated locally and offline in real time. Parent organization VideoLAN shared a video on Tuesday in which president Jean-Baptiste Kempf shows off the new feature, which uses open-source AI models to generate subtitles for videos in several languages. 

top 50 comments
sorted by: hot top controversial new old
[–] MITM0@lemmy.world 39 points 2 days ago (1 children)

As long as the models are OpenSource I have no complains

And the data stays local.

[–] billwashere@lemmy.world 48 points 2 days ago (1 children)

This might be one of the few times I’ve seen AI being useful and not just slapped on something for marketing purposes.

[–] PalmTreeIsBestTree@lemmy.world 15 points 2 days ago (1 children)

But the toppings contains potassium benzoate.

[–] Doorbook@lemmy.world 23 points 2 days ago (1 children)

The nice thing is, now at least this can be used with live tv from other countries and languages.

Think you want to watch Japanese tv or Korean channels with out bothering about downloading, searching and syncing subtitles

[–] sugar_in_your_tea@sh.itjust.works 13 points 2 days ago (1 children)

I prefer watching Mexican football announcers, and it would be nice to know what they're saying. Though that might actually detract from the experience.

[–] InFerNo@lemmy.ml 6 points 1 day ago (2 children)

GOOOOOOAAAAAAAAALLLLLLLLLL

Just fill up the whole screen with this.

load more comments (1 replies)
[–] renzev@lemmy.world 147 points 2 days ago (8 children)

This sounds like a great thing for deaf people and just in general, but I don't think AI will ever replace anime fansub makers who have no problem throwing a wall of text on screen for a split second just to explain an obscure untranslatable pun.

[–] rustyricotta@lemmy.ml 58 points 2 days ago

Bless those subbers. I love those walls of text.

[–] FMT99@lemmy.world 31 points 2 days ago

Translator's note: keikaku means plan

[–] FordBeeblebrox@lemmy.world 22 points 2 days ago

They are like the * in any Terry Pratchett (GNU) novel, sometimes a funny joke can have a little more spice added to make it even funnier

load more comments (4 replies)
[–] asbestos@lemmy.world 280 points 2 days ago (3 children)

Finally, some good fucking AI

[–] shyguyblue@lemmy.world 168 points 2 days ago (1 children)

I was just thinking, this is exactly what AI should be used for. Pattern recognition, full stop.

[–] spankmonkey@lemmy.world 67 points 2 days ago (3 children)

Yup, and if it isn't perfect that is ok as long as it is close enough.

Like getting name spellings wrong or mixing homophones is fine because it isn't trying to be factually accurate.

[–] tja@sh.itjust.works 34 points 2 days ago (18 children)

Problem ist that now people will say that they don't get to create accurate subtitles because VLC is doing the job for them.

Accessibility might suffer from that, because all subtitles are now just "good enough"

[–] spankmonkey@lemmy.world 25 points 2 days ago

Regular old live broadcast closed captioning is pretty much 'good enough' and that is the standard I'm comparing to.

Actual subtitles created ahead of time should be perfect because they have the time to double check.

[–] Railcar8095@lemm.ee 32 points 2 days ago

Or they can get OK ones with this tool, and fix the errors. Might save a lot of time

load more comments (16 replies)
load more comments (2 replies)
load more comments (2 replies)
[–] m8052@lemmy.world 185 points 2 days ago (7 children)

What’s important is that this is running on your machine locally, offline, without any cloud services. It runs directly inside the executable

YES, thank you JB

load more comments (7 replies)
[–] Thistlewick@lemmynsfw.com 19 points 2 days ago (1 children)

Amazing. I can finally find out exactly what that nurse is yelling about while she gets railed by the local basketball team.

load more comments (1 replies)
[–] clot27@lemm.ee 19 points 2 days ago (1 children)

Will it be possible to export these AI subs?

[–] Scrollone@feddit.it 8 points 2 days ago

Imagine the possibilities!

[–] mp3@lemmy.ca 71 points 2 days ago* (last edited 2 days ago) (6 children)

Now I want some AR glasses that display subtitles above someone's head when they talk à la Cyberpunk that also auto-translates. Of course, it has to be done entirely locally.

load more comments (6 replies)
[–] phoenixz@lemmy.ca 49 points 2 days ago* (last edited 2 days ago) (9 children)

As vlc is open source, can we expect this technology to also be available for, say, jellyfin, so that I can for once and for all have subtitles.done right?

Edit: I think it's great that vlc has this, but this sounds like something many other apps could benefit from

[–] QuadratureSurfer@lemmy.world 22 points 2 days ago (4 children)

It's already available for anyone to use. https://github.com/openai/whisper

They're using OpenAI's Whisper model for this: https://code.videolan.org/videolan/vlc/-/merge_requests/5155

[–] lukewarm_ozone@lemmy.today 3 points 1 day ago

Note that openai's original whisper models are pretty slow; in my experience the distil-whisper project (via a tool like whisperx) is more than 10x faster.

load more comments (3 replies)
load more comments (8 replies)
[–] Nalivai@lemmy.world 12 points 2 days ago (6 children)

The technology is nowhere near being good though. On synthetic tests, on the data it was trained and tweeked on, maybe, I don't know.
I corun an event when we invite speakers from all over the world, and we tried every way to generate subtitles, all of them run on the level of YouTube autogenerated ones. It's better than nothing, but you can't rely on it really.

[–] lukewarm_ozone@lemmy.today 2 points 1 day ago* (last edited 1 day ago) (1 children)

Really? This is the opposite of my experience with (distil-)whisper - I use it to generate subtitles for stuff like podcasts and was stunned at first by how high-quality the results are. I typically use distil-whisper/distil-large-v3, locally. Was it among the models you tried?

[–] Nalivai@lemmy.world 1 points 17 hours ago

I unfortunately don't know the specific names of the models, I will comment additionally if I will not forget to ask people who spun up the models themselves.
The difference might be that live vs recorded stuff, I don't know.

[–] TriflingToad@sh.itjust.works 4 points 1 day ago* (last edited 1 day ago) (1 children)

is your goal to rely on it, or to have it as a backup?
For my purpose of having backup nearly anything will be better than nothing.

[–] Nalivai@lemmy.world 1 points 11 hours ago

When you do live streaming there is no time for backup, it either works or not. Better than nothing, that's for sure, but also maybe marginally better than whatever we had 10 years ago

load more comments (4 replies)
[–] TheRealKuni@lemmy.world 27 points 2 days ago (3 children)

And yet they turned down having thumbnails for seeking because it would be too resource intensive. 😐

[–] DreamlandLividity@lemmy.world 15 points 2 days ago (2 children)

I mean, it would. For example Jellyfin implements it, but it does so by extracting the pictures ahead of time and saving them. It takes days to do this for my library.

load more comments (2 replies)
[–] cley_faye@lemmy.world 10 points 2 days ago (1 children)

Video decoding is resource intensive. We're used to it, we have hardware acceleration for some of it, but spewing something around 52 million pixels every second from a highly compressed data source is not cheap. I'm not sure how both compare, but small LLM models are not that costly to run if you don't factor their creation in.

load more comments (1 replies)
load more comments (1 replies)
load more comments
view more: next ›