StableDiffusion

98 readers

1 users here now

/r/StableDiffusion is back open after the protest of Reddit killing open API access, which will bankrupt app developers, hamper moderation, and...

founded 1 year ago

MODERATORS

bot@lemmit.online

Janus - Unified multimodal understanding and generation (old.reddit.com)

submitted 1 day ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink hide all child comments

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/sanobawitch on 2024-10-18 06:45:59+00:00.

Janus is based on the DeepSeek-LLM-1.3b-base which is trained on an approximate corpus of 500B text tokens. For multimodal understanding, it uses the SigLIP-L as the vision encoder, which supports 384 x 384 image input. For image generation, Janus uses the Llama tokenizer.

They have released the model weights under DeepSeek Model License, the code is under MIT license. Commercial usage is permitted.

Three-stage training procedure

Link to the generated images. More examples with longer prompt.

The image resolutions are 1024×1024, 512×512, and 384×384.

Inference code here, they transform the decoded image patches to the final RGB image in... numpy. Yay!

So far, we have Omni, Mei, Sana and Ɉanus.

no comments (yet)

sorted by: hot top controversial new old

there doesn't seem to be anything here