This is an automated archive made by the Lemmit Bot.
The original was posted on /r/stablediffusion by /u/jmellin on 2024-09-01 12:18:26+00:00.
I love the new CogVideoX-5b model and think it's great that we finally have a strong competitor in the open-source space, rivaling Kling, Runway, and others. However, I believe the community's demand for an image-to-video (img2vid) feature is evident.
Fine-tuned image-to-video model of curent text-to-video model existing but not released
After doing some research on GitHub, I found that the authors have stated they have no plans to open-source their current Image-to-Video model, which I find disappointing. I hope they reconsider in the future.
I believe that the first person or team to fine-tune the current model to handle image-to-video (which I know is no small task) and open-source it will gain a lot while also becoming a community legend. Alternatively, if someone develops a software solution, similar to inpainting I guess, that allows setting the first latent image, they would also be eligible for that recognition.
Keeping my fingers crossed for any of the above.
Links:
Authors response to Image To Video request in their github
kijai mention it as a reply in his ComfyUI-wrapper node