this post was submitted on 22 Aug 2024
1 points (100.0% liked)

StableDiffusion

98 readers
1 users here now

/r/StableDiffusion is back open after the protest of Reddit killing open API access, which will bankrupt app developers, hamper moderation, and...

founded 1 year ago
MODERATORS
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/MarcS- on 2024-08-21 22:04:09+00:00.


Hi everyone,

I've been running comparisons with standardized prompts of several new models. Usually, I focus on models I can run on my local machine (since I favor open software) but I decided I could use some free generations on ideogram to test their latest 2.0 model which they claim is better than Flux and Dall-E. I couldn't run all my library of prompt before running out of free credits but I hope the five prompts I tested will be of interest to you, before deciding if it's worth paying for subscription to their online generation service.

Prompt #1: the positional prompt, which you can compare to Flux and AuraFlow here :

"a blue cylinder in the center of the image, with a red sphere at the left, a green square at the right, a purple smiling sun on the top of the image and a severed foot at the bottom"

The idea here is to test if Ideogram 2.0 is SOTA in matter of adhering to a prompt with several items clearly positionned compared to each other.

Here are the four results I got, not cherrypicked:

It's very good. Arguably the smiling sun isn't on top all the time but it's at least in the top third of the image each time, so I say that it passes this test. AuraFlow did, as well, but it's the SOTA model for prompt adherence (version 0.2). Aesthetics are bad in both case, but I won't value aesthetics here as the result is pretty surrealist anyway. If we were to nitpick, I could say that the feet only look severed and not attached to the cylinder 3 times out of 4.

Prompt #2: A complex description.

Here I compared several models with the Shinto monk prompt.

"In the inner court of a grand Greek temple, majestic columns rise towards the sky, framing the scene with ancient elegance. At the center, a Shinto monk, dressed in traditional white and orange robes with intricate patterns, is levitating in the lotus position, floating serenely above a blazing fire. The flames dance and flicker, casting a warm, ethereal glow on the monk's peaceful expression. His hands are gently resting on his knees, with beads of a prayer necklace hanging loosely from his fingers. At the opposite end of the court, an anthropomorphical lion, regal and powerful, is bowing deeply. The lion, with a mane of golden fur and wearing an ornate, ceremonial chest plate, exudes a sense of reverence and respect. Its tail is curled gracefully around its body, and its eyes are closed in solemn devotion. Surrounding the court, ancient statues and carvings of Greek deities look down, their expressions solemn and timeless. The sky above is a serene blue, with the light of the setting sun casting long shadows and a warm, golden hue across the scene, highlighting the unique fusion of cultures and the mystical ambiance of the moment."

This prompt has 20 different items to rate, so I get a mark out of 20 and averaged the first 4 generations.

Misses "hands on knees", he doesn't hold the prayer beads in hands, the lion isn't anthropomorphic, not bowing particularly, mane isn't really fiery, his tail isn't curled around his body, admittedly his eyes are half-closed so I'll count it as right, no statues of greek gods, no serene blue sky. 12 out 20.

No lotus position, no prayer beads, not attached to hands, lion not anthropomorphic, mane doesn't seem golden either, tail not around body, that's a 14 (but the monk position is a big drawback).

Horrible monk... Misses the same as before, plus orange and white robe, intricate patterns, Demerit for the artifact monk... 11/20.

Misses the court of the temple (he's in front of a temple), misses the location of prayer bead necklace, anthropomorph lion, (fur admittedly golden here), tail curled around body, statues of greek gods. 15/20.

The average is 13/20. AuraFlow did 15/20. The prompt adherence is good, but not stellar. But out of a few generation, some can get quite close to the intended image.

Prompt #3: the pirate lady

A woman wearing 18th-century attire is positioned on all fours, facing the viewer, on a wooden table in a lively pirate tavern. She is dressed in a traditional colonial-style dress, with a corset bodice, lace-trimmed neckline, and flowing skirts. The fabric of her dress is rich and textured, featuring a deep burgundy color with intricate embroidery and gold accents. Her hair is styled in loose curls, cascading around her face, and she wears a tricorn hat adorned with feathers and ribbons.The tavern itself is bustling with activity. The background is filled with wooden beams, barrels, and rustic furniture, typical of a pirate tavern. The atmosphere is dimly lit by flickering lanterns and candles, casting warm, golden light throughout the room. Various pirates and patrons can be seen in the background, engaged in animated conversations, drinking from tankards, and playing cards. The woman's expression is confident and mischievous, her eyes meeting the viewer's gaze directly. Her posture, though unusual for the setting, conveys a sense of boldness and command. The table beneath her is cluttered with tankards, maps, and scattered coins, adding to the chaotic and adventurous ambiance of the pirate tavern.

Another scene that is very clearly depiected to reflect the image I have in mind. I won't count items, as the goal was to see if we could get a woman on all fours in a non-sexual context.

Ideogram fails, #3 is the best but she's at most leaning on the table, not on all fours on the table. Also, the table isn't cluttered with tankards, maps and coins. The model focussed on the 1girl, not the whole of the scene's composition. Flux did better, despite missing the kneeling on all four part of the lady as well.

Prompt #4; the submarine ruins

Compare here:

"Beneath the tranquil surface of a crystal-clear ocean, an ancient temple lies half-submerged, its majestic architecture eroded but still grand. The temple is a marvel, with columns covered in intricate carvings of sea creatures and mythical beings. Soft, blue light filters down from above, illuminating the scene with a serene glow. Merfolk, with their shimmering scales and flowing hair, glide gracefully around the temple, guarding its secrets. Giant kelp sway gently in the current, and schools of colorful fish dart through the water, adding vibrant splashes of color. An adventuring party, equipped with magical diving suits that emit a soft glow, explores the temple. They are fascinated by the glowing runes and ancient artifacts they find, evidence of a long-lost civilization. One member, a wizard, reaches out to touch a glowing orb, while another, a rogue, carefully inspects a mural depicting a great battle under the sea."

Actually, Ideogram did pretty good on this one, especially on the intricate carvings of sea creatures on the column, which are the most elaborate of any models I tried. On the other hands, it drops the ball mid-prompt, with a party of adventurer barely present, not interacting as they shouuld and lacking magical diving suits. It is however the prettiest set of images generated, so it has some quality.

And finally, a short prompt to let the magical prompt shine : "a breathtaking views of the Garden Dome, orbiting Uranus, with people taking a coffee break".

Not Uranus, no garden-y thing. The garden dome could be on an asteroid, so I won't count it against Ideogram.

Not very garden-y as well. Als...


Content cut off. Read original on https://old.reddit.com/r/StableDiffusion/comments/1ey2ffa/ideogram_20_prompt_adherence_and_aesthetics_test/

no comments (yet)
sorted by: hot top controversial new old
there doesn't seem to be anything here