Camen Design

© copyright

AI Art: Nothing Ventured, Nothing Gained

This is an updated, polished, blogified version of a Twitter thread where I wrote up a quick first-experience of toying with AI generated art.

This is the experience of someone who has never used the technology before, and whilst I'm a programmer and designer, you can also take this experience as an outsider’s view into the public perception of a technology that purports to be able to make literally anything; but as we know, there are caveats and real hard work required to produce perfect results.

Warning:

Some of the images below might be considered NSFW, although there is no actual nudity or graphic detail. At worst they might be appear provocative or suggestive, but that was never the intended goal.

I ask please that these images not be spread or re-used because ultimately AI generated imagery is a legal minefield that will take years to resolve and all I did was prompt the AI to draw something. I do not want to mis-represent the images here as my own work!

AI Art: A Thread

An anonymous person has gifted public access to their AI image generator and has probably raked up thousands of dollars of charges. Whatever their motives are, I am able to roughly assess something I could never afford although I am clearly no “expert”. After having used it, I've come to the realisation that it really is just a tool; when presented with almost infinite possibility, it can be debilitating to know what to even ask for! The number of levers and switches on the machine is hard to even comprehend.

I'm trying (and failing) to make a game about something personal to me, “Die Bunny Die!” and the fact that it features a protagonist in a bunny-suit will come off as perverse and weird, but I assure you it’s for a reason.

I'll begin with a frontispiece about how AI image generation does a lot of inference. If you enter a single word, it will randomly choose things like eye colour, scene composition, a myriad of stuff you didn’t have to even spell out. The database can only pluck colours and shapes from images its “seen” so this inference is not “inspiration” or the like IMO, if the database has no images containing fish (and tagged as such) it cannot draw fish.

I've generated this “bunny girl” with as few modifiers as I can on the interface. This picture is likely unique, but many thousands of real pieces of art have been referenced to produce this picture.

An AI-generated image of a 'bunny girl' with no direction on style, colour or design. The AI has decided to produce an anime girl with a large bust in a reddish-pink bunny suit. She's leaning forward with a shocked/embarrassed expression.
Note: AI generated image!

This is very twee. My game’s protagonist is a serious person, with the blind-confidence of youth but the inexperience in life to feel like she fits in. She constructed her bunny suit herself, not to be a ‘bunny girl’ but because she wanted to feel like the anime heroines of her youth and she is caught between an Eastern culture that sees it as ‘cute’ and harmless, and a Western culture that does not.

Naturally, she’s not a pink-haired anime girl with heterochromia but has short black hair so I begin with adding these properties to the input.

Here’s something interesting about how the AI merges conflicting information: at first I entered “bunny suit”, and I got this, which is more of a body-suit than a bunny-suit, but that’s because it saw “bunny” and “suit” as two separate things! It made a bunny-suit-like (i.e. shiny by default), work suit; jacket, trousers, et al! (notice the lapels!)

Two AI-generated images of a 'young woman with short black hair wearing a bunny suit'. The first image has her squatting, but the AI has misinterpreted the input and created a shiny body-suit rather than a bunny-suit. The second image shows a composite side and back view of the bodysuit, which has lapels on the front.
Note: AI generated image!

I note that the AI is very good at shiny stuff. It tends to blend things together into a smooth whole, which is why it can be bad at faces, fingers and so forth, but works amazingly well on fabrics. Also, the source images skew the defaults.

Here’s an image I like despite me barely getting to grips with the system, it still has the wonky “suit” but I like the elements that have been pulled together:

An AI-generated side-view image of a young woman squatting in high-heels, wearing what appears to be shiny leggings and bunny-girl paraphernalia such as a bunny tail and ears. There are numerous inaccuracies in the drawing of the image, but she has a soft, elegant and inquisitive look to her.
Note: AI generated image!

Because the AI is, I imagine, somewhat bifurcating the data across the tags you’re asking for, the image can suddenly change inferences. Here’s an unexpected surprise cartoonish image. I didn’t ask for three people, the dimensions of the image left space and some piece(s) of art somewhere deep in the data suggested that more copies be drawn there.

An AI-generated image, head and shoulders of three identical looking anime-styled girls with black hair and bunny ears looking at each other with shocked expressions.
Note: AI generated image!

If you ask for a bunny-suit, it will usually be black, but if you don’t ask for black, some other change in your query may make it suddenly change colour! Only “Looking behind” was added to the image on the left to produce the image on the right.

Two AI-generated images demonstrating the changing of one aspect of an image influencing others. On the left is a front view of a bunny girl with black hair and black bunny suit and ears. On the right the woman is facing away, and looking back over her right shoulder. Even though the background of a wet lamp-lit street is similar in both, the bunny suit and ears of the woman in the right image are pink!
Note: AI generated image!

But we’re getting ahead of ourselves; that example is from much later on.
We need to start at the beginning and build upwards.

Here’s what I would call the “Minimum Viable Product” of a realistic bunny-girl; you have to throw tags like “masterpiece”, “best quality”, “realistic” at it. Why? Because what you’re doing is picking out the higher quality art from the database, which often isn’t obvious — i.e. in a giant image set of pictures, what makes one picture more accurate than another? There are various tags that database curators use, but it’s highly inconsistent and often a strange wrangle.

An AI-generated image demonstrated minimal specificity for subject except for the addition of realism specifiers. Only the term 'bunny girl' has been given to the AI in terms of subject, with 'photorealism', 'masterpiece' and other tags added to direct the AI to drawing a realistic, rather than cartoonish, image. The background is an indistinct room with a pair of ceiling lights. The subject is woman with shoulder-length brown hair wearing a black bunny-suit and ears.
Note: AI generated image!

As far as the AI is concerned an anime bunny-girl and a photo of a cosplayer are both bunny-girls and contribute equal shape and colour. Separating those two is head-desking levels of pain and brute force.

Of course, there are solutions to this. If you’re rolling your own database, you can choose the data that goes in and it will produce better results, but it is a very manual and time-consuming process. This part of the whole feels like it’s still in the ’90s-esque swapping files on floppy disks phase as people build and share databases between themselves.

Let’s start shaping our character! My character has short spiky hair. It doesn’t understand spiky, and I don’t know what tag I should use to affect that, even if it’s possible — that’s literally relying on thousands of internet rando’s being anal enough to tag it on every picture.

An AI-generated image of a young woman with a focused expression dressed in a formal dinner jacket. She has short, neat, side-parted black hair, swept back.
Note: AI generated image!

It’s again used “suit” to mean work suit so I switch to a leather jacket.

An AI-generated image of a young woman wearing a black leather jacket with short casual black hair.
Note: AI generated image!

My character is torn between places. She’s British by birth to English and Japanese parents, but moved to France at a young age. She styles herself as French so I add “French” even though that’s unlikely to mean anything.

An AI-generated image of a woman with a young face, very short black hair wearing a black leather jacket and bodice or bunny-suit underneath.
Note: AI generated image!

Here I noticed that she’s a bit plain-faced — we’re missing colour! She wears incredibly bright red lipstick; like, seriously, what other colour would a British / Japanese fashion designer living in France in the ’90s be wearing!? Now the image really sparks to life, surprising! It’s like images with makeup are more wild-haired than those without.

An AI-generated image of a woman wearing a leather jacket. She has a beautiful face, green eyes, bright red lipstick and stylishly parted and cut short brown hair that appears redder at the ends.
Note: AI generated image!

Here’s where we start to see a focus problem. The more we ask, the more detail we lose as the cross section of images we’re pulling from fight each other. The face is rather water-colour like, lacking sharp edges.

Also, where did the bunny ears go!? Since I didn’t specify them, the other attributes I've added have overpowered the baseline bunny-girl. Of course you can shuffle around priorities of tags and that is another Sisyphean task that you could spend eternity on alone.

I add “bunny ears” specifically and increase its priority and… woah nelly! All those voluptuous bunny-girls came and gate-crashed my image! We’ve gone from French chiqué, to something out of Bioshock Infinite.

An AI-generated image of a woman's face and shoulders. The image is very stylised with detailed hair, a realistic face with eye shadow and bright red lips and an elaborate leather jacket with gold outlining on the lapels and shoulders.
Note: AI generated image!

Now, I could dial all of the tags in like an audio engineer in front of a giant mixer board where each knob takes a couple of minutes to take effect, but I'm not paying for this so I don’t have that eternity spare. I need to try things out and see what’s ‘good enough’.

Notice that I changed the eyes to brown; whilst my character is wearing a bunny suit, she isn’t anime and being half Japanese she has brown eyes but this doesn’t work very well because, I imagine, that tens of thousands of anime images have every eye-colour except brown!

An AI-generated image of a woman in a light tan coloured coat looking at the viewer. Her hair is brown and straight, cut down to above the neck. Her face has bright red lipstick, reddish-pink eye shadow and her eyes are a very pale brown colour.
Note: AI generated image!

Let’s play with expressions. Here’s an early attempt with minimal tags and “shocked expression”, that might not be the exact tag from the database, but there is some amount of AI language processing going on that lets you slur your way through.

An AI-generated image of a young woman's head and shoulders from the side. Her mouth hangs slightly open and she is looking at something out of view. Her hair is short and black with large sidelocks. She is wearing some kind of black turtleneck bodysuit.
Note: AI generated image!

Here’s a later version with our basic character attributes present. You’ll notice that as we develop the image more, we’ll start to lose the simple elegance in our earlier images.

An AI-generated image
Note: AI generated image!

The face is still not real enough. I go back to the drawing board and looking at what other people have been throwing into the machine I include more words that are supposed to enhance facial detail.

An AI-generated image
Note: AI generated image!

That is better, but a side effect of doing this is that if you say “close up of face” etc, you are limiting your dataset to pictures that don’t have the rest of the body, so the AI doesn’t want to draw it!

You can’t have both great facial detail, and a full body shot at the same time, not at least without “inpainting” (I think) and other advanced techniques. Here’s an example where if you combine all your detail with a large scale background, it can’t cope.

An AI-generated image
Note: AI generated image!

I'm sure that’s fixable with an incredible amount of time and GPU KWh, but this is what separates mere curious tinkerers and the real pros who have basically sussed out through brute force, sheer will and expenditure how to bend the AI to their whim.

We’ll get to backgrounds later. Back to faces.
Let’s go full ham on cinematic effects:

An AI-generated image
Note: AI generated image!

At this point we’ve focused so much (well, discounting the “soft focus” tag) we’ve lost the bunny-girl accoutrements and the AI tends to wobble around a bit when combining multiple costumes, like a bunny-suit and bolero jacket (even though the two often go together).

I switch to a turtleneck. (you’ve also seen this earlier re: expressions). It never ceases to amaze me when the AI can change the clothes of an image; to me this is more impressive than even conjuring an image out of thin air just from words.

An AI-generated image
Note: AI generated image!

I call this my “rain period”, because I discover that just by adding the word “rain” it suddenly starts creating backgrounds! (I wasn’t specifying them up to this point; the city stuff will come later)

However, at least with this hyper focus on the face, rain doesn’t imply a wet face so I add that. Getting a really drenched look appears to be a difficult trick to pull off.

An AI-generated image
Note: AI generated image!

I switch from landscape to portrait to give a bit more room for the shoulders. Changing the resolution and aspect ratio drastically alters the composition! Remember that you are bifurcating thousands of real images (even if the original images themselves are no longer in the database).

An AI-generated image
Note: AI generated image!

The teeth look off because my data is getting thin with all these tags. I didn’t try, but maybe this can be fixed by specifying “teeth” explicitly since I've left it up to the AI to infer that humans have teeth thus far lol!†

I decide to expand on the background and I discover something absolutely wild — there doesn’t even need to be someone there! The background is overpowering everything else because if you specify close up faces, none of the source images are going to have detailed backgrounds.

An AI-generated image
Note: AI generated image!

I throw away my super close up mega face detail tags and retool to see what a basic background will do with an over-the-shoulder pose. Since a plain bunny-suit can be a bit bare-shouldered and cold, I specify a leather “bolero jacket”; a short-backed jacket.

It’s not getting the bolero jacket right and it’s almost always long-sleeved so I force short-sleeved… and the suit changes colour!

An AI-generated image
Note: AI generated image!

I manage to get the bolero jacket back, with short-sleeves; it’s defaulted to blue for some reason.

An AI-generated image
Note: AI generated image!

I see that our character doesn’t look wet any more so I add “wet clothes” and the short-sleeves + wet-clothes combination becomes something else! Thank goodness we’re looking from behind, though she should still be looking back.

An AI-generated image
Note: AI generated image!

The “lampost” background is plain in the first so I try “alleyway” in the second to create a tighter space.

An AI-generated image
Note: AI generated image!

Too wide. Let’s use portrait. Again, no change in input tags, just resolution.

An AI-generated image
Note: AI generated image!

Interesting side note here, I notice that the background is excessively blurry compared to the character. The reason? The “soft focus” tag has shifted from the character to the background!

An AI-generated image
Note: AI generated image!

I can’t balance character detail at distance with background, so I decide to just go with my comfort blanket of close up face shots and dump my high quality tags in to pull the focus in — still with background specified. We’ve basically varnished ourselves into a corner here.

An AI-generated image
Note: AI generated image!

For example, if I change clothes, the style reverts to more anime looking because there’s no longer enough realistic images in the pool to draw from.†

An AI-generated image
Note: AI generated image!

With that particular avenue exhausted, I'm over my “rain period” and have made it out of the terrified n00b stage. Now I know how to use this, what can I do with it? I'd like to have something indicative of the game, since this is supposed to be the main character.

I don’t know what I'll use it for (future note: this is my critical flaw), but the game does feature her climbing the stairwell of a skyscraper. A stairwell makes a simple, narrow (good for portrait) background.

Experimenting with “high angle” here, impassive face for the 1st image, stern face for the 2nd image.

An AI-generated image
Note: AI generated image!

As much as I would love for there to be an MC Escher designed skyscraper, this is not quite what we were looking for:

An AI-generated image
Note: AI generated image!

In trying to fix the stairs I noticed my character’s chest was coming out too big. Trying to fix it reduces the heavily-biased image pool, skewing the output toward a different style — It’s almost impossible to change one thing in an image without changing another!

An AI-generated image
Note: AI generated image!

Searching a popular online image database typically used to feed an AI image data (I won’t say which one because obviously it’s also full of NSFW stuff), I discover that there’s a “sitting on stairs” tag, how exciting!

An AI-generated image
Note: AI generated image!

Well, she’s not so impressed anyhow.
You look bored, try putting your hands together.

An AI-generated image
Note: AI generated image!

Sigh, I meant in your lap!

An AI-generated image
Note: AI generated image!

Could you try and look cute, pleeeease?

An AI-generated image
Note: AI generated image!

That’s more like it!

Yes it does porn, badly, with all the problems demonstrated thus far, but interestingly if your image has nothing NSFW about it in the first place, adding “nsfw” will boost the characteristics of your subjects! (and often adds boobs and nudity where it wasn’t before because the NSFW side of the data skews that way by default)

We’re nearing the end now. Our character makes her way up the building towards the roof.

An AI-generated image
Note: AI generated image!

We can see the rest of the city outside the windows now.

An AI-generated image
Note: AI generated image!

As we head out to the roof we realise the dataset is getting thin out here in the open

An AI-generated image
Note: AI generated image!

Up on the roof, all that is left to do is enjoy the view with what little entropy remains.

An AI-generated image
Note: AI generated image!

What happens next? Well, you’ll have to play the game (if I finish it) to find out.

An AI-generated image
Note: AI generated image!

Conclusions

Is this “Art Theft”? There’s a critical separation between what is used to generate an image and what is done with the image that is generated and this shouldn’t be rolled into one.

If using images, without permission, to generate other images is theft then all memes are art theft. I think the deciding factor in theft is commercial activity with generated images.

However, all artists are well within their right to ask an image not be used in AI datasets as equally as they can ask an image not to be memed.

And there we get to the theory vs. practice of AI art and art theft. The thieves are quick and with the weakness of social media’s handling of information abuse en masse, it’s a free ticket to scammsville.

AI art is a Pandora’s box that has been opened and can never be shut. Any attempt to shut it down outright is futile, the software is already in people’s hands. It will not go away. At first the sky will go dark with the plague beasties flitting away.

I’m not a commercial artist, not of this stripe anyway, graphic design is too abstract for AI to be good with (for now) so it’s not my livelihood that’s being risked here. I am an empathetic man, but also deeply cynical.

Artists must fight for regulation and licensing first. Trying to stop your images being hoovered up is like being a pre-Roman Briton who watches Roman century after Roman century march in from the distance.

Now you’re free to blame and sue them for the rape and pillaging of your quiet village, right as soon as they establish and build a court in which for you to do so. The machine cares not. It doesn’t care about your emotions, unless tagged correctly TYVM, it wants meta-data and artists must band together to establish that meta-data in their favour.

What Is Good For?

You can’t do everything in one shot, that’s a naive and ultimately fruitless approach. You can spend hundreds of hours building and training your dataset, tuning your tags to the n'th degree and you will still struggle to get perfect results. Anybody with a decent amount of Photoshop (or, insert name of preferred art package here) skill will realise that one can ask for separate things and composite stuff together appropriately.

This is where I made my critical mistake before, with no specific idea of what I wanted to use the results for, I just tried generating everything on one canvas with the AI doing 100% of the work. What you end up with is hundreds of images that are ‘neat’, but not useable for anything specific; there’s too many errors and no clean separation of elements.

There’s definitely an undeniable skill at operating the machine, I don’t know what you would call this skill since “artist” conflicts with where a real artist’s time and skill are spent, but then I am a programmer and that’s a valid art-form too. We need a new word.

Combined with the draftsmanship of a skilful Photoshopper (and yes, even real artists too!) the results drastically reduce the time and effort of content creation whilst at the same time drastically raising complexity and quality.

What content though? AI art accelerates content production in a way that doesn’t really benefit consumers, it’s just MOAR and lots of it all whilst real artists are sidelined. It’s just a tool after all and we humans just can’t get enough of our self-destructive tools.

I truly believe that we are about to enter an age of stylistic maximalism such as we experienced in the ’90s with the advent of Photoshop. Computers had been generating images since the 60s, but it took until Photoshop for the right tool to exist to spread digital art into all avenues of life.

Since 2010, we first entered the flat, minimalist phase and I've been waiting, watching for clues of the return to maximalism. The rounded corners of Web 2.0 went away for Metro but now they have started coming back.

AI art’s ability to stick an impressive picture into anything at very little cost and with all the benefit of trying to see if something works or not first, and then modifying it to suit, is a power almost too much to hold back. Too many dials to play with. But we’re still at the “community” phase of the technology right now, like pre IBM PC with microcomputers or roughly 1995 in Internet terms, right at the cusp of the commercial explosion. The tinkerers and dreamers were here first and boy are they about to get usurped.

What Is Not Good For?

Without a specific purpose in mind, without a specific image needed for a specific place in a specific website design (for example), AI art is limited to being a play thing. You could scroll through near infinite highly impressive art piece on Twitter, but… why? It’s a massive time-sink if not tempered by purpose and meaning.

I'm old enough now that “because it’s neat” isn’t good enough reason to pour thousands of hours into it and neither am I (or ever have been) one who wants to become an ‘influencer’. For the deeply insecure in the world who live vicariously through vapid Internet validation, AI art is a new drug that promises algorithm-beating results for the least amount of effort. It’s incredibly addictive, I'll admit, but I could never get where I want to go with what it’s offering.

I could produce all kinds of art for my game but, firstly, I chose a text adventure because I wanted to avoid art and secondly what I need to actually, really, do right now is not fluff-pieces of concept art but to sit down and just write.

I need to detox from this AI art thing so my plan is to focus my computer activities in November on writing. Just writing. Doesn’t matter where it’ll fit in, or even if it’ll ever be used. I just need to get back in to the flow of dreaming up and crafting worlds, for my own satisfaction, without an AI taking that away from me.