Stable Diffusion

Tech Update 1: AI Generators (Jan 2023)

Tracy Harwood Blog January 8, 2023 Leave a reply

A month is apparently a VERY long time in the world of artificial intelligence… since our last post on this topic, released at the beginning of December, we’ve seen even more amazing techs launch.

ChatGPT

This is another creative AI from the OpenAI stable (also creator of DALL-E 2) that produces text in a conversational way. ChatGPT is capable of answering pretty much any question or request you care to throw at it, and the answers are actually quite robust if a little fuzzy around the edges at times (don’t rely on it for anything critical, in the words of Sam Altman, OpenAI CEO). Here’s an example I asked it for this post, using the prompt: ‘In the style of H P Lovecraft, write a description of a character for someone who hunts the Chluthu’

Response:

Professor Alistair Dunsmore was a man obsessed. For years, he had devoted his life to studying the mysteries of the cosmos, and in particular, the myth of the Chluthu.

A brilliant but troubled man, Dunsmore was driven by a need to uncover the truth about the terrifying creature that lurked beyond the veil of our known universe. He had spent years pouring over ancient texts and forbidden tomes, piecing together the scattered clues that hinted at the Chluthu’s existence.

And then, one fateful day, he discovered something that would change his life forever. A fragment of an ancient artifact, imbued with otherworldly powers, that seemed to hold the key to finding the Chluthu.

Dunsmore knew that he had to have it, no matter the cost. And so he set out on a quest to track down the artifact and unlock its secrets. He traveled to the far corners of the earth, braving dangers and braving the unknown, all in the pursuit of his ultimate goal.

Some say that Dunsmore was mad, driven to the brink by his obsession. But to those who knew him, he was a hero, a brave and brilliant man who dared to stare into the darkness and face the horrors that lay within.

That’s impressive – and it took just seconds to generate. It has great potential to be a useful tool for scriptwriting for stories and character development that can be used in machinima and virtual productions, and also marketing assets you might use to promote your creative works too!

And as if that isn’t useful enough, some bright folks have already used it to write a game and even create a virtual world. Note the detail in the prompts being used – this one from Jon Radoff’s article (4 Dec 2022) for an adventure game concept: ‘I want you to act as if you are a classic text adventure game and we are playing. I don’t want you to ever break out of your character, and you must not refer to yourself in any way. If I want to give you instructions outside the context of the game, I will use curly brackets {like this} but otherwise you are to stick to being the text adventure program. In this game, the setting is a fantasy adventure world. Each room should have at least 3 sentence descriptions. Start by displaying the first room at the beginning of the game, and wait for my to give you my first command’.

The detail is obviously the key and no doubt we’ll all get better at writing prompts as we learn how the tools respond to our requests. It is interesting that some are also suggesting there may be a new role on the horizon… a ‘prompt engineer’ (check out this article in the UK’s Financial Times). Yup, that and a ‘script prompter’, or any other possible prompter-writer role you can think of… but can it tell jokes too?

Give it a go – we’d love to hear your thoughts on the ideas it generates. Of course, those of you with even more flAIre can then use the scripts to generate images, characters, videos, music and soundscapes. There’s no excuse for not giving these new tools for producing machine cinema a go, surely.

Link requires registration to use (it is currently free) and note the tool now also keeps all of your previous chats which enables you to build on themes as you go: ChatGPT

Image Generators

Building on ChatGPT, D-ID enables you to create photorealistic speaking avatars from text. You can even upload your own image to create a speaking avatar, which of course raises a few IP issues, as we’ve just seen from the LENSA debacle (see this article on FastCompany’s website), but JSFILMZ has highlighted some of the potentials of the tech for machinima and virtual production creators here –

An AI we’ve mentioned previously, Stable Diffusion version 2.1 released on 7 December 2022. This is an image generating AI, its creative tool is called Dream Studio (and the Pro version will create video). In this latest version of the algorithm, developers have improved the filter which removes adult content yet enables beautiful and realistic looking images of characters to be created (now with better defined anatomy and hands), as well as stunning architectural concepts, natural scenery, etc. in a wider range of aesthetic styles than previous versions. It also enables you to produce images with non standard aspect ratios such as panoramics. As with ChatGPT, a lot depends on the prompt written in generating a quality image. This image and prompt example is taken from the Stability.ai website –

source: Stability.ai

So, just to show you how useful this can be, I took some text from the ChatGPT narrative for our imaginary character, Professor Alistair Dunsmore, and used a prompt to generate images of what he might look like and where he might be doing his research. The feature images for this post are some of the images it generated – and I guess I shouldn’t have been so surprised that the character looks vaguely reminiscent of Lovecraft himself. The prompt also produced some other images (below) and all you need to do is select the image you like best. Again, these are impressive outputs from a couple of minutes of playing around with the prompt.

images of Professor Alistair Dunsmore, in his study, searching for the Chluthu, by Tracy & Stable Diffusion

For next month, we might even see if we can create a video for you, but in the meantime, here’s an explainer of a similar approach that Martin Nebelong has taken, using MidJourney instead to retell some classic stories –

Supporting the great potential for creative endeavour, ArtStation has taken a stance in favour of the use of AI in generating images with its portfolio website (which btw was bought by Epic Games in 2021). This is in spite of thousands of its users demanding that it remove AI generated work and prevent content being scraped. This request is predicated on the lack of transparency used by AI developers in training and generating datasets. Instead, ArtStation has removed those using the Ghostbuster-like logo on their portfolios (‘no to AI generated images’) from its homepage and issued a statement about how creatives using the platform can protect their work. The text of an email received on 16 December 2022 stated:

Our goal at ArtStation is to empower artists with tools to showcase their work. We have updated our Terms of Service to reflect new features added to ArtStation as it relates to the use of AI software in the creation of artwork posted on the platform.

First, we have introduced a “NoAI” tag. When you tag your projects using the “NoAI” tag, the project will automatically be assigned an HTML “NoAI” meta tag. This will mark the project so that AI systems know you explicitly disallow the use of the project and its contained content by AI systems.

We have also updated the Terms of Service to reflect that it is prohibited to collect, aggregate, mine, scrape, or otherwise use any content uploaded to ArtStation for the purposes of testing, inputting, or integrating such content with AI or other algorithmic methods where any content has been tagged, labeled, or otherwise marked “NoAI”.

For more information, visit our Help Center FAQ and check out the updated Terms of Service.

You can also read an interesting article following the debate on The Verge’s website here, published 23 December 2022.

example of a logo used by creators on ArtStation portfolios

We’ve said it before, but AI is one of the tools that the digital arts community has commented on FOR YEARS. Its best use is as a means to support creatives to develop new pathways in their work. It does cut corners but it pushes people to think differently. I direct the UK’s Art AI Festival and the festival YouTube channel contains a number videos of live streamed discussions we’ve had with numerous international artists, such as Ernest Edmonds, a founder of the digital arts movement in the 1960s; Victoria and Albert Museum (London) digital arts curator Melanie Lenz; the first creative AI Lumen Prize winner, Cecilie Waagner Falkenstrom; and Eva Jäger, artist, researcher and assistant curator at Serpentine Galleries (London), among others. All discuss the role of AI in the development of their creative and curatorial practice, and AI is often described as a contemporary form of a paintbrush and canvas. As I’ve illustrated above with the H P Lovecraft character development process, its a means to generate some ideas through which it is possible to select and explore new directions that might otherwise take weeks to do. It is unfortunate that some have narrowed their view of its use rather than more actively engaged in discussion on how it might add to the creative processes employed by artists, but we also understand the concerns some have on the blatant exploitation of copyrighted material used without any real form of attribution. Surely AI can be part of the solution for that problem too although I have to admit so far I’ve seen very little effort being put into this part of the challenge – maybe you have?

In other developments, a new ‘globe’ plug-in for Unreal Engine has been developed by Blackshark. This is a fascinating world view, giving users access to synthetic 3D (#SYNTH3D) terrain data, including ground textures, buildings, infrastructure and vegetation of the entire Earth, based on satellite data. It contains some stunning sample sets and, according to Blackshark’s CEO Michael Putz, is the beginning of a new era of visualizing large scale models combined with georeferenced data. I’m sure we can all think of a few good stories that this one will be useful for too. Check out the video explainer here –

And Next…?

Who knows, but we’re looking forward to seeing how this fast action tech set evolves and we’ll be aiming to bring you more updates next month.

Don’t forget to drop us a line or add comments to continue the conversation with us on this.

Tech Update 1: AI Generators (Dec 2022)

Tracy Harwood Blog December 5, 2022 3 Comments

Everything with AI has grown exponentially this year, and this week we show you AI for animation using different techniques as well as AR, VR and voice cloning. It is astonishing that some of these tools are already a part of our creative toolset, as illustrated in our highlighted projects by GUNSHIP and Fabien Stelzer. Of course, any new toolset comes with its discontents, and so we cover some of those we’ve picked up on this past month too. It is certainly fair to say there are many challenges with this emergent creative practice but it appears these are being thought through alongside the developing applications by those using it… although, of course, legislation is far from here.

Animation

Text-to-image generator Stable Diffusion raised $100M in October this year and is about to release its animation API. On 15 November it released DreamStudio, the first API on its web platform of future AI-based apps, and on 24 November it released Stable Diffusion 2.0. The animation API, DreamStudio Pro, will be a node-based animation suite enabling anyone to create videos, including with music, quickly and easily. It includes storyboarding and is compatible with a whole range of creative toolsets such as Blender, potentially making it a new part of the filmmaking workflow bringing imagination closer to reality without the pain, or so it claims. We’ll see about that shortly no doubt. And btw, 2.0 has higher resolution upscaling options, more filters on adult content, increased depth information that can be more easily transformed into 3D and text-guided in-painting which helps to switch out parts of an image more quickly. You can catch up with the announcements on Robert Scoble’s Youtube channel here –

As if that isn’t amazing enough, Google is creating another method for animating using photographs, think image-to-video, called Google AI FLY. Its approach will make use of pre-existing methods of in-painting, out-painting and super resolution of images to animate a single photo, creating a similar effect to nerf (photogrammetry) but without the requirement for many images. Check out this ‘how its done’ review by Károly Zsolnai-Fehér on the Two Minute Papers channel –

For more information, this article on Petapixel.com‘s site is worth a read too.

And finally this week, Ebsynth by Secret Weapon is an interesting approach that uses a video and a painted keyframe to create a new video resembling the aesthetic style used in the painted frame. It is a type of generative style transfer with an animated output that could only really be achieved in post production but this is soooo much simpler to do and it looks pretty impressive. There is a review of the technique on 80.lv’s website here and an overview by its creators on their Youtube channel here –

We’d love to see anyone’s examples of outputs with these different animation tools, so get in touch if you’d like to share them!

AR & VR

For those of you into AR, AI enthusiast Bjorn Karmann also demonstrated how Stable Diffusion’s in-painting feature can be used to create new experiences – check this out on his Twitter feed here –

For those of you into 360 and VR, Stephen Coorlas has used MidJourney to create some neat spherical images. Here is his tutorial on the approach –

Also Ran?

Almost late to the AI generator party (mmm….), China has released ERNIE-ViLG 2.0 by Baidu, a Chinese text-to-image AI which Alan Thompson claims is even better than DALL-E and Stable Diffusion albeit using much a smaller model. Check out his review which certainly looks impressive –

Voice

NVidia has done it again – their amazing Riva AI clones a voice using just 30 minutes of voice samples. The application of this is anticipated to be conversational virtual assistants, including multi-lingual assistants and its already been touted as frontrunner with Alexa, Meta and Google – but in terms of virtual production and creative content, it is also possible it could be used to replace actors when, say, they are double booked or poorly. So, make sure you get that covered in your voice-acting contract in future too.

Projects

We found a couple of beautiful projects that push the boundaries this month. Firstly GUNSHIP’s music video is a great example of how this technology can be applied to enhance their creative work. Their video focusses on the aesthetics of cybernetics (and is our headline image for this article). Nice!

Secondly, an audience participation film by Fabien Stelzer which is being released on Twitter. The project uses AI generators for image and voice and also for scriptwriting. After each episode is released, viewers vote on what should happen next which the creator then integrates into the subsequent episode of the story. The series is called Salt and its aesthetic style is intended to be 1970s sci-fi. You can read about his approach on the CNN Business website and be a part of the project here –

Emerging Issues

Last month we considered the disruption that AI generators are causing in the art world and this month its the film industry’s turn. Just maybe we are seeing an end to Hollywood’s fetish with Marvellizing everything or perhaps AI generators will result in extended stories with the same old visual aesthetic, out-painted and stylized… which is highly likely since AI has to be trained on pre-existing images, text and audio. In this article, Pinar Seyhan Demirdag gives us some thoughts about what might happen but our experience with the emergence of machinima and its transmogriphication into virtual production (and vice versa) teaches us that anything which cuts a few corners will ultimately become part of the process. In this case, AI can be used to supplement everything from concept development, to storyboarding, to animation and visual effects. If that results in new ideas, then all well and good.

When those new ideas get integrated into the workflow using AI generators, however, there is clearly potential for some to be less happy. This is illustrated by Greg Rutkowski, a Polish digital artist whose aesthetic style of ethereal fantasy landscapes is a popular inclusion in text-to-image generators. According to this article in MIT Technology Review, Rutkowski’s name has appeared on more than 10M images and used as a prompt more than 93,000 times in Stable Diffusion alone – and it appears that this is becasue data on which the AI has been trained includes ArtStation, one of the main platforms used by concept artists to share their portfolios. Needless to say, the work is being scaped without attribution – as we have previously discussed.

What’s interesting here is the emerging groundswell of people and companies calling for legislative action. An industry initiative has formed and is evolving rapidly, spearheaded by Adobe in partnership with Twitter and the New York Times called Content Authentication Initiative. CAI aims to authenticate content and is a publishing platform – check out their blog here and note you can become a member for free. To date, it doesn’t appear that the popular AI generators we have reviewed are part of the initiative but it is highly likely they will at some point, so watch this space. In the meantime, Stability AI, creator of Stable Diffusion, is putting effort into listening to its community to address at least some of these issues.

Of course, much game-based machinima will immediately fall foul of such initiatives, especially if content is commercialized in some way – and that’s a whole other dimension to explore as we track the emerging issues… What of the roles of platforms owned by Amazon, Meta and Google, when so much of their content is fan-generated work? And what of those games devs and publishers who have made much hay from the distribution of creative endeavour by their fans? We’ll have to wait and see but so far there’s been no real kick-back from the game publishers that we’ve seen. The anime community in South Korea and Japan has, however, collectively taken action against a former French game developer, 5you. The company used a favored artist’s work, Jung Gi, to create an homage to his practice and aesthetic style after he had died but the community didn’t agree with the use of an AI generator to do that. You can read the article on Rest of World’s website here. Community action is of course very powerful and voting with feet is something that invokes fear in the hearts of all industries.

Tech Update 1 (Nov 2022)

Tracy Harwood Blog October 30, 2022 Leave a reply

Hot on the heels of our discussion on AI generators last week, we are interested to see tools already emerging that turn text prompts into 3D objects and also film content, alongside a tool for making music too. We have no less than five interesting updates to share here – plus a potentially very useful tool for rigging the character assets you create!

Another area of rapidly developing technological advancements is mo-cap, especially in the domain of markerless which lets face it is really the only way to think about creating naturalistic movement-based content. We share two interesting updates this week.

AI Generators

Nvidia has launched an AI tool that will generate 3D objects (see video). Called GET3D (which is derived from ‘Generate Explicit Textured 3D meshes’), the tool can generate characters and other 3D objects, as explained by Isha Salian on their blog (23 Sept). The code for the tool is currently available on Github, with instructions on how to use it here.

Google Research with researchers at the University of California, Berkeley are also working on similar tools (reported in Gigazine on 30 Sept). DreamFusion uses NeRF tech to create 3D models which can be exported into 3D renderers and modeling software. You can find the tool on Github here.

DreamFusion

Meta has developed a text-to-video generator, called Make-A-Video. The tool uses a single image or can fill in between two images to create some motion. The tool currently generates five second videos which are perfect for background shots in your film. Check out the details on their website here (and sign up to their updates too). Let us know how you get on with this one too!

Make-A-Video

Runway has released a Stable Diffusion-based tool that allows creators to switch out bits of images they do not like and replace them with things they do like (reported in 80.lv on 19 Oct), called Erase and Replace. There are some introductory videos available on Runway’s YouTube channel (see below for the Introduction to the tool).

And finally, also available on Github, is Mubert, a text-to-music generator. This tool uses a Deforum Stable Diffusion colab. Described as proprietary tech, its creator provides a custom license but says anything created with it cannot be released on DSPs as your own. It can be used for free with attribution to sync with images and videos, mentioning @mubertapp and hashtag #mubert, with an option to contact them directly if a commercial license is needed.

Character Rigging

Reallusion‘s Character Creator 4.1 has launched with built in AccurRIG tech – this turns any static model into an animation ready character and also comes with cross-platform support. No doubt very useful for those assets you might want to import from any AI generators you use!

Motion Capture Developments

That every-ready multi-tool, the digital equivalent of the Swiss army knife, has come to the rescue once again: the iPhone can now be used for full body mocap in Unreal Engine 5.1, as illustrated by Jae Solina, aka JSFilmz, in his video (below). Jae has used move.ai, which is rapidly becoming the gold standard in markerless mocap tech and for which you can find a growing number of demo vids showing how detailed movement can be captured on YouTube. You can find move.ai tutorials on Vimeo here and for more details about which versions of which smart phones you can use, go to their website here – its very impressive.

Another form of mocap is the detail of the image itself. Reality Capture has launched a tool that you can use to capture yourself (or anyone else or that matter, including your best doggo buddy) and use the resulting mesh to import into Unreal’s MetaHuman. Even more impressive is that Reality Capture is free, download details from here.

We’d love to hear how you get on with any of the tools we’ve covered this week – hit the ‘talk’ button on the menu bar up top and let us know.

Report: Creative AI Generators (Oct 2022)

Tracy Harwood Blog October 23, 2022 2 Comments

In this month’s special report, we take a look at some of the key challenges in using creative AI generators such as DALL-E, MidJourney, Stable Diffusion and others. Whilst we think they have FANTASTIC potential for creators, not least because they cut down the time in finding some of the creative ideas you want to use, there are some things that are emerging that need to be considered when using them.

Firstly, IP is a massive issue. As noted in this article on Kotaku (Luke Plunkett), the recent rise of AI-created art has brought to the fore some of the moral and legal problems in using it. In terms of the moral issues, some are afraid of a future where entry level art positions are taken over by AI and others see AI-created art as a reflection of what’s already occuring between artists – the influence of style and content… but this is an argument that came to the fore when computers were first used by artists back in the 1960s. Quite frankly we are now seeing some of the most creative work in a generation come to fruitition that just would not have happened without computational assistance. Take a look at the Lumen Prize annual entries, for example, to see what the state of the art is with creative possibilities of AI and other tech. Tracy even directs an Art AI Festival, aiming to showcase some of the latest AIs in creative applications, working in collaboration with one of the world’s leading creative AI curators, Luba Elliott.

As to the legal issues, these are really only just emerging and in a very disjointed and piecemeal way. It was interesting to note that Getty Images notified its contributors in an email (21 Sept 2022) that “Effective immediately, Getty Images will cease to accept all submissions created using AI generative models (e.g., Stable Diffusion, Dall‑E 2, MidJourney, etc.) and prior submissions utilizing such models will be removed.” It went on to state: “There are open questions with respect to the copyright of outputs from these models and there are unaddressed rights issues with respect to the underlying imagery and metadata used to train these models. These changes do not prevent the submission of 3D renders and do not impact the use of digital editing tools (e.g., Photoshop, Illustrator, etc.) with respect to modifying and creating imagery.” This is hot on the heals of a number of developments earlier in the year: in February 2022, the US Copyright Office refused to acknowledge that an AI could hold copyright of its creative endeavour (article here). By September 2022, an artwork created with MidJourney by Jason Allen that won the Colorado State Fair contest was causing a major stir across the art world as to what constitutes art, as outlined in this article (Smithsonian Magazine) and this short news report here –

Of course, the real dilemma is what happens to artists, particularly those at the lower end of the food chain. By way of another example, consider the UK actors’ union Equity’s response to recent proposals by the Government to include a data mining exemption for audio-visual content in its proposed new AI regulation. Why that’s interesting is because already a number of organizations that would otherwise employ these artists, say as graphic designers or concept artists, are rapidly replacing them with AI generated images – Cosmopolitan used its ‘first AI generated cover’ in June 2022 and advertising agencies the world over are doing likewise (Adage article). Some image users have even stated that in future they will ONLY use these tools as image sources, effectively cutting out the middle man, and indeed the originator of the contributory works. So, of course Getty is not going to be happy about this… and neither are the many contributors to their platforms.

And so here is the nub of the problem: in the rush that is now going to follow Getty’s stance (and probably others with similar influence to follow), how will the use of AI generators be policed? This has pretty serious consequences because it has implications for all content including on YouTube, in festivals and contests around the world – how would creative works like The Crow be judged (see our blog post here too)? It certainly places emphasis on the role of metadata and statements of authorship, but it is also as good an argument we can think of for using blockchain too! The Crow for example briefly mentions the AI generator tool it has used, which is freely available to use on Google CoLab here, but it doesn’t show the sources of the underlying training data set used.

AI code source is Pytii Colab Notebook (sportsracer48)

We contend, the only way to police the use of AI generated content is actually by using AI, say by analysing pixel level detail… and that’s because one of Getty’s points is no doubt going to be how their own stock images, even with copyright claims over them, have been used in training data sets. AI simply cuts out the stuff out that it doesn’t want and voila, something useful emerges! So, unless there is greater transparency and disclosure among the creators of AI generators AS A PRIORITY on where images have been scraped from and how they have been used, there is going to be a major problem for all types of content creators, including the machinima and virtual production creator using these tools as a way to infuse new ideas into their creative projects, and as the ability to turn 2D image into 3D object becomes more accessible to a wider range of creators. Watch this space!

In the meantime, we’ll be doing a podcast on the Completely Machinima YouTube channel some of the best creative ideas we’ve seen next month so do look out for that too.

We’d love to hear your views on this topic, so do drop them into the comments.

btw, our featured image was created in MidJourney using the prompt: ‘Diary of a Camper made in Quake engine’, by @tgharwood

Projects Update (Oct 2022)

Tracy Harwood Blog October 10, 2022 1 Comment

This week’s Projects Update on machinima, virtual production and content creation:

The Crow

One of the most interesting creative projects we’ve seen so far using MidJourney, a creative AI generator is the The Crow (by Glenn Marshall Neural Art). Here the generator has been used to recreate a version of the ballet performance portrayed in the short film, Painted (by Duncan McDowall and Dorotea Saykaly). Stunning to say the least and we recommend you play it at least once side-by-side against the original performance too for added insight.

We’re so impressed with the potential of AI generators, whether that’s DALL-E, MidJourney, Stable Diffusion or any of the others that are now emerging, that we’re going to dedicate a special episode of the podcast to the subject next month, so watch out for that!

Jim Henson Company

Jim Henson Company is using real-time animation on their new show, Earth to Ned. Characters are created with Unreal (its the AI in the background) but JHC has been so impressed with the workflow and no post production requirement that it is looking to use the virtual production method more. What’s interesting is the level of feedback in the process that guests experience – they are not aware of the puppeteering in the background, just the virtual actor on the screen, performing naturalistically in real-time! We’ve not seen much of this kind of machinima before although actually Hugh Hancock did some very early work on this and of course Rooster Teeth have done live performances using similar techniques. We can certainly expect to see a lot more of it, particularly for interactive theatre, VR and AR.

Half Life 3

Half Life 3 was never going to be like the originals? This article on Tech Radar is interesting: the author (Phil Iwanuik) contends the Half Life franchise remakes would never be like the originals because the extreme attention to the world of HL created so much pressure the Valve team could never live up to it. We’re not sure about that, but it’s an interesting idea.

source: Valve

Dune: Awakening

A very impressive MMO has launched using the Dune world, currently in beta, Dune: Awakening. Here’s the trailer – we’re looking forward to seeing machinima made with this –

Dungeons & Dragons?

What does Dungeons and Dragons, typically a game played around a table, have to do with machinima? There’s been a rise in popularity of web based shows where people play the game and act out scenes. This group (Corridor Crew) is using Unreal Engine 5 for virtual production (not quite The Mandalorian but sort of similar) to put their actors, real-time, into the environments of their adventure. Check it out here –