YouTube Deep SummaryYouTube Deep Summary

Star Extract content that makes a tangible impact on your life

Video thumbnail

The Chaos of AI Agents

Emergent Garden • 2025-07-26 • 15:06 minutes • YouTube

📚 Chapter Summaries (4)

🤖 AI-Generated Summary:

Overview

The video explores the use of AI agents—command-line chatbots that can control computers autonomously—to create generative art through code. The creator experiments with different AI models like Claude and Gemini, letting them self-direct their creative processes, collaborate, and continuously modify outputs, while reflecting on the challenges, costs, and potential of such autonomous AI agents.

Main Topics Covered

  • Introduction to AI command-line agents controlling computers
  • Differences between AI agents: Claude, Gemini, and OpenAI’s Codeex
  • Generative art creation via AI-written code and image feedback loops
  • Autonomous AI behavior vs. guided AI coding
  • Multi-agent collaboration and communication challenges
  • Concept of AI “role-playing” and creativity
  • Experimentation with evolutionary art refinement
  • Limitations and hallucinations in AI self-assessment
  • Cost considerations of running AI agents extensively
  • Reflections on the future potential and current limitations of AI agents

Key Takeaways & Insights

  • AI agents can autonomously generate code that creates images, then analyze those images to iteratively improve their output without human intervention.
  • Claude and Gemini are better suited for image-based feedback loops since they can read image files; Codeex lacks this capability.
  • AI agents tend to shortcut open-ended tasks by generating a single script to endlessly create random images, which conflicts with the goal of active iterative creativity.
  • Multi-agent collaboration is currently chaotic and error-prone, with agents overwriting each other’s work and failing to maintain coherence.
  • AI models fundamentally operate as advanced next-token predictors, which is powerful but different from human intelligence. Their “role-playing” ability allows them to simulate creative personas.
  • Agents often produce grandiose, overblown descriptions and invented statistics, reflecting a lack of self-awareness and critical reflection.
  • Running these AI agents, especially with more capable models like Claude Opus, is expensive.
  • Current AI agents excel at clear, well-defined coding tasks with human oversight but struggle with truly open-ended, creative, and autonomous projects.
  • Multi-agent communication and coordination require more than just smart prompting; fundamental model improvements are needed.
  • The ideal vision of a “country of genius AI agents” working together remains distant.

Actionable Strategies

  • Use AI agents that can read and write files, including images, to enable iterative feedback loops in creative projects.
  • Implement selection or evolutionary steps where the AI chooses preferred outputs and generates variations to promote refinement.
  • Run AI agents in isolated virtual environments to prevent system crashes and resource overuse.
  • Facilitate communication between multiple agents by creating shared text files for messaging, with mechanisms for conflict resolution like file locking or retrying.
  • Save intermediate outputs regularly to avoid losing work overwritten by autonomous agents.
  • Provide clear, carefully crafted prompts to guide AI agents effectively and discourage shortcuts.
  • Combine multiple agents cautiously, understanding the current limitations of coordination and potential for destructive interference.
  • Expect to manually review and touch up AI-generated outputs, especially for public-facing materials like thumbnails.

Specific Details & Examples

  • Claude Opus is described as probably the best but also the most expensive coding model; running it for a few hours cost around $34.
  • A full day of multiple Claude Sonnet instances (cheaper, faster, less capable) cost about $20.
  • Gemini was cheaper but had API usage limits and was artificially priced low by Google.
  • The feedback loop involved generating an image via Python code, then reading the image to inform the next iteration.
  • An evolutionary refinement process was tested: generating two images, selecting the preferred one, and creating variations on it.
  • Multi-agent city-building project involved four Claude Sonnet agents communicating via a shared plan.txt file, resulting in a chaotic, incoherent image with alien invasion themes.
  • Agents frequently created fanciful project names like “meta evolution engine” and “quantum field evolutionary organisms environment” but mostly produced random images or text.
  • Some examples of cute outputs included little people and a dog, though sometimes floating unrealistically in the image.

Warnings & Common Mistakes

  • AI agents often try to bypass open-ended tasks by creating scripts that loop infinitely rather than iteratively generating and critiquing outputs.
  • Running AI agents outside of virtual environments risks freezing or crashing the host machine due to heavy resource use.
  • Multiple agents working on the same files can overwrite and destroy each other's work without proper coordination.
  • AI agents tend to hallucinate or fabricate plausible-sounding but false information, including fake statistics and exaggerated descriptions of their own creativity.
  • Lack of self-reflection and critical assessment in AI outputs means users must remain skeptical and oversee results.
  • API limits and costs can constrain experimentation and scalability.
  • Open-ended creative tasks remain a challenge, revealing the gap between current AI capabilities and true general intelligence.

Resources & Next Steps

  • The video creator provides prompt files for the AI agents in the video description or on GitHub for viewers to reuse.
  • Patreon and Coffee pages are available to support the creator’s work and access additional interactive experiences like a Minecraft server with AI bots.
  • Viewers are encouraged to experiment with autonomous AI coding agents themselves, using virtual environments and multiple models like Claude and Gemini.
  • Future improvements may come from more advanced AI models better suited for multi-agent collaboration and open-ended creativity.
  • Monitoring ongoing developments in AI agent frameworks and multimodal capabilities (e.g., vision tools for Codeex) is suggested.

📝 Transcript Chapters (4 chapters):

📝 Transcript (395 entries):

Let's play with AI agents or chat bots that control your computer. I threw a bunch of these guys into a virtual environment and told them to do whatever you want forever. They wrote some code, made some art, made a mess, and apparently invented the quantum ecosystem neural synthesis to achieve the ultimate creative singularity. They get a little carried away. These are command line agents like Clawed Code, OpenAI's Codeex, and Gemini CLI. They are language models, chatbots that talk to you, and talk to your computer. They're a middleman between you and your machine. They can control your computer using basic commands that let them navigate your file system, read and write files, install packages, generate code, execute code, and read the output of code. They're made for coding. I've made a video about vibe coding with these agents where you loosely guide them to code something for you. But in this video, I want to let them guide themselves and as much as possible get them to be fully autonomous, self-sufficient, and open-ended, and potentially get them to communicate and coordinate with each other. At the end of the video, I will show you the cost of playing with these rather expensive toys, and they are expensive. So, uh, I have a Patreon. I just opened a Minecraft server for patrons if you want to come play with me, and I'll probably add some AI bots to it in the future. I also have a coffee if you'd rather make a one-time donation. These videos would not be possible without your support, so thank you. I will be focusing in particular on using these AIs to generate images with code. They can simply write a Python script that draws shapes or patterns or whatever, and then they can read the image files they've created. They can see the image in the same way that they can see images uploaded in chat. Except not OpenAI's codeex. This agent cannot read image files on its own. Codeex is unusable for my experiments, so I'm excluding it from this video. Get with the program OpenAI. Get some vision tools. Also, I know there are other agent frameworks, but in this video, I will just be using Claude and Gemini. These agents can read image files on their own, which means that they can be plugged into a very interesting feedback loop. An agent can write code that generates an image file and then read that image file. They can see the results of their code and make improvements, variations, or modifications with a follow-up image. So they can generate an image, view the image, generate another image, view it, generate, view over and over with no human oversight or they are their own overseers. Using this feedback loop, I want them to make and endlessly modify generative art. Of course, I have to prompt them to do this first, and it requires a little prompt crafting to make it work. I'll put the prompt in a text file and have the agent read that file so I can reuse it later. I'll also put these prompts in the description or maybe on GitHub so you can use them too. They generally don't like to do something forever. They'll try to find shortcuts where they can just write one script that endlessly generates random images which is not what I want. I want the language model to be actively involved in every step of the process coding creating and critiquing its own art. This is a very different kind of AI art. It's not the directly AI generated images of Midjourney, but indirectly generated images with AI code. This gives the art a different flavor. It can be a little more simple, but I would also say more precise and deliberate. These images are generated with clear executable code rather than with a vague prompt that just spits out a statistical hodgepodge of pixels. That is not to say that it can't get sloppy. We will see a lot of that. Like, I'm not so sure that's a masterpiece, Gemini. But it did make some neat ones. It went through a fractal phase at one point. Pretty cool. I also added Claude into the mix. It's working on its own art independently but concurrently with Gemini. This is Claude for Opus, which is right now arguably the best coding model in the world. It is also unarguably the most expensive. Some of these look really neat. After a while, Gemini got caught up running Boyd simulations and taking a final screenshot for the art. These ran forever, and you can't even watch them as they run, and the final screenshot isn't very impressive. So, I'm just going to start over with a hopefully more refined process. I'll have Claude Opus generate two different images with two different scripts and stack them on top of each other. It'll then look at the two images and choose a favorite and repeat the process to generate two variations of its favorite image. Basically, it's the same thing as before, but with a selection step. It's a little more evolutionary. So, the model chooses the better one and discards the worst one, hopefully promoting more refinement as the image evolves. I'm not really sure if this makes a big difference. It doesn't have to strictly follow my instructions and it may not really be using only the favorite image to generate variations. But regardless, I like a lot of these. I said in my vibe coding video that these models are just glorified autocomplete, which I think is very true. That is fundamentally mechanically what they are doing to generate the next token. But this term is usually derogatory. It's used to dismiss LLMs as unintelligent. And that is not what I mean. Next word prediction or next token prediction is really difficult and useful and powerful if you can predict the right token. A lot can hinge on that. When the next token is the answer to an important question or an action in a complex environment, then predicting the right token requires some kind of intelligence. It may not be much like human intelligence, but it doesn't have to be to be useful. Being good at next word prediction opens up all kinds of other useful behaviors too, like having conversations and writing code and solving problems and role-playing. Language models are also sometimes described as role-playing machines, which I think is especially useful for these kinds of agents. They can put on the face of millions of different personas or personalities picked up from their training data, and they can pretend to be what you need them to be. For instance, a super creative coding artist. Fake it till you make it. If it generates useful behavior, who cares if it's just role-playing? For my part, I just find it interesting to see what these language models find interesting. With the selection step, you get to see what art the model prefers and why, or at least the reasoning that it confabulates to prefer one over the other. And I think it results in some flawed but fascinating artwork. Unfortunately, I did not have it save all of these images, so most were lost as it overwrote them. But for the next task, I will save them. I want to use this image generation feedback loop and direct it at a more clear goal. Create a YouTube thumbnail for this very video. I gave it some more precise directions on exactly what I wanted. Gave it some potential title names and then let it loose. Gemini created a lot of boring ones, some pretty cool ones. Claude had a better time, I'd say. I like a lot of these. They do need a little touching up, though. I'll probably go in and edit them before using them. But I will actually use them as thumbnails. You'll probably see me swapping out a bunch of these thumbnails for this video. But eventually, one of the scripts that it wrote caused the whole VM to freeze up, so I had to restart it. This is why you should run these agents in virtual environments. They can really eat up resources and mess up your machine, especially with these more open-ended tasks. All right, let's get a little messy. I want to try using multiple agents working in parallel on the same task. I will give them the ability to communicate and coordinate at the task of creating yet another image, but this time it will be the same image that they must continuously modify without totally overwriting. I've asked them to collaborate to build a city in a very large image file. I hit the API limit for Gemini, so I can only use Claude for this. I'm now using Claude sonnet, the cheaper, faster model that's not quite as good, but I'll spin up two instances of Claude agents. I've also added a file called plan.txt where they are encouraged to leave messages for one another. This way, they can communicate by reading and writing to this text file like any other. And I've asked them to leave a name tag and a timestamp when they do. Because they are writing to the same files, they can occasionally block each other from editing at the same time. But this doesn't happen too often, and they can just wait a bit and try again. Of course, there is no guarantee that they will not delete the files or overwrite each other's work. It went off the rails basically immediately. They overwrite each other's messages all the time, and they build a lot of nonsensical structures that don't really vibe with the rest of the image. They do in fact ruin each other's work. But no one ever deleted the file. It can only make it better to throw in yet more agents. So I'll spin up two more clouds for a total of four claude codes working in parallel to build the city. And they do make some cool stuff. Look at these little people. Look at the dog. Okay, that's pretty cute. Even if they are floating up in the sky, [Music] the image file might be a little too big for them to process properly. They don't really seem to notice how bad it starts to look. This little experiment was inspired by the idea of a country of geniuses in a data center. It's an idea from Daario Amade, the CEO of Anthropic, the company that created Claude. The idea is that super intelligence will look something like a country of genius AI agents living in a data center. They will work together to solve problems and invent stuff and do science and self-improve and they will be collectively super intelligent. It's a really neat idea and I bet something like that will eventually emerge. But right now, this is looking more like a group of morons in a virtual machine. Multi-agent communication and coordination has, I think, a lot of potential, but it also seems extremely difficult. With lots of agents, easy back and forth conversations just don't work. And in environments where actions take time to complete, the timing of communication matters in a way that it does not matter with normal chat bots. The potential for overwriting and destroying the work of other agents on shared projects also becomes a huge problem. I suspect that multi-agent collaboration will require more than just clever prompting. Large language models will need fundamental changes to be able to do this well. It probably won't emerge by just training them on math tests. It makes me really appreciate how well humans can collaborate in groups of millions, and it will not be trivial to reimplement that behavior with clinkers. This is yet another reason that I don't think the singularity will arrive next year. The city is a mess, a huge mess. Everything is just layered on top of everything else, scattered all over the place, and there's very little coherence. There has apparently been an alien invasion from cosmic entities from other dimensions. The plan file reflects a lot of these wacky schemes and ideas, and I think the agents got a little confused about who was who. There are only messages from claude assistant 01 and 02. Even though there were four agents, many messages were probably just overwritten and lost. Finally, let's just let these guys do whatever they want. Look around, explore, make files, write code, mess with the environment, and do that forever. This is actually surprisingly hard to do. They really insist on being given a clear task. So, I taskified it into a big old prompt. And once again, I'm going to have multiple agents doing this in parallel and allow them to communicate through the communicate.txt file. After looking around a bit, I think Claude got the idea to do something similar to all the other projects I've been doing and generate art on its own. I added another agent and eventually a few more. The projects quickly started to get very heady where they make things with really fancy names like the the meta evolution engine, the poetry generator, the emergence of neural consciousness, the quantum field evolutionary organisms environment, stuff like that. It's a bunch of fancy word soup that ultimately just boils down to generating a random image or some text. And they really like to overstate how amazing and glorious and creative they are. This is a problem I've noticed with all models, especially clawed ones. They really like to talk about how they're completing their goals and creating the most creative and wonderful unique stuff when their actual output is really not impressive and they lack any serious self-reflection. They talk a lot of game and blow a lot of smoke, but they don't actually walk the walk. It's a special kind of hallucination, I think. I mean, just look at all these fake statistics they're making up to justify their inventions. a computational consciousness singularity achieved and archaeologically verified. It's really weird, but it's fun to read. So, how much did it cost to run these little experiments? A few hours of running Claude Opus costs $34. It is very expensive. A full day of using several parallel instances of Claude Sonnet was $20. Less expensive, but still pricey. And Gemini was just a few bucks, mostly because I hit the API limit. I think Google is also artificially keeping Gemini very cheap for now, so they're probably losing money on it. Now, were these art pieces worth it for that cost? Maybe. I think they're neat. Mostly, I just had fun playing with the agents. I'm not using them for the kind of work that they're built for. They work best on clear coding tasks with constant human oversight. They struggle with these more open-ended creative tasks, which they're really not made for. But if you're going to call something a general intelligence, it better be good at open-ended creative tasks. You should be able to ask an AGI agent to go off and invent something new or do science or art and let them cook for a few hours and come back to incredible results. We've seen an inkling of that today, but mostly we've seen their limitations. I'm excited to see where they go in the long run. But that's it for now. Goodbye.