[00:00] (0.08s)
Guys, guess what? The Minecraft project
[00:02] (2.16s)
has become so popular that it's
[00:03] (3.68s)
attracted the attention of some very
[00:05] (5.28s)
professional people. And we have worked
[00:07] (7.04s)
together to make an official movie. It's
[00:09] (9.68s)
called a Minecraft movie starring Jack
[00:11] (11.68s)
Black. I met Jack Black. He said chicken
[00:13] (13.92s)
jockey. That's all he would say. He he
[00:16] (16.08s)
wouldn't say anything else. Oh, and also
[00:18] (18.00s)
there's an official paper for Minecraft
[00:19] (19.92s)
now. Like a real scientific research
[00:22] (22.08s)
paper. It's called Collaborating Action
[00:24] (24.32s)
by action, a multi- aent LLM framework
[00:26] (26.96s)
for embodied reasoning. Catchy, I know.
[00:29] (29.36s)
It's scientific. Okay. I am a co-author
[00:31] (31.76s)
on this as well as my friend Colby, the
[00:33] (33.60s)
other original developer of Minecraft,
[00:35] (35.52s)
but most of the work was done by some
[00:37] (37.20s)
fine folks from UCSD, especially Izzy
[00:39] (39.60s)
and Aush. We are working on getting Jack
[00:41] (41.60s)
Black a credit as well. It's currently
[00:43] (43.68s)
still under peer review, but regardless,
[00:45] (45.60s)
it officially introduces Minecraft as a
[00:47] (47.92s)
research project that you can site and
[00:49] (49.68s)
build off of. So, a huge thanks to the
[00:51] (51.68s)
people involved. It was really fun
[00:53] (53.28s)
watching it all come together, and I
[00:54] (54.88s)
wanted to do a quick showcase with
[00:56] (56.64s)
speech bubbles. This is just a simple,
[00:58] (58.64s)
slightly janky speech bubble mod. It
[01:00] (60.56s)
only shows their most recent message,
[01:02] (62.08s)
which is usually only the last part of
[01:03] (63.76s)
what they said, but still looks a lot
[01:05] (65.36s)
better. We've added these things called
[01:07] (67.60s)
tasks that automatically start up a bot
[01:09] (69.76s)
with a predefined inventory and a goal
[01:11] (71.84s)
item. Like here, they're asked to craft
[01:13] (73.44s)
a Netherite block, and it will
[01:15] (75.12s)
constantly check to see if the goal is
[01:16] (76.72s)
completed. So, the moment they acquire
[01:18] (78.40s)
the item, they will shut down or
[01:20] (80.08s)
otherwise eventually time out. The focus
[01:22] (82.64s)
of the paper is on their collaborative
[01:24] (84.40s)
abilities. So many of the tasks are with
[01:26] (86.24s)
two bots that have the ingredients for a
[01:28] (88.32s)
goal item split up among their
[01:29] (89.92s)
inventories and they must communicate to
[01:32] (92.08s)
combine their inventories to craft that
[01:38] (98.04s)
item. We also have these cooking tasks
[01:40] (100.80s)
which are really cool. Aush made these.
[01:42] (102.80s)
It automatically constructs a little
[01:44] (104.56s)
cooking environment with all the crops
[01:46] (106.32s)
and animals and ingredients that they
[01:48] (108.08s)
need and gives them some goal food items
[01:50] (110.00s)
to cook. If they're clever, they can
[01:52] (112.00s)
split up the work and collect the
[01:53] (113.44s)
ingredients
[01:54] (114.66s)
[Music]
[01:57] (117.72s)
independently. When it resets, it
[01:59] (119.84s)
slaughters all of the animals just like
[02:03] (123.13s)
[Music]
[02:05] (125.08s)
cooking. And there are also
[02:07] (127.04s)
collaborative construction tasks where
[02:09] (129.04s)
they work together to build a structure.
[02:11] (131.20s)
This is very fun to watch even though
[02:12] (132.80s)
they kind of suck. These structures are
[02:15] (135.04s)
predefined. They're not free form
[02:16] (136.80s)
creative builds. They still can't really
[02:18] (138.56s)
collaborate on those. Izzy calls these
[02:20] (140.64s)
blueprints. They're essentially given a
[02:22] (142.40s)
list of blocks and locations and asked
[02:24] (144.32s)
to place the right blocks in the right
[02:26] (146.08s)
locations, which they can do using their
[02:27] (147.84s)
normal new action building. Using
[02:30] (150.08s)
predefined structures means that we can
[02:31] (151.84s)
actually measure how well they've
[02:33] (153.20s)
constructed it. These are really hard
[02:35] (155.60s)
tasks, and some of these models are
[02:37] (157.12s)
pretty dumb, so they usually don't work
[02:38] (158.64s)
together super well, especially with
[02:40] (160.16s)
more than two bots, but they can do
[02:41] (161.84s)
decently. with a very large suite of
[02:44] (164.16s)
tasks. We measured the performance of
[02:45] (165.92s)
different models working together with
[02:47] (167.76s)
clones of themselves and we can compare
[02:49] (169.76s)
their intelligence. This was done before
[02:52] (172.00s)
Gemini 2.5 or Claude 3.7. So, surprise
[02:55] (175.36s)
surprise, Claude 3.5 comes out on top.
[02:58] (178.32s)
We also show that performance drops off
[03:00] (180.32s)
considerably the more agents you add.
[03:02] (182.80s)
Also, side note, GPT40 has been updated
[03:05] (185.68s)
a lot in the past few months, and it
[03:07] (187.68s)
kind of sucks now in Minecraft at least.
[03:10] (190.08s)
Anyway, it's unrelated, just a heads up.
[03:12] (192.64s)
It takes a little elbow grease to run
[03:14] (194.48s)
the comprehensive task suite. You've got
[03:16] (196.48s)
to install Python and download some very
[03:18] (198.48s)
large JSON files and run on Unix, but
[03:20] (200.80s)
the instructions are in the repository.
[03:23] (203.04s)
So, yeah, quick video, but I wanted to
[03:24] (204.96s)
do a little bit more than a short for
[03:26] (206.32s)
this. I do have a lot of shorts showing
[03:28] (208.00s)
some of these tasks in action. You can
[03:29] (209.76s)
check those out, too, if you're curious.
[03:33] (213.71s)
[Music]
[03:42] (222.25s)
[Music]
[03:43] (223.64s)
and I'm Leave.