[00:00] (0.16s)
Let's say you're building a web app in
[00:01] (1.84s)
cursor. You've got everything set up and
[00:03] (3.84s)
you're deep into development. Now,
[00:05] (5.44s)
before anything goes live, every part of
[00:07] (7.44s)
the app needs to be tested properly.
[00:09] (9.36s)
Take something simple like the login
[00:10] (10.96s)
page. It might seem basic, but it's one
[00:12] (12.96s)
of the most critical parts. You need to
[00:14] (14.80s)
make sure it handles everything. Invalid
[00:16] (16.72s)
inputs, edge cases, even potential
[00:18] (18.88s)
attacks. Someone could enter a command
[00:20] (20.88s)
that tries to delete your entire
[00:22] (22.48s)
database. And if you're not prepared for
[00:24] (24.32s)
that, things can go very wrong. That's
[00:26] (26.32s)
why testing every possible use case
[00:28] (28.40s)
matters, even for something as
[00:30] (30.08s)
straightforward as logging in. Now,
[00:31] (31.68s)
imagine if Cursor could do all that
[00:33] (33.44s)
testing for you. Not just the login
[00:35] (35.12s)
page, but your entire app from the front
[00:37] (37.20s)
end to the back end, making sure every
[00:39] (39.28s)
component works exactly as it should.
[00:41] (41.52s)
And that's where this agent comes in.
[00:43] (43.04s)
The one I'm talking about is called
[00:44] (44.92s)
operative.sh. And what it does is let
[00:47] (47.04s)
your AI agent debug itself. Cursor can
[00:49] (49.68s)
access this AI agent through MCP. And
[00:52] (52.48s)
wherever the code is written, it can
[00:54] (54.08s)
test it out for you and carry out the
[00:55] (55.84s)
steps you'd usually handle manually. So
[00:57] (57.84s)
you don't have to go through the trouble
[00:59] (59.20s)
of testing everything on your own. Let's
[01:00] (60.96s)
say you've built a web app. You don't
[01:02] (62.40s)
need to break it down into separate
[01:04] (64.00s)
components. You can just ask it to test
[01:06] (66.08s)
the whole app. Cursor already knows how
[01:08] (68.08s)
it wrote the app. So you can give it
[01:09] (69.60s)
instructions in plain English. Just tell
[01:11] (71.44s)
it what the app does and what needs to
[01:13] (73.36s)
be tested and it takes care of the rest.
[01:15] (75.52s)
Let me take you through the installation
[01:17] (77.36s)
first. on their GitHub. They've provided
[01:19] (79.68s)
a way to install it manually by setting
[01:22] (82.24s)
up each component one by one. Or if you
[01:25] (85.12s)
prefer a quicker method, you can just
[01:27] (87.04s)
run the installer, which is available
[01:28] (88.96s)
right here. This is their site. And here
[01:30] (90.96s)
is where the installer can be found.
[01:32] (92.80s)
Before we install it, we need to get the
[01:34] (94.64s)
API key for this tool. And yes, it's
[01:36] (96.96s)
free. So, first go ahead and log in.
[01:39] (99.20s)
Once you're logged in, head over to the
[01:41] (101.20s)
dashboard. Inside the dashboard, you'll
[01:43] (103.36s)
also find some guides. And if you want,
[01:45] (105.44s)
you can check those out as well. On the
[01:47] (107.36s)
sidebar, there's a section for API keys.
[01:50] (110.08s)
You get 100 browser chat completion
[01:52] (112.40s)
requests per month. And once that limit
[01:54] (114.56s)
is reached, you will need to upgrade
[01:56] (116.24s)
your plan. But for now, let's just
[01:58] (118.32s)
create our key. Go ahead and name it.
[02:00] (120.40s)
Create the key. And as you can see, it's
[02:02] (122.24s)
already copied and ready to use. Now
[02:04] (124.24s)
that you've copied the API key, the next
[02:06] (126.48s)
step is to copy the installer command.
[02:08] (128.64s)
This command will fetch the installation
[02:10] (130.48s)
script, run it to install everything
[02:12] (132.40s)
automatically, and then delete the
[02:14] (134.08s)
script once it's done. So, let's open
[02:15] (135.84s)
the terminal, paste the command we just
[02:17] (137.92s)
copied, and run it. As you can see,
[02:19] (139.84s)
we're getting an interactive
[02:21] (141.12s)
installation process. The first question
[02:23] (143.12s)
it asks is about the installation type.
[02:25] (145.52s)
In other words, where do we want to
[02:27] (147.12s)
install the MCP? Since this is an MCP
[02:29] (149.76s)
installation, it will modify the MCP
[02:32] (152.28s)
config.json file of whichever tool we
[02:34] (154.64s)
choose. Let's go ahead and select cursor
[02:36] (156.64s)
here. What it's doing now is setting up
[02:38] (158.56s)
the directory, checking for any required
[02:40] (160.72s)
dependencies and downloading the
[02:42] (162.56s)
additional components needed to get the
[02:44] (164.40s)
tool running properly. The installation
[02:46] (166.40s)
has finished and it automatically
[02:48] (168.08s)
integrated itself with cursor. Now we
[02:50] (170.24s)
have the web eval agent along with its
[02:52] (172.32s)
tools which include the web eval agent
[02:54] (174.56s)
itself and the setup browser state tool.
[02:56] (176.88s)
We'll take a closer look at what these
[02:58] (178.48s)
do in just a moment. During the
[03:00] (180.16s)
installation, it also asks for your API
[03:02] (182.32s)
key, so you'll need to paste that in. As
[03:04] (184.40s)
part of the setup, it launches a browser
[03:06] (186.40s)
instance, which means Playright was
[03:08] (188.16s)
installed as well. One important thing
[03:10] (190.00s)
they mention at the end of the
[03:11] (191.36s)
installation is that you need to restart
[03:13] (193.36s)
whichever app you're using for
[03:14] (194.96s)
everything to work correctly. If you
[03:16] (196.80s)
skip this step, it might not appear or
[03:18] (198.88s)
function as expected. If you open cursor
[03:21] (201.20s)
and it's not working, try hitting the
[03:23] (203.12s)
refresh button. That usually solves it.
[03:25] (205.20s)
If it still doesn't show up, just close
[03:27] (207.20s)
and reopen cursor and that should take
[03:29] (209.44s)
care of the issue. So, this is a website
[03:31] (211.60s)
I quickly put together just to test this
[03:33] (213.76s)
tool. We're going to run some tests on
[03:35] (215.60s)
it. And the main area we'll be focusing
[03:37] (217.60s)
on is the login and signin process. Let
[03:40] (220.08s)
me go ahead and log in. All right. Now
[03:41] (221.92s)
that we're signed in, this is the
[03:43] (223.60s)
dashboard. It has a really clean look
[03:45] (225.76s)
because I built it using Aceternity UI,
[03:48] (228.32s)
which is a great UI library. There isn't
[03:50] (230.56s)
much else going on here aside from that,
[03:52] (232.48s)
but our main goal is to test the login
[03:54] (234.80s)
functionality. So, for now, let's go
[03:56] (236.64s)
ahead and sign out and I'll show you
[03:58] (238.24s)
what you can do next. First, let me give
[04:00] (240.24s)
you a little background on the tool.
[04:01] (241.92s)
Right now, it includes two main
[04:03] (243.76s)
components. The web eval agent and the
[04:06] (246.32s)
setup browser state. If we take a step
[04:08] (248.40s)
back, you'll see that each one serves a
[04:10] (250.48s)
different purpose. The web evil agent
[04:12] (252.32s)
acts as an automatic emulator that uses
[04:14] (254.56s)
the browser to carry out any task you
[04:16] (256.64s)
describe in natural language. And it
[04:18] (258.48s)
does this using Playright. On the other
[04:20] (260.40s)
hand, the setup browser state lets you
[04:22] (262.64s)
sign into your browser once if the site
[04:24] (264.96s)
you're testing requires authentication.
[04:27] (267.28s)
So you won't have to handle that
[04:28] (268.72s)
manually each time. These are the two
[04:30] (270.80s)
core tools that come with the setup. Now
[04:32] (272.88s)
let us talk about what arguments these
[04:34] (274.72s)
tools actually require. The main tool
[04:36] (276.72s)
which is the web evil agent first
[04:38] (278.64s)
requires a URL. This is the address
[04:40] (280.72s)
where your app is running. If it is
[04:42] (282.40s)
hosted elsewhere, you can enter the
[04:44] (284.16s)
corresponding URL where your app is
[04:46] (286.00s)
live. The next argument is the task.
[04:48] (288.08s)
This is a natural language description
[04:49] (289.84s)
of what you want the agent to do. You do
[04:51] (291.92s)
not need to include any technical
[04:53] (293.60s)
details. just describe what a normal
[04:55] (295.68s)
user would do while interacting with
[04:57] (297.52s)
your app. There is also an argument
[04:59] (299.20s)
called headless browser. By default, it
[05:01] (301.52s)
is set to false, which means you will
[05:03] (303.44s)
see the browser window while the agent
[05:05] (305.52s)
performs the task. If you want to run it
[05:07] (307.52s)
silently in the background without
[05:09] (309.20s)
opening a visible window, you can set
[05:11] (311.20s)
this to true by instructing cursor. Now,
[05:13] (313.68s)
let us look at the setup browser state
[05:15] (315.68s)
tool. This one does not really require
[05:17] (317.68s)
anything. The URL is optional. Its main
[05:20] (320.24s)
function is to let you sign in once and
[05:22] (322.40s)
it will save that browser session so you
[05:24] (324.48s)
do not need to log in again the next
[05:26] (326.32s)
time you run your tests. So I ran a
[05:28] (328.56s)
login test just to make sure everything
[05:30] (330.64s)
was working properly. And here is how it
[05:32] (332.88s)
went. First I asked it in simple
[05:34] (334.88s)
language to test the login. If cursor
[05:37] (337.20s)
understands the task, it automatically
[05:39] (339.28s)
translates that instruction into
[05:40] (340.96s)
step-by-step actions. All you need to do
[05:43] (343.20s)
is describe it naturally and it fills in
[05:45] (345.44s)
the argument with the required details.
[05:47] (347.52s)
In my case, the app was running locally
[05:49] (349.52s)
on port 3000. So I did not have to
[05:52] (352.00s)
provide any technical arguments. I just
[05:54] (354.08s)
told it where the app was running and
[05:55] (355.84s)
cursor handled everything else. This is
[05:57] (357.84s)
the MCP tool call that was made. You can
[06:00] (360.48s)
see that we invoked the web eval agent
[06:02] (362.72s)
tool. Both the URL and the task were
[06:04] (364.96s)
filled in automatically. The task itself
[06:07] (367.20s)
was broken down into step-by-step
[06:09] (369.12s)
instructions and I did not have to write
[06:11] (371.04s)
any detailed logic for that to happen.
[06:13] (373.04s)
You will also notice that headless
[06:14] (374.80s)
browser was set to false, which meant I
[06:17] (377.20s)
could actually see the browser
[06:18] (378.64s)
performing the actions live instead of
[06:20] (380.88s)
running quietly in the background. It
[06:22] (382.88s)
opened a dashboard that acted like a
[06:24] (384.96s)
control center. On the left side, it
[06:27] (387.12s)
showed a live preview of what was
[06:28] (388.80s)
happening. Even if you run it in
[06:30] (390.32s)
headless mode, you can still go to the
[06:32] (392.16s)
dashboard and watch the process in real
[06:34] (394.00s)
time. There was a status tab that showed
[06:36] (396.08s)
the current state of the agent, although
[06:38] (398.00s)
it did not give many details about the
[06:40] (400.08s)
actual testing steps. In the console
[06:42] (402.16s)
tab, however, all the logs were visible.
[06:44] (404.56s)
It also captured and displayed every
[06:46] (406.64s)
network request and response. This gave
[06:48] (408.88s)
us full visibility into the test
[06:51] (411.04s)
results. Errors, logs, screenshots, and
[06:54] (414.80s)
everything else were sent back to
[06:56] (416.40s)
cursor. The results from the login test
[06:58] (418.80s)
showed that everything was working
[07:00] (420.40s)
correctly. It successfully went through
[07:02] (422.32s)
the entire flow by creating an ID,
[07:05] (425.04s)
signing up, logging in, and then logging
[07:07] (427.04s)
out. However, it did not test any edge
[07:09] (429.20s)
cases. Right now I do not think there is
[07:11] (431.60s)
any protection built in for those kinds
[07:13] (433.68s)
of situations like when a user enters
[07:15] (435.84s)
something invalid. So the next step is
[07:17] (437.84s)
to ask cursor to generate some edge
[07:20] (440.48s)
cases for the login test. Run them
[07:22] (442.40s)
through this agent and make sure those
[07:24] (444.32s)
cases are handled correctly. This is the
[07:26] (446.48s)
workflow we are trying to set up using
[07:28] (448.48s)
this MCP. If you're enjoying the video,
[07:31] (451.12s)
I'd really appreciate it if you could
[07:32] (452.80s)
subscribe to the channel. We're aiming
[07:34] (454.48s)
to hit 25,000 subscribers by the end of
[07:37] (457.04s)
this month and your support really
[07:38] (458.64s)
helps. We share videos like this three
[07:40] (460.72s)
times a week, so there's always
[07:42] (462.16s)
something new and useful for you to
[07:43] (463.84s)
check out. Okay, so here is how I
[07:46] (466.00s)
extensively tested the login
[07:47] (467.76s)
functionality. The tests are still
[07:49] (469.60s)
running, and as you can see, it is still
[07:51] (471.76s)
generating results. If I go into the
[07:53] (473.84s)
dashboard and open the control center,
[07:55] (475.92s)
you can see the tests in progress right
[07:58] (478.16s)
here. There is a live preview on one
[08:00] (480.16s)
side, the agent status on the other,
[08:02] (482.24s)
along with the console logs and network
[08:04] (484.24s)
requests. Everything is clearly
[08:06] (486.16s)
displayed and the tests are running in
[08:08] (488.32s)
the background. Here is essentially what
[08:10] (490.00s)
I did. I asked it to write test cases
[08:12] (492.16s)
for the login functionality including
[08:14] (494.40s)
edge cases. I created a file called
[08:16] (496.80s)
login test cases and it generated all
[08:19] (499.20s)
the test cases inside that file. You can
[08:21] (501.60s)
see there is a section for the actual
[08:23] (503.44s)
result and the test cases are being
[08:25] (505.44s)
parsed one at a time. The first test
[08:27] (507.44s)
case was parsed then the second and so
[08:29] (509.76s)
on. Right now I think it is on test case
[08:32] (512.32s)
5 because that section has not updated
[08:34] (514.64s)
yet. What happens is that the agent
[08:36] (516.64s)
performs a test case then comes back and
[08:39] (519.12s)
edits the result directly into the file.
[08:41] (521.52s)
At this point cursor has generated
[08:43] (523.52s)
around 28 test cases all of them very
[08:46] (526.16s)
granular. Every small detail is checked
[08:48] (528.32s)
to make sure nothing is overlooked. This
[08:50] (530.24s)
is how real software development works.
[08:52] (532.56s)
Every edge case is accounted for because
[08:54] (534.64s)
as the app grows the likelihood of bugs
[08:57] (537.04s)
increases a lot. This is how you can
[08:58] (538.96s)
make sure everything works correctly
[09:00] (540.80s)
while you are still building. You are
[09:02] (542.40s)
not missing anything and every possible
[09:04] (544.40s)
scenario is covered. You just write the
[09:06] (546.32s)
test cases and Playright handles the
[09:08] (548.40s)
actual testing. Right now the browser is
[09:10] (550.48s)
not visible because I ran the tests in
[09:12] (552.56s)
headless mode. After writing the file, I
[09:15] (555.04s)
told the agent to go ahead and test each
[09:17] (557.04s)
use case and then returned to mark each
[09:19] (559.20s)
one as passed or failed. I set the app
[09:21] (561.36s)
location to localhost on port 3000,
[09:24] (564.08s)
provided the URL and enabled headless
[09:26] (566.32s)
mode. You can see that the tests have
[09:28] (568.24s)
started running and right now let me
[09:30] (570.00s)
check. Yes, it is currently on test case
[09:32] (572.64s)
10. It has been about 7 minutes since
[09:34] (574.72s)
the test started and it has already
[09:36] (576.72s)
worked through 10 use cases. Now I do
[09:38] (578.96s)
want to mention something important.
[09:40] (580.48s)
Using AI for testing is a slow process.
[09:43] (583.04s)
It does take time but the advantage is
[09:45] (585.04s)
that I do not have to write any scripts.
[09:47] (587.20s)
This is a simple example but it can be
[09:49] (589.20s)
applied in any situation. The AI looks
[09:51] (591.52s)
at the tags, writes the code and runs
[09:53] (593.76s)
the tests automatically. All you have to
[09:56] (596.08s)
do is provide the use cases and it tests
[09:58] (598.80s)
them, reports back and updates the
[10:00] (600.96s)
results. This may be a small
[10:02] (602.56s)
implementation, but it could easily grow
[10:04] (604.64s)
into a complete system that tracks which
[10:06] (606.80s)
test cases were parsed, which ones were
[10:09] (609.04s)
missed and automatically handles the
[10:10] (610.88s)
rest. And finally, I wanted to mention
[10:12] (612.96s)
Claude 4. It was released just 2 or 3
[10:15] (615.44s)
days ago at the time of this recording.
[10:17] (617.60s)
And I have to say the model is really
[10:19] (619.76s)
impressive. So far, I have genuinely
[10:22] (622.08s)
enjoyed using it. It does not have the
[10:24] (624.16s)
same frustrating issues that earlier
[10:26] (626.16s)
models did and overall it has been a
[10:28] (628.48s)
great experience working with clawed
[10:30] (630.16s)
force on it. And here are the final
[10:32] (632.40s)
results. A total of nine tests were not
[10:35] (635.12s)
executed. This was either because they
[10:37] (637.28s)
required specific tools that were not
[10:39] (639.36s)
available, needed some manual
[10:41] (641.12s)
configuration inside the browser, or
[10:43] (643.20s)
were skipped due to other limitations in
[10:45] (645.52s)
the environment. Regardless of the
[10:47] (647.20s)
reason, those nine tests did not run.
[10:49] (649.12s)
Out of the remaining tests, 60% passed
[10:51] (651.84s)
successfully. And this is exactly the
[10:53] (653.76s)
kind of process you want in place during
[10:55] (655.76s)
development. If any of the tests had
[10:57] (657.76s)
failed or if the outcome had not matched
[10:59] (659.76s)
the expected results, that information
[11:02] (662.00s)
would have been sent back to cursor.
[11:03] (663.92s)
From there, we could have identified the
[11:05] (665.76s)
issue and fixed it right away. This
[11:07] (667.68s)
entire workflow can now be reused.
[11:09] (669.84s)
Whether you are building a new feature,
[11:11] (671.76s)
working on a specific component, or
[11:13] (673.84s)
testing your full application, this same
[11:15] (675.92s)
process applies and helps ensure
[11:17] (677.84s)
everything is functioning as expected.
[11:19] (679.76s)
That brings us to the end of this video.
[11:21] (681.52s)
If you'd like to support the channel and
[11:23] (683.20s)
help us keep making tutorials like this,
[11:25] (685.28s)
you can do so by using the super thanks
[11:27] (687.20s)
button below. As always, thank you for
[11:29] (689.20s)
watching and I'll see you in the next