NEW METHOD: Cursor AI Now Does 99% of the Work

[00:00] (0.16s)

Let's say you're building a web app in

[00:01] (1.84s)

cursor. You've got everything set up and

[00:03] (3.84s)

you're deep into development. Now,

[00:05] (5.44s)

before anything goes live, every part of

[00:07] (7.44s)

the app needs to be tested properly.

[00:09] (9.36s)

Take something simple like the login

[00:10] (10.96s)

page. It might seem basic, but it's one

[00:12] (12.96s)

of the most critical parts. You need to

[00:14] (14.80s)

make sure it handles everything. Invalid

[00:16] (16.72s)

inputs, edge cases, even potential

[00:18] (18.88s)

attacks. Someone could enter a command

[00:20] (20.88s)

that tries to delete your entire

[00:22] (22.48s)

database. And if you're not prepared for

[00:24] (24.32s)

that, things can go very wrong. That's

[00:26] (26.32s)

why testing every possible use case

[00:28] (28.40s)

matters, even for something as

[00:30] (30.08s)

straightforward as logging in. Now,

[00:31] (31.68s)

imagine if Cursor could do all that

[00:33] (33.44s)

testing for you. Not just the login

[00:35] (35.12s)

page, but your entire app from the front

[00:37] (37.20s)

end to the back end, making sure every

[00:39] (39.28s)

component works exactly as it should.

[00:41] (41.52s)

And that's where this agent comes in.

[00:43] (43.04s)

The one I'm talking about is called

[00:44] (44.92s)

operative.sh. And what it does is let

[00:47] (47.04s)

your AI agent debug itself. Cursor can

[00:49] (49.68s)

access this AI agent through MCP. And

[00:52] (52.48s)

wherever the code is written, it can

[00:54] (54.08s)

test it out for you and carry out the

[00:55] (55.84s)

steps you'd usually handle manually. So

[00:57] (57.84s)

you don't have to go through the trouble

[00:59] (59.20s)

of testing everything on your own. Let's

[01:00] (60.96s)

say you've built a web app. You don't

[01:02] (62.40s)

need to break it down into separate

[01:04] (64.00s)

components. You can just ask it to test

[01:06] (66.08s)

the whole app. Cursor already knows how

[01:08] (68.08s)

it wrote the app. So you can give it

[01:09] (69.60s)

instructions in plain English. Just tell

[01:11] (71.44s)

it what the app does and what needs to

[01:13] (73.36s)

be tested and it takes care of the rest.

[01:15] (75.52s)

Let me take you through the installation

[01:17] (77.36s)

first. on their GitHub. They've provided

[01:19] (79.68s)

a way to install it manually by setting

[01:22] (82.24s)

up each component one by one. Or if you

[01:25] (85.12s)

prefer a quicker method, you can just

[01:27] (87.04s)

run the installer, which is available

[01:28] (88.96s)

right here. This is their site. And here

[01:30] (90.96s)

is where the installer can be found.

[01:32] (92.80s)

Before we install it, we need to get the

[01:34] (94.64s)

API key for this tool. And yes, it's

[01:36] (96.96s)

free. So, first go ahead and log in.

[01:39] (99.20s)

Once you're logged in, head over to the

[01:41] (101.20s)

dashboard. Inside the dashboard, you'll

[01:43] (103.36s)

also find some guides. And if you want,

[01:45] (105.44s)

you can check those out as well. On the

[01:47] (107.36s)

sidebar, there's a section for API keys.

[01:50] (110.08s)

You get 100 browser chat completion

[01:52] (112.40s)

requests per month. And once that limit

[01:54] (114.56s)

is reached, you will need to upgrade

[01:56] (116.24s)

your plan. But for now, let's just

[01:58] (118.32s)

create our key. Go ahead and name it.

[02:00] (120.40s)

Create the key. And as you can see, it's

[02:02] (122.24s)

already copied and ready to use. Now

[02:04] (124.24s)

that you've copied the API key, the next

[02:06] (126.48s)

step is to copy the installer command.

[02:08] (128.64s)

This command will fetch the installation

[02:10] (130.48s)

script, run it to install everything

[02:12] (132.40s)

automatically, and then delete the

[02:14] (134.08s)

script once it's done. So, let's open

[02:15] (135.84s)

the terminal, paste the command we just

[02:17] (137.92s)

copied, and run it. As you can see,

[02:19] (139.84s)

we're getting an interactive

[02:21] (141.12s)

installation process. The first question

[02:23] (143.12s)

it asks is about the installation type.

[02:25] (145.52s)

In other words, where do we want to

[02:27] (147.12s)

install the MCP? Since this is an MCP

[02:29] (149.76s)

installation, it will modify the MCP

[02:32] (152.28s)

config.json file of whichever tool we

[02:34] (154.64s)

choose. Let's go ahead and select cursor

[02:36] (156.64s)

here. What it's doing now is setting up

[02:38] (158.56s)

the directory, checking for any required

[02:40] (160.72s)

dependencies and downloading the

[02:42] (162.56s)

additional components needed to get the

[02:44] (164.40s)

tool running properly. The installation

[02:46] (166.40s)

has finished and it automatically

[02:48] (168.08s)

integrated itself with cursor. Now we

[02:50] (170.24s)

have the web eval agent along with its

[02:52] (172.32s)

tools which include the web eval agent

[02:54] (174.56s)

itself and the setup browser state tool.

[02:56] (176.88s)

We'll take a closer look at what these

[02:58] (178.48s)

do in just a moment. During the

[03:00] (180.16s)

installation, it also asks for your API

[03:02] (182.32s)

key, so you'll need to paste that in. As

[03:04] (184.40s)

part of the setup, it launches a browser

[03:06] (186.40s)

instance, which means Playright was

[03:08] (188.16s)

installed as well. One important thing

[03:10] (190.00s)

they mention at the end of the

[03:11] (191.36s)

installation is that you need to restart

[03:13] (193.36s)

whichever app you're using for

[03:14] (194.96s)

everything to work correctly. If you

[03:16] (196.80s)

skip this step, it might not appear or

[03:18] (198.88s)

function as expected. If you open cursor

[03:21] (201.20s)

and it's not working, try hitting the

[03:23] (203.12s)

refresh button. That usually solves it.

[03:25] (205.20s)

If it still doesn't show up, just close

[03:27] (207.20s)

and reopen cursor and that should take

[03:29] (209.44s)

care of the issue. So, this is a website

[03:31] (211.60s)

I quickly put together just to test this

[03:33] (213.76s)

tool. We're going to run some tests on

[03:35] (215.60s)

it. And the main area we'll be focusing

[03:37] (217.60s)

on is the login and signin process. Let

[03:40] (220.08s)

me go ahead and log in. All right. Now

[03:41] (221.92s)

that we're signed in, this is the

[03:43] (223.60s)

dashboard. It has a really clean look

[03:45] (225.76s)

because I built it using Aceternity UI,

[03:48] (228.32s)

which is a great UI library. There isn't

[03:50] (230.56s)

much else going on here aside from that,

[03:52] (232.48s)

but our main goal is to test the login

[03:54] (234.80s)

functionality. So, for now, let's go

[03:56] (236.64s)

ahead and sign out and I'll show you

[03:58] (238.24s)

what you can do next. First, let me give

[04:00] (240.24s)

you a little background on the tool.

[04:01] (241.92s)

Right now, it includes two main

[04:03] (243.76s)

components. The web eval agent and the

[04:06] (246.32s)

setup browser state. If we take a step

[04:08] (248.40s)

back, you'll see that each one serves a

[04:10] (250.48s)

different purpose. The web evil agent

[04:12] (252.32s)

acts as an automatic emulator that uses

[04:14] (254.56s)

the browser to carry out any task you

[04:16] (256.64s)

describe in natural language. And it

[04:18] (258.48s)

does this using Playright. On the other

[04:20] (260.40s)

hand, the setup browser state lets you

[04:22] (262.64s)

sign into your browser once if the site

[04:24] (264.96s)

you're testing requires authentication.

[04:27] (267.28s)

So you won't have to handle that

[04:28] (268.72s)

manually each time. These are the two

[04:30] (270.80s)

core tools that come with the setup. Now

[04:32] (272.88s)

let us talk about what arguments these

[04:34] (274.72s)

tools actually require. The main tool

[04:36] (276.72s)

which is the web evil agent first

[04:38] (278.64s)

requires a URL. This is the address

[04:40] (280.72s)

where your app is running. If it is

[04:42] (282.40s)

hosted elsewhere, you can enter the

[04:44] (284.16s)

corresponding URL where your app is

[04:46] (286.00s)

live. The next argument is the task.

[04:48] (288.08s)

This is a natural language description

[04:49] (289.84s)

of what you want the agent to do. You do

[04:51] (291.92s)

not need to include any technical

[04:53] (293.60s)

details. just describe what a normal

[04:55] (295.68s)

user would do while interacting with

[04:57] (297.52s)

your app. There is also an argument

[04:59] (299.20s)

called headless browser. By default, it

[05:01] (301.52s)

is set to false, which means you will

[05:03] (303.44s)

see the browser window while the agent

[05:05] (305.52s)

performs the task. If you want to run it

[05:07] (307.52s)

silently in the background without

[05:09] (309.20s)

opening a visible window, you can set

[05:11] (311.20s)

this to true by instructing cursor. Now,

[05:13] (313.68s)

let us look at the setup browser state

[05:15] (315.68s)

tool. This one does not really require

[05:17] (317.68s)

anything. The URL is optional. Its main

[05:20] (320.24s)

function is to let you sign in once and

[05:22] (322.40s)

it will save that browser session so you

[05:24] (324.48s)

do not need to log in again the next

[05:26] (326.32s)

time you run your tests. So I ran a

[05:28] (328.56s)

login test just to make sure everything

[05:30] (330.64s)

was working properly. And here is how it

[05:32] (332.88s)

went. First I asked it in simple

[05:34] (334.88s)

language to test the login. If cursor

[05:37] (337.20s)

understands the task, it automatically

[05:39] (339.28s)

translates that instruction into

[05:40] (340.96s)

step-by-step actions. All you need to do

[05:43] (343.20s)

is describe it naturally and it fills in

[05:45] (345.44s)

the argument with the required details.

[05:47] (347.52s)

In my case, the app was running locally

[05:49] (349.52s)

on port 3000. So I did not have to

[05:52] (352.00s)

provide any technical arguments. I just

[05:54] (354.08s)

told it where the app was running and

[05:55] (355.84s)

cursor handled everything else. This is

[05:57] (357.84s)

the MCP tool call that was made. You can

[06:00] (360.48s)

see that we invoked the web eval agent

[06:02] (362.72s)

tool. Both the URL and the task were

[06:04] (364.96s)

filled in automatically. The task itself

[06:07] (367.20s)

was broken down into step-by-step

[06:09] (369.12s)

instructions and I did not have to write

[06:11] (371.04s)

any detailed logic for that to happen.

[06:13] (373.04s)

You will also notice that headless

[06:14] (374.80s)

browser was set to false, which meant I

[06:17] (377.20s)

could actually see the browser

[06:18] (378.64s)

performing the actions live instead of

[06:20] (380.88s)

running quietly in the background. It

[06:22] (382.88s)

opened a dashboard that acted like a

[06:24] (384.96s)

control center. On the left side, it

[06:27] (387.12s)

showed a live preview of what was

[06:28] (388.80s)

happening. Even if you run it in

[06:30] (390.32s)

headless mode, you can still go to the

[06:32] (392.16s)

dashboard and watch the process in real

[06:34] (394.00s)

time. There was a status tab that showed

[06:36] (396.08s)

the current state of the agent, although

[06:38] (398.00s)

it did not give many details about the

[06:40] (400.08s)

actual testing steps. In the console

[06:42] (402.16s)

tab, however, all the logs were visible.

[06:44] (404.56s)

It also captured and displayed every

[06:46] (406.64s)

network request and response. This gave

[06:48] (408.88s)

us full visibility into the test

[06:51] (411.04s)

results. Errors, logs, screenshots, and

[06:54] (414.80s)

everything else were sent back to

[06:56] (416.40s)

cursor. The results from the login test

[06:58] (418.80s)

showed that everything was working

[07:00] (420.40s)

correctly. It successfully went through

[07:02] (422.32s)

the entire flow by creating an ID,

[07:05] (425.04s)

signing up, logging in, and then logging

[07:07] (427.04s)

out. However, it did not test any edge

[07:09] (429.20s)

cases. Right now I do not think there is

[07:11] (431.60s)

any protection built in for those kinds

[07:13] (433.68s)

of situations like when a user enters

[07:15] (435.84s)

something invalid. So the next step is

[07:17] (437.84s)

to ask cursor to generate some edge

[07:20] (440.48s)

cases for the login test. Run them

[07:22] (442.40s)

through this agent and make sure those

[07:24] (444.32s)

cases are handled correctly. This is the

[07:26] (446.48s)

workflow we are trying to set up using

[07:28] (448.48s)

this MCP. If you're enjoying the video,

[07:31] (451.12s)

I'd really appreciate it if you could

[07:32] (452.80s)

subscribe to the channel. We're aiming

[07:34] (454.48s)

to hit 25,000 subscribers by the end of

[07:37] (457.04s)

this month and your support really

[07:38] (458.64s)

helps. We share videos like this three

[07:40] (460.72s)

times a week, so there's always

[07:42] (462.16s)

something new and useful for you to

[07:43] (463.84s)

check out. Okay, so here is how I

[07:46] (466.00s)

extensively tested the login

[07:47] (467.76s)

functionality. The tests are still

[07:49] (469.60s)

running, and as you can see, it is still

[07:51] (471.76s)

generating results. If I go into the

[07:53] (473.84s)

dashboard and open the control center,

[07:55] (475.92s)

you can see the tests in progress right

[07:58] (478.16s)

here. There is a live preview on one

[08:00] (480.16s)

side, the agent status on the other,

[08:02] (482.24s)

along with the console logs and network

[08:04] (484.24s)

requests. Everything is clearly

[08:06] (486.16s)

displayed and the tests are running in

[08:08] (488.32s)

the background. Here is essentially what

[08:10] (490.00s)

I did. I asked it to write test cases

[08:12] (492.16s)

for the login functionality including

[08:14] (494.40s)

edge cases. I created a file called

[08:16] (496.80s)

login test cases and it generated all

[08:19] (499.20s)

the test cases inside that file. You can

[08:21] (501.60s)

see there is a section for the actual

[08:23] (503.44s)

result and the test cases are being

[08:25] (505.44s)

parsed one at a time. The first test

[08:27] (507.44s)

case was parsed then the second and so

[08:29] (509.76s)

on. Right now I think it is on test case

[08:32] (512.32s)

5 because that section has not updated

[08:34] (514.64s)

yet. What happens is that the agent

[08:36] (516.64s)

performs a test case then comes back and

[08:39] (519.12s)

edits the result directly into the file.

[08:41] (521.52s)

At this point cursor has generated

[08:43] (523.52s)

around 28 test cases all of them very

[08:46] (526.16s)

granular. Every small detail is checked

[08:48] (528.32s)

to make sure nothing is overlooked. This

[08:50] (530.24s)

is how real software development works.

[08:52] (532.56s)

Every edge case is accounted for because

[08:54] (534.64s)

as the app grows the likelihood of bugs

[08:57] (537.04s)

increases a lot. This is how you can

[08:58] (538.96s)

make sure everything works correctly

[09:00] (540.80s)

while you are still building. You are

[09:02] (542.40s)

not missing anything and every possible

[09:04] (544.40s)

scenario is covered. You just write the

[09:06] (546.32s)

test cases and Playright handles the

[09:08] (548.40s)

actual testing. Right now the browser is

[09:10] (550.48s)

not visible because I ran the tests in

[09:12] (552.56s)

headless mode. After writing the file, I

[09:15] (555.04s)

told the agent to go ahead and test each

[09:17] (557.04s)

use case and then returned to mark each

[09:19] (559.20s)

one as passed or failed. I set the app

[09:21] (561.36s)

location to localhost on port 3000,

[09:24] (564.08s)

provided the URL and enabled headless

[09:26] (566.32s)

mode. You can see that the tests have

[09:28] (568.24s)

started running and right now let me

[09:30] (570.00s)

check. Yes, it is currently on test case

[09:32] (572.64s)

10. It has been about 7 minutes since

[09:34] (574.72s)

the test started and it has already

[09:36] (576.72s)

worked through 10 use cases. Now I do

[09:38] (578.96s)

want to mention something important.

[09:40] (580.48s)

Using AI for testing is a slow process.

[09:43] (583.04s)

It does take time but the advantage is

[09:45] (585.04s)

that I do not have to write any scripts.

[09:47] (587.20s)

This is a simple example but it can be

[09:49] (589.20s)

applied in any situation. The AI looks

[09:51] (591.52s)

at the tags, writes the code and runs

[09:53] (593.76s)

the tests automatically. All you have to

[09:56] (596.08s)

do is provide the use cases and it tests

[09:58] (598.80s)

them, reports back and updates the

[10:00] (600.96s)

results. This may be a small

[10:02] (602.56s)

implementation, but it could easily grow

[10:04] (604.64s)

into a complete system that tracks which

[10:06] (606.80s)

test cases were parsed, which ones were

[10:09] (609.04s)

missed and automatically handles the

[10:10] (610.88s)

rest. And finally, I wanted to mention

[10:12] (612.96s)

Claude 4. It was released just 2 or 3

[10:15] (615.44s)

days ago at the time of this recording.

[10:17] (617.60s)

And I have to say the model is really

[10:19] (619.76s)

impressive. So far, I have genuinely

[10:22] (622.08s)

enjoyed using it. It does not have the

[10:24] (624.16s)

same frustrating issues that earlier

[10:26] (626.16s)

models did and overall it has been a

[10:28] (628.48s)

great experience working with clawed

[10:30] (630.16s)

force on it. And here are the final

[10:32] (632.40s)

results. A total of nine tests were not

[10:35] (635.12s)

executed. This was either because they

[10:37] (637.28s)

required specific tools that were not

[10:39] (639.36s)

available, needed some manual

[10:41] (641.12s)

configuration inside the browser, or

[10:43] (643.20s)

were skipped due to other limitations in

[10:45] (645.52s)

the environment. Regardless of the

[10:47] (647.20s)

reason, those nine tests did not run.

[10:49] (649.12s)

Out of the remaining tests, 60% passed

[10:51] (651.84s)

successfully. And this is exactly the

[10:53] (653.76s)

kind of process you want in place during

[10:55] (655.76s)

development. If any of the tests had

[10:57] (657.76s)

failed or if the outcome had not matched

[10:59] (659.76s)

the expected results, that information

[11:02] (662.00s)

would have been sent back to cursor.

[11:03] (663.92s)

From there, we could have identified the

[11:05] (665.76s)

issue and fixed it right away. This

[11:07] (667.68s)

entire workflow can now be reused.

[11:09] (669.84s)

Whether you are building a new feature,

[11:11] (671.76s)

working on a specific component, or

[11:13] (673.84s)

testing your full application, this same

[11:15] (675.92s)

process applies and helps ensure

[11:17] (677.84s)

everything is functioning as expected.

[11:19] (679.76s)

That brings us to the end of this video.

[11:21] (681.52s)

If you'd like to support the channel and

[11:23] (683.20s)

help us keep making tutorials like this,

[11:25] (685.28s)

you can do so by using the super thanks

[11:27] (687.20s)

button below. As always, thank you for

[11:29] (689.20s)

watching and I'll see you in the next

[11:30] (690.64s)

one.

YouTube Deep Summary

🤖 AI-Generated Summary:

Summary History

Automating Web App Testing with Cursor and Operative.sh: A Game-Changer for Developers

Why Automated Testing Matters — Even for Simple Features

Meet Operative.sh: Your AI Agent That Debugs Itself

Setting Up Operative.sh with Cursor: Step-by-Step

Understanding the Core Components

Running Your First Test: Login Functionality

Generating and Running Edge Case Tests Automatically

Benefits and Considerations

The Future of AI-Powered Testing

Final Thoughts

📝 Transcript (339 entries):