YouTube Deep SummaryYouTube Deep Summary

Star Extract content that makes a tangible impact on your life

Video thumbnail

NEW METHOD: Cursor AI Now Does 99% of the Work

AI LABS β€’ 2025-05-27 β€’ 11:34 minutes β€’ YouTube

πŸ€– AI-Generated Summary:

Automating Web App Testing with Cursor and Operative.sh: A Game-Changer for Developers

When building a web application, thorough testing is crucial before launching to ensure every feature works flawlessly and securely. Even something as seemingly simple as a login page deserves comprehensive testing β€” from handling invalid inputs to defending against potential attacks like SQL injection. But manually writing and running all these tests can be tedious and time-consuming.

What if you could automate this entire testing process β€” not just for the login page but for your whole app β€” using natural language instructions? Enter the powerful combination of Cursor, operative.sh, and AI-driven testing agents that debug and validate your app autonomously.

In this post, I’ll walk you through how to set up and use this innovative testing workflow, how it works under the hood, and why it could revolutionize your development process.


Why Automated Testing Matters β€” Even for Simple Features

Take the login page as an example. Users can enter all kinds of unexpected inputs, from typos to malicious code. Without proper testing, your app could crash or, worse, become vulnerable to attacks that compromise your entire database.

Testing every possible use case manually is tedious, error-prone, and slows down development. Automating tests with smart agents that understand your app and can execute user-like interactions can save you precious time and prevent costly bugs.


Meet Operative.sh: Your AI Agent That Debugs Itself

Operative.sh is an AI-powered agent accessible via Cursor’s MCP (Modular Control Panel) system. It can:

  • Understand natural language instructions about what your app does.
  • Automatically generate detailed test cases, including edge cases.
  • Execute tests in a real browser environment using Playwright.
  • Report back detailed logs, errors, screenshots, and network activity.
  • Update test result files autonomously.

All you need to do is tell it what to test in plain English, and it handles the how.


Setting Up Operative.sh with Cursor: Step-by-Step

  1. Get the API Key
    Visit operative.sh’s website and sign up (it’s free). On the dashboard, navigate to the API keys section and generate your key. You get 100 free browser chat completion requests per month.

  2. Install the Agent
    Use the provided installer script to install operative.sh and its components automatically. The installer sets up dependencies, integrates with Cursor’s MCP config, and installs Playwright to enable browser automation.

  3. Configure the Agent in Cursor
    During installation, paste your API key when prompted. After installation, restart Cursor or refresh the app to activate the new agent.


Understanding the Core Components

  • Web Eval Agent
    This is the main tool that runs your tests by emulating user interactions in a browser. It accepts:
  • A URL where your app is hosted.
  • A natural language task description of what to test.
  • A β€œheadless” flag to run tests visibly or silently in the background.

  • Setup Browser State
    This tool lets you sign in once to your app’s browser session so the Web Eval Agent can reuse the authenticated state across multiple tests, avoiding repetitive manual logins.


Running Your First Test: Login Functionality

After setting up, I tested a simple web app's login flow:

  • I told the agent in plain English: β€œTest the login functionality.”
  • Cursor translated this into step-by-step browser actions like creating a user ID, signing up, logging in, and logging out.
  • The agent ran the test live (headless mode off), showing the browser actions and logging every network request and response.
  • The test passed successfully, confirming the basic flow works.

Generating and Running Edge Case Tests Automatically

Next, I asked Cursor to generate comprehensive edge cases for the login:

  • The agent created about 28 detailed test cases covering invalid inputs and unusual user behaviors.
  • Each test case was executed automatically in headless mode.
  • The agent updated a test case file with pass/fail results in real time.
  • Logs and screenshots provided full visibility into what happened.

This approach ensures that no corner cases are missed β€” essential as your app grows more complex.


Benefits and Considerations

  • No Manual Script Writing: The AI writes test scripts and runs them based on your natural language descriptions.
  • Full Visibility: Access live previews, logs, network traffic, and error reports.
  • Reusability: Easily rerun tests anytime you add features or fix bugs.
  • Slower but Smarter: AI-driven testing takes longer than simple scripted tests but saves developer time and improves coverage.

The Future of AI-Powered Testing

This setup is just the beginning. Imagine a fully automated system that tracks which tests passed, which need manual review, and integrates seamlessly with your CI/CD pipeline.

Also, with advancements like Claude 4, the AI model powering these agents, testing becomes more reliable and less frustrating than older tools.


Final Thoughts

Automated testing with Cursor and operative.sh empowers developers to focus on building features rather than writing exhaustive test scripts. By leveraging AI agents that understand natural language and emulate real user interactions, you can catch bugs early, cover edge cases, and ship high-quality apps faster.

If you’re a web developer looking to streamline your testing process, give this approach a try. It’s a glimpse into the future of software development β€” smarter, faster, and more efficient.


Enjoyed this overview? Subscribe to stay updated with more tutorials and deep dives into cutting-edge development tools and workflows!


Happy coding and testing!


πŸ“ Transcript (339 entries):

Let's say you're building a web app in cursor. You've got everything set up and you're deep into development. Now, before anything goes live, every part of the app needs to be tested properly. Take something simple like the login page. It might seem basic, but it's one of the most critical parts. You need to make sure it handles everything. Invalid inputs, edge cases, even potential attacks. Someone could enter a command that tries to delete your entire database. And if you're not prepared for that, things can go very wrong. That's why testing every possible use case matters, even for something as straightforward as logging in. Now, imagine if Cursor could do all that testing for you. Not just the login page, but your entire app from the front end to the back end, making sure every component works exactly as it should. And that's where this agent comes in. The one I'm talking about is called operative.sh. And what it does is let your AI agent debug itself. Cursor can access this AI agent through MCP. And wherever the code is written, it can test it out for you and carry out the steps you'd usually handle manually. So you don't have to go through the trouble of testing everything on your own. Let's say you've built a web app. You don't need to break it down into separate components. You can just ask it to test the whole app. Cursor already knows how it wrote the app. So you can give it instructions in plain English. Just tell it what the app does and what needs to be tested and it takes care of the rest. Let me take you through the installation first. on their GitHub. They've provided a way to install it manually by setting up each component one by one. Or if you prefer a quicker method, you can just run the installer, which is available right here. This is their site. And here is where the installer can be found. Before we install it, we need to get the API key for this tool. And yes, it's free. So, first go ahead and log in. Once you're logged in, head over to the dashboard. Inside the dashboard, you'll also find some guides. And if you want, you can check those out as well. On the sidebar, there's a section for API keys. You get 100 browser chat completion requests per month. And once that limit is reached, you will need to upgrade your plan. But for now, let's just create our key. Go ahead and name it. Create the key. And as you can see, it's already copied and ready to use. Now that you've copied the API key, the next step is to copy the installer command. This command will fetch the installation script, run it to install everything automatically, and then delete the script once it's done. So, let's open the terminal, paste the command we just copied, and run it. As you can see, we're getting an interactive installation process. The first question it asks is about the installation type. In other words, where do we want to install the MCP? Since this is an MCP installation, it will modify the MCP config.json file of whichever tool we choose. Let's go ahead and select cursor here. What it's doing now is setting up the directory, checking for any required dependencies and downloading the additional components needed to get the tool running properly. The installation has finished and it automatically integrated itself with cursor. Now we have the web eval agent along with its tools which include the web eval agent itself and the setup browser state tool. We'll take a closer look at what these do in just a moment. During the installation, it also asks for your API key, so you'll need to paste that in. As part of the setup, it launches a browser instance, which means Playright was installed as well. One important thing they mention at the end of the installation is that you need to restart whichever app you're using for everything to work correctly. If you skip this step, it might not appear or function as expected. If you open cursor and it's not working, try hitting the refresh button. That usually solves it. If it still doesn't show up, just close and reopen cursor and that should take care of the issue. So, this is a website I quickly put together just to test this tool. We're going to run some tests on it. And the main area we'll be focusing on is the login and signin process. Let me go ahead and log in. All right. Now that we're signed in, this is the dashboard. It has a really clean look because I built it using Aceternity UI, which is a great UI library. There isn't much else going on here aside from that, but our main goal is to test the login functionality. So, for now, let's go ahead and sign out and I'll show you what you can do next. First, let me give you a little background on the tool. Right now, it includes two main components. The web eval agent and the setup browser state. If we take a step back, you'll see that each one serves a different purpose. The web evil agent acts as an automatic emulator that uses the browser to carry out any task you describe in natural language. And it does this using Playright. On the other hand, the setup browser state lets you sign into your browser once if the site you're testing requires authentication. So you won't have to handle that manually each time. These are the two core tools that come with the setup. Now let us talk about what arguments these tools actually require. The main tool which is the web evil agent first requires a URL. This is the address where your app is running. If it is hosted elsewhere, you can enter the corresponding URL where your app is live. The next argument is the task. This is a natural language description of what you want the agent to do. You do not need to include any technical details. just describe what a normal user would do while interacting with your app. There is also an argument called headless browser. By default, it is set to false, which means you will see the browser window while the agent performs the task. If you want to run it silently in the background without opening a visible window, you can set this to true by instructing cursor. Now, let us look at the setup browser state tool. This one does not really require anything. The URL is optional. Its main function is to let you sign in once and it will save that browser session so you do not need to log in again the next time you run your tests. So I ran a login test just to make sure everything was working properly. And here is how it went. First I asked it in simple language to test the login. If cursor understands the task, it automatically translates that instruction into step-by-step actions. All you need to do is describe it naturally and it fills in the argument with the required details. In my case, the app was running locally on port 3000. So I did not have to provide any technical arguments. I just told it where the app was running and cursor handled everything else. This is the MCP tool call that was made. You can see that we invoked the web eval agent tool. Both the URL and the task were filled in automatically. The task itself was broken down into step-by-step instructions and I did not have to write any detailed logic for that to happen. You will also notice that headless browser was set to false, which meant I could actually see the browser performing the actions live instead of running quietly in the background. It opened a dashboard that acted like a control center. On the left side, it showed a live preview of what was happening. Even if you run it in headless mode, you can still go to the dashboard and watch the process in real time. There was a status tab that showed the current state of the agent, although it did not give many details about the actual testing steps. In the console tab, however, all the logs were visible. It also captured and displayed every network request and response. This gave us full visibility into the test results. Errors, logs, screenshots, and everything else were sent back to cursor. The results from the login test showed that everything was working correctly. It successfully went through the entire flow by creating an ID, signing up, logging in, and then logging out. However, it did not test any edge cases. Right now I do not think there is any protection built in for those kinds of situations like when a user enters something invalid. So the next step is to ask cursor to generate some edge cases for the login test. Run them through this agent and make sure those cases are handled correctly. This is the workflow we are trying to set up using this MCP. If you're enjoying the video, I'd really appreciate it if you could subscribe to the channel. We're aiming to hit 25,000 subscribers by the end of this month and your support really helps. We share videos like this three times a week, so there's always something new and useful for you to check out. Okay, so here is how I extensively tested the login functionality. The tests are still running, and as you can see, it is still generating results. If I go into the dashboard and open the control center, you can see the tests in progress right here. There is a live preview on one side, the agent status on the other, along with the console logs and network requests. Everything is clearly displayed and the tests are running in the background. Here is essentially what I did. I asked it to write test cases for the login functionality including edge cases. I created a file called login test cases and it generated all the test cases inside that file. You can see there is a section for the actual result and the test cases are being parsed one at a time. The first test case was parsed then the second and so on. Right now I think it is on test case 5 because that section has not updated yet. What happens is that the agent performs a test case then comes back and edits the result directly into the file. At this point cursor has generated around 28 test cases all of them very granular. Every small detail is checked to make sure nothing is overlooked. This is how real software development works. Every edge case is accounted for because as the app grows the likelihood of bugs increases a lot. This is how you can make sure everything works correctly while you are still building. You are not missing anything and every possible scenario is covered. You just write the test cases and Playright handles the actual testing. Right now the browser is not visible because I ran the tests in headless mode. After writing the file, I told the agent to go ahead and test each use case and then returned to mark each one as passed or failed. I set the app location to localhost on port 3000, provided the URL and enabled headless mode. You can see that the tests have started running and right now let me check. Yes, it is currently on test case 10. It has been about 7 minutes since the test started and it has already worked through 10 use cases. Now I do want to mention something important. Using AI for testing is a slow process. It does take time but the advantage is that I do not have to write any scripts. This is a simple example but it can be applied in any situation. The AI looks at the tags, writes the code and runs the tests automatically. All you have to do is provide the use cases and it tests them, reports back and updates the results. This may be a small implementation, but it could easily grow into a complete system that tracks which test cases were parsed, which ones were missed and automatically handles the rest. And finally, I wanted to mention Claude 4. It was released just 2 or 3 days ago at the time of this recording. And I have to say the model is really impressive. So far, I have genuinely enjoyed using it. It does not have the same frustrating issues that earlier models did and overall it has been a great experience working with clawed force on it. And here are the final results. A total of nine tests were not executed. This was either because they required specific tools that were not available, needed some manual configuration inside the browser, or were skipped due to other limitations in the environment. Regardless of the reason, those nine tests did not run. Out of the remaining tests, 60% passed successfully. And this is exactly the kind of process you want in place during development. If any of the tests had failed or if the outcome had not matched the expected results, that information would have been sent back to cursor. From there, we could have identified the issue and fixed it right away. This entire workflow can now be reused. Whether you are building a new feature, working on a specific component, or testing your full application, this same process applies and helps ensure everything is functioning as expected. That brings us to the end of this video. If you'd like to support the channel and help us keep making tutorials like this, you can do so by using the super thanks button below. As always, thank you for watching and I'll see you in the next one.