Learn AI Agents - How they Work & Build Your Own

[00:00] (0.12s)

this is going to sound dramatic but if

[00:01] (1.76s)

you're in Tech if you're a programmer

[00:03] (3.28s)

building apps or just interested I

[00:05] (5.80s)

really think if you don't understand

[00:07] (7.84s)

this then you're going to get completely

[00:10] (10.16s)

left behind in the coming years why

[00:12] (12.28s)

combinator is saying this is going to be

[00:14] (14.24s)

10 times bigger than SAS it actually led

[00:16] (16.88s)

to me switching to Windows for the first

[00:19] (19.16s)

time ever and buying a new laptop when

[00:21] (21.24s)

you think of AI apps you probably think

[00:22] (22.72s)

of okay chbt open AI but once or twice

[00:26] (26.40s)

you might have heard of AI agents and

[00:28] (28.44s)

this article by anthropic lays it out in

[00:30] (30.88s)

a really solid way you can think of

[00:33] (33.04s)

Agents as composable building blocks

[00:35] (35.92s)

similar to patterns in program and with

[00:38] (38.44s)

these building blocks you can either

[00:39] (39.64s)

build workflows which augment your code

[00:42] (42.96s)

and replace functions and can do a set

[00:45] (45.00s)

of actions or you have agents which are

[00:46] (46.92s)

a level higher they orchestrate your

[00:50] (50.04s)

workflows your functions so basically

[00:51] (51.96s)

they choose what actions to take and

[00:54] (54.40s)

then based on the result of that action

[00:55] (55.92s)

they choose what to do next now these

[00:57] (57.80s)

building blocks I spoke about they're

[00:59] (59.36s)

very similar IL to Concepts you already

[01:01] (61.40s)

know if you know anything about

[01:02] (62.64s)

programming you have prompt chaining

[01:04] (64.72s)

which is the same as doing multiple

[01:06] (66.44s)

function calls with optional error

[01:08] (68.36s)

handling evaluator Optimizer it's just a

[01:11] (71.24s)

loop routing is like parallel concurrent

[01:14] (74.60s)

async programming which helps improve

[01:16] (76.76s)

your performance by running multiple

[01:18] (78.24s)

things at once and orchestration and

[01:20] (80.36s)

synthesis basically what data Engineers

[01:22] (82.28s)

do you take large data sets and you

[01:24] (84.00s)

transform it into a more useful

[01:25] (85.72s)

structured format as we can see there's

[01:27] (87.76s)

a big catch with these types of workfl

[01:29] (89.92s)

flows and systems they often trade

[01:31] (91.84s)

latency and cost for better task

[01:33] (93.96s)

performance so agents are expensive in

[01:36] (96.12s)

time and money they take a while to run

[01:37] (97.76s)

because you're doing all these backtack

[01:39] (99.28s)

tasks and you're doing tons of llm calls

[01:41] (101.92s)

maybe recursively in a loop and feeding

[01:44] (104.12s)

in large context because your agent

[01:45] (105.96s)

needs to understand the past actions

[01:48] (108.20s)

that it's already taken but and this is

[01:50] (110.08s)

super interesting is there is a way

[01:52] (112.08s)

completely around this and it's why I

[01:53] (113.68s)

bought the new laptop it allows you to

[01:55] (115.84s)

run llms on your computer for free and a

[01:58] (118.52s)

lot faster everyone is saying it it's

[02:00] (120.40s)

not AI that's going to take your job

[02:02] (122.04s)

it's someone that knows AI better than

[02:03] (123.76s)

you do if that's true at all what we're

[02:05] (125.44s)

doing here learning is absolutely key

[02:08] (128.20s)

for securing your future career in other

[02:10] (130.16s)

words you want to thrive you got to

[02:11] (131.40s)

learn this stuff as much as possible now

[02:13] (133.44s)

the most serious courses that I've come

[02:15] (135.20s)

across when I was searching around

[02:17] (137.08s)

trying to learn are simply learns Ai and

[02:19] (139.68s)

machine learning courses give me just a

[02:21] (141.68s)

minute CU if you want to go deep I think

[02:23] (143.12s)

they're worth checking out and just a

[02:24] (144.84s)

heads up this video is sponsored by

[02:26] (146.88s)

simply learn one of the best ones I came

[02:28] (148.84s)

across was the micros soft bagged AI

[02:30] (150.84s)

engineer course because it covers

[02:32] (152.68s)

everything from generative AI to deep

[02:34] (154.72s)

learning prompt engineering and more

[02:36] (156.96s)

there's over 25 projects and a Capstone

[02:39] (159.96s)

so you'll walk away with a lot of

[02:41] (161.32s)

hands-on experience then there's

[02:42] (162.92s)

electives which really let you

[02:44] (164.52s)

specialize advanc generative AI NLP and

[02:48] (168.56s)

even preparing for the Microsoft azer

[02:50] (170.60s)

certification exam and in the end you

[02:52] (172.44s)

even get a certificate from Microsoft

[02:54] (174.28s)

and if you're curious about reviews 4.5

[02:56] (176.36s)

on switch up 4.4 on Career Karma you can

[02:59] (179.24s)

check those out they've also got

[03:00] (180.52s)

financing options so I would encourage

[03:02] (182.20s)

you if this sounds interesting at all at

[03:03] (183.76s)

least check out Simply learns website

[03:05] (185.76s)

evaluate some of the different courses

[03:07] (187.44s)

and this is a really structured way just

[03:09] (189.20s)

to get fully immersed in AI so if you're

[03:11] (191.64s)

interested check out the pin comment or

[03:13] (193.64s)

Link in description thanks again to

[03:15] (195.20s)

Simply learn for the sponsor back to the

[03:17] (197.52s)

video all you need to do is go to

[03:19] (199.76s)

ama.com and you can download a bunch of

[03:22] (202.52s)

different ones for free so running

[03:24] (204.44s)

through this really quick you just go to

[03:26] (206.12s)

the models Tab and you can see a full

[03:28] (208.72s)

list here it goes goes on and on now

[03:31] (211.24s)

most important part when you're running

[03:32] (212.80s)

it locally your model size has to be

[03:35] (215.72s)

less than your GPU vram so this

[03:39] (219.32s)

particular card RTX 4070 it has 8 GB so

[03:44] (224.32s)

I have to check how big is the model not

[03:47] (227.16s)

in terms of uh parameters like this one

[03:50] (230.12s)

has 70 billion but in terms of the

[03:52] (232.40s)

actual let's say file or uh trained

[03:55] (235.36s)

model size so I can go into for example

[03:58] (238.08s)

llama 3.3 it's going to be too big I

[04:00] (240.52s)

already know with the 70 and the 405 um

[04:04] (244.60s)

billion parameters so what I do is I

[04:06] (246.80s)

just go into tags and I can see like

[04:09] (249.12s)

yeah they're 40 49 53 and so on I

[04:12] (252.88s)

already have a few installed and again

[04:15] (255.24s)

you can install a new one just by

[04:17] (257.96s)

running this command and it will

[04:19] (259.72s)

immediately start running it but I can

[04:22] (262.04s)

uh show you which ones I have installed

[04:24] (264.08s)

so o llama

[04:25] (265.52s)

LS and you'll see that I have quen I

[04:29] (269.04s)

have llama 3

[04:30] (270.24s)

actually have a compressed version right

[04:31] (271.72s)

now and uh just cuz I was testing it and

[04:34] (274.36s)

I have L lava which is image analysis uh

[04:38] (278.52s)

so these quen models you'll notice

[04:40] (280.44s)

they're actually even the compressed

[04:42] (282.60s)

ones are under eight but actually when I

[04:45] (285.32s)

run it it is not fully using my GPU so

[04:49] (289.16s)

if I do if I run quen instruct Q2 let me

[04:53] (293.76s)

just show you what I

[04:55] (295.96s)

mean so once you run that you get a

[04:58] (298.08s)

command prompt and you can kind of test

[04:59] (299.44s)

the speed by typing in a command hello

[05:01] (301.84s)

and even for that really simple one

[05:03] (303.20s)

you'll see there was a bit of a delay

[05:05] (305.04s)

now if I go to a new window and I do AMA

[05:07] (307.12s)

PS I can see the reason is because

[05:10] (310.20s)

actually when this model is running the

[05:12] (312.24s)

size expands to 7.5 GB because it needs

[05:15] (315.08s)

a little bit of extra space and then on

[05:18] (318.04s)

top of that my GPU needs some extra

[05:20] (320.20s)

space to run normal processes so while

[05:22] (322.44s)

it's able to fit 91% into the GPU that's

[05:24] (324.92s)

still going to be a pretty big

[05:26] (326.48s)

performance hit because it has to

[05:27] (327.84s)

offload things to the CPO which is just

[05:30] (330.04s)

like exponentially slower and it's what

[05:31] (331.96s)

you'd have to do if you don't have a GPU

[05:33] (333.72s)

locally so let's just kill this one and

[05:36] (336.64s)

we'll instead run the Llama

[05:40] (340.00s)

3.2 so I just copy that model

[05:43] (343.76s)

name just running it again because it

[05:45] (345.92s)

didn't fully stop the previous one and

[05:48] (348.24s)

now when we type hello boom instant

[05:50] (350.12s)

response that's what we want now if we

[05:51] (351.92s)

check the you uh utilization we can see

[05:54] (354.80s)

it's 100% GPU it expanded to 5.4 but

[05:58] (358.16s)

that's okay as long as you you have this

[06:00] (360.36s)

when you run a llama piece companies

[06:02] (362.72s)

like this origami agents it's in y

[06:05] (365.00s)

combinator in the first month I think

[06:07] (367.00s)

they're doing 100K recurring Revenue

[06:09] (369.56s)

already and let's just take a look at

[06:11] (371.16s)

this company and try to build a simple

[06:13] (373.24s)

version so they do business lead

[06:15] (375.32s)

generation with custom prompting and

[06:17] (377.36s)

their whole pitch is something I've

[06:18] (378.44s)

described in the past few videos you

[06:19] (379.84s)

have virtually unlimited unstructured

[06:21] (381.84s)

data on the web that maybe doesn't exist

[06:24] (384.40s)

in a database you have access to and if

[06:26] (386.44s)

you can extract and structure this in a

[06:28] (388.24s)

useful way this is a huge opportunity

[06:30] (390.76s)

even if you're building a very Niche

[06:32] (392.24s)

agent for a very specific industry so if

[06:34] (394.84s)

we scroll down we can see some of the

[06:36] (396.16s)

queries people are able to run find

[06:37] (397.64s)

woocommerce store owners who sell

[06:39] (399.76s)

products covered in uh By Us health

[06:42] (402.12s)

insurance here's another one you can

[06:44] (404.24s)

visit the site if you want to see them

[06:45] (405.72s)

all but let's take a look at this one

[06:47] (407.88s)

specifically so if we think in terms of

[06:50] (410.24s)

agentic patterns so for this one if we

[06:52] (412.20s)

think in terms of agentic patterns how

[06:53] (413.92s)

would we actually achieve this with

[06:55] (415.80s)

various workflows to make this a bit

[06:57] (417.28s)

more concrete we have the orchestrator

[06:58] (418.72s)

at the top which first chooses what

[07:00] (420.76s)

steps we want to run so first find

[07:03] (423.08s)

products second find stores and then

[07:05] (425.04s)

third see if they're woocommerce or not

[07:07] (427.88s)

and within each step we'd have sub

[07:09] (429.16s)

workflows so we might have just have a

[07:10] (430.92s)

search Google workflow a scrape article

[07:14] (434.00s)

site to extract the information we need

[07:16] (436.28s)

so I've actually coded a simple version

[07:17] (437.92s)

of an app that's very similar to what a

[07:20] (440.48s)

B2B agent like origami would do but I've

[07:23] (443.96s)

given you the choice here between

[07:25] (445.28s)

running it locally for free or using the

[07:27] (447.48s)

open AI API but again I'll emphasize

[07:30] (450.04s)

that agents specifically they can do a

[07:32] (452.48s)

lot of llm calls consume a lot of tokens

[07:34] (454.52s)

so you can see my component files map to

[07:36] (456.68s)

the architecture that we talked about

[07:38] (458.20s)

first I have the orchestrator which has

[07:40] (460.20s)

a lot of different methods and again if

[07:41] (461.68s)

you really want to get into it just

[07:43] (463.00s)

check out the code but at the highest

[07:45] (465.24s)

level first we're generating the tasks

[07:48] (468.04s)

upfront first do this then do this then

[07:50] (470.32s)

do this and use subw workflows to

[07:52] (472.24s)

accomplish them then another important

[07:54] (474.00s)

method we have a prompt to select the

[07:55] (475.96s)

next workflow from the workflow

[07:57] (477.96s)

definitions in these folders which are

[07:59] (479.84s)

basically the same as just programming

[08:01] (481.92s)

functions for our workflows I could have

[08:03] (483.68s)

gone super generic and just written a

[08:05] (485.44s)

crawler with a custom prompt but I

[08:07] (487.56s)

wanted to make it a little bit more

[08:08] (488.80s)

specific where I can where I have one

[08:10] (490.92s)

just for search Google where I can

[08:12] (492.80s)

ensure that the search results are

[08:14] (494.44s)

getting crawled and extracted correctly

[08:16] (496.60s)

so I've also added a little bit of

[08:18] (498.24s)

custom scraping code here if we're doing

[08:20] (500.24s)

B2B having something that finds LinkedIn

[08:22] (502.36s)

profiles is very useful and I've

[08:25] (505.08s)

specified here that we want to just pull

[08:26] (506.84s)

the serps or Google search results for

[08:29] (509.08s)

let's say name company name let's say we

[08:31] (511.40s)

do a prompt like find me all the coding

[08:33] (513.64s)

boot camp owners in the USA and send me

[08:36] (516.12s)

their name and their LinkedIn profile so

[08:38] (518.44s)

I can message them that would be an

[08:39] (519.88s)

example use case there and then of

[08:41] (521.60s)

course we have our generic crawl site so

[08:43] (523.68s)

we have these search results for Google

[08:45] (525.92s)

then our orchestrator will decide which

[08:48] (528.52s)

ones are worth visiting that might have

[08:50] (530.52s)

the information we want so I've run this

[08:52] (532.32s)

a few times already and I'll show you

[08:53] (533.72s)

the output I put in find 10 Facebook

[08:55] (535.64s)

software engineer names and their

[08:56] (536.84s)

LinkedIn profiles and we can see just

[08:58] (538.80s)

from this prompt the model completely

[09:00] (540.32s)

ran and extracted exactly what we need

[09:02] (542.52s)

we have name profile URL and position so

[09:05] (545.08s)

software engineer software engineer yeah

[09:07] (547.80s)

pretty much all software Engineers with

[09:09] (549.92s)

direct links let me just show you

[09:11] (551.52s)

another example of this let's say you

[09:12] (552.84s)

want to find Shopify apps to Market a

[09:15] (555.52s)

specific product to them all I have to

[09:18] (558.20s)

do is say find me 10 Shopify

[09:21] (561.44s)

apps then find their Founders SL owners

[09:27] (567.68s)

and their

[09:30] (570.60s)

their names and Linkedin

[09:33] (573.72s)

URLs now I don't want this video to get

[09:35] (575.84s)

too long but let's just see what it does

[09:37] (577.80s)

first so so here our orchestrator has

[09:39] (579.68s)

broken it up into two key tasks first

[09:42] (582.20s)

find the Shopify apps and names from a

[09:43] (583.84s)

Google search then get founder names and

[09:45] (585.76s)

Linkedin URLs so it's going to complete

[09:47] (587.88s)

the first one before the second one then

[09:49] (589.60s)

it selected the search Google workflow

[09:51] (591.16s)

and we can see the search query here

[09:52] (592.84s)

that'll run for a little bit and then

[09:54] (594.08s)

we'll see that our step was complete

[09:55] (595.80s)

with a summary so I found the first

[09:57] (597.36s)

Shopify app which is named clavio and it

[09:59] (599.72s)

found and it's doing another search

[10:01] (601.60s)

query for clavio Founders LinkedIn and

[10:04] (604.12s)

so far it found two results now it's

[10:06] (606.20s)

doing the same for oero privy and it's

[10:09] (609.24s)

just going down through the list of URLs

[10:11] (611.40s)

that we found so this is going to

[10:12] (612.84s)

continue to run let me just show you the

[10:14] (614.36s)

end output with that very simple prompt

[10:16] (616.28s)

that we put in in the beginning so here

[10:18] (618.00s)

we can see a result summary and it also

[10:19] (619.72s)

saved us a Json file let's open that

[10:22] (622.40s)

file and first we can see it got all the

[10:25] (625.04s)

app names and URLs but and we can also

[10:27] (627.68s)

look at the summary down here

[10:28] (628.84s)

unfortunately it didn't 100% understand

[10:30] (630.80s)

this because for certain apps it found

[10:32] (632.72s)

more than one person so of course you

[10:34] (634.12s)

could further refine The Prompt that

[10:35] (635.52s)

you're inputting to get a different

[10:37] (637.48s)

result but let's just take a look at one

[10:39] (639.64s)

of these URLs to see if it's correct and

[10:41] (641.92s)

yeah for this one at least we got the

[10:43] (643.28s)

co-founder at sprocket let's just check

[10:45] (645.32s)

another one toer tagrin for

[10:48] (648.32s)

yo and yeah so we got the CEO there so

[10:51] (651.52s)

so of course it's still not perfect I

[10:52] (652.84s)

coded this agent in about 1 day but you

[10:54] (654.92s)

can probably start to see the power of

[10:57] (657.04s)

composing things together doing custom

[10:58] (658.68s)

workflows and having a really solid

[11:00] (660.64s)

orchestrator that being said if you like

[11:02] (662.24s)

this video please leave a comment so I

[11:03] (663.92s)

can make more agent videos or AI videos

[11:07] (667.28s)

and with that being said I'll see you

[11:08] (668.44s)

guys in the next one

YouTube Deep Summary

Learn AI Agents - How they Work & Build Your Own

🤖 AI-Generated Summary:

Summary History

What Are AI Agents?

The Trade-Off: Cost and Latency vs. Performance

Running LLMs Locally: A Game Changer

Real-World Applications: Building a B2B Agent for Lead Generation

Why You Should Care and How to Get Started

Final Thoughts

📝 Transcript (312 entries):