[00:00] (0.08s)
even if you're a developer you might
[00:01] (1.56s)
still be thinking about AI wrong and the
[00:03] (3.52s)
reason is it's not just chat GPT it's
[00:05] (5.88s)
not just agents and it's not just your
[00:09] (9.40s)
cursor code editor that's helping you
[00:11] (11.32s)
write code better in fact it's an entire
[00:13] (13.60s)
platform with multiple levels of the
[00:15] (15.40s)
stack that you can develop on like other
[00:18] (18.72s)
let's say platforms like iOS web having
[00:22] (22.20s)
a solid development environment to take
[00:25] (25.04s)
advantage of this coming opportunity
[00:27] (27.24s)
it's super important and that's the
[00:28] (28.88s)
reason why I picked up an RTX laptop
[00:31] (31.96s)
personally I got the Asus Rog zephrus
[00:34] (34.24s)
470 which is not just good for gaming it
[00:36] (36.92s)
has a lot of advantages when you're
[00:38] (38.48s)
running local llms and trying to stay
[00:41] (41.28s)
ahead of the curve on this stuff and be
[00:43] (43.32s)
well positioned for the coming months
[00:45] (45.08s)
and years so we're going to talk about
[00:46] (46.44s)
the AI stack why would you even want to
[00:49] (49.00s)
run llms locally and also some kind of
[00:51] (51.64s)
fun quality of life stuff that you can
[00:53] (53.36s)
do with this laptop because I've
[00:54] (54.88s)
personally been a Mac User for 10 years
[00:56] (56.96s)
and this is finally like a valid reason
[00:59] (59.32s)
for me to upgrade and even Switch to
[01:01] (61.52s)
Windows I'll say now this video is
[01:03] (63.28s)
sponsored by Nvidia who recently had
[01:05] (65.32s)
some amazing jaw-dropping announcements
[01:07] (67.52s)
in the CES conference the new gpus
[01:10] (70.00s)
coming out super interesting small
[01:12] (72.04s)
computers that can run insane models and
[01:14] (74.84s)
I'm super happy they got in touch
[01:16] (76.20s)
because I'm now working full-time on
[01:18] (78.00s)
various AI apps and I really think it's
[01:20] (80.12s)
still so early so let's dive into this
[01:22] (82.24s)
so running models locally seems like a
[01:23] (83.96s)
pain why would you want to do it well
[01:25] (85.88s)
it's actually super easy you just go to
[01:27] (87.76s)
ama.com you can download the Open Source
[01:30] (90.12s)
model of your choice pre-trained and
[01:32] (92.48s)
then you can run it locally on your
[01:33] (93.92s)
computer but trying to run this on a Mac
[01:35] (95.96s)
you're going to encounter one issue and
[01:38] (98.40s)
you need to be able to fully load the
[01:40] (100.96s)
model by its size so let's say it's 8 GB
[01:44] (104.40s)
you need at least 8 gigs of vram in your
[01:47] (107.72s)
GPU to run that model at least at a
[01:50] (110.40s)
speed that is actually viable useful and
[01:53] (113.60s)
comparable with apis Beyond just vram
[01:56] (116.12s)
there's a reason Nvidia is leading the
[01:58] (118.20s)
industry they're the standard for all
[02:00] (120.88s)
large AI companies it's that their
[02:02] (122.84s)
architectures are also highly optimized
[02:05] (125.36s)
you don't have to take it for me you can
[02:06] (126.64s)
go to AMA right now download one of the
[02:09] (129.16s)
models for free it's going to be slow
[02:11] (131.44s)
unless I can load it fully in my GPU the
[02:14] (134.08s)
way you check that is you get your model
[02:16] (136.04s)
you start running it and then you run
[02:17] (137.44s)
the command o Lama pce you'll see a
[02:19] (139.68s)
utilization breakdown CPU versus GPU if
[02:23] (143.12s)
even 10 20% is being loaded into your
[02:26] (146.20s)
CPU because you don't have the parallel
[02:28] (148.40s)
capabilities of the GPU your model is
[02:30] (150.68s)
going to run exponentially slower like 3
[02:32] (152.88s)
to five to even 10 times depending how
[02:35] (155.88s)
big it is now of course you could also
[02:38] (158.04s)
go to let's say a API provider so you
[02:41] (161.16s)
can see this test Iran running on a
[02:43] (163.92s)
pretty fast Network and the response
[02:46] (166.40s)
time being around 2x higher so two times
[02:50] (170.84s)
yeah it makes a huge difference if
[02:52] (172.12s)
you're developing if you're running long
[02:54] (174.52s)
complex AI flows but the more important
[02:56] (176.76s)
consideration or bottleneck with
[02:58] (178.60s)
comparing this to an AP
[03:00] (180.44s)
is the cost aspect open ai's pricing is
[03:03] (183.12s)
based on tokens and when you're running
[03:05] (185.76s)
an agent or you're running a complex
[03:07] (187.80s)
task you're usually feeding the entire
[03:10] (190.24s)
history into each context window when
[03:12] (192.76s)
you do this each call is going to eat up
[03:14] (194.88s)
a pretty substantial amount of tokens
[03:16] (196.68s)
especially if you're using the newer
[03:17] (197.96s)
models like A1 or let's say you're
[03:19] (199.84s)
building an agent each agent might have
[03:22] (202.24s)
10 50 200 llm calls to complete a task
[03:26] (206.32s)
so let's come back to AI being a new
[03:28] (208.76s)
platform and there's kind of three
[03:30] (210.40s)
levels in my mind I see it broken down
[03:32] (212.84s)
into the highest level is just llm calls
[03:36] (216.12s)
and orchestrating them into complex
[03:38] (218.32s)
workflows and tasks like agents which
[03:40] (220.72s)
why combinator has said can actually be
[03:42] (222.92s)
10 times bigger than the whole SAS
[03:45] (225.12s)
industry whether or not you believe this
[03:46] (226.92s)
they are going to be part of the future
[03:48] (228.52s)
and when it comes to building agents
[03:50] (230.24s)
being able to do this locally even if
[03:52] (232.80s)
just for development purposes is hugely
[03:55] (235.16s)
advantageous coding your own
[03:57] (237.20s)
orchestrator and agent flow on your
[03:59] (239.88s)
local computer it's a great project even
[04:02] (242.44s)
if you're just trying to get hired in
[04:04] (244.08s)
the coming years and I personally coded
[04:06] (246.36s)
an agent for this video to find anyone's
[04:09] (249.12s)
LinkedIn profile with a broad query my
[04:11] (251.96s)
agent will do web scraping it will
[04:14] (254.68s)
analyze LinkedIn profiles and it will
[04:17] (257.44s)
crawl the web for me completely for free
[04:20] (260.44s)
so I think this is a really good AI
[04:22] (262.24s)
project starting out but this is still
[04:24] (264.24s)
the highest level of this stack you go
[04:25] (265.92s)
one level down things get even more
[04:28] (268.08s)
interesting with nvidia's a AI workbench
[04:30] (270.72s)
optimized specifically for their chips
[04:32] (272.84s)
you have an entire Suite of tools to
[04:34] (274.92s)
play with the most interesting part for
[04:36] (276.84s)
me at least is the ability to fine-tune
[04:38] (278.88s)
models in other words you can take a
[04:41] (281.08s)
smallish model like llama 3.2 and it can
[04:44] (284.36s)
become almost as performant as a larger
[04:46] (286.56s)
model because you've made it specialized
[04:49] (289.36s)
fine tuning is critical because when you
[04:51] (291.00s)
consider the AI space so many people are
[04:53] (293.12s)
just using generic larger models that
[04:56] (296.32s)
are not customized you probably know
[04:58] (298.08s)
what fine tuning is but when you can do
[05:00] (300.52s)
this locally it really makes those
[05:02] (302.64s)
smaller models quite formidable and then
[05:05] (305.68s)
you combine that with the other benefits
[05:07] (307.48s)
on the lowest level you have actual Cuda
[05:10] (310.60s)
programming you can run low-level code
[05:12] (312.80s)
on your GPU hardware and the
[05:14] (314.64s)
mind-blowing thing is the
[05:16] (316.04s)
parallelization capabilities are insane
[05:19] (319.00s)
when you think about new AI Graphics
[05:21] (321.88s)
procedurally generated games this is
[05:23] (323.92s)
where your Cuda programming is going to
[05:26] (326.80s)
be really interesting going into the
[05:28] (328.60s)
future it really good simple example if
[05:31] (331.04s)
you're struggling to understand why it's
[05:32] (332.96s)
useful is something like FFM Peg most
[05:36] (336.12s)
experienced programmers know this is for
[05:37] (337.96s)
video processing modifications and
[05:40] (340.20s)
basically manipulating video files and
[05:42] (342.84s)
there's a lot of operations required to
[05:45] (345.80s)
modify the individual frames but using
[05:48] (348.48s)
Cuda using the GPU you can parallelize
[05:51] (351.40s)
this huge set of tasks and your
[05:53] (353.80s)
performance can go up in terms of speed
[05:56] (356.40s)
so this is low-level programming and if
[05:58] (358.28s)
you can wrap your head around it for
[06:00] (360.00s)
that reason there's going to be huge
[06:01] (361.56s)
opportunities here and if you can
[06:03] (363.40s)
position yourself now to learn Cuda well
[06:05] (365.64s)
it's going to be insane so how about the
[06:07] (367.12s)
fun stuff things that you know everyone
[06:09] (369.88s)
can take advantage of whether you're a
[06:11] (371.16s)
developer or not things I've enjoyed
[06:13] (373.12s)
personally the first one is with games
[06:15] (375.64s)
obviously games run really well on a
[06:17] (377.72s)
4070 but you also have with Nvidia frame
[06:20] (380.72s)
generation your GPU is able to fill in
[06:24] (384.08s)
gaps with AI in real time and you get a
[06:26] (386.72s)
better frame rate than what your native
[06:28] (388.72s)
Hardware is actually capable of as the
[06:31] (391.08s)
frames are being rendered to you it
[06:32] (392.80s)
seems more smooth and this is done
[06:35] (395.28s)
dynamically and Nvidia also has
[06:37] (397.80s)
upscaling even for YouTube so if you're
[06:40] (400.68s)
watching a YouTube video at 480p 1080P
[06:44] (404.80s)
and this works on every browser the GPU
[06:47] (407.36s)
will actually be able to increase the
[06:50] (410.12s)
quality of that video Beyond even its
[06:53] (413.04s)
compressed size and improves the image
[06:55] (415.28s)
in the same way that upscaling models do
[06:58] (418.20s)
as well but this is done in real time
[07:00] (420.08s)
and it's really mindblowing so I don't
[07:01] (421.96s)
know if it's me but I think the sooner
[07:04] (424.12s)
you can dive into this stuff and
[07:05] (425.72s)
actually just Embrace okay we have this
[07:08] (428.48s)
new platform these new tools these new
[07:11] (431.44s)
types of software these laptops for like
[07:14] (434.80s)
actually having a developer environment
[07:16] (436.48s)
it's kind of what you need so that's the
[07:18] (438.68s)
reason I switched let me know what you
[07:20] (440.24s)
think of this video and shout out to
[07:22] (442.68s)
Nvidia for sponsoring and hope to see
[07:25] (445.44s)
you guys in the next one we'll do an
[07:27] (447.60s)
agent workflow on the laptop very soon
[07:30] (450.88s)
so I'll see you guys in the next one