I Built an AI Coding Agent in 200 Lines of Code.

Great Pods

I Built an AI Coding Agent in 200 Lines of Code.

Key Points

AI coding agents like Claude Code are not black magic - they can be built in just 200 lines of code
You only need basic programming skills and access to a large language model (LLM) to build a coding agent
Large language models are stateless text-in, text-out systems that generate responses one token at a time
OpenAI SDKs provide a thin abstraction layer above LLMs for easier programming
Agentic AI enriches basic LLMs by giving them the ability to sense and impact their environment
Coding agents can read files, write code, run shell scripts, and execute tests
Conversation memory is maintained by storing all messages and responses in the agent code, not the LLM
Tool calling works by sending secret prompts to the model describing available tools and expected JSON response format
When an LLM wants to use a tool, it responds with JSON requesting the tool call instead of regular text
The agent parses tool call requests, executes the actual tools, and sends results back to the LLM
Tools can include file operations (read/write), code execution, and shell commands
PowerShell access can give an agent "god mode" capabilities to perform any system operation
Tool specifications follow formats like OpenAI's standard, with alternatives like MCP (Model Context Protocol) for standardization
The core components are: HTTP client, game loop (main loop), conversation history storage, and tool calling support
Everything else in commercial coding agents is essentially "bells and whistles" on top of these fundamentals
Local models like Qwen 2.5 Coder can be used instead of expensive frontier models for learning purposes
The agent maintains conversation state by appending user messages, assistant responses, and tool results to a message list
Each tool call becomes another message type in the conversation history that gets sent with every new prompt

Full Transcript

Language: en
Hey folks, tools like cloth code or
codeex or cursor, they feel a bit like
black magic, right? Well, today I'm
going to show you that there is zero
black magic involved because we will be
building our own cloud code killer. Nah,
I'm kidding. But we will be building our
own AI coding agent uh so you can see
what the fundamentals are and uh what's
involved. And spoiler alert, it's only
200 lines of code. So, let's dig in.
And there we go. And as you can see,
there is actual JavaScript code in here
that gets called and evaluated. So that
is pretty cool, right? We just built our
coding agent in 200 lines of code and it
can do anything cloud code can kind of.
Now the inspiration for this video came
from a blog post and a talk by uh the
AMP code crew Thorston Bal and Jeffrey
Huntley. So if you want to dive into the
original blogs, they're in the
description below.
All right. Now, what do we need to get
started? Turns out you don't need much.
You need some programming uh skills. You
need to be able to write CLI programs
and talk to an HTTP web server. And
secondly, you need access to a large
language model. Now you can use the F
frontier models like cloth set or GPT5
whatever or you can go for a cheaper
route especially for this exercise which
is using cheaper models or even running
your own model and that is what we I
will be doing today. I am running like a
very old very small quen coder model on
this laptop here. So it's all local uh
or at least it's on my local network and
that's actually all you need. So, basic
programming skills um and access to a
model. All right. First, we need to get
some fundamentals down. And the first
one is a 30-cond recap of what a large
language model is and how it works
somewhat. So, you can treat a large
language model as a blackbox. And it's a
stateless thing that takes input, which
is like messages or commonly called
prompts, and then one token at a time,
it generates a response.
And for each subsequent token, it needs
like all the previous things. So your
prompt and things, but that's that's
already going into too much detail. Uh
all you need to know is that it's text
in text out and it's a very lowlevel
technology. Now that will not be fun to
program with, but there are some SDKs um
the most commonly used ones are the
OpenAI ones uh that provide a thin layer
above this. And what these SDKs give you
is a way to send messages to a model and
receive like the entire uh completion.
So the entire responses and you can
stream them and you can do all kinds of
crazy stuff. So you can think of these
OpenAI SDKs as like a thin abstraction
layer above LLMs that allow you to do
text in text out kind of protocols. And
that's what we will be using to build
our first AI coding agent.
So what exactly is an agent? Well, if
you have these basic building blocks
like a stateless thing, message in,
message out, that's like programming and
hasll, right? No side effects means no
interesting software.
But uh when we talk about agentic AI, we
uh enrich this thing so that it can
sense and impact its environment. So it
can see things that are happening that
are not like directly in its prompt and
it can do things that are like not in
its immediate control.
And for coding agents, what we will be
building today, seeing the environment
is like having the ability to read
files, to open files, to list
directories, stuff like that. And
impacting the environment. Typical
coding agents, they can write text or
code to files and they can run shell
scripts to compile code, run unit tests,
stuff like that. So that's what we mean
when we say uh agentic.
All right. So we will be building this
coding agent like step by step
incrementally and let's take a look at
the first step or the first iteration
which will be very bare bones but bear
with me. Uh we are just going to write
an AI coding agent that you can talk to.
You can send a message and that'll go
and talk to a large language model and
it'll just print out a response. So it's
a very basic walking skeleton
and let's uh grab the code for that. So
it is 50ish 70ish lines of code but it's
a lot of boilerplate inn net. So it's 50
lines of code. Uh what actually are we
doing? We are uh using the openi uh
client to talk to models and as you can
see I'm running uh a model somewhere in
my network uh using oama and it is quen
2.5 coder 7b instruct.
doesn't really matter but it's a very
small model I can run on my consumer
and then we just connect to uh yeah that
LLM provider that model provider and
then what happens when we run uh our
command or our application we just ask
for input like uh like cloud code would
do we ask for a prompt and then we uh
process it and as you can see it's like
a game loop a main loop that keeps doing
this until you want to stop and Let's
take a look at what happens uh if you uh
ask a question or like submit a prompt.
We just wrap it into some uh types and
then say run inference. That's the
technical term for like generate some
tokens for me please and we print it out
to console. Now what does running
inference look like uh in this hello
world example? Super simple. We just use
that client uh and we say okay just
complete this message just provide a
response and then we just uh return the
string result and that's it. So now
let's see that in action. So yeah it's
booted up and let's say
let's say hello there
and there we go. We have a a ripple and
we can talk to a model from our command
line. Now, this isn't doing much, right?
But let's uh build some some extra
sauce. Let's let's put something on top.
So, the first thing you will notice when
you have this hello world up and running
is that it's really dump. And when I say
really dump, I mean really dump. So, let
me illustrate this.
Uh let me tell it a secret.
So, now I just told my LM uh the secret
and now I'm going to ask it about the
secret. What was
so asking what was the secret again?
And it has no clue what we're talking
about. What's happening?
Well, uh, one thing we glossed over in
the introduction is that uh, LLMs are
stateless. So, they do not keep any
track of any state. And even that SDK,
that wrap around it does not keep track
of any state. that's just there to allow
you to send messages to an LLM and to
like get responses back. So how do we
get like how does cloth code get get its
memory like a conversational memory?
Well, the trick is it's actually all
kept in the agent code. So in the the
thing we are building.
So what we are going to do is we're
going to store uh a conversation which
is nothing more than a list of messages
and we're going to store both the things
we are saying and the responses that we
got from the LLM. And then as a next
step every time we ask a new question or
send a new message we are going to
upload or send along the entire
conversation history uh to the model. So
it has like track of everything that has
been said before and that's actually how
u these models u have conversations.
So let's now go to the code and take a
look at what this looks like in
practice. So as we just discussed uh you
just need to keep track of a
conversation and in my hackey
implementation that's just a list of all
the chat messages that get sent and that
we received. So that's what you see here
this conversation list and then not a
lot changes actually except uh where we
are running inference. We're just adding
every message we send to this list and
adding every response we get to this
list and that is the only difference
except that when we run inference
instead of like sending just the last
message we send the entire conversation.
Now one other thing to note is the what
you can see here there are different
kinds of messages and those are just
tags that like say okay this was
originally a message from the user or
this was originally a message from the
model itself. So uh there's like a real
script like a a play or a movie script
that has like all the actors in it and
that is actually all you need to uh yeah
keep session state or conversation
state. So let's see what that looks
like. All right. So let's try the secret
And now let's see if it remembers.
There you go. It knows about secrets. It
keeps track of the conversation history.
So that's how you keep track of
conversation. All right. Now we're
getting to the good stuff. Tool calling.
So let's make our agent actual agents
that can sense their environment and can
like write to their environments. Uh so
first uh we're going to do this um with
chat GPT in the browser just to have
like an idea of what we're going to do
and then we'll write some code to do the
actual tool uh integration. But first
things first, how does this work?
Because large language models as you
know it's text in text out. They cannot
execute code. They cannot run commands.
So how do tools actually get integrated?
actually with a clever hack by
reprogramming these models. Uh because
what happens is uh next to your
conversation and the prompts you're
submitting, there's a like a a sneaky
secret uh message that gets sent. Uh and
we'll take a look at it in a second, but
that basically tells the agent these are
the tools you have access to and this is
how you should like signal that you need
to call a tool. So that's just another
message, just another prompt that gets
sent over the wire to the model and then
the model knows about tools and can
execute them. But what does that
actually look like executing a tool for
a large language model? For that, we can
go to shhat GBT and take a look at an
actual uh tool description uh list. So
uh let's take a look at this prompt
before I fire it off. We're telling the
model is that it has a list of tools and
if they feel like they should be calling
a tool to like answer your questions or
obey your commands, they uh have to
respond with some jason some JSON
and that's not valid JSON but okay. So
they have to respond with some JSON
basically requesting a tool call to
happen and then uh we will do the actual
tool call like me uh in the browser
typing the answer or when we build our
coding agent we will write some code
that does that for us. So no mechanical
term there but that's the idea. And then
as an example let's give a chat GPT a
tool list here that is like okay we have
one tool it's called get secret. This is
what you can do with it. It takes no
arguments and this is what you get back.
So we're just telling it that we have a
get secret tool uh that shed GPT can use
to find out about the secret and takes
no arguments and it'll return the
secret. So let's reprogram shed GPT with
this tool list.
So, it got the message. We're good to
go. And now, let's send a prompt asking
about the secret.
And as you can see, it answers, but it
answers with like a request for a tool
call. It's saying, "Hey, I need a tool
call uh uh for this to work." And the
tool we're calling is get secrets. So,
now pretend we are the coding agent and
we see this response. Okay, we need to
do a tool call. So, let's pretend we do
that. And we found the secrets. So now
let's return secret.
I don't know why I'm picking fruit,
but now we responded with a tool uh
response and chat GBT just like sees it
and can continue its work. So this is in
a nutshell what tool calls look like.
Now let's implement that. All right. So
now let's implement tool calling in our
coding agent. Uh so um the only thing
that changes
is that uh when we get a response an
assistant message is what it's called in
the open SDK we investigate that
response and whether if it's a tool call
we do some tool calling and
uh that's actually the only difference.
So let's take a look at uh what I'm
doing here. I'm just taking a look at
like is the response JSON or not. If
it's JSON, it's a tool call and we take
a look at the request to perform the
tool. Otherwise, not. That's not
actually how tool calling works in most
models and most um clients, but it is
with the model I'm using and Olama, the
the model provider I'm using. So, there
are better ways if you're building your
own coding agent than trying to parse
JSON basically, but it works for this
demo. So, that's why I'm going with it.
So yeah, this is where we're parsing uh
JSON to see whether or not it's a tool
call and just as we did in our chat GPT
demo, it's just the name of the tool and
some arguments if uh that's relevant.
And then when we have our tool call
uh if I can navigate back at least.
Yeah. So if it is a tool call uh we do a
tool call which is what you can see here
and then save that tool call as another
message in the conversation history as
we do for uh user messages and assistant
messages. Now we also have like tool
messages and that gets sent to the LLM
with every prompt we do. So that's how
the LLM receives tool responses. So
let's take a take a quick look at
running a tool. Um the secret tool uh
that we just demoed is a hardcoded thing
but this is where the magic will happen.
This is where you can look up files,
read files, edit files. So this is how
tool calling works in practice. And as
you can see here, our secret of the day
uh in our actual tool is key lime pie.
So we're still somewhere in the fruit
And now uh let's demo this.
So I'm going to ask it what is the
As you can see, it did the tool call
response and then our agent picked up
that response, parsed it and executed
this line of code. And that is exactly
what the model got and what the model
returned. So that's basically how tool
calling works. Now let's make it a
powerful coding agent. So it can write
files and read files and execute code.
So let's now make it a quick real coding
agent. So it can read files, write
files, and uh execute code. And
typically you give these agents
different tools for the job. But I was
getting I was spending way too much time
on this. It was fun though. But um I was
spending way too much time on this. So I
gave it god mode and I gave it access to
PowerShell. And it's a scripting
language for for Windows. And with that
you can do anything. So you can read
files, create files, run code. You can
it's a shell basically. Uh so I gave it
access to PowerShell as you can see.
It's just the same uh approach as the
secret thing we just saw. And the input
arguments are actually a bit more
involved because we need input like what
file are we trying to write? What are
the contents of the file? What command
are we trying to run? So that's what you
see here. Now uh it's al it's basically
just like explaining the the method
signature or the tool signature. saying
like okay this tool uh takes a script
argument and it's a string type and this
is what you should put in it and it's
required stuff like that. Uh one final
thing to note, this is like very uh
proprietary. It's the OpenAI uh
specification. Uh the Quen models have
their own specification. Uh but there's
a bit of standardization around this.
Whenever you hear the term like MCP
server, they solve a large part of this
like being a standard interface uh for
like tools and for protocols, but uh
that's something else. So yeah, this is
where MCP fits into the picture. So uh
with this we gave our
uh LLM like knowledge about the tool but
what is the actual tool calling? What
does it look like?
There we go. So if we receive a tool
request for PowerShell, we just run it
and it's two lines of code again or
sorry one two three four five lines of
code which literally creates a new
PowerShell script uh or a shell
and executes that script and just echoes
back whatever this script returns. So
that is it. Now let's see it in action.
All right. So our coding agent is done.
It has PowerShell superpowers. So now
let's see it in action.
Uh I prepared a short demo because the
model I'm using Quen 2.5 is it's a bit
big difference in quality to what you're
used to if you're working with Frontier
models like a cloud set or GBD5.
Uh so this is somewhat carefully crafted
but it's just to to get the idea across
right. So uh we will ask uh our coding
agent to uh program something in
JavaScript to create a oneline script
because I was messing too much with new
lines. Uh and it should be able to write
a script that calculates factorial of 10
and then save the that JavaScript code
to a file uh called fact.js.
So let's uh fire that up.
As you can see it's doing a tool calls
All right. So it did something. Now uh
let's ask it to run that code
using node which is like how you execute
JavaScript on your machine.
As you can see it's doing another uh
tool call. And there you go. It returns
the result. And now let's to make sure
we're not cheating. Let's take a look at
our file system. So yeah, there is a
fact file in there. And let's open it
And there we go. And as you can see,
there is actual JavaScript code in here
that gets called and evaluated. So that
is pretty cool, right? We just built our
coding agent in 200 lines of code and it
can do anything cloud code can kind of.
And that's it folks. With just 200 lines
of code, we just create our own coding
agent. Now, uh I think I'll stick with
my day job and I will not be building
the next clot code killer, but I hope
this demystified uh coding agents for
you. It's just an HTTP client, a game
loop, and some tomb calling support
basically. And and all the rest are
bells and whistles. So yeah, if this
breakdown helped you demystify uh the
black magic that is coding agents,
please hit the like button and share
this video with someone uh that would be
interested. And if you have some hours
to spare, I highly encourage you to
build one yourself. My code is available
on GitHub if you want to take a look at
it. And I hope to see you again next
time. Thanks for watching. Bye-bye.

← All Summaries

Watch on YouTube