Harvard Agentic Science

Something about AI clearly works. You have asked it questions and gotten useful answers, but you have also seen it fail in ways that are hard to ignore. Maybe it hallucinated a reference that does not exist. Maybe it made a sign error halfway through a derivation and confidently kept going. Maybe you spent twenty minutes getting it to understand your notation, closed the tab, and had to start from scratch the next day.

These are not random bugs. They come from the same underlying issue: browser-based AI has no memory, no access to your files (other than what you manually upload), and a limited window for how much information it can hold at once. Every conversation starts from zero.

The good news is that all of this is fixable. If you went through the first part of this guide, you already know what these models can do in a browser; this part is about removing the things that hold them back.

Every AI model shares the same limitation

The problem is context. When you talk to an AI in a browser, the model can only see what is in front of it right now: your current message, whatever you have attached, and the conversation so far. It does not have access to anything else. It cannot open a file on your computer. It cannot look back at a conversation you had last week. It does not know what project you are working on unless you tell it, again, every single time.

This means that the model's usefulness is capped by how much you can fit into one conversation. For short questions, this is fine. But try giving it something bigger. Upload a 300-page textbook and ask it to find where a specific derivation appears. It will confidently give you an answer, but there is a good chance the page number is wrong, the section title is made up, or it is referencing a chapter that does not exist. It is not lying. It simply cannot hold the entire book in its context at once, so it fills in the gaps with plausible-sounding guesses.

The same thing happens with any task that requires the model to keep track of a lot of information at once. It is simply not feasible to use ChatGPT for a multi-step research task that spans several documents or a long derivation with many intermediate results. The model's working memory for the current conversation is called the context window, and once that window gets crowded, it starts losing track of your constraints and missing details.

You might have noticed that ChatGPT sometimes seems to remember things about you across conversations. It does, but in a very limited way. Behind the scenes, ChatGPT maintains a short text file of things it has decided to remember about you. It cannot actually go back and re-read your old conversations. It just has this small note to itself that it sometimes randomly updates. That is why if you ask "what do you remember about me?" it can give you a list, but it is an illusion, not real memory.

The fix

When AI learned to use tools

The big shift happened when these models were given access to tools. In a browser, the model can only work with what you paste into the chat. But when you install it locally, it can use tools the same way you would. It can open a PDF reader, a file browser, a terminal. Instead of cramming an entire 300-page textbook into its context window and hoping for the best, it can open the file with a tool, spawn multiple copies of itself to read different sections in parallel, and report back with the exact information you asked for.

This changes what AI is. It is no longer a chat window you type questions into. It is a program that can interact with other applications (like Mathematica) and access your files directly. You install it inside an isolated project folder, and from there it can read your notes, your papers, your code, and your data on its own. It can open files when it needs them, re-read things it has already seen, and ground its answers in your actual research materials instead of guessing from memory.

Luckily, software engineers have already figured this out for us. We install a program called Claude Code (from Anthropic) or Codex (from OpenAI), point it at a folder, and that's it! The next section walks you through the setup step by step.

What this makes possible

Hallucinated references are a good example of a problem that becomes solvable. Instead of trusting that the model got every citation right, you can spawn a completely separate AI whose only job is to go through the reference list and verify each one. It checks that the paper exists, that the authors match, that the content is relevant, etc. You can set this up and run it in parallel with the main model and it will catch any mistakes before you even see them.

The pattern here is delegation. Rather than asking one AI to do everything in a single conversation, you break the work into pieces and hand each piece to a fresh model that has exactly the context it needs.

So what is the point?

This is not about learning to chat with AI more effectively. The goal is to reach a point where you can hand an LLM a well-defined task, trust what comes back, and easily verify every piece of it. That means an AI that knows your project, remembers your preferences, and has already checked its own work before you see the output. The rest of this guide shows you how to build that.