Many people say that reasoning=coding, and O3 is the model that can write code the best.
My view is that reasoning refers to the ability to throw a simple, clean question and provide a genius answer.
Let's put it this way: if O3 were thrown into the 20th century, it would definitely be the most outstanding theoretical CS scientist in the world, easily mastering 3-SAT, max flow, min cut, red-black trees, LU decomposition, KMP, and various proof-based encryption algorithms, constructing the entire TCS edifice in one go.
Solving TCS problems is essentially solving abstract mathematical, computational, and topological problems, which can be considered a similar ability to "solving mathematical problems."
(However, solving CS problems does not equal solving mathematical problems; CS is not equal to mathematics, and CS has no direct relationship with pure math.)
However, the ability to write code in real daily work is completely different from researching theoretical computer science problems; they are two entirely different abilities, modes, and ways of thinking.
Real coding ability in practice is not only about setting up systems but also requires strong pressure tolerance and memory, as well as continuous hands-on configuration, testing, debugging, and completing various profiling tasks.
You need to not only read code in a leapfrogging manner and interact with machines but also interact with colleagues, engage with a large amount of documentation, interact with different configuration environments, and deal with various dependency documents, then clarify these complex relationships one by one, remember them, and gradually explore the modules and functionalities.
This is completely different from designing a simple, clean, genius, outstanding TCS algorithm.
Moreover, you should never think that an architect is solving high-level, abstract, clean, perfect mathematical problems.
A truly qualified architect is precisely the one who gets their hands the dirtiest, touches the most technical details, does the most debugging, and profiling—then continuously summarizes and reflects from these repetitive and tedious tasks, constantly trying with dirty hands to make the right architectural and design choices.
Those who say "only a true architect needs O3-level intelligence" are completely out of their depth.
Currently, all projects working on coding agents encounter a direct deadlock: the context window is too small; while a few files can be fed in, the entire code cannot be fed in.
Many people are focusing on solving memory issues for agents, but using memory does not solve any problems related to coding.
The current level of coding agents on the market is roughly as follows:
If you show them a well-defined, clean, simple, high-difficulty problem, they can provide you with a very elegant solution through step-by-step reasoning;
However, if you give them a giant project with 200,000 lines of code, they simply cannot get started.
Then the authors of coding agents will use various RAG methods to feed the model a bunch of fragments, trying to directly imagine the answer using a few-shot approach—resulting in inevitable errors (like cursor, windsurf).
On the other hand, some coding agent authors attempt to guide GPT-4o step by step to complete design-driven or test-driven development processes, using a lot of resources to ensure that each step provides sufficient information to GPT-4o, waiting for it to take the next action, including adding files, modifying files, or executing, compiling, and testing in the terminal (like Devin).
An even more troublesome issue is that in reality, the vast majority of people also have to deal with AWS, databases, various private keys and permissions, and various containers—essentially interacting with different environments and people.
This kind of work outside of coding either has to be handed over to a human proxy to guide humans to intervene at the right moment (which is very complex and requires real-time monitoring), or you start giving it all passwords, accounts, and permissions, letting it decide when to operate (which is very dangerous).
In summary, one point I repeatedly emphasize:
LLM and current agent technology can replace many TCS (theoretical computer science) PhDs,
But they cannot replace programmers whose work is slightly more complex, including PhDs who design complex systems (including MLSys).
So this is also why I have believed in Moonshot from the very beginning; in fact, the context window has a limit in areas such as coding or legal work.
If you believe in scaling laws, you should not only believe in multi-agent and parallel task scheduling, but also believe that the context window issue will gradually be resolved.
If the context window is not resolved, or if you firmly believe that fine-tuning is more important than the context window—then many problems will be completely stuck, becoming the real bottleneck of this wave of AI, LLM, and vertical AI agents.
This article is synchronized and updated to xLog by Mix Space
The original link is https://blog.kanes.top/posts/ArtificialIntelligence/reasoningcoding