Deep learning has changed a lot in 2 years. I’m writing this so that I can fully realize just how starkly different the agentic large language model paradigm is when compared with the old deep learning paradigm. I’d like to officially recognize several new subfields of machine learning research. So much old thinking needs to be updated, and we need to make these updates quickly.

I’m tempted to call this the Agency paradigm - it does feel clear, in comparison with the past paradigm, that this one will give rise to general learning systems that automate 90+% of the knowledge work economy. It is possible, though not inevitable, that this will happen very quickly - on the order of the next 2 years. It may take as long as 6-8 years to get to a reliability level that allows for full automation rather than mere augmentation of every knowledge work task.

The Agency Paradigm is replacing an old paradigm, which I represent with the Machine Intelligence Research Frontier.

Similar concepts: Foundation Models.

The new paradigm:

Chains
- Planning
- Agents
- Self-Correction: Automatic Evaluation, Self-Awareness.
- Tool Use
- Composition: APIs & API Chaining
Pre-Training [Base Models]
- Transformers & Attention Mechanisms
- Diffusion Models
- Multi-Modality
Post-pre-training
- RLHF [Reinforcement Learning from Human Feedback]
- Instruct Fine-Tuning
Information Access
- Context Length + Memory Implementations
- Search: Data & Dataset Access & Choice, RAG [Retrieval Augmented Generation]
Meta Software
- Neural Program Synthesis
- Prompting: Prompt Generation & Automated Prompting, Meta-Prompt

Chaining - taking sequences of actions in the face of a single request from the user of an ML system - is newly possible because the reliability of single action is now high enough that the multiplicative interactions do not necessarily do undue damage to the final output. HIgh task reliability means that planning - both short and long term - becomes a task with merit. Creating plans that are achievable, reliable, self-correctable, and that fulfill the goal for which the plan is made is now a central problem in building AGI. Without it, the intelligent system will not have a clear view of the sequence of actions that are necessary for it to fulfill its mission.

Agentic behavior exhibiting creativity, decisive action, identity, and decision making is possible in the face of foundation model improvements. Models are finally making the jump from tools to living beings, in the sense of their ability to replicate, have a survival drive, and maintain their goals across multiple experiences.

Self-awareness and environmental awareness allows for a transformation to the way that models act and behave. Self-correction, or evaluation of whether or not its performance fulfills the goal or requirements given, means that models can ‘think longer’, ‘work longer’, and compile actions or insights that fulfill much more complex tasks than they were previously given.

Learning to use tools is an essential part of completing important tasks in every human knowledge worker’s life. Communication is paramount in coordinating large-scale activity. Email, slack, text messaging, and other communication tools will clearly be essential to LLM agents looking to gather information from or send commands to other people. Getting the information required to make decisions, take actions that influence the environment, or answer questions often requires using search tools like web search systems or searching over private databases. Putting multiple machine learning techniques together - for example, speech to text for taking a command from a human user combined with text to speech to responding, allows for complex & integrated agents to achieve complex tasks with multiple types of output (messaging, code writing & execution, payments).

The composition of these techniques - of arbitrary ML APIs with tools with self-modifications with planning - will continually unlock unknown capabilities.

Agency Literature Review

Chains
- Planning
- ReAct
- Prompting: Prompt Generation & Automated Prompting, Meta-Prompt
- PromptChainer
Agents
- ReAct
Self-Correction: Automatic Evaluation, Self-Awareness, Tool Use
- Toolformer
- Essential Tools List
Composition: APIs & API Chaining
Pre-Training [Base Models]
- Transformers & Attention Mechanisms
- BERT
- Diffusion Models
Multi-Modality
Post-pre-training
- RLHF: Deep RL from Human Preferences
- Instruct Fine-Tuning: Training Language Models to Follow Instructions with Human Feedback
Information Access
- Context Length
  - FlashAttention
  - Sparse Transformer
- Memory Implementations
- Search: Data & Dataset Access & Choice, RAG [Retrieval Augmented Generation]
- Retrieval-Augmented Generation
Meta Software
- Neural Program Synthesis