Some Notes of LLM Agents
LLM Powered Autonomous Agents
Planning
-
Task Decomposition
-
by LLM with simple prompting
What are the subgoals for achieving XYZ?
-
by using task-specific instructions
Write a story outline.
- with human inputs
-
Related Works
- Chain of Thought
- Tree of Thoughts
- extends CoT by exploring multiple reasoning possibilities at each step
- first decomposes the problem into multiple thought steps
- generates multiple thoughts per step, creating a tree structure
- extends CoT by exploring multiple reasoning possibilities at each step
- LLM+P
- an external classical planner to do long-horizon planning
- translates the problem into “Problem PDDL”
- requests a classical planner to generate a PDDL plan based on an existing “Domain PDDL”
- translates the PDDL plan back into natural language
- an external classical planner to do long-horizon planning
-
-
Self-Reflection
- Related Works
- ReAct
- prompting LLM to generate reasoning traces in natural language
Thought: ... Action: ... Observation: ... ... (Repeated many times)
- Reflexion
- a standard RL setup
- reward model provides a simple binary reward
- the action space follows the setup in ReAct
- After each action, the agent computes a heuristic
- The heuristic function determines when the trajectory is inefficient
- trajectories that take too long without success
- or contains hallucination and should be stopped
- encountering a sequence of consecutive identical actions that lead to the same observation in the environment.
- The heuristic function determines when the trajectory is inefficient
- a standard RL setup
- Chain of Hindsight
- explicitly presenting it with a sequence of past outputs, each annotated with feedback
- Algorithm Distillation
- an algorithm is encapsulated in a long history-conditioned policy
- learn the process of RL instead of training a task-specific policy itself
- hypothesizes that any algorithm that generates a set of learning histories can be distilled into a neural network by performing behavioral cloning over actions.
- an algorithm is encapsulated in a long history-conditioned policy
- ReAct
- Related Works
Memory
- Types of Memory
- Sensory Memory
- lasts for up to a few seconds
- iconic memory (visual), echoic memory (auditory), and haptic memory (touch)
- learning embedding representations for raw inputs
- Short-Term Memory/Working Memory
- 7 items and lasts for 20-30 seconds.
- in-context learning
- Long-Term Memory
- Explicit / declarative memory
- Implicit / procedural memory
- the external vector
- Sensory Memory
- Maximum Inner Product Search (MIPS)
- A standard practice
- save the embedding representation of information into a vector store database
- To optimize the retrieval speed
- the approximate nearest neighbors (ANN) algorithm to return approximately top k nearest neighbors
- MIPS algorithms and performance comparison in ann-benchmarks.com.
- the approximate nearest neighbors (ANN) algorithm to return approximately top k nearest neighbors
- A standard practice
Tool Use
- Related Works
- MRKL
- a collection of “expert” modules and the general-purpose LLM works as a router to route inquiries to the best suitable expert module.
- knowing when to and how to use the tools are crucial
- TALM & Toolformer
- fine-tune a LM to learn to use external tool APIs
- HuggingGPT
- task planner to select models available in HuggingFace platform
- according to the model descriptions and summarize the response based on the execution results.
- task planner to select models available in HuggingFace platform
- MRKL
Challenges
- Finite context length
- Although vector stores and retrieval can provide access to a larger knowledge pool, their representation power is not as powerful as full attention.
- Challenges in long-term planning and task decomposition
- LLMs struggle to adjust plans when faced with unexpected errors
- Reliability of natural language interface
- LLMs may make formatting errors and occasionally exhibit rebellious behavior (e.g. refuse to follow an instruction).
References
Weng, Lilian. (Jun 2023). LLM-powered Autonomous Agents”. Lil’Log. https://lilianweng.github.io/posts/2023-06-23-agent/.