在本文中,我们介绍ReAct,这是一种将推理和行动与语言模型相结合以解决各种语言推理和决策制定任务的通用范式。采取动态的方式构建prompt,即根据将用户请求以及之前的观察向大模型请求,依据模型的推理思决策下一步的动作行为,在执行完指定动作后,将结果作为prompt的一部分,继续请求语言模型,直到任务完成。改方法在QA数据集HotpotQA、事实验证数据集Fever上,结合维基百科API,能够提高可解释性。另外,在交互决策数据集ALFWorld、WebShop数据集上效果比模仿学习和强化学成功率分别提升34%和10%。

ReAct是Reasoning and Acting(也有一说是Reason Act)缩写,意思是LLM可以根据逻辑推理(Reason),构建完整系列行动(Act),从而达成期望目标。LLM灵感来源是人类行为和推理之间的协同关系。人类根据这种协同关系学习新知识,做出决策,然后执行。LLM模型在逻辑推理上有着非常优秀的表现,因此有理由相信LLM模型也可以像人类一样进行逻辑推理,学习知识,做出决策,并执行。在实际使用中,LLM会发生幻觉和错误判断的情况。这是因为LLM在训练的时候接触到的知识有限。因此对超出训练过程中使用的数据进行逻辑分析时,LLM就会开始不懂装懂地编造一些理由。因此对于解决这个问题最好的办法是,可以保证LLM模型在做出分析决策时,必须将应该有的知识提供给LLM。

ReAct方式的作用就是协调LLM模型和外部的信息获取,与其他功能交互。如果说LLM模型是大脑,那ReAct框架就是这个大脑的手脚和五官。同时具备帮助LLM模型获取信息、输出内容与执行决策的能力。对于一个指定的任务目标,ReAct框架会自动补齐LLM应该具备的知识和相关信息,然后再让LLM模型做出决策,并执行LLM的决策。

ReAct如何工作

Chain-of-thought (CoT) prompting显示LLM在常识推理和算术等问题上进行推理跟踪以生成最终答案的能力。但CoT的最大问题在于因为缺少与外部世界的连接导致无法及时更新知识,这导致了例如事实幻觉和错误传播等问题。

ReAct是一种将推理和行动与LLM相结合的通用范式。通过Few-shot Prompt,引导LLM生成推理轨迹和特定任务行动。这使得系统能够在整个pipeline过程中进行动态推理,不断创造、维护、调整行动计划,同时允许通过工具调用和外部环境(例如维基百科)进行交互以此获得外部信息,并将外部信息融合到整个动态推理过程中。

ReAct模式主要是模型通过一个Thought->Action->Observation的循环过程,展现出了人类一样边思考、边行动逐步解决问题的过程,其中:

  1. 思考(Thought):推理过程的文字展示,我要干什么,或者说我想要 LLM 帮我做什么,以及为了达成这件事情所需要的前置条件是什么;
  2. 行动(Act):生成与外部交互的指令,确定这一步要做什么之后所生成的对应行为指令文字,比如遇到了 LLM 没有预设的知识,要进行搜索;
  3. 观察(Obs):从外部获取执行指令得到的结果,相当于拿到当前这一步的行为的结果,

下面的图中给出了一个上下文示例。这些例子引导代理经历一个循环过程:产生一个想法,采取一个行动,然后观察行动的结果。通过结合推理跟踪和操作,ReAct允许模型执行动态推理,这样可以生成高级计划,还可以与外部环境交互以收集额外的信息。

从上图可以看到:

  • 在标准的zero-shot的情况下(1a),模型给出了错误答案
  • 在CoT(Chain of Thought)的情况下(1b),模型给出了推理过程,但答案依然是错误的,这也反映了只通过CoT推理找答案,不通过行动从外部获取信息的弊端,AI在这种情况下能做得最好推理就是其在预训练时学习的知识,当其不具备相关知识的时候,它就会开始瞎编(下文有具体数据)
  • 在只采取行动不推理找原因的情况下(1c),模型只是进行了一些徒劳的搜索,然后给出了一个莫名其妙的答案"yes"
  • 而在既分析原因也通过行动获取外部信息的情况下(1d),可以看到模型的表现非常像人类处理未知问题时的表现,先通过推理确定自己需要采取什么行动,采取行动获取信息后,又进一步推理下一步的动作,在这种工作方式下,模型给出了正确的结果

一个ReAct流程里,关键是三个概念:

Thought:由LLM模型生成,是LLM产生行为和依据。可以根据LLM的思考,来衡量他要采取的行为是否合理。这是一个可用来判断本次决策是否合理的关键依据。相较于人类,thought的存在可以让LLM的决策变得更加有可解释性和可信度。

Act:Act是指LLM判断本次需要执行的具体行为。Act一般由两部分组成:行为和对象。用编程的说法就是API名称和对应的入参。LLM模型最大的优势是,可以根据Thought的判断,选择需要使用的API并生成需要填入API的参数。从而保证了ReAct框架在执行层面的可行性。

Obs:LLM框架对于外界输入的获取。它就像LLM的五官,将外界的反馈信息同步给LLM模型,协助LLM模型进一步的做分析或者决策。

一个完整的ReAct的行为,包涵以下几个流程:

1.输入目标:任务的起点。可以是用户的手动输入,也可以是依靠触发器(比如系统故障报警)。

2.LOOP:LLM模型开始分析问题需要的步骤(Thought),按步骤执行Act,根据观察到的信息(Obs),循环执行这个过程。直到判断任务目标达成。

3.Finish:任务最终执行成功,返回最终结果。

例如一个ReAct过程的Prompt的例子:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
You are an assistant, please fully understand the user's question, choose the appropriate tool, and help the user solve the problem step by step.

### CONSTRAINTS ####
1. The tool selected must be one of the tools in the tool list.
2. When unable to find the input for the tool, please adjust immediately and use the AskHumanHelpTool to ask the user for additional parameters.
3. When you believe that you have the final answer and can respond to the user, please use the TaskCompleteTool.
5. You must response in Chinese;

### Tool List ###

[
Search: 如果需要搜索请用它.paramDescription : [{"name": "searchKey", "description": "搜索参数","type":"String"}]
AskHumanHelpTool: 如果需要人类帮助,请使用它。paramDescription : [{"name": "question", "description": "问题","type":"String"}]
TaskCompleteTool:如果你认为你已经有了最终答案,请使用它。paramDescription : [{"name": "answer", "description": "答案","type":"String"}]
]

You should only respond in JSON format as described below

### RESPONSE FORMAT ###
{
{"thought": "为什么选择这个工具的思考","tool_names": "工具名","args_list": {“工具名1”:{"参数名1": "参数值1","参数名2": "参数值2"}}}}
Make sure that the response content you return is all in JSON format and does not contain any extra content.

实验效果

具体而言,在问答(HotpotQA)和事实验证(Fever)任务上,ReAct通过与简单的维基百科API进行交互,克服了思维链推理中普遍存在的妄想和错误传播问题,并生成更具解释性的类人任务解决轨迹,这比没有推理追踪的基线方法更易理解。在两个交互式决策制定基准(ALFWorld和WebShop)上,ReAct分别比模仿学习和强化学习方法的成功率绝对提高了34%和10%。

LLM模型最大的问题是产出内容不稳定。这种不稳定不仅仅是内容存在波动,也体现在他对复杂问题的分析,解决上存在一定的波动。

上图论文中,采用PaLM-540B模型进行的测试。图1中可以看到采用ReAct模式时,LLM模型在知识密集型推理任务如问答 (HotPotQA)上表现存在不足,而在事实验证(Fever)上有着更好的表现。这个主要原因是采用ReAct方式会约束LLM模型制定推理方面的灵活性。LLM擅长逻辑推理,但过多非信息性的搜索有可能会阻碍模型推理,导致难以恢复推理流程。

图2中的数据也可以看到,即使采用ReAct模型,也会有幻觉、不知道改搜索什么内容,错误归因等现象发生。

LLM的表现来看,更像一个人类。泛用性很强,可以通过自己思考去解决很多问题,但也会因为自身知识,能力上的缺陷无法做到稳定输出。但LLM跟人比,会显得更加盲目自信,对于不了解不理解的问题也会编造一些内容(幻觉)。

论文中采用的还是PaLM-540B模型,整体表现是明显差于GPT4.0和更新的模型。我们有理由相信未来的模型可以把幻觉,不稳定输出等问题发生概率降低到最低。但现阶段,如果应用LLM模型的时候必须要考虑这个问题。

GPT Plugin是采用的ReAct方式,极大的增强了ChatGPT的能力。但它注定是不够灵活,受限于ChatGPT的使用场景。当我的应用需要独立于ChatGPT以外的使用场景时,就很难使用。这里可以认为它是对LLM模型能力的补充,带并不会取代ReAct框架在工程中的使用。但Plugin的优势是集成在LLM模型内部,可以大幅降低ReAct交互次数。从而带来成本降低,响应时间提升的优势。

LangChain中如何在代码中实现ReAct

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
# 更新或安装必要的库
# pip install --upgrade openai
# pip install --upgrade langchain
# pip install google-search-results

import os
from langchain.llms import OpenAI
from langchain.agents import load_tools
from langchain.agents import initialize_agent

os.environ["OPENAI_API_KEY"] = "{你的TOKEN}"
os.environ["OPENAI_API_BASE"] = "{你的代理地址}"
# https://serpapi.com/manage-api-key
os.environ["SERPAPI_API_KEY"] = "{serpapi.com注册可以获得key}"
llm = OpenAI(model_name="text-davinci-003", temperature=0)
tools = load_tools(["serpapi", "llm-math"], llm=llm)
agent = initialize_agent(
tools, llm, agent="zero-shot-react-description", verbose=True)
agent.run("珠穆拉玛峰旁边的西北方向的山峰叫什么名字?有谁成功登顶这座山峰?并计算年龄的3次方。")

可以看到代码很简单,具体的Prompt已经由LangChang封装好了,这里需要回答的问题是珠穆拉玛峰旁边的西北方向的山峰叫什么名字?有谁成功登顶这座山峰?为了回答这个问题,ChatGPT需要和外部世界交互,所以程序里提供了一个工具"serpapi",这是一个Google搜索的API包装,而为了进行年龄的3次方这种复杂数学计算,程序还提供了一个工具"llm-math",这些工具LangChain都已经事先实现好了,我们直接调用即可。

其输出的过程和结果

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
> Entering new AgentExecutor chain...
I need to find out who successfully climbed the mountain and then calculate their age cubed
Action: Search
Action Input: "who successfully climbed the mountain near Mount Everest northwest"
Observation: Edmund Hillary (left) and Sherpa Tenzing Norgay reached the 29,035-foot summit of Everest on May 29, 1953, becoming the first people to stand atop the world's highest mountain.
Thought: I need to find out Edmund Hillary's age
Action: Search
Action Input: "Edmund Hillary age"
Observation: 88 years
Thought: I need to calculate 88 cubed
Action: Calculator
Action Input: 88^3
Observation: Answer: 681472

Thought: I now know the final answer
Final Answer: 意蒙德·希拉里(Edmund Hillary)成功登顶过山峰珠穆拉玛峰旁边的西北方向的山峰,他的年龄的3次方是681472

> Finished chain

从上述可以看出,利用ReAct和其他模型结合,可能产生更高的智能。

如果在实际的业务场景里,你想对上述工作过程进行更精确的控制,那么你就需要在Prompt上做工作了,下图是来自LangChain源码中的Prompt,你可以以其为参考定制自己的Prompt.

在LangChain的源码中,ReAct实现比较简单,仍然是基于Prompt模板实现,我们来看看 langchain/agents/react 下的python代码:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
EXAMPLES = [
"""Question: What is the elevation range for the area that the eastern sector of the Colorado orogeny extends into?
Thought: I need to search Colorado orogeny, find the area that the eastern sector of the Colorado orogeny extends into, then find the elevation range of the area.
Action: Search[Colorado orogeny]
Observation: The Colorado orogeny was an episode of mountain building (an orogeny) in Colorado and surrounding areas.
Thought: It does not mention the eastern sector. So I need to look up eastern sector.
Action: Lookup[eastern sector]
Observation: (Result 1 / 1) The eastern sector extends into the High Plains and is called the Central Plains orogeny.
Thought: The eastern sector of Colorado orogeny extends into the High Plains. So I need to search High Plains and find its elevation range.
Action: Search[High Plains]
Observation: High Plains refers to one of two distinct land regions
Thought: I need to instead search High Plains (United States).
Action: Search[High Plains (United States)]
Observation: The High Plains are a subregion of the Great Plains. From east to west, the High Plains rise in elevation from around 1,800 to 7,000 ft (550 to 2,130 m).[3]
Thought: High Plains rise in elevation from around 1,800 to 7,000 ft, so the answer is 1,800 to 7,000 ft.
Action: Finish[1,800 to 7,000 ft]""",
"""Question: Musician and satirist Allie Goertz wrote a song about the "The Simpsons" character Milhouse, who Matt Groening named after who?
Thought: The question simplifies to "The Simpsons" character Milhouse is named after who. I only need to search Milhouse and find who it is named after.
Action: Search[Milhouse]
Observation: Milhouse Mussolini Van Houten is a recurring character in the Fox animated television series The Simpsons voiced by Pamela Hayden and created by Matt Groening.
Thought: The paragraph does not tell who Milhouse is named after, maybe I can look up "named after".
Action: Lookup[named after]
Observation: (Result 1 / 1) Milhouse was named after U.S. president Richard Nixon, whose middle name was Milhous.
Thought: Milhouse was named after U.S. president Richard Nixon, so the answer is Richard Nixon.
Action: Finish[Richard Nixon]""",
"""Question: Which documentary is about Finnish rock groups, Adam Clayton Powell or The Saimaa Gesture?
Thought: I need to search Adam Clayton Powell and The Saimaa Gesture, and find which documentary is about Finnish rock groups.
Action: Search[Adam Clayton Powell]
Observation: Could not find [Adam Clayton Powell]. Similar: [’Adam Clayton Powell III’, ’Seventh Avenue (Manhattan)’, ’Adam Clayton Powell Jr. State Office Building’, ’Isabel Washington Powell’, ’Adam Powell’, ’Adam Clayton Powell (film)’, ’Giancarlo Esposito’].
Thought: To find the documentary, I can search Adam Clayton Powell (film).
Action: Search[Adam Clayton Powell (film)]
Observation: Adam Clayton Powell is a 1989 American documentary film directed by Richard Kilberg. The film is about the rise and fall of influential African-American politician Adam Clayton Powell Jr.[3][4] It was later aired as part of the PBS series The American Experience.
Thought: Adam Clayton Powell (film) is a documentary about an African-American politician, not Finnish rock groups. So the documentary about Finnish rock groups must instead be The Saimaa Gesture.
Action: Finish[The Saimaa Gesture]""",
"""Question: What profession does Nicholas Ray and Elia Kazan have in common?
Thought: I need to search Nicholas Ray and Elia Kazan, find their professions, then find the profession they have in common.
Action: Search[Nicholas Ray]
Observation: Nicholas Ray (born Raymond Nicholas Kienzle Jr., August 7, 1911 - June 16, 1979) was an American film director, screenwriter, and actor best known for the 1955 film Rebel Without a Cause.
Thought: Professions of Nicholas Ray are director, screenwriter, and actor. I need to search Elia Kazan next and find his professions.
Action: Search[Elia Kazan]
Observation: Elia Kazan was an American film and theatre director, producer, screenwriter and actor.
Thought: Professions of Elia Kazan are director, producer, screenwriter, and actor. So profession Nicholas Ray and Elia Kazan have in common is director, screenwriter, and actor.
Action: Finish[director, screenwriter, actor]""",
"""Question: Which magazine was started first Arthur’s Magazine or First for Women?
Thought: I need to search Arthur’s Magazine and First for Women, and find which was started first.
Action: Search[Arthur’s Magazine]
Observation: Arthur’s Magazine (1844-1846) was an American literary periodical published in Philadelphia in the 19th century.
Thought: Arthur’s Magazine was started in 1844. I need to search First for Women next.
Action: Search[First for Women]
Observation: First for Women is a woman’s magazine published by Bauer Media Group in the USA.[1] The magazine was started in 1989.
Thought: First for Women was started in 1989. 1844 (Arthur’s Magazine) < 1989 (First for Women), so Arthur’s Magazine was started first.
Action: Finish[Arthur’s Magazine]""",
"""Question: Were Pavel Urysohn and Leonid Levin known for the same type of work?
Thought: I need to search Pavel Urysohn and Leonid Levin, find their types of work, then find if they are the same.
Action: Search[Pavel Urysohn]
Observation: Pavel Samuilovich Urysohn (February 3, 1898 - August 17, 1924) was a Soviet mathematician who is best known for his contributions in dimension theory.
Thought: Pavel Urysohn is a mathematician. I need to search Leonid Levin next and find its type of work.
Action: Search[Leonid Levin]
Observation: Leonid Anatolievich Levin is a Soviet-American mathematician and computer scientist.
Thought: Leonid Levin is a mathematician and computer scientist. So Pavel Urysohn and Leonid Levin have the same type of work.
Action: Finish[yes]""",
]
SUFFIX = """\nQuestion: {input}
{agent_scratchpad}"""

WIKI_PROMPT = PromptTemplate.from_examples(
EXAMPLES, SUFFIX, ["input", "agent_scratchpad"]
)

实现流程如下:
(1)用Prompt喂一个样例,比如上面的模板:Thought: xxx Action: Search[xxx]
(2)LLMs会根据模板相同的逻辑,结合 CoT 思维链方式一步一步思考,并获取外部知识;
(3)最后 Action: Finish 获取最终结果后结束;