Retro-star
Collection
Retro*: Optimizing LLMs for Reasoning-Intensive Document Retrieval
•
9 items
•
Updated
Retro*: Optimizing LLMs for Reasoning-Intensive Document Retrieval
For more details, please refer to our paper and Github repository.
import re
import sglang as sgl
PROMPT_TEMPLATE = """\
Here is the **relevance definition** in a retrieval task: {relevance_definition}
Now given a **query** ({query_type}) and a **document** ({doc_type}) in this retrieval task, your mission is to perform the following steps.
1. Query Analysis: Think to reason and describe what information would be most helpful in answering the query.
2. Document Analysis: Discuss how the information provided by the document fulfills or fails to fulfill the requirements implied by the query.
3. Relevance Annotation: Based on the relevance definition and the insights from the previous two steps, clearly justify your final relevance annotation result and annotate an integer score from a scale of 0 to 100. Please use the following guide:
- **80-100 (Highly Relevant):** The document directly and comprehensively addresses the query's intent. It is a core and authoritative answer.
- **60-80 (Relevant):** The document substantially addresses the query's intent, providing most of the key information, but might miss some minor details.
- **40-60 (Moderately Relevant):** The document is on-topic and addresses a part of the query's intent, but it is not a comprehensive answer.
- **20-40 (Slightly Relevant):** The document mentions keywords from the query, but its main topic is different. It offers very limited value.
- **0-20 (Irrelevant):** The document does not address the query's intent at all and is off-topic.
After providing your detailed analysis and justification for all the steps above, conclude your entire response with the final relevance score. The score must be placed strictly between the <score> tags. There should be no other text or explanation inside the tags:
<score>
[From a scale of 0 to 100, annotate the degree of relevance between the query and the document.]
</score>
Query ({query_type}):
[Begin of Query]
{query}
[End of Query]
Document ({doc_type}):
[Begin of Document]
{doc}
[End of Document]
"""
def main():
query = "In a party, how many guests do you need to have to ensure that either four people all know each other or four people are all complete strangers to one another?"
doc = "\\section{Infinite Ramsey's Theorem}\nTags: Ramsey Theory, Named Theorems\n\n\\begin{theorem}\nLet $k, n \\in \\N$.\nFor any set $S$, let $S^{\\paren n}$ denote the set $\\set {\\set {s_1, \\ldots, s_n}: \\text{each } s_i \\in S}$ of cardinality $n$ subsets of $S$.\nLet $X$ be an infinite set.\nThen:\n:for every partition $P$ of $X^{\\paren n}$ into $k$ many components\n:there is an infinite subset $Y \\subseteq X$\nsuch that:\n:each member of $Y^{\\paren n}$ is in the same component of $P$.\n\\end{theorem}\n\n\\begin{proof}\nWe will prove the theorem for fixed $k$ by induction on $n$.\n\\end{proof}\n\n"
query_type = "math problem"
doc_type = "math-related passage"
relevance_definition = "Given a query (math problem) and a document (math-related passage), the document is relevant to the query if the theorem described in the document can help solve the problem in the query."
prompts = [
PROMPT_TEMPLATE.format(
relevance_definition=relevance_definition,
query_type=query_type,
doc_type=doc_type,
query=query,
doc=doc
)
]
llm = sgl.Engine(
model_path="ljw13/retro-star-qwen2.5-32b-instruct-0923",
tp_size=8,
dp_size=1,
)
tokenizer = llm.tokenizer_manager.tokenizer
messages = [[{"role": "user", "content": prompt}] for prompt in prompts]
input_texts = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
enable_thinking=False
)
sampling_params = {
"n": 1,
"temperature": 0.6,
"max_new_tokens": 1024,
"skip_special_tokens": False,
"spaces_between_special_tokens": False,
}
outputs = llm.generate(
input_texts,
sampling_params=sampling_params,
)
llm.shutdown()
scores = []
for i, output in enumerate(outputs):
print(output["text"])
print("==" * 30)
try:
score = int(re.search(r"<score>\s*(\d+)\s*</score>", output["text"]).group(1))
except AttributeError:
score = 0
scores.append(score)
print("Scores:", scores)
if __name__ == "__main__":
main()
If you find this model useful, please consider giving a like and citation:
@article{lan2025retro,
title={Retro*: Optimizing LLMs for Reasoning-Intensive Document Retrieval},
author={Lan, Junwei and Chen, Jianlyu and Liu, Zheng and Li, Chaofan and Bao, Siqi and Lian, Defu},
journal={arXiv preprint arXiv:2509.24869},
year={2025}
}