You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Introduction

Retro*: Optimizing LLMs for Reasoning-Intensive Document Retrieval

For more details, please refer to our paper and Github repository.

Usage

Using SGLang

import re
import sglang as sgl


PROMPT_TEMPLATE = """\
Here is the **relevance definition** in a retrieval task: {relevance_definition}

Now given a **query** ({query_type}) and a **document** ({doc_type}) in this retrieval task, your mission is to perform the following steps.

1. Query Analysis: Think to reason and describe what information would be most helpful in answering the query.
2. Document Analysis: Discuss how the information provided by the document fulfills or fails to fulfill the requirements implied by the query.
3. Relevance Annotation: Based on the relevance definition and the insights from the previous two steps, clearly justify your final relevance annotation result and annotate an integer score from a scale of 0 to 100. Please use the following guide:
    - **80-100 (Highly Relevant):** The document directly and comprehensively addresses the query's intent. It is a core and authoritative answer.
    - **60-80 (Relevant):** The document substantially addresses the query's intent, providing most of the key information, but might miss some minor details.
    - **40-60 (Moderately Relevant):** The document is on-topic and addresses a part of the query's intent, but it is not a comprehensive answer.
    - **20-40 (Slightly Relevant):** The document mentions keywords from the query, but its main topic is different. It offers very limited value.
    - **0-20 (Irrelevant):** The document does not address the query's intent at all and is off-topic.

After providing your detailed analysis and justification for all the steps above, conclude your entire response with the final relevance score. The score must be placed strictly between the <score> tags. There should be no other text or explanation inside the tags:
<score>
[From a scale of 0 to 100, annotate the degree of relevance between the query and the document.]
</score>

Query ({query_type}):
[Begin of Query]
{query}
[End of Query]

Document ({doc_type}):
[Begin of Document]
{doc}
[End of Document]
"""


def main():
    query = "In a party, how many guests do you need to have to ensure that either four people all know each other or four people are all complete strangers to one another?"
    doc = "\\section{Infinite Ramsey's Theorem}\nTags: Ramsey Theory, Named Theorems\n\n\\begin{theorem}\nLet $k, n \\in \\N$.\nFor any set $S$, let $S^{\\paren n}$ denote the set $\\set {\\set {s_1, \\ldots, s_n}: \\text{each } s_i \\in S}$ of cardinality $n$ subsets of $S$.\nLet $X$ be an infinite set.\nThen:\n:for every partition $P$ of $X^{\\paren n}$ into $k$ many components\n:there is an infinite subset $Y \\subseteq X$\nsuch that:\n:each member of $Y^{\\paren n}$ is in the same component of $P$.\n\\end{theorem}\n\n\\begin{proof}\nWe will prove the theorem for fixed $k$ by induction on $n$.\n\\end{proof}\n\n"
    query_type = "math problem"
    doc_type = "math-related passage"
    relevance_definition = "Given a query (math problem) and a document (math-related passage), the document is relevant to the query if the theorem described in the document can help solve the problem in the query."

    prompts = [
        PROMPT_TEMPLATE.format(
            relevance_definition=relevance_definition,
            query_type=query_type,
            doc_type=doc_type,
            query=query,
            doc=doc
        )
    ]
    
    llm = sgl.Engine(
        model_path="ljw13/retro-star-qwen2.5-32b-instruct-0923",
        tp_size=8,
        dp_size=1,
    )

    tokenizer = llm.tokenizer_manager.tokenizer
    messages = [[{"role": "user", "content": prompt}] for prompt in prompts]
    input_texts = tokenizer.apply_chat_template(
        messages,
        tokenize=False,
        add_generation_prompt=True,
        enable_thinking=False
    )

    sampling_params = {
        "n": 1,
        "temperature": 0.6,
        "max_new_tokens": 1024,
        "skip_special_tokens": False,
        "spaces_between_special_tokens": False,
    }

    outputs = llm.generate(
        input_texts,
        sampling_params=sampling_params,
    )

    llm.shutdown()

    scores = []
    for i, output in enumerate(outputs):
        print(output["text"])
        print("==" * 30)
        try:
            score = int(re.search(r"<score>\s*(\d+)\s*</score>", output["text"]).group(1))
        except AttributeError:
            score = 0
        scores.append(score)

    print("Scores:", scores)

if __name__ == "__main__":
    main()

Citation

If you find this model useful, please consider giving a like and citation:

@article{lan2025retro,
  title={Retro*: Optimizing LLMs for Reasoning-Intensive Document Retrieval},
  author={Lan, Junwei and Chen, Jianlyu and Liu, Zheng and Li, Chaofan and Bao, Siqi and Lian, Defu},
  journal={arXiv preprint arXiv:2509.24869},
  year={2025}
}