Papers
arxiv:2508.04612

A Reproducible, Scalable Pipeline for Synthesizing Autoregressive Model Literature

Published on Aug 6, 2025
Authors:
,

Abstract

An automated pipeline for retrieving, analyzing, and reproducing research papers on autoregressive generative models demonstrates high accuracy in relevance classification and hyper-parameter extraction while supporting scalable and reproducible experimental workflows.

AI-generated summary

The accelerating pace of research on autoregressive generative models has produced thousands of papers, making manual literature surveys and reproduction studies increasingly impractical. We present a fully open-source, reproducible pipeline that automatically retrieves candidate documents from public repositories, filters them for relevance, extracts metadata, hyper-parameters and reported results, clusters topics, produces retrieval-augmented summaries and generates containerised scripts for re-running selected experiments. Quantitative evaluation on 50 manually-annotated papers shows F1 scores above 0.85 for relevance classification, hyper-parameter extraction and citation identification. Experiments on corpora of up to 1000 papers demonstrate near-linear scalability with eight CPU workers. Three case studies -- AWD-LSTM on WikiText-2, Transformer-XL on WikiText-103 and an autoregressive music model on the Lakh MIDI dataset -- confirm that the extracted settings support faithful reproduction, achieving test perplexities within 1--3% of the original reports.

Community

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2508.04612
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2508.04612 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2508.04612 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2508.04612 in a Space README.md to link it from this page.

Collections including this paper 1