Evaluating GPT-4's Vision Capabilities on Brazilian University Admission Exams Paper • 2311.14169 • Published Nov 23, 2023
BLUEX: A benchmark based on Brazilian Leading Universities Entrance eXams Paper • 2307.05410 • Published Jul 11, 2023 • 2
SurveySum: A Dataset for Summarizing Multiple Scientific Articles into a Survey Section Paper • 2408.16444 • Published Aug 29, 2024 • 8
TiEBe: A Benchmark for Assessing the Current Knowledge of Large Language Models Paper • 2501.07482 • Published Jan 13
BRoverbs -- Measuring how much LLMs understand Portuguese proverbs Paper • 2509.08960 • Published Sep 10
Building High-Quality Datasets for Portuguese LLMs: From Common Crawl Snapshots to Industrial-Grade Corpora Paper • 2509.08824 • Published Sep 10
BLUEX Revisited: Enhancing Benchmark Coverage with Automatic Captioning Paper • 2508.21294 • Published Aug 29
Ticket-Bench: A Kickoff for Multilingual and Regionalized Agent Evaluation Paper • 2509.14477 • Published Sep 17
CCNeXt: An Effective Self-Supervised Stereo Depth Estimation Approach Paper • 2509.22627 • Published Sep 26 • 2
Sabiá-2: A New Generation of Portuguese Large Language Models Paper • 2403.09887 • Published Mar 14, 2024
InRanker Collection Distilled rankers optimized to out-of-domain distributions (zero-shot). • 9 items • Updated Mar 15 • 1