This collection contains held-out splits for testing Flow-Judge-v0.1.
Flow AI
company
Verified
AI & ML interests
LLM system evaluation, Automatic LM improvements
Organization Card
Flow AI is the system for evaluating and improving your LLM application.
models 7
flowaicom/Flow-Judge-v0.1-W8A16
1B • Updated • 1
flowaicom/Flow-Judge-v0.1-W4A16
0.7B • Updated • 3 • 1
flowaicom/Flow-Judge-v0.1-FP8
4B • Updated • 4 • 1
flowaicom/Flow-Judge-v0.1-AWQ
Text Generation • 4B • Updated • 5.29k • 6
flowaicom/Flow-Judge-v0.1
Text Generation • 4B • Updated • 927 • 70
flowaicom/Flow-Judge-v0.1-Llamafile
Updated • 10 • 1
flowaicom/Flow-Judge-v0.1-GGUF
Text Generation • 4B • Updated • 73 • 10
datasets 9
flowaicom/legalbench_contracts_qa_subset
Viewer • Updated • 100 • 34
flowaicom/Flow-Judge-v0.1-3-likert-heldout
Viewer • Updated • 300 • 8
flowaicom/Flow-Judge-v0.1-5-likert-heldout
Viewer • Updated • 274 • 13
flowaicom/Flow-Judge-v0.1-binary-heldout
Viewer • Updated • 316 • 30
flowaicom/RAGTruth_test
Viewer • Updated • 2.7k • 23 • 1
flowaicom/covid_qa
Viewer • Updated • 1k • 6
flowaicom/PubMedQA
Viewer • Updated • 1k • 14 • 1
flowaicom/HaluEval
Viewer • Updated • 10k • 176 • 1
flowaicom/Feedback-Bench
Viewer • Updated • 1k • 37 • 1