Document datasets with .pdf files that are usable with pixparse libraries and tools.
AI & ML interests
Document and User Interface Parsing, Understanding, Q&A.
Organization Card
Multi-modal document, image, and text datasets and models for document understanding, OCR, VQA tasks.
GitHub repos:
- Data Loading:
chug- https://github.com/huggingface/chug - Modelling:
pixparse- coming soon
models 0
None public yet
datasets 6
pixparse/pdfa-eng-wds
Viewer
• Updated
• 7.1k • 4.7k • 158
pixparse/idl-wds
Viewer
• Updated
• 3.41M • 4.31k • 193
pixparse/docvqa-wds
Updated
• 154 • 4
pixparse/docvqa-single-page-questions
Viewer
• Updated
• 50k • 573 • 10
pixparse/cc12m-wds
Viewer
• Updated
• 11M • 10.2k • 36
pixparse/cc3m-wds
Viewer
• Updated
• 2.93M • 11.9k • 45