Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
Pedro Ortiz Suarez's picture
21 1 12

Pedro Ortiz Suarez

pjox
commoncrawl
laurievb's profile picture kargaranamir's profile picture naturelizer's profile picture
·
https://portizs.eu/
  • pjox13
  • pjox
  • pjox
  • pjox.bsky.social

AI & ML interests

Language modeling, parsing, sequence tagging, NER, historical languages.

Recent Activity

published a dataset 3 days ago
commoncrawl/CommonLID
updated a dataset 3 days ago
commoncrawl/CommonLID
authored a paper 15 days ago
SciLaD: A Large-Scale, Transparent, Reproducible Dataset for Natural Scientific Language Processing
View all activity

Organizations

ALMAnaCH (Inria)'s profile picture BigScience Workshop's profile picture OSCAR's profile picture BigScience Catalogue Data's profile picture Scilons Project's profile picture BigScience Data's profile picture Web Data Commons's profile picture Speech and Language Technology, DFKI's profile picture Just some testing..'s profile picture Common Crawl Foundation's profile picture Occiglot's profile picture

authored a paper 15 days ago

SciLaD: A Large-Scale, Transparent, Reproducible Dataset for Natural Scientific Language Processing

Paper • 2512.11192 • Published Dec 12, 2025
authored a paper over 1 year ago

mOSCAR: A Large-scale Multilingual and Multimodal Document-level Corpus

Paper • 2406.08707 • Published Jun 13, 2024 • 17
authored a paper about 2 years ago

CamemBERT: a Tasty French Language Model

Paper • 1911.03894 • Published Nov 10, 2019 • 4
authored a paper almost 3 years ago

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

Paper • 2211.05100 • Published Nov 9, 2022 • 37
Company
TOS Privacy About Careers
Website
Models Datasets Spaces Pricing Docs