InfoBay AI Ltd.

company

AI & ML interests

Accelerate the frontier of AI development with enterprise-grade, deeply curated datasets engineered to enhance pre-training, alignment, and real-world performance.

Recent Activity

RohitManglik updated a collection about 4 hours ago

Egocentric videos

RohitManglik updated a collection about 5 hours ago

Codebase Datasets

RohitManglik updated a collection about 5 hours ago

Codebase Datasets

View all activity

InfoBayAI 's collections 9

Codebase Datasets

Sample Datasets of Coding dataset for benchmarking and domain specific AI models

InfoBayAI/legacy_codebase

Viewer • Updated 3 days ago • 15.3k • 40
InfoBayAI/Code_Intelligence_Dataset

Viewer • Updated 3 days ago • 60 • 17
InfoBayAI/Product_codebase

Viewer • Updated 3 days ago • 337 • 40

Dual Channel Global Customer-Agent Interaction Datasets

Sample Datasets of dual-channel call center audio with separate agent and customer channels for ASR, diarization, and conversational AI training.

InfoBayAI/call_center_audio_dual_channel_en_in

Viewer • Updated 19 days ago • 11 • 29
InfoBayAI/call_center_audio_dual_channel_en_us

Viewer • Updated 19 days ago • 9 • 25 • 1
InfoBayAI/call_center_audio_dual_channel_en_uk

Viewer • Updated 19 days ago • 3 • 25
InfoBayAI/call_center_audio_dual_channel_hi

Viewer • Updated 19 days ago • 8 • 25

UGC, Egocentric and STEM Video Datasets

InfoBayAI/stem-videos

Viewer • Updated 19 days ago • 5 • 23 • 1
InfoBayAI/User_Generated_Content

Updated 19 days ago • 20 • 1

STEM & Non-STEM Q&A Datasets for LLM Training

Sample datasets from a 6.5M+ enterprise-grade Q&A corpus across STEM and Non-STEM domains, built for LLM training, instruction tuning, and evaluation.

InfoBayAI/Hindi_STEM_Question_Answering_MCQA_Dataset

Viewer • Updated 19 days ago • 200 • 27
InfoBayAI/English_STEM_Question_Answering_MCQA_Dataset

Viewer • Updated 19 days ago • 200 • 39
InfoBayAI/English-Non-STEM-Question-Answering-MCQA-Dataset

Viewer • Updated 19 days ago • 5 • 24
InfoBayAI/Arabic-STEM-Question-Answering-MCQA-Dataset

Viewer • Updated 19 days ago • 49 • 27

Egocentric videos

InfoBayAI/egocentric_video

Viewer • Updated about 4 hours ago • 10

Healthcare AI Datasets for Clinical & LLM Training

Sample dataset from an enterprise-grade medical corpus built for clinical AI, diagnosis support, and healthcare LLM training.

InfoBayAI/mri_clinical_reports_without_findings_medical_nlp

Viewer • Updated 7 days ago • 588 • 24
InfoBayAI/ct_scan_clinical_reports_without_findings_medical_nlp

Viewer • Updated 19 days ago • 2.6k • 22
InfoBayAI/ct_scan_clinical_reports_with_findings_medical_nlp

Viewer • Updated 19 days ago • 6.3k • 23
InfoBayAI/xray_clinical_reports_without_findings_medical_nlp

Preview • Updated 19 days ago • 19

Podcast Speech & Conversational Audio Datasets

Sample from a podcast audio dataset, designed for ASR, speech recognition, and conversational AI training using diverse, real-world spoken content.

InfoBayAI/English-Podcast-ASR-Dataset

Viewer • Updated 13 days ago • 5 • 28
InfoBayAI/Hindi-Podcast-ASR-Dataset

Viewer • Updated 19 days ago • 10 • 20 • 1
InfoBayAI/Arabic-Podcast-ASR-Dataset

Viewer • Updated 19 days ago • 10 • 29
InfoBayAI/Punjabi-Podcast-ASR-Dataset

Viewer • Updated 19 days ago • 10 • 19

Academic Textbook Corpora for LLM Training

Sample of a 2.2B+ word textbook corpus across 32K+ books, 5K+ subjects, and 14 languages for LLM training and multilingual knowledge modeling.

InfoBayAI/Hindi-STEM-Educational-Text-Corpus

Viewer • Updated 19 days ago • 1.14k • 22
InfoBayAI/English-STEM-Educational-Text-Corpus

Viewer • Updated 19 days ago • 1.27k • 27
InfoBayAI/English-Non-STEM-Educational-Text-Corpus

Viewer • Updated 19 days ago • 1.62k • 20
InfoBayAI/Arabic-STEM-Educational-Text-Corpus

Viewer • Updated 19 days ago • 1.35k • 28

Computer Vision & Multimodal Datasets

Sample dataset from multilingual image corpus covering medical, STEM, Non-STEM, automobile, and complex domains for computer vision and multimodal AI.

InfoBayAI/stem_educational_images

Viewer • Updated 19 days ago • 35 • 23
InfoBayAI/non_stem_images_en

Viewer • Updated 19 days ago • 25 • 17
InfoBayAI/medical_images_en

Viewer • Updated 19 days ago • 25 • 19
InfoBayAI/automobile_images

Viewer • Updated 19 days ago • 25 • 17

Codebase Datasets

Sample Datasets of Coding dataset for benchmarking and domain specific AI models

InfoBayAI/legacy_codebase

Viewer • Updated 3 days ago • 15.3k • 40
InfoBayAI/Code_Intelligence_Dataset

Viewer • Updated 3 days ago • 60 • 17
InfoBayAI/Product_codebase

Viewer • Updated 3 days ago • 337 • 40

Healthcare AI Datasets for Clinical & LLM Training

Sample dataset from an enterprise-grade medical corpus built for clinical AI, diagnosis support, and healthcare LLM training.

InfoBayAI/mri_clinical_reports_without_findings_medical_nlp

Viewer • Updated 7 days ago • 588 • 24
InfoBayAI/ct_scan_clinical_reports_without_findings_medical_nlp

Viewer • Updated 19 days ago • 2.6k • 22
InfoBayAI/ct_scan_clinical_reports_with_findings_medical_nlp

Viewer • Updated 19 days ago • 6.3k • 23
InfoBayAI/xray_clinical_reports_without_findings_medical_nlp

Preview • Updated 19 days ago • 19

Dual Channel Global Customer-Agent Interaction Datasets

Sample Datasets of dual-channel call center audio with separate agent and customer channels for ASR, diarization, and conversational AI training.

InfoBayAI/call_center_audio_dual_channel_en_in

Viewer • Updated 19 days ago • 11 • 29
InfoBayAI/call_center_audio_dual_channel_en_us

Viewer • Updated 19 days ago • 9 • 25 • 1
InfoBayAI/call_center_audio_dual_channel_en_uk

Viewer • Updated 19 days ago • 3 • 25
InfoBayAI/call_center_audio_dual_channel_hi

Viewer • Updated 19 days ago • 8 • 25

Podcast Speech & Conversational Audio Datasets

Sample from a podcast audio dataset, designed for ASR, speech recognition, and conversational AI training using diverse, real-world spoken content.

InfoBayAI/English-Podcast-ASR-Dataset

Viewer • Updated 13 days ago • 5 • 28
InfoBayAI/Hindi-Podcast-ASR-Dataset

Viewer • Updated 19 days ago • 10 • 20 • 1
InfoBayAI/Arabic-Podcast-ASR-Dataset

Viewer • Updated 19 days ago • 10 • 29
InfoBayAI/Punjabi-Podcast-ASR-Dataset

Viewer • Updated 19 days ago • 10 • 19

UGC, Egocentric and STEM Video Datasets

InfoBayAI/stem-videos

Viewer • Updated 19 days ago • 5 • 23 • 1
InfoBayAI/User_Generated_Content

Updated 19 days ago • 20 • 1

Academic Textbook Corpora for LLM Training

Sample of a 2.2B+ word textbook corpus across 32K+ books, 5K+ subjects, and 14 languages for LLM training and multilingual knowledge modeling.

InfoBayAI/Hindi-STEM-Educational-Text-Corpus

Viewer • Updated 19 days ago • 1.14k • 22
InfoBayAI/English-STEM-Educational-Text-Corpus

Viewer • Updated 19 days ago • 1.27k • 27
InfoBayAI/English-Non-STEM-Educational-Text-Corpus

Viewer • Updated 19 days ago • 1.62k • 20
InfoBayAI/Arabic-STEM-Educational-Text-Corpus

Viewer • Updated 19 days ago • 1.35k • 28

STEM & Non-STEM Q&A Datasets for LLM Training

Sample datasets from a 6.5M+ enterprise-grade Q&A corpus across STEM and Non-STEM domains, built for LLM training, instruction tuning, and evaluation.

InfoBayAI/Hindi_STEM_Question_Answering_MCQA_Dataset

Viewer • Updated 19 days ago • 200 • 27
InfoBayAI/English_STEM_Question_Answering_MCQA_Dataset

Viewer • Updated 19 days ago • 200 • 39
InfoBayAI/English-Non-STEM-Question-Answering-MCQA-Dataset

Viewer • Updated 19 days ago • 5 • 24
InfoBayAI/Arabic-STEM-Question-Answering-MCQA-Dataset

Viewer • Updated 19 days ago • 49 • 27

Computer Vision & Multimodal Datasets

Sample dataset from multilingual image corpus covering medical, STEM, Non-STEM, automobile, and complex domains for computer vision and multimodal AI.

InfoBayAI/stem_educational_images

Viewer • Updated 19 days ago • 35 • 23
InfoBayAI/non_stem_images_en

Viewer • Updated 19 days ago • 25 • 17
InfoBayAI/medical_images_en

Viewer • Updated 19 days ago • 25 • 19
InfoBayAI/automobile_images

Viewer • Updated 19 days ago • 25 • 17

Egocentric videos

InfoBayAI/egocentric_video

Viewer • Updated about 4 hours ago • 10

AI & ML interests

Recent Activity

Team members 1

InfoBayAI 's collections 9