BayesTensor
/

out

Generated from Trainer

4-bit precision

Model card Files Files and versions

out / lm-evaluation-harness /lm_eval /tasks /pubmedqa /README.md

BayesTensor's picture

Upload folder using huggingface_hub

9d5b280 verified 8 months ago

|

history blame contribute delete

2.18 kB

	# PubMedQA

	### Paper

	Title: `PubMedQA: A Dataset for Biomedical Research Question Answering`

	Abstract: https://arxiv.org/abs/1909.06146

	PubMedQA is a novel biomedical question answering (QA) dataset collected from
	PubMed abstracts. The task of PubMedQA is to answer research questions with
	yes/no/maybe (e.g.: Do preoperative statins reduce atrial fibrillation after
	coronary artery bypass grafting?) using the corresponding abstracts. PubMedQA
	has 1k expert-annotated, 61.2k unlabeled and 211.3k artificially generated QA
	instances. Each PubMedQA instance is composed of (1) a question which is either
	an existing research article title or derived from one, (2) a context which is
	the corresponding abstract without its conclusion, (3) a long answer, which is
	the conclusion of the abstract and, presumably, answers the research question,
	and (4) a yes/no/maybe answer which summarizes the conclusion.

	Homepage: https://pubmedqa.github.io/


	### Citation

	```
	@inproceedings{jin2019pubmedqa,
	title={PubMedQA: A Dataset for Biomedical Research Question Answering},
	author={Jin, Qiao and Dhingra, Bhuwan and Liu, Zhengping and Cohen, William and Lu, Xinghua},
	booktitle={Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)},
	pages={2567--2577},
	year={2019}
	}
	```

	### Groups and Tasks

	#### Groups

	* Not part of a group yet

	#### Tasks

	* `pubmed_qa`

	### Checklist

	For adding novel benchmarks/datasets to the library:
	* [ ] Is the task an existing benchmark in the literature?
	* [ ] Have you referenced the original paper that introduced the task?
	* [ ] If yes, does the original paper provide a reference implementation? If so, have you checked against the reference implementation and documented how to run such a test?


	If other tasks on this dataset are already supported:
	* [ ] Is the "Main" variant of this task clearly denoted?
	* [ ] Have you provided a short sentence in a README on what each new variant adds / evaluates?
	* [ ] Have you noted which, if any, published evaluation setups are matched by this variant?