BayesTensor
/

out

Generated from Trainer

4-bit precision

Model card Files Files and versions

out / lm-evaluation-harness /lm_eval /tasks /scrolls /README.md

BayesTensor's picture

Upload folder using huggingface_hub

9d5b280 verified 7 months ago

|

history blame contribute delete

1.44 kB

	"""
	SCROLLS: Standardized CompaRison Over Long Language Sequences
	https://arxiv.org/abs/2201.03533

	SCROLLS is a suite of datasets that require synthesizing information over long texts.
	The benchmark includes seven natural language tasks across multiple domains,
	including summarization, question answering, and natural language inference.

	Homepage: https://www.scrolls-benchmark.com/

	Since SCROLLS tasks are generally longer than the maximum sequence length of many models,
	it is possible to create "subset" tasks that contain only those samples whose tokenized length
	is less than some pre-defined limit. For example, to create a subset of "Qasper" that would
	be suitable for a model using the GPTNeoX tokenizer and a 4K maximum sequence length:

	```
	class QasperGPTNeoX4K(Qasper):
	PRUNE_TOKENIZERS = ["EleutherAI/pythia-410m-deduped"]
	PRUNE_MAX_TOKENS = 4096
	PRUNE_NUM_PROC = _num_cpu_cores() # optional, to speed up pruning of large datasets like NarrativeQA
	```

	`PRUNE_TOKENIZERS` can contain more than one tokenizer; this will include only samples that are
	less than `PRUNE_MAX_TOKENS` for ALL of the tokenizers. This can be useful to comparing models
	that use different tokenizers but the same maximum sequence length.

	Once the subset task class has been defined in this file, it can be used by adding the class
	to `lm_eval/tasks/__init__.py`.

	NOTE: GovReport may need `max_gen_toks` set larger for causal models.
	"""