Edit Models filters

Tasks

Text Generation

Image-Text-to-Text

Parameters

Libraries

Transformers.js

Apps

Inference Providers

Models

1,047

Full-text search

Active filters: reinforcement-learning, transformers

baek26/billsum_4768_bart-dialogsum

Reinforcement Learning • 0.1B • Updated Apr 17, 2024 • 2

baek26/dialogsum_9789_bart-dialogsum

Reinforcement Learning • 0.1B • Updated Apr 17, 2024 • 2

baek26/billsum_6121_bart-billsum

Reinforcement Learning • 0.1B • Updated Apr 17, 2024 • 2

baek26/bart-dialogsum-oracle

Reinforcement Learning • 0.1B • Updated Apr 17, 2024 • 2

baek26/billsum_1703_bart-billsum

Reinforcement Learning • 0.1B • Updated Apr 17, 2024 • 12

baek26/bart-billsum-oracle

Reinforcement Learning • 0.1B • Updated Apr 17, 2024 • 12

baek26/cnn_dailymail_6849_bart-dialogsum

Reinforcement Learning • 0.1B • Updated Apr 18, 2024 • 2

baek26/cnn_dailymail_886_bart-dialogsum

Reinforcement Learning • 0.1B • Updated Apr 18, 2024 • 2

baek26/cnn_dailymail_7952_bart-dialogsum

Reinforcement Learning • 0.1B • Updated Apr 18, 2024 • 2

baek26/cnn_dailymail_4520_bart-cnndm

Reinforcement Learning • 0.1B • Updated Apr 19, 2024 • 2

baek26/cnn_dailymail_3418_bart-cnndm

Reinforcement Learning • 0.1B • Updated Apr 19, 2024 • 2

pkbiswas/Phi-1_5-Detoxified-PPO-LoRa

Reinforcement Learning • Updated Apr 20, 2024 • 1

ruffy369/iris-breakout

Reinforcement Learning • Updated Aug 3, 2024 • 3

PranavBP525/phi-2-storygen-rlGPTf

Reinforcement Learning • Updated Apr 21, 2024 • 4

baek26/all_5483_all_8657_bart-base_rl

Reinforcement Learning • 0.1B • Updated Apr 21, 2024 • 2

baek26/all_9991_all_8657_bart-base_rl

Reinforcement Learning • 0.1B • Updated Apr 21, 2024 • 2

baek26/all_9006_all_8657_bart-base_rl

Reinforcement Learning • 0.1B • Updated Apr 21, 2024 • 2

baek26/all_6417_bart-base_rl

Reinforcement Learning • 0.1B • Updated Apr 22, 2024 • 12

lzacchini/ppo-LunarLander-v2

Reinforcement Learning • Updated May 10, 2024 • 2

PranavBP525/phi-2-storygen-rlhf

Reinforcement Learning • Updated Apr 24, 2024 • 2

baek26/all_5286_all_6417_bart-base_rl

Reinforcement Learning • 0.1B • Updated Apr 29, 2024 • 11

baek26/all_8113_all_6417_bart-base_rl

Reinforcement Learning • 0.1B • Updated Apr 29, 2024 • 2

baek26/all_4814_all_6417_bart-base_rl

Reinforcement Learning • 0.1B • Updated Apr 29, 2024 • 2

AlkQ/ppo-LunarLander-v2.1

Reinforcement Learning • Updated May 20, 2024 • 2

pkbiswas/Phi-3-Detoxified-PPO-LoRa

Reinforcement Learning • Updated May 18, 2024 • 2

stvnl/ppo_model_en

Reinforcement Learning • Updated May 2, 2024 • 2

hanyinwang/layer-project-diagnostic-mistral

Reinforcement Learning • Updated May 3, 2024 • 3

baek26/all_6618_all_6417_bart-base_rl

Reinforcement Learning • 0.1B • Updated May 7, 2024 • 12

baek26/all_8243_all_6417_bart-base_rl

Reinforcement Learning • 0.1B • Updated May 7, 2024 • 2

baek26/all_6959_all_6417_bart-base_rl

Reinforcement Learning • 0.1B • Updated May 7, 2024 • 13