{ "cells": [ { "cell_type": "code", "execution_count": 24, "id": "27b0322e-d6a8-4202-9f78-8d2754ebdd97", "metadata": {}, "outputs": [], "source": [ "#!pip list | grep hugging" ] }, { "cell_type": "code", "execution_count": 40, "id": "da82a90f-7098-4d0c-9fe8-3e0cfc39671d", "metadata": {}, "outputs": [], "source": [ "#!pip install transformers datasets" ] }, { "cell_type": "code", "execution_count": 12, "id": "829575c2-c292-4455-8cc6-48764e64c4b0", "metadata": {}, "outputs": [], "source": [ "#!pip install torch" ] }, { "cell_type": "code", "execution_count": 1, "id": "ba0ced0b-35cd-40fd-934f-1013d4a1364d", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/Users/milindchawre/.pyenv/versions/3.12.2/envs/hugging-face/lib/python3.12/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n", " from .autonotebook import tqdm as notebook_tqdm\n" ] } ], "source": [ "import transformers\n", "import datasets" ] }, { "cell_type": "code", "execution_count": 2, "id": "d196c435-fa5a-4c3b-bec4-0181aa00e8bb", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'4.44.0'" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "transformers.__version__" ] }, { "cell_type": "code", "execution_count": 3, "id": "55ddfdaa-3a22-4eab-ad36-24355cbb7fee", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'2.21.0'" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "datasets.__version__" ] }, { "cell_type": "code", "execution_count": 4, "id": "13b8669b-5cc5-40b8-bd22-a7ff44fa43f3", "metadata": {}, "outputs": [], "source": [ "from datasets import load_dataset" ] }, { "cell_type": "code", "execution_count": 5, "id": "a722a796-84b7-4c45-a104-3d863d52cbb5", "metadata": {}, "outputs": [], "source": [ "reviews = load_dataset('rotten_tomatoes')" ] }, { "cell_type": "code", "execution_count": 6, "id": "607c352e-70c8-4697-b9a1-a4c68e55d502", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "datasets.dataset_dict.DatasetDict" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "type(reviews)" ] }, { "cell_type": "code", "execution_count": 7, "id": "e0637ff3-90b9-41cf-bdd0-ba3dbe185225", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "DatasetDict({\n", " train: Dataset({\n", " features: ['text', 'label'],\n", " num_rows: 8530\n", " })\n", " validation: Dataset({\n", " features: ['text', 'label'],\n", " num_rows: 1066\n", " })\n", " test: Dataset({\n", " features: ['text', 'label'],\n", " num_rows: 1066\n", " })\n", "})\n" ] } ], "source": [ "print(reviews)" ] }, { "cell_type": "code", "execution_count": 8, "id": "79f60106-7628-4605-920f-6bb8375e6cb5", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | text | \n", "label | \n", "
---|---|---|
0 | \n", "the rock is destined to be the 21st century's ... | \n", "1 | \n", "
1 | \n", "the gorgeously elaborate continuation of \" the... | \n", "1 | \n", "
2 | \n", "effective but too-tepid biopic | \n", "1 | \n", "
3 | \n", "if you sometimes like to go to the movies to h... | \n", "1 | \n", "
4 | \n", "emerges as something rare , an issue movie tha... | \n", "1 | \n", "
... | \n", "... | \n", "... | \n", "
8525 | \n", "any enjoyment will be hinge from a personal th... | \n", "0 | \n", "
8526 | \n", "if legendary shlockmeister ed wood had ever ma... | \n", "0 | \n", "
8527 | \n", "hardly a nuanced portrait of a young woman's b... | \n", "0 | \n", "
8528 | \n", "interminably bleak , to say nothing of boring . | \n", "0 | \n", "
8529 | \n", "things really get weird , though not particula... | \n", "0 | \n", "
8530 rows × 2 columns
\n", "\n", " | text | \n", "label | \n", "model_prediction | \n", "
---|---|---|---|
0 | \n", "lovingly photographed in the manner of a golde... | \n", "1 | \n", "POSITIVE | \n", "
1 | \n", "consistently clever and suspenseful . | \n", "1 | \n", "POSITIVE | \n", "
2 | \n", "it's like a \" big chill \" reunion of the baade... | \n", "1 | \n", "NEGATIVE | \n", "
3 | \n", "the story gives ample opportunity for large-sc... | \n", "1 | \n", "POSITIVE | \n", "
4 | \n", "red dragon \" never cuts corners . | \n", "1 | \n", "POSITIVE | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "
1061 | \n", "a terrible movie that some people will neverth... | \n", "0 | \n", "NEGATIVE | \n", "
1062 | \n", "there are many definitions of 'time waster' bu... | \n", "0 | \n", "NEGATIVE | \n", "
1063 | \n", "as it stands , crocodile hunter has the hurrie... | \n", "0 | \n", "NEGATIVE | \n", "
1064 | \n", "the thing looks like a made-for-home-video qui... | \n", "0 | \n", "NEGATIVE | \n", "
1065 | \n", "enigma is well-made , but it's just too dry an... | \n", "0 | \n", "NEGATIVE | \n", "
1066 rows × 3 columns
\n", "