# Pipelines for NLP Tasks

In [2]:
import transformers
from transformers import pipeline

  from .autonotebook import tqdm as notebook_tqdm


In [3]:
print(transformers.__version__)

4.44.0


## Loading Tasks

The task defining which pipeline will be returned. Currently accepted tasks are:
    
    - `"audio-classification"`: will return a [`AudioClassificationPipeline`].
    - `"automatic-speech-recognition"`: will return a [`AutomaticSpeechRecognitionPipeline`].
    - `"conversational"`: will return a [`ConversationalPipeline`].
    - `"depth-estimation"`: will return a [`DepthEstimationPipeline`].
    - `"document-question-answering"`: will return a [`DocumentQuestionAnsweringPipeline`].
    - `"feature-extraction"`: will return a [`FeatureExtractionPipeline`].
    - `"fill-mask"`: will return a [`FillMaskPipeline`]:.
    - `"image-classification"`: will return a [`ImageClassificationPipeline`].
    - `"image-segmentation"`: will return a [`ImageSegmentationPipeline`].
    - `"image-to-text"`: will return a [`ImageToTextPipeline`].
    - `"object-detection"`: will return a [`ObjectDetectionPipeline`].
    - `"question-answering"`: will return a [`QuestionAnsweringPipeline`].
    - `"summarization"`: will return a [`SummarizationPipeline`].
    - `"table-question-answering"`: will return a [`TableQuestionAnsweringPipeline`].
    - `"text2text-generation"`: will return a [`Text2TextGenerationPipeline`].
    - `"text-classification"` (alias `"sentiment-analysis"` available): will return a
      [`TextClassificationPipeline`].
    - `"text-generation"`: will return a [`TextGenerationPipeline`]:.
    - `"token-classification"` (alias `"ner"` available): will return a [`TokenClassificationPipeline`].
    - `"translation"`: will return a [`TranslationPipeline`].
    - `"translation_xx_to_yy"`: will return a [`TranslationPipeline`].
    - `"video-classification"`: will return a [`VideoClassificationPipeline`].
    - `"visual-question-answering"`: will return a [`VisualQuestionAnsweringPipeline`].
    - `"zero-shot-classification"`: will return a [`ZeroShotClassificationPipeline`].
    - `"zero-shot-image-classification"`: will return a [`ZeroShotImageClassificationPipeline`].
    - `"zero-shot-object-detection"`: will return a [`ZeroShotObjectDetectionPipeline`].

## Classification 

### Default Models

In [4]:
pipe = pipeline(task="text-classification",device=0)
pipe("This restaurant is ok")

No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


[{'label': 'POSITIVE', 'score': 0.9998236298561096}]

### Specific Models

Perhaps you want to use a different model for different categories or text types, for example, financial news: https://huggingface.co/ProsusAI/finbert

You can explore more details in the paper: https://arxiv.org/pdf/1908.10063

In [5]:
pipe = pipeline(model="ProsusAI/finbert")

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Hardware accelerator e.g. GPU is available in the environment, but no `device` argument is passed to the `Pipeline` object. Model will be on CPU.


In [6]:
pipe("Shares of food delivery companies surged despite the catastrophic impact of coronavirus on global markets.")

[{'label': 'positive', 'score': 0.9350943565368652}]

In [7]:
tweets = ['Gonna buy AAPL, its about to surge up!',
          'Gotta sell AAPL, its gonna plummet!']

In [8]:
pipe(tweets)

[{'label': 'positive', 'score': 0.523411750793457},
 {'label': 'neutral', 'score': 0.5528597831726074}]

# Named Entity Recognition

Let's explore another NLP task, such as NER - Named Entity Recognition

**Note, this is a much larger model! If you run this it will download about 1.5 GB on to your computer inside of a cache folder!**

In [9]:
pipe = pipeline(task="text-classification")

No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
Hardware accelerator e.g. GPU is available in the environment, but no `device` argument is passed to the `Pipeline` object. Model will be on CPU.


In [10]:
ner_tag_pipe = pipeline('ner')

No model was supplied, defaulted to dbmdz/bert-large-cased-finetuned-conll03-english and revision f2482bf (https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
Some weights of the model checkpoint at dbmdz/bert-large-cased-finetuned-conll03-english were not used when initializing BertForTokenClassification: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Hardware accelerator e.g. GPU is

In [22]:
result = ner_tag_pipe("After working at Tesla I started to study Nikola Tesla a lot more, especially at university in the USA.")

In [23]:
#sentence ="""After working at Tomtom I started to study AI a lot more, especially at home in the Mumbai, Topics like RAG, hugging face and data science interest me more, Eating food like snacks and packed food with my laptop is my working setup."""
#result = ner_tag_pipe(sentence)

In [24]:
result

[{'entity': 'I-ORG',
  'score': 0.9137765,
  'index': 4,
  'word': 'Te',
  'start': 17,
  'end': 19},
 {'entity': 'I-ORG',
  'score': 0.3789888,
  'index': 5,
  'word': '##sla',
  'start': 19,
  'end': 22},
 {'entity': 'I-PER',
  'score': 0.99693346,
  'index': 10,
  'word': 'Nikola',
  'start': 42,
  'end': 48},
 {'entity': 'I-PER',
  'score': 0.9901416,
  'index': 11,
  'word': 'Te',
  'start': 49,
  'end': 51},
 {'entity': 'I-PER',
  'score': 0.8931826,
  'index': 12,
  'word': '##sla',
  'start': 51,
  'end': 54},
 {'entity': 'I-LOC',
  'score': 0.9997478,
  'index': 22,
  'word': 'USA',
  'start': 99,
  'end': 102}]

# Question Answering

In [25]:
qa_bot = pipeline('question-answering')

No model was supplied, defaulted to distilbert/distilbert-base-cased-distilled-squad and revision 626af31 (https://huggingface.co/distilbert/distilbert-base-cased-distilled-squad).
Using a pipeline without specifying a model name and revision in production is not recommended.
Hardware accelerator e.g. GPU is available in the environment, but no `device` argument is passed to the `Pipeline` object. Model will be on CPU.


In [26]:
text = """
D-Day, marked on June 6, 1944, stands as one of the most significant military operations in history, 
initiating the Allied invasion of Nazi-occupied Europe during World War II. Known as Operation Overlord, 
this massive amphibious assault involved nearly 160,000 Allied troops landing on the beaches of Normandy, 
France, across five sectors: Utah, Omaha, Gold, Juno, and Sword. Supported by over 5,000 ships and 13,000 
aircraft, the operation was preceded by extensive aerial and naval bombardment and an airborne assault. 
The invasion set the stage for the liberation of Western Europe from Nazi control, despite the heavy 
casualties and formidable German defenses. This day not only demonstrated the logistical prowess 
and courage of the Allied forces but also marked a turning point in the war, leading to the eventual 
defeat of Nazi Germany.
"""

In [38]:
question = "What were the five beach sectors on D-Day?"

result = qa_bot(question=question,context=text)

In [36]:
#
#question = "Who is sherlock holmes?"
#result = qa_bot(question=question,context=text)

In [39]:
result

{'score': 0.9430821537971497,
 'start': 345,
 'end': 379,
 'answer': 'Utah, Omaha, Gold, Juno, and Sword'}

## Translations

Translates from one language to another.

This translation pipeline can currently be loaded from pipeline() using the following task identifier: "translation_xx_to_yy".

The models that this pipeline can use are models that have been fine-tuned on a translation task. See the up-to-date list of available models on www.huggingface.co/models.  

Note: You would typically call a specific model for translations: https://huggingface.co/models?pipeline_tag=translation

In [6]:
from transformers import pipeline
translate = pipeline('translation_en_to_fr')

No model was supplied, defaulted to t5-base and revision 686f1db (https://huggingface.co/t5-base).
Using a pipeline without specifying a model name and revision in production is not recommended.


config.json:   0%|          | 0.00/1.21k [00:00<?, ?B/s]

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to see activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development


pytorch_model.bin:   0%|          | 0.00/892M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.39M [00:00<?, ?B/s]

For now, this behavior is kept to avoid breaking backwards compatibility when padding/encoding with `truncation is True`.
- Be aware that you SHOULD NOT rely on t5-base automatically truncating your input to 512 when padding/encoding.
- If you want to encode/pad to sequences longer than 512 you can either instantiate this tokenizer with `model_max_length` or pass `max_length` when encoding/padding.


In [7]:
result = translate("Hello, my name is Jose. What is your name?")



In [8]:
result

[{'translation_text': 'Bonjour, mon nom est Jose, quel est votre nom ?'}]

In [9]:
result = translate("Hello, my name is Jose.")

In [10]:
result

[{'translation_text': 'Bonjour, mon nom est Jose.'}]

In [11]:
result = translate("Hello, I am called Jose.")

In [12]:
result

[{'translation_text': "Bonjour, je m'appelle Jose."}]