Spaces:
Sleeping
Sleeping
import streamlit as st | |
def show_text_summarizer(nlp_engine): | |
"""Display the text summarization UI component""" | |
#st.markdown("📄➡️📎") | |
st.title("Text Summarization📄➡️📎") | |
st.markdown(""" | |
Generate concise summaries of longer texts using the BART-Large-CNN model. | |
This model is fine-tuned on CNN Daily Mail, a dataset of news articles paired with summaries. | |
""") | |
# Text input | |
text_input = st.text_area( | |
"Enter text to summarize", | |
"""The Hugging Face ecosystem provides a wide array of tools and models for natural language processing. | |
It includes transformers for state-of-the-art models, datasets for accessing and sharing data, | |
and a model hub for discovering and using pre-trained models. Developers can leverage these | |
resources to build powerful NLP applications with relative ease. The platform also supports | |
various tasks such as text classification, summarization, translation, and question answering. | |
The quick brown fox jumps over the lazy dog. This sentence is repeated multiple times to ensure | |
the text is long enough for summarization to be meaningful. The quick brown fox jumps over the lazy dog. | |
The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog.""", | |
height=250 | |
) | |
# Parameters | |
col1, col2 = st.columns(2) | |
with col1: | |
min_length = st.slider( | |
"Minimum Length (words)", | |
min_value=10, | |
max_value=100, | |
value=30, | |
step=5, | |
help="Minimum length of the summary in words" | |
) | |
with col2: | |
max_length = st.slider( | |
"Maximum Length (words)", | |
min_value=50, | |
max_value=500, | |
value=150, | |
step=10, | |
help="Maximum length of the summary in words" | |
) | |
# Process button | |
if st.button("Generate Summary"): | |
if len(text_input.split()) < min_length: | |
st.error(f"Input text is too short. It should have at least {min_length} words.") | |
else: | |
with st.spinner("Generating summary..."): | |
# Get summary | |
summary_result = nlp_engine.summarize_text( | |
text_input, | |
max_length=max_length, | |
min_length=min_length | |
) | |
# Display results | |
st.markdown("### Summary") | |
st.info(summary_result[0]['summary_text']) | |
# Display statistics | |
input_word_count = len(text_input.split()) | |
summary_word_count = len(summary_result[0]['summary_text'].split()) | |
reduction = round((1 - summary_word_count / input_word_count) * 100, 1) | |
st.markdown(f""" | |
**Statistics:** | |
- Original text: {input_word_count} words | |
- Summary: {summary_word_count} words | |
- Reduction: {reduction}% | |
""") | |
# Example section | |
with st.expander("Example texts to try"): | |
st.markdown(""" | |
### Example 1: Scientific Article | |
``` | |
Recent advances in artificial intelligence have led to significant breakthroughs in natural language processing. | |
Transformer models like BERT, GPT, and T5 have demonstrated remarkable capabilities in understanding and generating human language. | |
These models leverage self-attention mechanisms to process sequences of text in parallel, capturing long-range dependencies more effectively than previous architectures like RNNs or LSTMs. | |
Pre-training on vast corpora of text allows these models to learn general language representations that can be fine-tuned for specific downstream tasks with relatively small amounts of labeled data. | |
Applications of these technologies include machine translation, text summarization, question answering, and sentiment analysis. | |
Despite their impressive performance, challenges remain in areas such as computational efficiency, interpretability, and ethical considerations regarding bias and fairness. | |
Researchers continue to explore methods for reducing model size while maintaining performance, as well as techniques for making models more transparent and accountable. | |
``` | |
### Example 2: News Article | |
``` | |
The city council voted yesterday to approve the controversial downtown development project, following a heated debate that lasted nearly five hours. | |
The $500 million project will include a 40-story residential tower, 100,000 square feet of retail space, and a public park. | |
Supporters argue that the development will create jobs and revitalize the downtown area, which has struggled economically in recent years. | |
They point to estimates suggesting the project will generate 1,500 construction jobs and 800 permanent positions once completed. | |
However, opponents raised concerns about increased traffic, potential environmental impacts, and the displacement of existing small businesses in the area. | |
Community activist groups held protests outside city hall, with signs reading "People Over Profit" and "Save Our Neighborhood." | |
The final vote was 7-4 in favor of the project, with councilmembers from the downtown districts voting against it. | |
Mayor Johnson, who has championed the development since its proposal two years ago, called the decision "a crucial step forward for our city's future." | |
Construction is expected to begin next spring and last approximately three years. | |
The developer has agreed to include 15% affordable housing units and contribute $5 million to a community benefits fund as part of the approval conditions. | |
``` | |
""") | |
# Information about the model | |
with st.expander("About this model"): | |
st.markdown(""" | |
**Model**: `facebook/bart-large-cnn` | |
BART (Bidirectional and Auto-Regressive Transformers) is a transformer encoder-decoder model fine-tuned on CNN Daily Mail, a large dataset of news articles paired with summaries. | |
- **Size**: 400M parameters | |
- **Training**: Pre-trained with a denoising objective on a large text corpus, then fine-tuned on CNN/DM dataset | |
- **Performance**: State-of-the-art results on various summarization benchmarks | |
This model is particularly effective at generating concise, coherent summaries that capture the main points of news articles and other informative texts. | |
""") | |