NLP-Playground / src /components /semantic_search.py
MuzzammilShah's picture
Upload 24 files
3ab6535 verified
import streamlit as st
import torch
from utils.ui_helpers import plot_similarity_heatmap
def show_semantic_search(nlp_engine):
"""Display the semantic search UI component"""
#st.markdown("🔍🧠")
st.title("Semantic Search🔍🧠")
st.markdown("""
Search for semantically similar texts using sentence embeddings.
This tool converts text to vector representations and computes similarity.
""")
# Create tabs for different modes
tab1, tab2 = st.tabs(["Semantic Search", "Text Similarity"])
# TAB 1: Semantic Search
with tab1:
st.markdown("### Search in a Corpus of Texts")
# Corpus input
corpus_text = st.text_area(
"Enter corpus texts (one text per line)",
"The weather is sunny today.\nI enjoy walking in the park on a beautiful day.\nAI is transforming many industries.",
height=150,
help="Enter multiple sentences or paragraphs, one per line"
)
# Convert to list
corpus = [text.strip() for text in corpus_text.split('\n') if text.strip()]
# Query input
query = st.text_input(
"Enter your search query",
"What is the forecast for today?"
)
# Process button
if st.button("Search", key="search_button"):
if not corpus or not query:
st.error("Please provide both corpus texts and a search query.")
else:
with st.spinner("Computing similarities..."):
# Get embeddings
query_embedding = nlp_engine.get_embeddings(query)
corpus_embeddings = nlp_engine.get_embeddings(corpus)
# Ensure query embedding is 2D for comparison
if query_embedding.ndim == 1:
query_embedding = query_embedding.unsqueeze(0)
# Compute similarities
similarities = torch.nn.functional.cosine_similarity(
query_embedding,
corpus_embeddings,
dim=1
).tolist()
# Display results
st.markdown("### Search Results")
# Plot similarities
fig = plot_similarity_heatmap(query, corpus, similarities)
st.plotly_chart(fig, use_container_width=True)
# Show sorted results in a table
results = []
for i, (text, score) in enumerate(sorted(zip(corpus, similarities), key=lambda x: x[1], reverse=True)):
results.append({
"Rank": i + 1,
"Text": text,
"Similarity": f"{score:.4f}"
})
st.table(results)
# TAB 2: Text Similarity
with tab2:
st.markdown("### Compare Two Texts")
# Text inputs
col1, col2 = st.columns(2)
with col1:
text1 = st.text_area(
"Text 1",
"The weather forecast predicts rain tomorrow.",
height=150
)
with col2:
text2 = st.text_area(
"Text 2",
"According to meteorologists, precipitation is expected the following day.",
height=150
)
# Process button
if st.button("Compare", key="compare_button"):
if not text1 or not text2:
st.error("Please provide both texts to compare.")
else:
with st.spinner("Computing similarity..."):
# Get embeddings
embedding1 = nlp_engine.get_embeddings(text1)
embedding2 = nlp_engine.get_embeddings(text2)
# Ensure embeddings are 2D for comparison
if embedding1.ndim == 1:
embedding1 = embedding1.unsqueeze(0)
if embedding2.ndim == 1:
embedding2 = embedding2.unsqueeze(0)
# Compute similarity
similarity = torch.nn.functional.cosine_similarity(
embedding1,
embedding2,
dim=1
).item()
# Display result
st.markdown("### Similarity Result")
# Create a visual representation of similarity
similarity_percentage = round(similarity * 100, 2)
st.progress(similarity)
# Show the similarity score
if similarity_percentage > 80:
st.success(f"Similarity Score: {similarity_percentage}% (Very Similar)")
elif similarity_percentage > 60:
st.info(f"Similarity Score: {similarity_percentage}% (Moderately Similar)")
elif similarity_percentage > 40:
st.warning(f"Similarity Score: {similarity_percentage}% (Somewhat Similar)")
else:
st.error(f"Similarity Score: {similarity_percentage}% (Not Very Similar)")
# Example section
with st.expander("Example texts to try"):
st.markdown("""
### Example Corpus for Semantic Search
```
Artificial intelligence is revolutionizing healthcare through early disease detection.
Machine learning algorithms can predict patient outcomes based on historical data.
The automotive industry is investing heavily in self-driving car technology.
Climate change is causing more frequent and severe weather events globally.
Regular exercise and a balanced diet are essential for maintaining good health.
The global economy faces significant challenges due to supply chain disruptions.
Renewable energy sources are becoming increasingly cost-competitive with fossil fuels.
```
**Example Queries:**
- How is AI helping doctors?
- What's happening with autonomous vehicles?
- How can I stay healthy?
- What are the economic trends currently?
### Example Text Pairs for Similarity Comparison
**Similar Pairs:**
Text 1: `The film received positive reviews from critics and performed well at the box office.`
Text 2: `The movie was praised by reviewers and was commercially successful.`
Text 1: `The company announced a significant increase in quarterly earnings.`
Text 2: `The firm reported a substantial growth in profits for the last quarter.`
**Dissimilar Pairs:**
Text 1: `The recipe calls for two tablespoons of olive oil and fresh herbs.`
Text 2: `The basketball game went into overtime after a tied score at the final buzzer.`
Text 1: `Quantum computing leverages principles of quantum mechanics to process information.`
Text 2: `The annual flower festival attracts thousands of visitors to the botanical gardens.`
""")
# Information about the model
with st.expander("About this model"):
st.markdown("""
**Model**: `sentence-transformers/all-MiniLM-L6-v2`
This is a sentence transformer model that generates fixed-size embeddings for text.
- **Size**: 80M parameters (smaller than BERT Base)
- **Embedding Dimension**: 384
- **Performance**: 58.80% accuracy on Semantic Textual Similarity (STS) benchmark
The model maps sentences or paragraphs to a dense vector space where semantically similar texts are close to each other, enabling semantic search and similarity comparisons.
""")