Spaces:
Sleeping
Sleeping
File size: 8,224 Bytes
3ab6535 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 |
import streamlit as st
import torch
from utils.ui_helpers import plot_similarity_heatmap
def show_semantic_search(nlp_engine):
"""Display the semantic search UI component"""
#st.markdown("🔍🧠")
st.title("Semantic Search🔍🧠")
st.markdown("""
Search for semantically similar texts using sentence embeddings.
This tool converts text to vector representations and computes similarity.
""")
# Create tabs for different modes
tab1, tab2 = st.tabs(["Semantic Search", "Text Similarity"])
# TAB 1: Semantic Search
with tab1:
st.markdown("### Search in a Corpus of Texts")
# Corpus input
corpus_text = st.text_area(
"Enter corpus texts (one text per line)",
"The weather is sunny today.\nI enjoy walking in the park on a beautiful day.\nAI is transforming many industries.",
height=150,
help="Enter multiple sentences or paragraphs, one per line"
)
# Convert to list
corpus = [text.strip() for text in corpus_text.split('\n') if text.strip()]
# Query input
query = st.text_input(
"Enter your search query",
"What is the forecast for today?"
)
# Process button
if st.button("Search", key="search_button"):
if not corpus or not query:
st.error("Please provide both corpus texts and a search query.")
else:
with st.spinner("Computing similarities..."):
# Get embeddings
query_embedding = nlp_engine.get_embeddings(query)
corpus_embeddings = nlp_engine.get_embeddings(corpus)
# Ensure query embedding is 2D for comparison
if query_embedding.ndim == 1:
query_embedding = query_embedding.unsqueeze(0)
# Compute similarities
similarities = torch.nn.functional.cosine_similarity(
query_embedding,
corpus_embeddings,
dim=1
).tolist()
# Display results
st.markdown("### Search Results")
# Plot similarities
fig = plot_similarity_heatmap(query, corpus, similarities)
st.plotly_chart(fig, use_container_width=True)
# Show sorted results in a table
results = []
for i, (text, score) in enumerate(sorted(zip(corpus, similarities), key=lambda x: x[1], reverse=True)):
results.append({
"Rank": i + 1,
"Text": text,
"Similarity": f"{score:.4f}"
})
st.table(results)
# TAB 2: Text Similarity
with tab2:
st.markdown("### Compare Two Texts")
# Text inputs
col1, col2 = st.columns(2)
with col1:
text1 = st.text_area(
"Text 1",
"The weather forecast predicts rain tomorrow.",
height=150
)
with col2:
text2 = st.text_area(
"Text 2",
"According to meteorologists, precipitation is expected the following day.",
height=150
)
# Process button
if st.button("Compare", key="compare_button"):
if not text1 or not text2:
st.error("Please provide both texts to compare.")
else:
with st.spinner("Computing similarity..."):
# Get embeddings
embedding1 = nlp_engine.get_embeddings(text1)
embedding2 = nlp_engine.get_embeddings(text2)
# Ensure embeddings are 2D for comparison
if embedding1.ndim == 1:
embedding1 = embedding1.unsqueeze(0)
if embedding2.ndim == 1:
embedding2 = embedding2.unsqueeze(0)
# Compute similarity
similarity = torch.nn.functional.cosine_similarity(
embedding1,
embedding2,
dim=1
).item()
# Display result
st.markdown("### Similarity Result")
# Create a visual representation of similarity
similarity_percentage = round(similarity * 100, 2)
st.progress(similarity)
# Show the similarity score
if similarity_percentage > 80:
st.success(f"Similarity Score: {similarity_percentage}% (Very Similar)")
elif similarity_percentage > 60:
st.info(f"Similarity Score: {similarity_percentage}% (Moderately Similar)")
elif similarity_percentage > 40:
st.warning(f"Similarity Score: {similarity_percentage}% (Somewhat Similar)")
else:
st.error(f"Similarity Score: {similarity_percentage}% (Not Very Similar)")
# Example section
with st.expander("Example texts to try"):
st.markdown("""
### Example Corpus for Semantic Search
```
Artificial intelligence is revolutionizing healthcare through early disease detection.
Machine learning algorithms can predict patient outcomes based on historical data.
The automotive industry is investing heavily in self-driving car technology.
Climate change is causing more frequent and severe weather events globally.
Regular exercise and a balanced diet are essential for maintaining good health.
The global economy faces significant challenges due to supply chain disruptions.
Renewable energy sources are becoming increasingly cost-competitive with fossil fuels.
```
**Example Queries:**
- How is AI helping doctors?
- What's happening with autonomous vehicles?
- How can I stay healthy?
- What are the economic trends currently?
### Example Text Pairs for Similarity Comparison
**Similar Pairs:**
Text 1: `The film received positive reviews from critics and performed well at the box office.`
Text 2: `The movie was praised by reviewers and was commercially successful.`
Text 1: `The company announced a significant increase in quarterly earnings.`
Text 2: `The firm reported a substantial growth in profits for the last quarter.`
**Dissimilar Pairs:**
Text 1: `The recipe calls for two tablespoons of olive oil and fresh herbs.`
Text 2: `The basketball game went into overtime after a tied score at the final buzzer.`
Text 1: `Quantum computing leverages principles of quantum mechanics to process information.`
Text 2: `The annual flower festival attracts thousands of visitors to the botanical gardens.`
""")
# Information about the model
with st.expander("About this model"):
st.markdown("""
**Model**: `sentence-transformers/all-MiniLM-L6-v2`
This is a sentence transformer model that generates fixed-size embeddings for text.
- **Size**: 80M parameters (smaller than BERT Base)
- **Embedding Dimension**: 384
- **Performance**: 58.80% accuracy on Semantic Textual Similarity (STS) benchmark
The model maps sentences or paragraphs to a dense vector space where semantically similar texts are close to each other, enabling semantic search and similarity comparisons.
""")
|