File size: 8,224 Bytes
3ab6535
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
import streamlit as st
import torch
from utils.ui_helpers import plot_similarity_heatmap

def show_semantic_search(nlp_engine):
    """Display the semantic search UI component"""
    #st.markdown("🔍🧠")
    st.title("Semantic Search🔍🧠")
    st.markdown("""
    Search for semantically similar texts using sentence embeddings.
    This tool converts text to vector representations and computes similarity.
    """)
    
    # Create tabs for different modes
    tab1, tab2 = st.tabs(["Semantic Search", "Text Similarity"])
    
    # TAB 1: Semantic Search
    with tab1:
        st.markdown("### Search in a Corpus of Texts")
        
        # Corpus input
        corpus_text = st.text_area(
            "Enter corpus texts (one text per line)",
            "The weather is sunny today.\nI enjoy walking in the park on a beautiful day.\nAI is transforming many industries.",
            height=150,
            help="Enter multiple sentences or paragraphs, one per line"
        )
        
        # Convert to list
        corpus = [text.strip() for text in corpus_text.split('\n') if text.strip()]
        
        # Query input
        query = st.text_input(
            "Enter your search query",
            "What is the forecast for today?"
        )
        
        # Process button
        if st.button("Search", key="search_button"):
            if not corpus or not query:
                st.error("Please provide both corpus texts and a search query.")
            else:
                with st.spinner("Computing similarities..."):
                    # Get embeddings
                    query_embedding = nlp_engine.get_embeddings(query)
                    corpus_embeddings = nlp_engine.get_embeddings(corpus)
                    
                    # Ensure query embedding is 2D for comparison
                    if query_embedding.ndim == 1:
                        query_embedding = query_embedding.unsqueeze(0)
                    
                    # Compute similarities
                    similarities = torch.nn.functional.cosine_similarity(
                        query_embedding, 
                        corpus_embeddings, 
                        dim=1
                    ).tolist()
                    
                    # Display results
                    st.markdown("### Search Results")
                    
                    # Plot similarities
                    fig = plot_similarity_heatmap(query, corpus, similarities)
                    st.plotly_chart(fig, use_container_width=True)
                    
                    # Show sorted results in a table
                    results = []
                    for i, (text, score) in enumerate(sorted(zip(corpus, similarities), key=lambda x: x[1], reverse=True)):
                        results.append({
                            "Rank": i + 1,
                            "Text": text,
                            "Similarity": f"{score:.4f}"
                        })
                    
                    st.table(results)
    
    # TAB 2: Text Similarity
    with tab2:
        st.markdown("### Compare Two Texts")
        
        # Text inputs
        col1, col2 = st.columns(2)
        
        with col1:
            text1 = st.text_area(
                "Text 1",
                "The weather forecast predicts rain tomorrow.",
                height=150
            )
        
        with col2:
            text2 = st.text_area(
                "Text 2",
                "According to meteorologists, precipitation is expected the following day.",
                height=150
            )
        
        # Process button
        if st.button("Compare", key="compare_button"):
            if not text1 or not text2:
                st.error("Please provide both texts to compare.")
            else:
                with st.spinner("Computing similarity..."):
                    # Get embeddings
                    embedding1 = nlp_engine.get_embeddings(text1)
                    embedding2 = nlp_engine.get_embeddings(text2)
                    
                    # Ensure embeddings are 2D for comparison
                    if embedding1.ndim == 1:
                        embedding1 = embedding1.unsqueeze(0)
                    if embedding2.ndim == 1:
                        embedding2 = embedding2.unsqueeze(0)
                    
                    # Compute similarity
                    similarity = torch.nn.functional.cosine_similarity(
                        embedding1, 
                        embedding2, 
                        dim=1
                    ).item()
                    
                    # Display result
                    st.markdown("### Similarity Result")
                    
                    # Create a visual representation of similarity
                    similarity_percentage = round(similarity * 100, 2)
                    st.progress(similarity)
                    
                    # Show the similarity score
                    if similarity_percentage > 80:
                        st.success(f"Similarity Score: {similarity_percentage}% (Very Similar)")
                    elif similarity_percentage > 60:
                        st.info(f"Similarity Score: {similarity_percentage}% (Moderately Similar)")
                    elif similarity_percentage > 40:
                        st.warning(f"Similarity Score: {similarity_percentage}% (Somewhat Similar)")
                    else:
                        st.error(f"Similarity Score: {similarity_percentage}% (Not Very Similar)")
    
    # Example section
    with st.expander("Example texts to try"):
        st.markdown("""
        ### Example Corpus for Semantic Search
        
        ```
        Artificial intelligence is revolutionizing healthcare through early disease detection.
        Machine learning algorithms can predict patient outcomes based on historical data.
        The automotive industry is investing heavily in self-driving car technology.
        Climate change is causing more frequent and severe weather events globally.
        Regular exercise and a balanced diet are essential for maintaining good health.
        The global economy faces significant challenges due to supply chain disruptions.
        Renewable energy sources are becoming increasingly cost-competitive with fossil fuels.
        ```
        
        **Example Queries:**
        - How is AI helping doctors?
        - What's happening with autonomous vehicles?
        - How can I stay healthy?
        - What are the economic trends currently?
        
        ### Example Text Pairs for Similarity Comparison
        
        **Similar Pairs:**
        
        Text 1: `The film received positive reviews from critics and performed well at the box office.`
        Text 2: `The movie was praised by reviewers and was commercially successful.`
        
        Text 1: `The company announced a significant increase in quarterly earnings.`
        Text 2: `The firm reported a substantial growth in profits for the last quarter.`
        
        **Dissimilar Pairs:**
        
        Text 1: `The recipe calls for two tablespoons of olive oil and fresh herbs.`
        Text 2: `The basketball game went into overtime after a tied score at the final buzzer.`
        
        Text 1: `Quantum computing leverages principles of quantum mechanics to process information.`
        Text 2: `The annual flower festival attracts thousands of visitors to the botanical gardens.`
        """)
    
    # Information about the model
    with st.expander("About this model"):
        st.markdown("""
        **Model**: `sentence-transformers/all-MiniLM-L6-v2`
        
        This is a sentence transformer model that generates fixed-size embeddings for text.
        
        - **Size**: 80M parameters (smaller than BERT Base)
        - **Embedding Dimension**: 384
        - **Performance**: 58.80% accuracy on Semantic Textual Similarity (STS) benchmark
        
        The model maps sentences or paragraphs to a dense vector space where semantically similar texts are close to each other, enabling semantic search and similarity comparisons.
        """)