ashutoshchoudhari's picture
Rename project title in README.md for clarity and consistency
51972c5
---
title: Search Engine LLM App
emoji: 🌍
colorFrom: pink
colorTo: indigo
sdk: streamlit
sdk_version: 1.43.2
app_file: app.py
pinned: false
license: mit
short_description: This app allows you to chat with an LLM that can search web.
---
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
# Search Engine LLM App
## Overview
This application is a powerful research assistant built with Langchain that can search across multiple knowledge sources including Wikipedia, arXiv, and the web via DuckDuckGo. It leverages Groq's LLM capabilities to provide intelligent, context-aware responses to user queries.
## Live Demo
Try the application live at: [Hugging Face Spaces](https://huggingface.co/spaces/ashutoshchoudhari/Search-Engine-LLM-app)
## Features
- **Multi-source search**: Access information from Wikipedia, arXiv scientific papers, and web results
- **Conversational memory**: Retains context from previous interactions
- **Streaming responses**: See the AI's response generated in real-time
- **User-friendly interface**: Clean Streamlit UI for easy interaction
## Technical Components
- **LLM**: Groq's Llama3-8b-8192 model (with fallback support for Ollama models)
- **Embeddings**: Hugging Face's all-MiniLM-L6-v2
- **Search Tools**:
- Wikipedia API
- arXiv API
- DuckDuckGo Search
- **Framework**: Langchain for agent orchestration
- **Frontend**: Streamlit
## Project Structure
- **app.py**: Main application file containing the Streamlit UI and Langchain integration
- **requirements.txt**: Dependencies required to run the application
- **README.md**: Project metadata and description for Hugging Face Spaces
- **tools_agents.ipynb**: Jupyter notebook demonstrating how to use Langchain tools and agents
- **.github/workflows/main.yaml**: GitHub Actions workflow for deploying to Hugging Face Spaces
- **.gitattributes**: Git LFS configuration for handling large files
- **.gitignore**: Standard Python gitignore file
- **LICENSE**: MIT License file
- **app_documentation.md**: This comprehensive documentation file
## Implementation Details
### LLM Integration
The application uses Groq's API to access the Llama3-8b-8192 model with streaming capability:
```python
llm = ChatGroq(
groq_api_key = st.session_state.api_key,
model_name = "Llama3-8b-8192",
streaming = True
)
```
Alternative local models can also be configured with Ollama:
```python
#llm = ChatOllama(base_url=OLLAMA_WSL_IP, model="llama3.1", streaming=True)
```
### Search Tools Configuration
The app configures three primary search tools:
1. **Wikipedia Search**:
```python
api_wrapper_wiki = WikipediaAPIWrapper(top_k_results = 3, doc_content_chars_max=10000)
wiki = WikipediaQueryRun(api_wrapper=api_wrapper_wiki)
wiki_tool= Tool(
name = "Wikipedia",
func = wiki.run,
description = "This tool uses the Wikipedia API to search for a topic."
)
```
2. **arXiv Search**:
```python
api_wrapper_arxiv = ArxivAPIWrapper(top_k_results = 5, doc_content_chars_max=10000)
arxiv = ArxivQueryRun(api_wrapper=api_wrapper_arxiv)
arxiv_tool = Tool(
name = "arxiv",
func = arxiv.run,
description = "Searches arXiv for papers matching the query.",
)
```
3. **DuckDuckGo Web Search**:
```python
api_wrapper_ddg = DuckDuckGoSearchAPIWrapper(region="us-en", time="y", max_results=10)
ddg = DuckDuckGoSearchResults(
api_wrapper=api_wrapper_ddg,
output_format="string",
handle_tool_error=True,
handle_validation_error=True)
ddg_tool = Tool(
name = "DuckDuckGo_Search",
func = ddg.run,
description = "Searches for search queries using the DuckDuckGo Search engine."
)
```
### Agent Configuration
The system uses the CHAT_CONVERSATIONAL_REACT_DESCRIPTION agent type with a conversational memory buffer:
```python
memory = ConversationBufferWindowMemory(k=5, memory_key="chat_history", return_messages=True)
search_agent = initialize_agent(
tools = tools,
llm = llm,
agent = AgentType.CHAT_CONVERSATIONAL_REACT_DESCRIPTION,
max_iterations = 10,
memory = memory,
handle_parsing_errors = True)
```
## Setup Requirements
1. Groq API key
2. Hugging Face token (for embeddings)
3. Python environment with required dependencies
## Installation Instructions
Install the required packages using:
```bash
pip install -r requirements.txt
```
Required packages include:
- arxiv
- wikipedia
- langchain, langchain-community, langchain-huggingface, langchain-groq
- openai
- duckduckgo-search
- ollama, langchain-ollama (for local model support)
## Environment Variables
Create a `.env` file with the following variables:
```
GROQ_API_KEY=your_groq_api_key_here
HF_TOKEN=your_huggingface_token_here
```
## Usage
1. Start the application using Streamlit:
```bash
streamlit run app.py
```
2. Enter your Groq API key in the sidebar when prompted
3. Type your research question in the chat input box
4. The agent will search across available sources and provide a comprehensive response
5. Your conversation history will be maintained throughout the session
## Example Queries
- "What are the latest developments in quantum computing?"
- "Explain the concept of transformer models in NLP"
- "What were the key findings from the recent climate change report?"
- "Tell me about the history and applications of reinforcement learning"
## Deployment
This project is configured to deploy to Hugging Face Spaces using GitHub Actions. The workflow in `.github/workflows/main.yaml` automatically syncs the repository to Hugging Face when changes are pushed to the main branch.
### Live Application
The app is currently deployed and accessible at: [Hugging Face Spaces](https://huggingface.co/spaces/ashutoshchoudhari/Search-Engine-LLM-app)
### Local Development
For local development, you can use:
```bash
streamlit run app.py
```
## License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
## Acknowledgments
- Langchain for providing the agent and tool framework
- Groq for the LLM API access
- Hugging Face for embeddings and hosting capabilities
## Contributing
Contributions, issues, and feature requests are welcome. Feel free to check issues page if you want to contribute.