|
--- |
|
title: Search Engine LLM App |
|
emoji: π |
|
colorFrom: pink |
|
colorTo: indigo |
|
sdk: streamlit |
|
sdk_version: 1.43.2 |
|
app_file: app.py |
|
pinned: false |
|
license: mit |
|
short_description: This app allows you to chat with an LLM that can search web. |
|
--- |
|
|
|
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference |
|
|
|
# Search Engine LLM App |
|
|
|
## Overview |
|
This application is a powerful research assistant built with Langchain that can search across multiple knowledge sources including Wikipedia, arXiv, and the web via DuckDuckGo. It leverages Groq's LLM capabilities to provide intelligent, context-aware responses to user queries. |
|
|
|
## Live Demo |
|
Try the application live at: [Hugging Face Spaces](https://huggingface.co/spaces/ashutoshchoudhari/Search-Engine-LLM-app) |
|
|
|
## Features |
|
- **Multi-source search**: Access information from Wikipedia, arXiv scientific papers, and web results |
|
- **Conversational memory**: Retains context from previous interactions |
|
- **Streaming responses**: See the AI's response generated in real-time |
|
- **User-friendly interface**: Clean Streamlit UI for easy interaction |
|
|
|
## Technical Components |
|
- **LLM**: Groq's Llama3-8b-8192 model (with fallback support for Ollama models) |
|
- **Embeddings**: Hugging Face's all-MiniLM-L6-v2 |
|
- **Search Tools**: |
|
- Wikipedia API |
|
- arXiv API |
|
- DuckDuckGo Search |
|
- **Framework**: Langchain for agent orchestration |
|
- **Frontend**: Streamlit |
|
|
|
## Project Structure |
|
- **app.py**: Main application file containing the Streamlit UI and Langchain integration |
|
- **requirements.txt**: Dependencies required to run the application |
|
- **README.md**: Project metadata and description for Hugging Face Spaces |
|
- **tools_agents.ipynb**: Jupyter notebook demonstrating how to use Langchain tools and agents |
|
- **.github/workflows/main.yaml**: GitHub Actions workflow for deploying to Hugging Face Spaces |
|
- **.gitattributes**: Git LFS configuration for handling large files |
|
- **.gitignore**: Standard Python gitignore file |
|
- **LICENSE**: MIT License file |
|
- **app_documentation.md**: This comprehensive documentation file |
|
|
|
## Implementation Details |
|
|
|
### LLM Integration |
|
The application uses Groq's API to access the Llama3-8b-8192 model with streaming capability: |
|
|
|
```python |
|
llm = ChatGroq( |
|
groq_api_key = st.session_state.api_key, |
|
model_name = "Llama3-8b-8192", |
|
streaming = True |
|
) |
|
``` |
|
|
|
Alternative local models can also be configured with Ollama: |
|
```python |
|
#llm = ChatOllama(base_url=OLLAMA_WSL_IP, model="llama3.1", streaming=True) |
|
``` |
|
|
|
### Search Tools Configuration |
|
The app configures three primary search tools: |
|
|
|
1. **Wikipedia Search**: |
|
```python |
|
api_wrapper_wiki = WikipediaAPIWrapper(top_k_results = 3, doc_content_chars_max=10000) |
|
wiki = WikipediaQueryRun(api_wrapper=api_wrapper_wiki) |
|
wiki_tool= Tool( |
|
name = "Wikipedia", |
|
func = wiki.run, |
|
description = "This tool uses the Wikipedia API to search for a topic." |
|
) |
|
``` |
|
|
|
2. **arXiv Search**: |
|
```python |
|
api_wrapper_arxiv = ArxivAPIWrapper(top_k_results = 5, doc_content_chars_max=10000) |
|
arxiv = ArxivQueryRun(api_wrapper=api_wrapper_arxiv) |
|
arxiv_tool = Tool( |
|
name = "arxiv", |
|
func = arxiv.run, |
|
description = "Searches arXiv for papers matching the query.", |
|
) |
|
``` |
|
|
|
3. **DuckDuckGo Web Search**: |
|
```python |
|
api_wrapper_ddg = DuckDuckGoSearchAPIWrapper(region="us-en", time="y", max_results=10) |
|
ddg = DuckDuckGoSearchResults( |
|
api_wrapper=api_wrapper_ddg, |
|
output_format="string", |
|
handle_tool_error=True, |
|
handle_validation_error=True) |
|
ddg_tool = Tool( |
|
name = "DuckDuckGo_Search", |
|
func = ddg.run, |
|
description = "Searches for search queries using the DuckDuckGo Search engine." |
|
) |
|
``` |
|
|
|
### Agent Configuration |
|
The system uses the CHAT_CONVERSATIONAL_REACT_DESCRIPTION agent type with a conversational memory buffer: |
|
|
|
```python |
|
memory = ConversationBufferWindowMemory(k=5, memory_key="chat_history", return_messages=True) |
|
|
|
search_agent = initialize_agent( |
|
tools = tools, |
|
llm = llm, |
|
agent = AgentType.CHAT_CONVERSATIONAL_REACT_DESCRIPTION, |
|
max_iterations = 10, |
|
memory = memory, |
|
handle_parsing_errors = True) |
|
``` |
|
|
|
## Setup Requirements |
|
1. Groq API key |
|
2. Hugging Face token (for embeddings) |
|
3. Python environment with required dependencies |
|
|
|
## Installation Instructions |
|
Install the required packages using: |
|
|
|
```bash |
|
pip install -r requirements.txt |
|
``` |
|
|
|
Required packages include: |
|
- arxiv |
|
- wikipedia |
|
- langchain, langchain-community, langchain-huggingface, langchain-groq |
|
- openai |
|
- duckduckgo-search |
|
- ollama, langchain-ollama (for local model support) |
|
|
|
## Environment Variables |
|
Create a `.env` file with the following variables: |
|
``` |
|
GROQ_API_KEY=your_groq_api_key_here |
|
HF_TOKEN=your_huggingface_token_here |
|
``` |
|
|
|
## Usage |
|
1. Start the application using Streamlit: |
|
```bash |
|
streamlit run app.py |
|
``` |
|
2. Enter your Groq API key in the sidebar when prompted |
|
3. Type your research question in the chat input box |
|
4. The agent will search across available sources and provide a comprehensive response |
|
5. Your conversation history will be maintained throughout the session |
|
|
|
## Example Queries |
|
- "What are the latest developments in quantum computing?" |
|
- "Explain the concept of transformer models in NLP" |
|
- "What were the key findings from the recent climate change report?" |
|
- "Tell me about the history and applications of reinforcement learning" |
|
|
|
## Deployment |
|
This project is configured to deploy to Hugging Face Spaces using GitHub Actions. The workflow in `.github/workflows/main.yaml` automatically syncs the repository to Hugging Face when changes are pushed to the main branch. |
|
|
|
### Live Application |
|
The app is currently deployed and accessible at: [Hugging Face Spaces](https://huggingface.co/spaces/ashutoshchoudhari/Search-Engine-LLM-app) |
|
|
|
### Local Development |
|
For local development, you can use: |
|
```bash |
|
streamlit run app.py |
|
``` |
|
|
|
## License |
|
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details. |
|
|
|
## Acknowledgments |
|
- Langchain for providing the agent and tool framework |
|
- Groq for the LLM API access |
|
- Hugging Face for embeddings and hosting capabilities |
|
|
|
## Contributing |
|
Contributions, issues, and feature requests are welcome. Feel free to check issues page if you want to contribute. |