metadata

title: Search Engine LLM App
emoji: 🌍
colorFrom: pink
colorTo: indigo
sdk: streamlit
sdk_version: 1.43.2
app_file: app.py
pinned: false
license: mit
short_description: This app allows you to chat with an LLM that can search web.

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

Search Engine LLM App

Overview

This application is a powerful research assistant built with Langchain that can search across multiple knowledge sources including Wikipedia, arXiv, and the web via DuckDuckGo. It leverages Groq's LLM capabilities to provide intelligent, context-aware responses to user queries.

Live Demo

Try the application live at: Hugging Face Spaces

Features

Multi-source search: Access information from Wikipedia, arXiv scientific papers, and web results
Conversational memory: Retains context from previous interactions
Streaming responses: See the AI's response generated in real-time
User-friendly interface: Clean Streamlit UI for easy interaction

Technical Components

LLM: Groq's Llama3-8b-8192 model (with fallback support for Ollama models)
Embeddings: Hugging Face's all-MiniLM-L6-v2
Search Tools:
- Wikipedia API
- arXiv API
- DuckDuckGo Search
Framework: Langchain for agent orchestration
Frontend: Streamlit

Project Structure

app.py: Main application file containing the Streamlit UI and Langchain integration
requirements.txt: Dependencies required to run the application
README.md: Project metadata and description for Hugging Face Spaces
tools_agents.ipynb: Jupyter notebook demonstrating how to use Langchain tools and agents
.github/workflows/main.yaml: GitHub Actions workflow for deploying to Hugging Face Spaces
.gitattributes: Git LFS configuration for handling large files
.gitignore: Standard Python gitignore file
LICENSE: MIT License file
app_documentation.md: This comprehensive documentation file

Implementation Details

LLM Integration

The application uses Groq's API to access the Llama3-8b-8192 model with streaming capability:

llm = ChatGroq(
    groq_api_key = st.session_state.api_key, 
    model_name = "Llama3-8b-8192", 
    streaming = True
)

Alternative local models can also be configured with Ollama:

#llm = ChatOllama(base_url=OLLAMA_WSL_IP, model="llama3.1", streaming=True)

Search Tools Configuration

The app configures three primary search tools:

Wikipedia Search:

api_wrapper_wiki = WikipediaAPIWrapper(top_k_results = 3, doc_content_chars_max=10000)
wiki = WikipediaQueryRun(api_wrapper=api_wrapper_wiki)
wiki_tool= Tool(
    name = "Wikipedia",
    func = wiki.run,
    description = "This tool uses the Wikipedia API to search for a topic."
)

arXiv Search:

api_wrapper_arxiv = ArxivAPIWrapper(top_k_results = 5, doc_content_chars_max=10000)
arxiv = ArxivQueryRun(api_wrapper=api_wrapper_arxiv)
arxiv_tool = Tool(
    name = "arxiv",
    func = arxiv.run,
    description = "Searches arXiv for papers matching the query.",
)

DuckDuckGo Web Search:

api_wrapper_ddg = DuckDuckGoSearchAPIWrapper(region="us-en", time="y", max_results=10)
ddg = DuckDuckGoSearchResults(
    api_wrapper=api_wrapper_ddg,
    output_format="string",
    handle_tool_error=True,
    handle_validation_error=True)
ddg_tool = Tool(
    name = "DuckDuckGo_Search",
    func = ddg.run,
    description = "Searches for search queries using the DuckDuckGo Search engine."
)

Agent Configuration

The system uses the CHAT_CONVERSATIONAL_REACT_DESCRIPTION agent type with a conversational memory buffer:

memory = ConversationBufferWindowMemory(k=5, memory_key="chat_history", return_messages=True)

search_agent = initialize_agent(
    tools = tools,
    llm = llm,
    agent = AgentType.CHAT_CONVERSATIONAL_REACT_DESCRIPTION,
    max_iterations = 10,
    memory = memory,
    handle_parsing_errors = True)

Setup Requirements

Groq API key
Hugging Face token (for embeddings)
Python environment with required dependencies

Installation Instructions

Install the required packages using:

pip install -r requirements.txt

Required packages include:

arxiv
wikipedia
langchain, langchain-community, langchain-huggingface, langchain-groq
openai
duckduckgo-search
ollama, langchain-ollama (for local model support)

Environment Variables

Create a .env file with the following variables:

GROQ_API_KEY=your_groq_api_key_here
HF_TOKEN=your_huggingface_token_here

Usage

Start the application using Streamlit:
```
streamlit run app.py
```
Enter your Groq API key in the sidebar when prompted
Type your research question in the chat input box
The agent will search across available sources and provide a comprehensive response
Your conversation history will be maintained throughout the session

Example Queries

"What are the latest developments in quantum computing?"
"Explain the concept of transformer models in NLP"
"What were the key findings from the recent climate change report?"
"Tell me about the history and applications of reinforcement learning"

Deployment

This project is configured to deploy to Hugging Face Spaces using GitHub Actions. The workflow in .github/workflows/main.yaml automatically syncs the repository to Hugging Face when changes are pushed to the main branch.

Live Application

The app is currently deployed and accessible at: Hugging Face Spaces

Local Development

For local development, you can use:

streamlit run app.py

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Langchain for providing the agent and tool framework
Groq for the LLM API access
Hugging Face for embeddings and hosting capabilities

Contributing

Contributions, issues, and feature requests are welcome. Feel free to check issues page if you want to contribute.