Spaces:

ashutoshchoudhari
/

Search-Engine-LLM-app

Sleeping

App Files Files Community

Search-Engine-LLM-app / README.md

ashutoshchoudhari

Rename project title in README.md for clarity and consistency

51972c5 5 months ago

preview code

raw

history blame contribute delete

6.24 kB

	---
	title: Search Engine LLM App
	emoji: 🌍
	colorFrom: pink
	colorTo: indigo
	sdk: streamlit
	sdk_version: 1.43.2
	app_file: app.py
	pinned: false
	license: mit
	short_description: This app allows you to chat with an LLM that can search web.
	---

	Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

	# Search Engine LLM App

	## Overview
	This application is a powerful research assistant built with Langchain that can search across multiple knowledge sources including Wikipedia, arXiv, and the web via DuckDuckGo. It leverages Groq's LLM capabilities to provide intelligent, context-aware responses to user queries.

	## Live Demo
	Try the application live at: [Hugging Face Spaces](https://huggingface.co/spaces/ashutoshchoudhari/Search-Engine-LLM-app)

	## Features
	- Multi-source search: Access information from Wikipedia, arXiv scientific papers, and web results
	- Conversational memory: Retains context from previous interactions
	- Streaming responses: See the AI's response generated in real-time
	- User-friendly interface: Clean Streamlit UI for easy interaction

	## Technical Components
	- LLM: Groq's Llama3-8b-8192 model (with fallback support for Ollama models)
	- Embeddings: Hugging Face's all-MiniLM-L6-v2
	- Search Tools:
	- Wikipedia API
	- arXiv API
	- DuckDuckGo Search
	- Framework: Langchain for agent orchestration
	- Frontend: Streamlit

	## Project Structure
	- app.py: Main application file containing the Streamlit UI and Langchain integration
	- requirements.txt: Dependencies required to run the application
	- README.md: Project metadata and description for Hugging Face Spaces
	- tools_agents.ipynb: Jupyter notebook demonstrating how to use Langchain tools and agents
	- .github/workflows/main.yaml: GitHub Actions workflow for deploying to Hugging Face Spaces
	- .gitattributes: Git LFS configuration for handling large files
	- .gitignore: Standard Python gitignore file
	- LICENSE: MIT License file
	- app_documentation.md: This comprehensive documentation file

	## Implementation Details

	### LLM Integration
	The application uses Groq's API to access the Llama3-8b-8192 model with streaming capability:

	```python
	llm = ChatGroq(
	groq_api_key = st.session_state.api_key,
	model_name = "Llama3-8b-8192",
	streaming = True
	)
	```

	Alternative local models can also be configured with Ollama:
	```python
	#llm = ChatOllama(base_url=OLLAMA_WSL_IP, model="llama3.1", streaming=True)
	```

	### Search Tools Configuration
	The app configures three primary search tools:

	1. Wikipedia Search:
	```python
	api_wrapper_wiki = WikipediaAPIWrapper(top_k_results = 3, doc_content_chars_max=10000)
	wiki = WikipediaQueryRun(api_wrapper=api_wrapper_wiki)
	wiki_tool= Tool(
	name = "Wikipedia",
	func = wiki.run,
	description = "This tool uses the Wikipedia API to search for a topic."
	)
	```

	2. arXiv Search:
	```python
	api_wrapper_arxiv = ArxivAPIWrapper(top_k_results = 5, doc_content_chars_max=10000)
	arxiv = ArxivQueryRun(api_wrapper=api_wrapper_arxiv)
	arxiv_tool = Tool(
	name = "arxiv",
	func = arxiv.run,
	description = "Searches arXiv for papers matching the query.",
	)
	```

	3. DuckDuckGo Web Search:
	```python
	api_wrapper_ddg = DuckDuckGoSearchAPIWrapper(region="us-en", time="y", max_results=10)
	ddg = DuckDuckGoSearchResults(
	api_wrapper=api_wrapper_ddg,
	output_format="string",
	handle_tool_error=True,
	handle_validation_error=True)
	ddg_tool = Tool(
	name = "DuckDuckGo_Search",
	func = ddg.run,
	description = "Searches for search queries using the DuckDuckGo Search engine."
	)
	```

	### Agent Configuration
	The system uses the CHAT_CONVERSATIONAL_REACT_DESCRIPTION agent type with a conversational memory buffer:

	```python
	memory = ConversationBufferWindowMemory(k=5, memory_key="chat_history", return_messages=True)

	search_agent = initialize_agent(
	tools = tools,
	llm = llm,
	agent = AgentType.CHAT_CONVERSATIONAL_REACT_DESCRIPTION,
	max_iterations = 10,
	memory = memory,
	handle_parsing_errors = True)
	```

	## Setup Requirements
	1. Groq API key
	2. Hugging Face token (for embeddings)
	3. Python environment with required dependencies

	## Installation Instructions
	Install the required packages using:

	```bash
	pip install -r requirements.txt
	```

	Required packages include:
	- arxiv
	- wikipedia
	- langchain, langchain-community, langchain-huggingface, langchain-groq
	- openai
	- duckduckgo-search
	- ollama, langchain-ollama (for local model support)

	## Environment Variables
	Create a `.env` file with the following variables:
	```
	GROQ_API_KEY=your_groq_api_key_here
	HF_TOKEN=your_huggingface_token_here
	```

	## Usage
	1. Start the application using Streamlit:
	```bash
	streamlit run app.py
	```
	2. Enter your Groq API key in the sidebar when prompted
	3. Type your research question in the chat input box
	4. The agent will search across available sources and provide a comprehensive response
	5. Your conversation history will be maintained throughout the session

	## Example Queries
	- "What are the latest developments in quantum computing?"
	- "Explain the concept of transformer models in NLP"
	- "What were the key findings from the recent climate change report?"
	- "Tell me about the history and applications of reinforcement learning"

	## Deployment
	This project is configured to deploy to Hugging Face Spaces using GitHub Actions. The workflow in `.github/workflows/main.yaml` automatically syncs the repository to Hugging Face when changes are pushed to the main branch.

	### Live Application
	The app is currently deployed and accessible at: [Hugging Face Spaces](https://huggingface.co/spaces/ashutoshchoudhari/Search-Engine-LLM-app)

	### Local Development
	For local development, you can use:
	```bash
	streamlit run app.py
	```

	## License
	This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

	## Acknowledgments
	- Langchain for providing the agent and tool framework
	- Groq for the LLM API access
	- Hugging Face for embeddings and hosting capabilities

	## Contributing
	Contributions, issues, and feature requests are welcome. Feel free to check issues page if you want to contribute.