CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

This is a Hugging Face Spaces application that provides text embeddings using 15+ state-of-the-art embedding models including Nomic, BGE, Snowflake Arctic, IBM Granite, and sentence-transformers models. It runs on CPU and provides both a web interface and API endpoints for generating text embeddings with model selection.

Key Commands

Local Development

# Install dependencies
pip install -r requirements.txt

# Run the application locally
python app.py

Git Operations

# Push to Hugging Face Spaces (requires authentication)
git push origin main

# Note: May need to authenticate with:
huggingface-cli login

Architecture

The application consists of a single app.py file with:

Model Configuration: Dictionary of 15+ embedding models with trust_remote_code settings (lines 10-26)
Model Caching: Dynamic model loading with caching to avoid reloading (lines 32-42)
FastAPI App: Direct HTTP endpoints at /embed and /models (lines 44, 57-102)
Embedding Function: Multi-model wrapper that calls model.encode() (lines 49-53)
Gradio Interface: UI with model dropdown selector and API endpoint (lines 106-135)
Dual Server: FastAPI mounted with Gradio using uvicorn (lines 214-219)

Important Configuration Details

Queue: Hugging Face Spaces enforces queuing at infrastructure level, even without .queue() in code
CPU Mode: Explicitly set to CPU to avoid GPU requirements
Trust Remote Code: Only predefined models in MODELS dict allow trust_remote_code=True
Any HF Model: API accepts any Hugging Face model name but uses trust_remote_code=False for unlisted models
API Access: Direct HTTP available via FastAPI endpoints

API Usage

Two options for API access:

Direct FastAPI endpoint (no queue):

# List models
curl https://ipepe-nomic-embeddings.hf.space/models

# Generate embedding with specific model
curl -X POST https://ipepe-nomic-embeddings.hf.space/embed \
  -H "Content-Type: application/json" \
  -d '{"text": "your text", "model": "mixedbread-ai/mxbai-embed-large-v1"}'

Gradio client (handles queue automatically):

from gradio_client import Client
client = Client("ipepe/nomic-embeddings")
result = client.predict("text to embed", "model-name", api_name="/predict")

Deployment Notes

Deployed on Hugging Face Spaces at https://huggingface.co/spaces/ipepe/nomic-embeddings
Runs on port 7860
Uses Gradio 4.36.1 (newer versions available)
PyTorch configured for CPU-only via --extra-index-url in requirements.txt

Development Constraints

There is no python installed locally, everything needs to be deployed to hugging face first