nomic-embeddings / CLAUDE.md
Patryk Ptasiński
Fix nvidia/NV-Embed-v2 to use trust_remote_code=True
0b5e578

A newer version of the Gradio SDK is available: 5.42.0

Upgrade

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

This is a Hugging Face Spaces application that provides text embeddings using 15+ state-of-the-art embedding models including Nomic, BGE, Snowflake Arctic, IBM Granite, and sentence-transformers models. It runs on CPU and provides both a web interface and API endpoints for generating text embeddings with model selection.

Key Commands

Local Development

# Install dependencies
pip install -r requirements.txt

# Run the application locally
python app.py

Git Operations

# Push to Hugging Face Spaces (requires authentication)
git push origin main

# Note: May need to authenticate with:
huggingface-cli login

Architecture

The application consists of a single app.py file with:

  • Model Configuration: Dictionary of 15+ embedding models with trust_remote_code settings (lines 10-26)
  • Model Caching: Dynamic model loading with caching to avoid reloading (lines 32-42)
  • FastAPI App: Direct HTTP endpoints at /embed and /models (lines 44, 57-102)
  • Embedding Function: Multi-model wrapper that calls model.encode() (lines 49-53)
  • Gradio Interface: UI with model dropdown selector and API endpoint (lines 106-135)
  • Dual Server: FastAPI mounted with Gradio using uvicorn (lines 214-219)

Important Configuration Details

  • Queue: Hugging Face Spaces enforces queuing at infrastructure level, even without .queue() in code
  • CPU Mode: Explicitly set to CPU to avoid GPU requirements
  • Trust Remote Code: Only predefined models in MODELS dict allow trust_remote_code=True
  • Any HF Model: API accepts any Hugging Face model name but uses trust_remote_code=False for unlisted models
  • API Access: Direct HTTP available via FastAPI endpoints

API Usage

Two options for API access:

  1. Direct FastAPI endpoint (no queue):
# List models
curl https://ipepe-nomic-embeddings.hf.space/models

# Generate embedding with specific model
curl -X POST https://ipepe-nomic-embeddings.hf.space/embed \
  -H "Content-Type: application/json" \
  -d '{"text": "your text", "model": "mixedbread-ai/mxbai-embed-large-v1"}'
  1. Gradio client (handles queue automatically):
from gradio_client import Client
client = Client("ipepe/nomic-embeddings")
result = client.predict("text to embed", "model-name", api_name="/predict")

Deployment Notes

Development Constraints

  • There is no python installed locally, everything needs to be deployed to hugging face first