foundational-model

A semantic product recommendation model that matches user profiles (free text) to products. Uses a frozen multilingual MiniLM encoder with trainable projection heads and chunk attention for user encoding.

Model description

  • Architecture: Dual-encoder (user encoder + item encoder)
  • Base model: paraphrase-multilingual-MiniLM-L12-v2 (frozen)
  • Trainable params: ~148k (projection head + chunk attention)
  • Input: User profile text + product name + description
  • Output: Cosine similarity scores for ranking

Intended use

Product recommendation from user free-text profiles (e.g. "Marcos, gosto de videogames e de música, sou de Rio de janeiro"). Trained on synthetic e-commerce interactions in Portuguese.

How to use

from transformers import AutoTokenizer
import torch
from huggingface_hub import hf_hub_download

# Download checkpoint
checkpoint = hf_hub_download(repo_id="oristides/foundational-model", filename="pytorch_model.bin")

# Load model (requires model_arch1.RecSysModel - see repo for architecture)
from model.model_arch1 import RecSysModel
tokenizer = AutoTokenizer.from_pretrained("sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2")
model = RecSysModel()
model.load_state_dict(torch.load(checkpoint, map_location="cpu"))
model.eval()

# Encode user and items, then: scores = user_emb @ item_embs.T

Or use the recommender CLI in this repo: uv run projects/reneguirecsys/model/recommender.py "your profile" -k 10

Training

  • Loss: In-batch multi-negative cross-entropy
  • Split: Leave-one-out per user
  • Eval metrics: AUC, NDCG@10, MRR
  • Max sequence length: 256 (user chunks), 128 (items)

Citation

@misc{oristides-foundational-model-2025,
  author = {oristides},
  title = {Foundational Model for Product Recommendation},
  year = {2025},
  publisher = {Hugging Face},
  url = {https://huggingface.co/oristides/foundational-model}
}

License

MIT

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support