YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Revolutionizing E-commerce with Foundational Model

1. Introduction

This repository contains a fine-tuned version of the RWKV v4 430M Pile language model, a recurrent neural network (RNN) with 430 million parameters, optimized for e-commerce tasks using the eCeLLM dataset (specifically, the ECInstruct dataset). This fine-tuned model enhances the base model's capabilities to understand and generate text tailored to e-commerce contexts, such as product descriptions, user reviews, and query-product interactions.

What is RWKV?

  • RWKV (pronounced "RwaKuv") is an innovative RNN architecture designed for high performance in large language modeling (LLM) tasks.
  • It combines the strengths of RNNs and Transformers:
    • Strong performance: Comparable to Transformer-based models.
    • Linear time complexity: Unlike the quadratic complexity of self-attention in Transformers.
    • Constant memory usage: No key-value (KV) cache required, unlike Transformers.
    • Fast training speed: Supports Transformer-like parallelization during training.
    • Infinite context length: Capable of handling long contexts effectively.
    • Free sentence embeddings: Naturally provides embeddings without additional computation.
  • The current state-of-the-art is RWKV-7 "Goose," but this model is based on RWKV v4, a well-established version.

RWKV v4 430M Pile

  • A specific instantiation of RWKV v4 with 430 million parameters.
  • Originally trained on the Pile dataset, a diverse corpus for general language modeling.
  • Configuration estimate: Based on similar models (e.g., 0.4B uses --n_layer 24, --n_embd 1024), this model likely follows a comparable setup.

Differences from Traditional Models (e.g., Transformers)

Feature RWKV (RNN) Transformers
Architecture Recurrent Neural Network Self-attention mechanism
KV Cache None (constant memory usage) Required (memory grows with context)
Parallelism Transformer-like parallelism Fully parallel
Long-context Handling Potentially infinite context Limited by memory and design
Hardware Friendliness ASIC-friendly, edge-device suitable Less efficient on edge devices

These properties make RWKV ideal for long-context applications and deployment on resource-constrained environments.


2. Fine-tuning Objective

The primary goals of fine-tuning the RWKV v4 430M Pile model on the eCeLLM dataset are:

  • Enhance e-commerce task performance: Improve accuracy and relevance on tasks like product understanding, user sentiment analysis, and query-product matching.
  • Align with e-commerce contexts: Enable the model to generate and interpret text specific to e-commerce, such as product descriptions, user reviews, and recommendations.
  • Analyze key interactions: Strengthen the model's ability to process product information, user behavior, and user-product interactions in real-world e-commerce settings.

3. Fine-tune Dataset Description (eCeLLM Dataset)

This model was fine-tuned on the ECInstruct dataset, a cornerstone of the eCeLLM project (eCeLLM: Generalizing Large Language Models for E-commerce from Large-scale, High-quality Instruction Data).

Overview

  • Size: 116,528 samples.
  • Scope: Covers 10 real-world e-commerce tasks across 4 categories.
  • Purpose: A high-quality instruction benchmark dataset designed to develop and evaluate LLMs for e-commerce applications.

Task Categories and Examples

  1. Product Understanding:

    • Attribute Value Extraction (AVE): Extract attribute values (e.g., "color: blue") from product descriptions.
    • Product Matching (PM): Determine if two product listings refer to the same item.
    • Product Relation Prediction (PRP): Predict relationships between products (e.g., co-purchase likelihood).
  2. User Understanding:

    • Sentiment Analysis (SA): Assess sentiment in user reviews (e.g., positive, negative).
    • Sequential Recommendation (SR): Predict the next product a user might purchase based on history.
  3. Query-Product Matching:

    • Multi-class Product Classification (MPC): Match a query to a product category.
    • Product Substitute Identification (PSI): Identify functional substitutes for a queried product.
    • Query-Product Ranking (QPR): Rank products by relevance to a user query.
  4. Product Question Answering:

    • Answerability Prediction (AP): Determine if a question about a product can be answered from available data.
    • Answer Generation (AG): Generate answers to product-related questions.

Characteristics

  • Source: Collected from real-world e-commerce platforms.
  • Quality: Underwent rigorous curation and filtering (e.g., removing overlaps, ensuring English-only data, manual inspection).
  • Evaluation Splits: Includes in-domain (IND) and out-of-domain (OOD) test sets for 6 tasks, with OOD featuring unseen products to assess generalization.
  • Tokenizer: Likely uses the "World tokenizer" (common in RWKV models), though specifics depend on your fine-tuning setup.

4. How to Use the Fine-tuned Model

Requirements

  • rwkv library: Install via pip install rwkv (ensure compatibility with RWKV v4).
  • Python libraries: torch (for model operations).
  • Model files:
    • Fine-tuned model weights ().
    • Model configuration ().
    • Tokenizer file ().

Load Model

Here’s how to load the model in Python:

import os
from rwkv.model import RWKV

# Path to your fine-tuned model weights
model_path = 'CommerAI/ecom-foundational-model'

# Initialize the model (adjust 'strategy' based on your hardware, e.g., 'cuda fp16')
model = RWKV(model=model_path, strategy='cuda fp16')

Input Preprocessing

  • Tokenization: Use the same tokenizer as the base model (e.g., pile_20B_tokenizer_text_document.bin for Pile-trained RWKV v4) unless customized during fine-tuning.
  • Steps:
    1. Convert input text to tokens using the tokenizer.
    2. Ensure input format matches the training data (e.g., instruction + context).

Example:

from rwkv.utils import TOKENIZER

tokenizer = TOKENIZER('path/to/tokenizer.bin')
input_text = "This product is great!"
tokens = tokenizer.encode(input_text)

5. Model Performance After Fine-tuning

Evaluation Metrics

The model’s performance is assessed using metrics aligned with eCeLLM tasks:

  • Accuracy: Proportion of correct predictions (e.g., MPC, AP).
  • F1-score: Harmonic mean of precision and recall (e.g., AVE, SA, PSI).
  • nDCG: Normalized Discounted Cumulative Gain for ranking tasks (e.g., QPR).
  • HR@1: Hit Rate at 1 for recommendation tasks (e.g., SR).
  • FBERT, RBERT, PBERT, BLEURT: Specialized metrics for generation tasks (e.g., AG), measuring semantic similarity and fluency.

Evaluation Results

(Note: Replace this section with your actual results if available.)

Since specific evaluation results are not provided, users are encouraged to evaluate the model on the ECInstruct test sets (IND and OOD splits) using the metrics above. For reference, the eCeLLM paper reports:

  • In-domain (IND): eCeLLM models achieved a 10.7% average improvement over baselines (e.g., GPT-4, task-specific SoTA models).
  • Out-of-domain (OOD): 9.3% improvement on unseen products, showcasing strong generalization.

Example table format for results (fill in your data):

Task Metric Base RWKV v4 430M Fine-tuned Model Improvement (%)
SA Macro F1 0.470 0.639 36.0
PM F1 0.867 0.995 14.8

6. References

  • RWKV Introduction Paper: [Add citation if available, e.g., "Peng, B. et al., RWKV: Reinventing RNNs for the Transformer Era"].
  • RWKV GitHub Repository: https://github.com/BlinkDL/RWKV-LM
  • eCeLLM Paper: Peng, B., Ling, X., Chen, Z., Sun, H., & Ning, X. (2024). eCeLLM: Generalizing Large Language Models for E-commerce from Large-scale, High-quality Instruction Data. Proceedings of the 41st International Conference on Machine Learning, PMLR 235.
  • eCeLLM GitHub Repository: https://github.com/ninglab/eCeLLM

7. Additional Information

  • Fine-tuning Details: [Specify your process, e.g., epochs, learning rate, hardware used, if available.]
  • Model Limitations: May struggle with tasks requiring extensive world knowledge beyond e-commerce or extremely rare edge cases in ECInstruct.
  • Contributions: We welcome community feedback and contributions! Please submit issues or PRs to the repository.
  • License: [Specify your license, e.g., MIT, Apache 2.0, or refer to RWKV/eCeLLM licenses if applicable.]
  • Contact: For inquiries, reach out to hello@astadeus.com.
  • Hugging Face Model: Available at CommerAI/ecom-foundational-model.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support