Revolutionizing E-commerce with Foundational Model

1. Introduction

This repository contains a fine-tuned version of the RWKV v4 430M Pile language model, a recurrent neural network (RNN) with 430 million parameters, optimized for e-commerce tasks using the eCeLLM dataset (specifically, the ECInstruct dataset). This fine-tuned model enhances the base model's capabilities to understand and generate text tailored to e-commerce contexts, such as product descriptions, user reviews, and query-product interactions.

What is RWKV?

RWKV (pronounced "RwaKuv") is an innovative RNN architecture designed for high performance in large language modeling (LLM) tasks.
It combines the strengths of RNNs and Transformers:
- Strong performance: Comparable to Transformer-based models.
- Linear time complexity: Unlike the quadratic complexity of self-attention in Transformers.
- Constant memory usage: No key-value (KV) cache required, unlike Transformers.
- Fast training speed: Supports Transformer-like parallelization during training.
- Infinite context length: Capable of handling long contexts effectively.
- Free sentence embeddings: Naturally provides embeddings without additional computation.
The current state-of-the-art is RWKV-7 "Goose," but this model is based on RWKV v4, a well-established version.

RWKV v4 430M Pile

A specific instantiation of RWKV v4 with 430 million parameters.
Originally trained on the Pile dataset, a diverse corpus for general language modeling.
Configuration estimate: Based on similar models (e.g., 0.4B uses --n_layer 24, --n_embd 1024), this model likely follows a comparable setup.

Differences from Traditional Models (e.g., Transformers)

Feature	RWKV (RNN)	Transformers
Architecture	Recurrent Neural Network	Self-attention mechanism
KV Cache	None (constant memory usage)	Required (memory grows with context)
Parallelism	Transformer-like parallelism	Fully parallel
Long-context Handling	Potentially infinite context	Limited by memory and design
Hardware Friendliness	ASIC-friendly, edge-device suitable	Less efficient on edge devices

These properties make RWKV ideal for long-context applications and deployment on resource-constrained environments.

2. Fine-tuning Objective

The primary goals of fine-tuning the RWKV v4 430M Pile model on the eCeLLM dataset are:

Enhance e-commerce task performance: Improve accuracy and relevance on tasks like product understanding, user sentiment analysis, and query-product matching.
Align with e-commerce contexts: Enable the model to generate and interpret text specific to e-commerce, such as product descriptions, user reviews, and recommendations.
Analyze key interactions: Strengthen the model's ability to process product information, user behavior, and user-product interactions in real-world e-commerce settings.

3. Fine-tune Dataset Description (eCeLLM Dataset)

This model was fine-tuned on the ECInstruct dataset, a cornerstone of the eCeLLM project (eCeLLM: Generalizing Large Language Models for E-commerce from Large-scale, High-quality Instruction Data).

Overview

Size: 116,528 samples.
Scope: Covers 10 real-world e-commerce tasks across 4 categories.
Purpose: A high-quality instruction benchmark dataset designed to develop and evaluate LLMs for e-commerce applications.

Task Categories and Examples

Product Understanding:
- Attribute Value Extraction (AVE): Extract attribute values (e.g., "color: blue") from product descriptions.
- Product Matching (PM): Determine if two product listings refer to the same item.
- Product Relation Prediction (PRP): Predict relationships between products (e.g., co-purchase likelihood).
User Understanding:
- Sentiment Analysis (SA): Assess sentiment in user reviews (e.g., positive, negative).
- Sequential Recommendation (SR): Predict the next product a user might purchase based on history.
Query-Product Matching:
- Multi-class Product Classification (MPC): Match a query to a product category.
- Product Substitute Identification (PSI): Identify functional substitutes for a queried product.
- Query-Product Ranking (QPR): Rank products by relevance to a user query.
Product Question Answering:
- Answerability Prediction (AP): Determine if a question about a product can be answered from available data.
- Answer Generation (AG): Generate answers to product-related questions.

Characteristics

Source: Collected from real-world e-commerce platforms.
Quality: Underwent rigorous curation and filtering (e.g., removing overlaps, ensuring English-only data, manual inspection).
Evaluation Splits: Includes in-domain (IND) and out-of-domain (OOD) test sets for 6 tasks, with OOD featuring unseen products to assess generalization.
Tokenizer: Likely uses the "World tokenizer" (common in RWKV models), though specifics depend on your fine-tuning setup.

4. How to Use the Fine-tuned Model

Requirements

rwkv library: Install via pip install rwkv (ensure compatibility with RWKV v4).
Python libraries: torch (for model operations).
Model files:
- Fine-tuned model weights ().
- Model configuration ().
- Tokenizer file ().

Load Model

Here’s how to load the model in Python:

import os
from rwkv.model import RWKV

# Path to your fine-tuned model weights
model_path = 'CommerAI/ecom-foundational-model'

# Initialize the model (adjust 'strategy' based on your hardware, e.g., 'cuda fp16')
model = RWKV(model=model_path, strategy='cuda fp16')

Input Preprocessing

Tokenization: Use the same tokenizer as the base model (e.g., pile_20B_tokenizer_text_document.bin for Pile-trained RWKV v4) unless customized during fine-tuning.
Steps:
1. Convert input text to tokens using the tokenizer.
2. Ensure input format matches the training data (e.g., instruction + context).

Example:

from rwkv.utils import TOKENIZER

tokenizer = TOKENIZER('path/to/tokenizer.bin')
input_text = "This product is great!"
tokens = tokenizer.encode(input_text)

5. Model Performance After Fine-tuning

Evaluation Metrics

The model’s performance is assessed using metrics aligned with eCeLLM tasks:

Accuracy: Proportion of correct predictions (e.g., MPC, AP).
F1-score: Harmonic mean of precision and recall (e.g., AVE, SA, PSI).
nDCG: Normalized Discounted Cumulative Gain for ranking tasks (e.g., QPR).
HR@1: Hit Rate at 1 for recommendation tasks (e.g., SR).
FBERT, RBERT, PBERT, BLEURT: Specialized metrics for generation tasks (e.g., AG), measuring semantic similarity and fluency.

Evaluation Results

(Note: Replace this section with your actual results if available.)

Since specific evaluation results are not provided, users are encouraged to evaluate the model on the ECInstruct test sets (IND and OOD splits) using the metrics above. For reference, the eCeLLM paper reports:

In-domain (IND): eCeLLM models achieved a 10.7% average improvement over baselines (e.g., GPT-4, task-specific SoTA models).
Out-of-domain (OOD): 9.3% improvement on unseen products, showcasing strong generalization.

Example table format for results (fill in your data):

Task	Metric	Base RWKV v4 430M	Fine-tuned Model	Improvement (%)
SA	Macro F1	0.470	0.639	36.0
PM	F1	0.867	0.995	14.8

6. References

RWKV Introduction Paper: [Add citation if available, e.g., "Peng, B. et al., RWKV: Reinventing RNNs for the Transformer Era"].
RWKV GitHub Repository: https://github.com/BlinkDL/RWKV-LM
eCeLLM Paper: Peng, B., Ling, X., Chen, Z., Sun, H., & Ning, X. (2024). eCeLLM: Generalizing Large Language Models for E-commerce from Large-scale, High-quality Instruction Data. Proceedings of the 41st International Conference on Machine Learning, PMLR 235.
eCeLLM GitHub Repository: https://github.com/ninglab/eCeLLM

7. Additional Information

Fine-tuning Details: [Specify your process, e.g., epochs, learning rate, hardware used, if available.]
Model Limitations: May struggle with tasks requiring extensive world knowledge beyond e-commerce or extremely rare edge cases in ECInstruct.
Contributions: We welcome community feedback and contributions! Please submit issues or PRs to the repository.
License: [Specify your license, e.g., MIT, Apache 2.0, or refer to RWKV/eCeLLM licenses if applicable.]
Contact: For inquiries, reach out to hello@astadeus.com.
Hugging Face Model: Available at CommerAI/ecom-foundational-model.