imt-ml Track Prediction — Ensemble (2025-09-06 12:47:55)

Predicts which MBTA commuter rail track/platform a train will use, using a small tabular neural-network ensemble trained on historical assignments. This card documents the artifacts in output/ensemble_20250906_124755.

Model Summary

  • Task: Tabular multi-class classification (13 track classes)
  • Library: Keras (TensorFlow backend)
  • Architecture: 6-model ensemble (diverse dense nets with embeddings + cyclical time features); softmax outputs averaged at inference
  • Inputs (preprocessed):
    • Categorical: station_id (int index), route_id (int index), direction_id (0/1)
    • Time (cyclical): hour_sin, hour_cos, minute_sin, minute_cos, day_sin, day_cos
    • Continuous: scheduled_timestamp (float seconds since epoch; normalized in-model)
  • Outputs: Probability over 13 track labels (softmax)
  • License: MIT

Files in This Repo

  • track_prediction_ensemble_model_0_final.kerastrack_prediction_ensemble_model_5_final.keras — individual ensemble members
  • track_prediction_ensemble_model_*_best.keras — best checkpoints during training (may match final)
  • training_report.md — training configuration and metrics

Note: Ensemble training currently does not emit a *_vocab.json. See “Preprocessing & Vocab” below.

Preprocessing & Vocab

Models expect integer indices for station_id and route_id, and raw direction_id 0/1. In training, indices are produced by lookup tables built from the dataset vocabularies. To reproduce inference exactly, you must use the same vocabularies (station/route/track) that were present at training time or ensure consistent mapping.

What to use:

  • The training pipeline’s dataset loader (imt_ml.dataset.create_feature_engineering_fn) defines the exact feature mapping. If you need the vocab files, re-run a training or export step to generate them for your data snapshot, or save the vocab mapping alongside the model.

Metrics (validation)

From training_report.md:

  • Average validation loss: 1.2251
  • Average validation accuracy: 0.5957
  • Best individual accuracy: 0.6049
  • Worst individual accuracy: 0.5812
  • Ensemble accuracy stdev: 0.0087
  • Dataset size: 24,832 records (310 train steps/epoch, 77 val steps/epoch)

These metrics reflect individual model performance; at inference time, average the softmax probabilities across all 6 models to produce ensemble predictions.

Example Usage (local Python)

This snippet loads all six Keras models and averages their softmax outputs. Replace the feature values with your preprocessed tensors/arrays, ensuring they match the training feature schema and index mappings.

import numpy as np
import keras

# Load ensemble members
paths = [
    "track_prediction_ensemble_model_0_final.keras",
    "track_prediction_ensemble_model_1_final.keras",
    "track_prediction_ensemble_model_2_final.keras",
    "track_prediction_ensemble_model_3_final.keras",
    "track_prediction_ensemble_model_4_final.keras",
    "track_prediction_ensemble_model_5_final.keras",
]
models = [keras.models.load_model(p, compile=False) for p in paths]

# Prepare one example (batch size 1) — values shown are placeholders.
# You must convert raw strings to indices using the same vocab mapping used in training.
features = {
    "station_id": np.array([12], dtype=np.int64),     # int index
    "route_id": np.array([3], dtype=np.int64),        # int index
    "direction_id": np.array([1], dtype=np.int64),    # 0 or 1
    "hour_sin": np.array([0.707], dtype=np.float32),
    "hour_cos": np.array([0.707], dtype=np.float32),
    "minute_sin": np.array([0.0], dtype=np.float32),
    "minute_cos": np.array([1.0], dtype=np.float32),
    "day_sin": np.array([0.433], dtype=np.float32),
    "day_cos": np.array([0.901], dtype=np.float32),
    "scheduled_timestamp": np.array([1.7260e9], dtype=np.float32),
}

# Predict per model and average probabilities
probs = [m.predict(features, verbose=0) for m in models]
avg_prob = np.mean(probs, axis=0)   # shape: (batch, num_tracks)
pred_class = int(np.argmax(avg_prob, axis=-1)[0])
print({"predicted_track_index": pred_class, "probabilities": avg_prob[0].tolist()})

Tip: If you have the track vocabulary used at training time, you can map pred_class back to its track label string by indexing into that track_vocab list.

Training Data

  • Source: Historical MBTA track assignments exported from Redis to TFRecord
  • Features:
    • Categorical: station_id, route_id, direction_id
    • Temporal: hour, minute, day_of_week (encoded as sin/cos pairs)
    • Target: track_number (13 classes)

Training Procedure

  • Command: ensemble
  • Num models: 6 (architectural diversity: deep, wide, standard)
  • Epochs: 150
  • Batch size: 64
  • Base learning rate: 0.001 (varied 0.8x–1.2x per model)
  • Regularization: L1/L2, Dropout, BatchNorm; cosine LR scheduling and early stopping when enabled

Intended Use & Limitations

  • Intended for assisting real-time track/platform assignment predictions for MBTA commuter rail.
  • Not a safety system; always defer to official dispatch/operations.
  • Sensitive to concept drift (schedule/operational changes) and to unseen stations/routes.
  • Requires consistent categorical index mapping between training and inference.
Downloads last month
269
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Evaluation results