Temporal Twins: A Matched-Control Benchmark for Temporal Fraud Detection

Synthetic UPI-style temporal transaction benchmark where fraud and benign trajectories are matched on static and prefix-level summaries but differ in delayed event-order structure.

Installation

Recommended Python: 3.11+

pip install -r requirements.txt

If you prefer Conda:

conda env create -f environment.yml
conda activate temporal-twins

Repository Structure

src/: synthetic user, transaction, risk, fraud, graph, and temporal benchmark generation code
models/: SeqGRU, static baselines, audit/probe models, and temporal GNN wrappers
experiments/: deterministic benchmark runner and matched-prefix evaluation utilities
config/: base YAML configs used by the experiment runner
configs/: release-facing config snapshots for calibration and paper-suite reproduction
docs/: determinism and supporting documentation
metadata/: MLCommons Croissant metadata and validation notes
results/: lightweight frozen paper-suite summaries and interpretation notes

Quick Smoke Test

PYTHONPATH=. python3 experiments/run_all.py \
  --fast \
  --seed 0 \
  --benchmark-mode temporal_twins_oracle_calib \
  --experiments audit \
  --device cpu

Exact Paper-Scale Reproduction

The checked-in CLI exposes --benchmark-mode, --seed, --seeds, --fast, --device, and --experiments, but not separate --difficulty, --num-users, or --simulation-days flags. For the exact grouped paper-scale runs, use the helper below from the repository root.

Define this shell helper once:

run_group() {
  local group="$1"
  local seed="$2"
  local out_json="$3"

  PYTHONPATH=. python3 - "$group" "$seed" "$out_json" <<'PY'
import json
import math
import sys
import time
from pathlib import Path

from src.core.config_loader import load_config
from experiments.run_all import (
    build_gate_pool_from_frames,
    gate_volume_is_sufficient,
    generate_single_difficulty,
    offset_gate_namespace,
    prepare_gate_subset,
    run_motif_validity_check,
    set_global_determinism,
)


def normalize(value):
    if isinstance(value, dict):
        return {k: normalize(v) for k, v in value.items()}
    if isinstance(value, (list, tuple)):
        return [normalize(v) for v in value]
    if hasattr(value, "item"):
        try:
            value = value.item()
        except Exception:
            pass
    if isinstance(value, float) and not math.isfinite(value):
        return None
    return value


group = sys.argv[1]
seed = int(sys.argv[2])
out_json = Path(sys.argv[3])

if group == "oracle_calib":
    benchmark_mode = "temporal_twins_oracle_calib"
    difficulty = "easy"
    hard_abort = True
else:
    benchmark_mode = "temporal_twins"
    difficulty = group
    hard_abort = False

cfg = load_config("config/default.yaml")
cfg = cfg.model_copy(
    update={
        "num_users": 350,
        "simulation_days": 45,
        "benchmark_mode": benchmark_mode,
        "random_seed": seed,
    }
)

set_global_determinism(seed)
pool = generate_single_difficulty(
    cfg,
    difficulty=difficulty,
    seed=seed,
    benchmark_mode=benchmark_mode,
)
gate = prepare_gate_subset(pool, seed=seed, fast_mode=False)
pack_count = 1

while (not gate_volume_is_sufficient(gate["volume"], False)) and pack_count <= 6:
    extra_seed = seed + pack_count * 10007
    extra_pack = generate_single_difficulty(
        cfg,
        difficulty=difficulty,
        seed=extra_seed,
        benchmark_mode=benchmark_mode,
    )
    extra_pack = offset_gate_namespace(extra_pack, pack_count)
    pool = build_gate_pool_from_frames([pool, extra_pack])
    gate = prepare_gate_subset(pool, seed=seed, fast_mode=False)
    pack_count += 1

gate["source_pool_events"] = int(len(pool))
gate["source_pool_pairs"] = int(pool.loc[pool["twin_pair_id"] >= 0, "twin_pair_id"].nunique()) if "twin_pair_id" in pool.columns else 0
gate["source_pool_packs"] = int(pack_count)

start = time.time()
gate_pass, report = run_motif_validity_check(
    df=pool,
    config=cfg,
    seed=seed,
    device="cpu",
    num_epochs=3,
    node_epochs=150,
    n_checkpoints=8,
    hard_abort=hard_abort,
    benchmark_mode=benchmark_mode,
    fast_mode=False,
    force_temporal_models=True,
    prebuilt_gate=gate,
)
elapsed = time.time() - start

result = {
    "benchmark_group": group,
    "benchmark_mode": benchmark_mode,
    "seed": seed,
    "primary_metric_label": report["audit_metric_label"],
    "secondary_metric_label": report["raw_metric_label"],
    "gate_pass": bool(gate_pass),
    "run_wall_time_sec": float(elapsed),
    **report,
}

out_json.parent.mkdir(parents=True, exist_ok=True)
out_json.write_text(json.dumps(normalize(result), indent=2) + "\n")
print(f"Wrote {out_json}")
PY
}

Reproduce `oracle_calib`

run_group oracle_calib 0 results/paper_suite_repro/jobs/oracle_calib_0.json

Reproduce `easy`

run_group easy 0 results/paper_suite_repro/jobs/easy_0.json

Reproduce `medium`

run_group medium 0 results/paper_suite_repro/jobs/medium_0.json

Reproduce `hard`

run_group hard 0 results/paper_suite_repro/jobs/hard_0.json

Reproduce the Full Paper Suite

mkdir -p results/paper_suite_repro/jobs

for group in oracle_calib easy medium hard; do
  for seed in 0 1 2 3 4; do
    run_group "$group" "$seed" "results/paper_suite_repro/jobs/${group}_${seed}.json"
  done
done

The frozen reference outputs for the final deterministic suite are already included in results/:

paper_suite_summary.csv
paper_suite_summary.md
paper_suite_runtime.csv
paper_suite_meta.json
paper_suite_runs.csv
PAPER_GATE_INTERPRETATION.md

Expected Headline Results

Benchmark	XGBoost ROC-AUC	StaticGNN ROC-AUC	SeqGRU ROC-AUC	SeqGRU Shuffle Delta
`oracle_calib`	`0.5000`	`0.5222`	`1.0000`	`-0.5032`
`easy`	`0.5000`	`0.4946`	`1.0000`	`-0.5003`
`medium`	`0.5000`	`0.4922`	`0.8391`	`-0.3337`
`hard`	`0.5000`	`0.5026`	`0.6876`	`-0.1883`

Determinism

CPU deterministic runtime is enabled. The same seed should reproduce identical matched-prefix data and metrics. Deterministic torch settings can slow runtime, especially for the non-fast paper-scale suite.

Data Note

This code repository contains source code, metadata, documentation, and lightweight result summaries only. The generated synthetic dataset and full release artifacts are hosted separately at the dataset repository:

https://huggingface.co/datasets/temporal-twins-benchmark/temporal-twins

Privacy Note

Synthetic data only
No real UPI transactions
No real users
No real bank accounts
No personal financial records

License

Code: Apache-2.0
Dataset and generated benchmark artifacts: CC-BY-4.0

Citation

Anonymous NeurIPS 2026 submission; final citation to be added after review.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

temporal-twins-benchmark
/

temporal-twins-code

Temporal Twins: A Matched-Control Benchmark for Temporal Fraud Detection

Links

Installation

Repository Structure

Quick Smoke Test

Exact Paper-Scale Reproduction

Reproduce `oracle_calib`

Reproduce `easy`

Reproduce `medium`

Reproduce `hard`

Reproduce the Full Paper Suite

Expected Headline Results

Determinism

Data Note

Privacy Note

License

Citation

Temporal Twins: A Matched-Control Benchmark for Temporal Fraud Detection

Links

Installation

Repository Structure

Quick Smoke Test

Exact Paper-Scale Reproduction

Reproduce oracle_calib

Reproduce easy

Reproduce medium

Reproduce hard

Reproduce the Full Paper Suite

Expected Headline Results

Determinism

Data Note

Privacy Note

License

Citation

Reproduce `oracle_calib`

Reproduce `easy`

Reproduce `medium`

Reproduce `hard`