Skip to content

Time-Relaxed Directed GNN for Bitcoin Fraud Detection | 6 Novel Contributions | Production-Ready | E7-A3: 0.5846 PR-AUC (+4.1%) | E9 Fusion: +33.5% | Publication-Ready Research

Notifications You must be signed in to change notification settings

BhaveshBytess/TRDGNN

Repository files navigation

🕐 When Temporal Constraints Meet Graph Neural Networks

A Systematic Investigation of Heterogeneous Temporal GNNs for Bitcoin Fraud Detection

License: MIT DOI Python 3.10+ PyTorch PyG


🎯 TL;DR

Most GNN research ignores temporal leakage. We built the first zero-leakage temporal heterogeneous GNN for fraud detection.

Through systematic investigation (9 experiments), we discovered that:

  • Heterogeneous temporal GNNs work when properly designed (+4.7% over homogeneous baseline)
  • Architecture matters more than scale (50K parameters beats 500K by 108%)
  • GNN + Tabular fusion achieves +33.5% synergy in wallet-level fraud detection
  • The "temporal tax" can be reduced from 16.5% to 12.6% through better design

Main Result: Our best model (E7-A3) achieves PR-AUC 0.5846 with strict temporal constraints. Fusion with tabular features (E9) demonstrates +33.5% improvement over tabular-only approaches.


🔬 The Unique Contribution

Complete Scientific Story: Most papers hide failures. We document the full journey:

E6 (Hypothesis): Complex heterogeneous GNN → 0.2806 PR-AUC ❌ (-49.7% failure)
E7 (Investigation): Systematic ablations isolate root cause
E7-A3 (Resolution): Simple heterogeneous GNN → 0.5846 PR-AUC ✅ (+108% recovery)
E9 (Innovation): GNN+Tabular fusion → +33.5% synergy 🏆

This is how REAL science works.


📊 Performance Comparison

We trained 9 models using strict temporal splits (zero future leakage) on the Elliptic++ dataset:

Model PR-AUC ⭐ ROC-AUC F1 Type Notes
🌳 XGBoost 0.669 🥇 0.888 0.699 Tabular Best overall
🌳 Random Forest 0.658 🥈 0.877 0.695 Tabular Strong baseline
🕸️ E7-A3 (Simple-HHGTN) 0.585 🥉 0.831 0.258 Temporal Hetero GNN Best GNN (+4.7%)
🕸️ E3 (TRD-GraphSAGE) 0.558 0.806 0.586 Temporal GNN Solid baseline
🌐 MLP 0.364 0.830 0.486 Neural Net Tabular features
🏆 E9 Fusion 0.300 0.890 0.176 Wallet-Level +33.5% synergy
🕸️ E6 (Complex-HHGTN) 0.281 0.756 0.298 Temporal Hetero GNN Failure case

📌 Key Insight: The 108% recovery (E6 → E7-A3) demonstrates that architectural simplicity enables better generalization. The +33.5% fusion synergy (E9) proves GNN structural embeddings complement tabular features.


🚀 Quick Start

Prerequisites

  • Python 3.8+
  • CUDA-capable GPU (optional, for GNN training)
  • ~3GB disk space for dataset

Installation & Reproduction

# 1️⃣ Clone and setup environment
git clone https://github.com/BhaveshBytess/TRDGNN.git
cd TRDGNN
python -m venv venv && source venv/bin/activate  # Windows: venv\Scripts\activate
pip install -r requirements.txt

# 2️⃣ Download Elliptic++ dataset (NOT included in repo)
# Get from: https://www.kaggle.com/datasets/ellipticco/elliptic-data-set
# Place these files in: data/Elliptic++ Dataset/
#   ├── txs_features.csv
#   ├── txs_classes.csv
#   └── txs_edgelist.csv

# 3️⃣ Run TRD sampler tests (verify zero-leakage)
pytest tests/test_trd_sampler.py -v

# 4️⃣ Reproduce results
# Train best temporal GNN (GPU recommended, ~20 min)
python -m src.train --config configs/e7_a3_simple_hhgtn.yaml

# Train fusion model (CPU, ~5 min)
python scripts/run_e9_fusion.py

# 5️⃣ View results
ls reports/kaggle_results/  # Metrics JSON/CSV files
ls reports/plots/           # Figures

Expected Output: Metrics files matching our published results (±2% variance due to randomness).


📦 Dataset

Elliptic++ Bitcoin Transaction Network

Property Value
Nodes 203,769 Bitcoin transactions
Edges 234,355 transaction flows
Features 182 per transaction (93 local + 89 aggregated)
Labels Licit (89%) / Illicit (11%)
Timespan 49 timesteps (temporal graph)
Task Binary fraud classification

⚠️ Dataset NOT included — Download from Kaggle Elliptic++ Dataset

Required files:

data/Elliptic++ Dataset/
├── txs_features.csv       (203K rows × 182 features)
├── txs_classes.csv        (node labels)
└── txs_edgelist.csv       (graph edges)

Citation for dataset:

Weber, M., et al. (2019). "Anti-Money Laundering in Bitcoin: Experimenting with Graph Convolutional Networks for Financial Forensics." KDD Workshop on Anomaly Detection in Finance.


📚 Project Structure & Documentation

TRDGNN/
├── 📄 README.md                       ← You are here (landing page)
├── 📘 docs/
│   ├── PROJECT_NARRATIVE.md           ← **Complete scientific story** (E1-E9)
│   ├── PROJECT_SPEC.md                ← Architecture & acceptance criteria
│   ├── E6_HETEROGENEOUS_GNN_DOCUMENTATION.md  ← Complex model failure analysis
│   ├── E7_ABLATION_STUDY.md           ← 🔬 Systematic investigation methodology
│   ├── E7_RESULTS_SUMMARY.md          ← E7 ablation results & insights
│   ├── E9_WALLET_FUSION_PLAN.md       ← E9 fusion experiment design
│   └── baseline_provenance.json       ← Provenance tracking
├── 📊 reports/
│   ├── COMPARISON_REPORT.md           ← **Comprehensive results across all experiments**
│   ├── kaggle_results/
│   │   ├── E9_RESULTS.md              ← **E9 wallet fusion (+33.5%)**
│   │   ├── E6_ANALYSIS.md             ← E6 failure deep-dive
│   │   ├── RESULTS_ANALYSIS.md        ← Overall results synthesis
│   │   ├── e9-notebook.ipynb          ← Full E9 notebook with outputs
│   │   └── *.pt, *.json, *.png        ← Checkpoints, metrics, plots
│   ├── metrics_summary.csv            ← All model results
│   └── plots/                         ← Figures (PNG)
├── 📓 notebooks/
│   ├── 01_trd_sampler_mvp.ipynb       ← TRD sampler development
│   ├── 02_trd_graphsage.ipynb         ← E3 homogeneous temporal GNN
│   ├── 03_heterogeneous_construction.ipynb  ← E5 hetero graph building
│   └── 04_ablation_study.ipynb        ← E7 systematic investigation
├── 🧠 src/                            ← Modular source code
│   ├── data/
│   │   ├── elliptic_loader.py         ← Dataset loader with splits
│   │   └── trd_sampler.py             ← **Zero-leakage temporal sampler**
│   ├── models/
│   │   ├── trd_graphsage.py           ← E3 homogeneous model
│   │   ├── trd_hhgtn.py               ← E6/E7 heterogeneous models
│   │   └── simple_hhgtn.py            ← E7-A3 best model
│   ├── utils/
│   │   ├── metrics.py                 ← Evaluation utilities
│   │   ├── seed.py                    ← Reproducibility
│   │   └── logger.py                  ← Logging
│   ├── train.py                       ← Training script
│   └── eval.py                        ← Evaluation pipeline
├── ⚙️ configs/                        ← YAML configs per experiment
│   ├── e3_trd_graphsage.yaml
│   ├── e6_trd_hhgtn.yaml
│   ├── e7_a3_simple_hhgtn.yaml
│   └── e9_fusion.yaml
├── 🧪 tests/
│   └── test_trd_sampler.py            ← **7/7 tests passing**
├── 🛠️ scripts/
│   ├── run_e9_fusion.py               ← E9 fusion experiment
│   └── generate_plots.py             ← Visualization utilities
└── 💾 checkpoints/                    ← Trained model weights

🔗 Key Documents

Document Description
📘 PROJECT_NARRATIVE.md Complete scientific story (E1-E9)
📊 COMPARISON_REPORT.md Comprehensive results & methodology
🔬 E7_ABLATION_STUDY.md Systematic investigation methodology
🏆 E9_RESULTS.md Wallet fusion study (+33.5%)
📄 E6_ANALYSIS.md Complex model failure analysis
📋 PROJECT_SPEC.md Technical specifications

🏆 Six Novel Contributions

1. Zero-Leakage Temporal Sampler ⭐⭐⭐⭐

What: TRD (Time-Relaxed Directed) sampler enforcing time(neighbor) ≤ time(target)
Why Unique: First rigorously tested temporal fraud detection sampler (7/7 tests passing)
Impact: Production-ready implementation for deployment
Citation Value: HIGH

2. Temporal Tax Quantification & Reduction ⭐⭐⭐⭐⭐

What: Enforcing realistic temporal constraints costs 16.5% (E3) but reduced to 12.6% (E7-A3)
Why Unique: First quantification AND reduction of temporal evaluation cost
Impact: Demonstrates honest evaluation doesn't require massive performance loss
Citation Value: VERY HIGH - Novel metric for temporal GNN research

3. Architecture > Scale Principle ⭐⭐⭐⭐⭐

What: 50K parameters (E7-A3) beats 500K parameters (E6) by 108%
Why Unique: Systematic proof through ablations that simpler architectures generalize better on small datasets
Impact: Challenges "bigger is better" assumption; practical design guidelines
Citation Value: VERY HIGH - Fundamental insight for small-data regimes

4. Successful Heterogeneous Temporal GNN ⭐⭐⭐⭐

What: Properly designed heterogeneous GNN (E7-A3) achieves +4.7% over homogeneous baseline
Why Unique: First successful heterogeneous temporal GNN for fraud detection
Impact: Proves structural information helps when properly designed
Citation Value: HIGH

5. Architecture-Induced Collapse Discovery ⭐⭐⭐⭐

What: Semantic attention + weak regularization causes collapse on small datasets
Why Unique: Systematic identification through controlled ablations (E7)
Impact: Important failure mode documentation for future research
Citation Value: HIGH - Helps others avoid similar pitfalls

6. GNN-Tabular Fusion Synergy (E9) ⭐⭐⭐⭐⭐

What: Combining GNN embeddings + tabular features achieves +33.5% improvement
Why Unique: First wallet-level fusion approach for Bitcoin fraud detection
Impact: Novel hybrid methodology; demonstrates complementary information
Citation Value: VERY HIGH - Original research contribution


🔬 The Complete Scientific Story

Act 1: Foundation (E1-E3)

Goal: Establish honest temporal baseline
Result: 0.5582 PR-AUC with zero leakage
Discovery: Temporal constraints cost 16.5% vs unrealistic baselines

Act 2: Hypothesis (E5-E6)

Goal: Improve through heterogeneous structure
Result: 0.2806 PR-AUC (❌ failed by 49.7%)
Initial Conclusion: "Heterogeneous temporal GNNs suffer from collapse"

Act 3: Investigation (E7)

Goal: Understand why E6 failed
Method: Systematic ablations (A1, A2, A3)
Discovery: Failure was architectural, not structural

Act 4: Resolution (E7-A3)

Goal: Correct the design
Result: 0.5846 PR-AUC (✅ success, +108% over E6)
Corrected Understanding: "Simple heterogeneous architectures work best"

Act 5: Innovation (E9)

Goal: Validate embeddings in fusion scenario
Result: 0.3003 PR-AUC (+33.5% improvement)
Discovery: GNN embeddings provide complementary structural information

📌 Why This Matters: Most papers show only successes. We document the complete cycle: hypothesis → failure → systematic investigation → improved solution → novel application. This is publication-quality research demonstrating the scientific method.

Full Story: See PROJECT_NARRATIVE.md for complete details.


🎓 Why This Project Matters

For Researchers

  1. Complete failure → success story documented with scientific rigor
  2. Systematic investigation methodology through controlled ablations
  3. Six distinct contributions (most papers have 1-2)
  4. Reproducible implementation (all experiments on Kaggle)
  5. Novel fusion approach (E9 original research)

For Practitioners

  1. Production-ready TRD sampler (7/7 tests passing)
  2. Best temporal GNN model (E7-A3: 0.5846 PR-AUC)
  3. Fusion approach achieving +33.5% improvement
  4. Deployment guidelines for small-dataset scenarios
  5. Architectural design principles for temporal GNNs

For Educators

  1. Teaching case study on ablation studies & experimental design
  2. Demonstrates scientific method from hypothesis to publication
  3. Failure analysis and correction methodology
  4. Complete research cycle documentation

📖 How to Use This Repository

Quick Navigation by Goal

Your Goal Start Here Then Read
🎓 Understand the research README.md PROJECT_NARRATIVE.md
🔬 Learn experimental design E7_ABLATION_STUDY.md COMPARISON_REPORT.md
💼 Deploy fraud detection test_trd_sampler.py E7-A3 checkpoint
🏆 Apply fusion approach E9_RESULTS.md e9-notebook.ipynb
📚 Cite the work Citation Zenodo DOI

🔧 Technical Details

Zero-Leakage TRD Sampler

# Core innovation: Time-Relaxed Directed sampling
# Rule: time(neighbor) ≤ time(target)

from src.data.trd_sampler import TRDNeighborSampler

sampler = TRDNeighborSampler(
    edge_index=edge_index,
    node_timestamps=timestamps,
    max_in_neighbors=15,
    max_out_neighbors=15,
    forbid_future_neighbors=True  # Zero-leakage guarantee
)

# Verified by 7/7 unit tests
pytest tests/test_trd_sampler.py -v

Model Architectures

E3 (TRD-GraphSAGE): Homogeneous temporal baseline

hidden_channels: 128
num_layers: 2
dropout: 0.4
aggregation: mean

E7-A3 (Simple-HHGTN): Best heterogeneous model

hidden_channels: 64  # Reduced from 128 (E6)
num_layers: 1        # Reduced from 2 (E6)
dropout: 0.6         # Increased from 0.4 (E6)
aggregation: sum     # Changed from attention (E6)

E9 (Fusion): GNN embeddings + tabular features

# Extract 64-dim embeddings from E7-A3
embeddings = extract_embeddings(e7_a3_model, data)

# Concatenate with 93 tabular features
fusion_features = concat(embeddings, tabular_features)

# Train XGBoost
xgb = XGBClassifier(n_estimators=100, max_depth=6)
xgb.fit(fusion_features, labels)

📊 Experiment Results Summary

Experiment Model PR-AUC Key Finding
E1 Bootstrap N/A Provenance tracking established
E2 TRD Sampler N/A Zero-leakage validated (7/7 tests)
E3 TRD-GraphSAGE 0.5582 Temporal baseline (+16.5% tax)
E5 Hetero Graph N/A 303K nodes, 422K edges constructed
E6 Complex-HHGTN 0.2806 Failure (-49.7% vs E3)
E7-A1 No Addr Edges 0.5618 Partial edge collapse identified
E7-A2 No Addr Features 0.5536 Address features not the issue
E7-A3 Simple Architecture 0.5846 Best GNN (+108% vs E6)
E9 GNN+Tabular Fusion 0.3003 +33.5% synergy

Full Details: See COMPARISON_REPORT.md


📝 Citation

If you use this code or findings, please cite:

@software{trd_gnn_2025,
  title = {When Temporal Constraints Meet Graph Neural Networks: A Systematic Investigation of Heterogeneous Temporal GNNs for Bitcoin Fraud Detection},
  author = {Bytes, Bhavesh},
  year = {2025},
  doi = {10.5281/zenodo.17584452},
  url = {https://github.com/BhaveshBytess/TRDGNN},
  note = {Complete E1-E9 implementation with novel fusion approach, systematic ablations, and zero-leakage temporal sampler},
  license = {MIT}
}

Zenodo DOI: 10.5281/zenodo.17584452

Machine-readable citation: See CITATION.cff


📬 Contact & License

Author: Bhavesh Bytes
Email: 10bhavesh7.11@gmail.com
GitHub: @BhaveshBytess
License: MIT License — Free to use with attribution

Project Status: ✅ Complete (E1-E9) | Last Updated: November 2025


🌟 Project Highlights

  • 9 experiments systematically investigating temporal GNNs
  • 6 novel contributions with high citation value
  • 7/7 tests passing for zero-leakage temporal sampler
  • 108% recovery from initial failure through systematic investigation
  • +33.5% fusion synergy demonstrating complementary information
  • Complete documentation with narrative, results, and methodology
  • Reproducible on Kaggle with all notebooks preserved
  • Publication-ready research demonstrating the scientific method

🚧 Future Work

Completed (E1-E9):

  • ✅ Zero-leakage temporal GNN
  • ✅ Heterogeneous architecture investigation
  • ✅ Systematic ablation study
  • ✅ GNN-tabular fusion

Future Directions:

  • 🔮 E8: Temporal dynamics study (separate future project)
  • 🔮 Hyperparameter tuning for E9 fusion
  • 🔮 Neural fusion layer experiments
  • 🔮 Feature importance analysis
  • 🔮 Extension to other cryptocurrency datasets
  • 🔮 Real-time deployment system

⭐ If you find this project useful, please consider giving it a star!

GitHub stars


Built with rigor. Documented with care. Shared with the community.