Skip to content

Model Versioning in Production: Beyond Git for Machine Learning

1/20/2025

Git works great for code, but ML models need different versioning. A model checkpoint is more than code—it's trained on specific data, with hyperparameters, dependencies, and metrics that determine production readiness.

The Model Versioning Stack

Here's the four-layer approach we implement for clients:

1. Semantic Versioning for Models

# Not this:
model_v1.pkl
model_v2_final.pkl
model_v2_final_ACTUALLY_FINAL.pkl

# This:
fraud-detector-v2.3.1
# Major: Breaking API changes (input schema, output format)
# Minor: Accuracy improvements, new features
# Patch: Bug fixes, performance tuning

2. Model Registry with Metadata

Every model version stores:

{
  "model_id": "fraud-detector-v2.3.1",
  "training_data": "s3://data/fraud-2025-01-15.parquet",
  "data_hash": "sha256:a3b2c1...",
  "framework": "pytorch==2.1.0",
  "metrics": {
    "precision": 0.94,
    "recall": 0.89,
    "f1": 0.91,
    "auc_roc": 0.96
  },
  "training_duration": "3h 24m",
  "artifact_uri": "s3://models/fraud-detector/2.3.1/",
  "stage": "production",
  "promoted_at": "2025-01-20T10:00:00Z",
  "promoted_by": "user@company.com"
}

3. Lineage Tracking

Connect models to their origins:

# Using MLflow or custom tracking
import mlflow

with mlflow.start_run():
    # Log dataset version
    mlflow.log_param("dataset_version", "2025-01-15")
    mlflow.log_param("feature_engineering_commit", "abc123")
    
    # Log hyperparameters
    mlflow.log_params(config)
    
    # Train model
    model = train(data, config)
    
    # Log metrics
    mlflow.log_metrics(metrics)
    
    # Log model with signature
    mlflow.sklearn.log_model(
        model,
        "model",
        signature=signature,
        registered_model_name="fraud-detector"
    )

4. Staged Rollouts & Rollbacks

Never deploy directly to 100% traffic:

# Deployment strategy
stages:
  - name: staging
    traffic: 0%
    purpose: Integration testing
    
  - name: canary
    traffic: 5%
    duration: 2h
    abort_on:
      - latency_p95 > 200ms
      - error_rate > 0.5%
      
  - name: production
    traffic: 100%
    rollback_to: v2.2.4  # Previous stable version

Implementation Example

Here's a minimal model registry using DynamoDB:

import boto3
from datetime import datetime
from boto3.dynamodb.conditions import Attr

class ModelRegistry:
    def __init__(self):
        self.table = boto3.resource('dynamodb').Table('model-registry')
    
    def register_model(self, model_id, metadata):
        """Register a new model version"""
        item = {
            'model_id': model_id,
            'registered_at': datetime.utcnow().isoformat(),
            'stage': 'staging',
            **metadata
        }
        self.table.put_item(Item=item)
        return model_id
    
    def get_production_model(self):
        """Get currently deployed production model"""
        response = self.table.scan(
            FilterExpression=Attr('stage').eq('production')
        )
        return response['Items'][0] if response['Items'] else None
    
    def promote_to_production(self, model_id):
        """Promote model to production stage"""
        timestamp = datetime.utcnow().isoformat()
        
        # Get current production model
        current = self.get_production_model()
        
        # Demote current to archived (preserve promoted_at for rollback)
        if current:
            self.table.update_item(
                Key={'model_id': current['model_id']},
                UpdateExpression='SET #s = :archived, archived_at = :time',
                ExpressionAttributeNames={'#s': 'stage'},
                ExpressionAttributeValues={
                    ':archived': 'archived',
                    ':time': timestamp
                }
            )
        
        # Promote new model
        self.table.update_item(
            Key={'model_id': model_id},
            UpdateExpression='SET #s = :prod, promoted_at = :time',
            ExpressionAttributeNames={'#s': 'stage'},
            ExpressionAttributeValues={
                ':prod': 'production',
                ':time': timestamp
            }
        )
    
    def rollback(self, to_version=None):
        """Rollback to previous or specified version"""
        if to_version:
            self.promote_to_production(to_version)
        else:
            # Get last archived version (sorted by when it was promoted)
            response = self.table.scan(
                FilterExpression=Attr('stage').eq('archived')
            )
            if response['Items']:
                # Sort by promoted_at (when it was last in production)
                latest = sorted(
                    response['Items'],
                    key=lambda x: x.get('promoted_at', ''),
                    reverse=True
                )[0]
                self.promote_to_production(latest['model_id'])

Key Takeaways

  1. Version everything: Code, data, configs, dependencies
  2. Store metadata: Metrics, lineage, timestamps, authors
  3. Automate promotion: Manual approval + automated checks
  4. Plan for rollback: One-command revert to last known good
  5. Audit trail: Who promoted what, when, and why

For regulated industries (BFSI, healthcare), this isn't optional—it's compliance.

Want help setting up model versioning for your team? Get in touch.