Skip to content

Productify Autoscaler

The autoscaler component for the Productify FrameWork — primarily for use with Nomad. This repository holds 2 components. Each has its own README with detailed usage and configuration.

Overview

The autoscaler uses a second-level time-based caching mechanism for improved accuracy:

  1. The optimizer service forecasts resource requirements at second-level granularity
  2. It returns a list of desired replica counts for the next N seconds (configurable via cache_size, default: 10)
  3. The nomadscaler plugin caches these values with a timestamp
  4. Each cached value corresponds to one second of elapsed time
  5. The plugin selects the appropriate cached value based on seconds elapsed since cache update
  6. After using the cache 2 times, the plugin requests fresh predictions (with fallback to cache on failure)

This approach allows for granular, second-by-second scaling decisions based on the optimizer's MILP-calculated resource requirements over time.

Components

Prediction Horizon

The prediction horizon is configurable:

  • Default: 10 seconds (returns 10 values)
  • Configure via cache_size parameter in the scaling policy
  • Optimizer uses SARIMAX time-series forecasting for predictions
  • Historical metrics (minutes) are used to forecast future demand (seconds)

Architecture

┌───────────────────────────────────────────────────┐
│         Nomad Autoscaler Framework                │
│  ┌────────────────────────────────────────────┐   │
│  │   Nomadscaler Plugin (Target Plugin)       │   │
│  │   - Reads scaling policies                 │   │
│  │   - Manages cache (second-level)           │   │
│  │   - Updates Nomad job counts               │   │
│  └────────────────────────────────────────────┘   │
└───────────────────────────────────────────────────┘
                    │ HTTP API

┌───────────────────────────────────────────────────┐
│         Optimizer Service (Python)                │
│  - SARIMAX time-series forecasting                │
│  - MILP optimization                              │
│  - Returns N second predictions                   │
│  - Historical metrics analysis                    │
└───────────────────────────────────────────────────┘


          ┌───────────────────┐
          │   Nomad Metrics   │
          │  (APM, Telemetry) │
          └───────────────────┘

How It Works

1. Prediction Request

Nomadscaler plugin requests predictions from optimizer service:

json
{
  "check": {
    "metric_app_name": "my-app"
  },
  "current_replicas": 3,
  "min_replicas": 1,
  "max_replicas": 10,
  "cache_size": 10
}

2. Time-Series Forecasting

Optimizer analyzes historical metrics and forecasts demand for next N seconds.

3. MILP Optimization

For each second in the prediction horizon, optimizer calculates optimal replica count considering:

  • Forecasted resource demand
  • Min/max constraints
  • Cost optimization
  • Performance requirements

4. Second-Level Caching

Optimizer returns array of desired replica counts:

json
{
  "desired": [3, 3, 4, 4, 5, 5, 6, 6, 7, 7]
}

Each value corresponds to one second in the future.

5. Cache-Based Scaling

Plugin caches predictions and selects value based on elapsed time:

  • Second 0: Use desired[0] = 3 replicas
  • Second 1: Use desired[1] = 3 replicas
  • Second 2: Use desired[2] = 4 replicas
  • ...and so on

6. Automatic Refresh

After using cache 2 times, plugin requests fresh predictions (with fallback to cache on failure).

Use Cases

Traffic Spike Prevention

Predict and scale before traffic increases:

Historical Pattern:
  09:00 - Low traffic
  09:30 - Gradual increase
  10:00 - Peak traffic

Autoscaler Action:
  09:25 - Start scaling up (proactive)
  09:30 - Continue scaling
  10:00 - Ready for peak (no lag)

Cost Optimization

Scale down during predictable low-traffic periods:

Forecasted Demand:
  23:00 - High traffic ends
  00:00 - Low traffic predicted

Autoscaler Action:
  23:00 - Begin gradual scale down
  00:00 - Minimal replicas (cost savings)
  07:00 - Proactive scale up for morning traffic

Event-Driven Scaling

Handle scheduled events:

Event: Product launch at 12:00
Historical: Similar launches caused 10x traffic

Autoscaler Action:
  11:50 - Start scaling to predicted capacity
  12:00 - Fully scaled for event
  12:30 - Gradual scale down as traffic normalizes

Benefits

Accuracy

  • Second-level predictions vs. minute-level in traditional autoscalers
  • MILP optimization for mathematically optimal scaling decisions
  • Historical pattern recognition for accurate forecasting

Performance

  • Proactive scaling eliminates lag time
  • Smooth transitions with gradual scaling
  • No scaling oscillations due to intelligent caching

Cost Efficiency

  • Minimize over-provisioning with accurate predictions
  • Optimize resource usage with MILP
  • Scheduled scale-down during predictable low-traffic periods

Reliability

  • Fallback mechanisms maintain service during optimizer outages
  • Cache redundancy ensures continuous operation
  • Constraint enforcement prevents under/over-scaling

Requirements

  • Nomad 1.4+ with autoscaler
  • Go 1.25+ (for building nomadscaler)
  • Python 3.12+ (for optimizer service)
  • Metrics source (Nomad APM, Prometheus, etc.)

Next Steps

  1. Read the Architecture overview
  2. Follow the Quick Start guide
  3. Configure Scaling Policies
  4. Deploy to Nomad