Autoscaler Architecture

The Productify Autoscaler uses a two-tier architecture combining Time-Series forecasting with mathematical optimization for intelligent scaling decisions.

System Overview

┌────────────────────────────────────────────────────────┐
│              Nomad Autoscaler Framework                │
│                                                        │
│  ┌──────────────────────────────────────────────────┐  │
│  │      APM Plugin (Metrics Source)                 │  │
│  │  - Collects application metrics                  │  │
│  │  - CPU, Memory, Request count                    │  │
│  └──────────────────────────────────────────────────┘  │
│                       │                                │
│                       ▼                                │
│  ┌──────────────────────────────────────────────────┐  │
│  │      Strategy Plugin (Scaling Logic)             │  │
│  │  - Evaluates scaling policies                    │  │
│  │  - Determines when to scale                      │  │
│  └──────────────────────────────────────────────────┘  │
│                       │                                │
│                       ▼                                │
│  ┌──────────────────────────────────────────────────┐  │
│  │  Target Plugin (Productify Nomadscaler)          │  │
│  │  - Manages second-level cache                    │  │
│  │  - Calls optimizer for predictions               │  │
│  │  - Updates Nomad job counts                      │  │
│  └──────────────────────────────────────────────────┘  │
└────────────────────────────────────────────────────────┘
                       │
                       │ HTTP API
                       ▼
┌────────────────────────────────────────────────────────┐
│           Optimizer Service (Python)                   │
│                                                        │
│  ┌──────────────────────────────────────────────────┐  │
│  │      Time-Series Forecasting                     │  │
│  │  - SARIMAX model                                 │  │
│  │  - Historical data analysis                      │  │
│  │  - Future demand prediction                      │  │
│  └──────────────────────────────────────────────────┘  │
│                       │                                │
│                       ▼                                │
│  ┌──────────────────────────────────────────────────┐  │
│  │      MILP Optimization                           │  │
│  │  - Constraint-based optimization                 │  │
│  │  - Cost vs. performance balance                  │  │
│  └──────────────────────────────────────────────────┘  │
│                       │                                │
│                       ▼                                │
│  ┌──────────────────────────────────────────────────┐  │
│  │      Prediction Cache Generation                 │  │
│  │  - N seconds of predictions                      │  │
│  │  - One value per second                          │  │
│  │  - Includes metadata                             │  │
│  └──────────────────────────────────────────────────┘  │
└────────────────────────────────────────────────────────┘

Components

Nomadscaler Plugin (Go)

Role: Nomad autoscaler target plugin

Responsibilities:

Receive scaling policy evaluations from Nomad autoscaler
Request predictions from optimizer service
Manage second-level prediction cache
Select appropriate cached value based on elapsed time
Update Nomad job replica counts
Handle optimizer failures gracefully

Key Features:

Second-level time-based cache
Automatic cache refresh strategy
Fallback to cached values on optimizer failure
Constraint enforcement (min/max replicas)

Optimizer Service (Python)

Role: Time-Series forecasting and optimization engine

Responsibilities:

Fetch historical metrics from Nomad
Train SARIMAX time-series model
Forecast demand for next N seconds
Calculate optimal replica counts using MILP
Provide HTTP API for predictions
Manage model persistence

Key Features:

SARIMAX time-series forecasting
Mixed-Integer Linear Programming
Configurable prediction horizon
Multi-objective optimization
Token-based authentication

Data Flow

1. Metrics Collection

Application → Nomad Metrics → Historical Database

Metrics collected:

CPU usage (%)
Memory usage (MB)
Request rate (req/s)
Active connections
Custom application metrics

2. Policy Evaluation

Nomad Autoscaler → Check Policy → Evaluate Strategy

Policy defines:

Target metric
Evaluation interval
Min/max replicas
Strategy plugin to use

3. Prediction Request

Nomadscaler Plugin → HTTP POST /optimize → Optimizer Service

Request payload:

json

{
  "token": "SECRET",
  "check": {
    "metric_app_name": "my-app"
  },
  "current_replicas": 3,
  "min_replicas": 1,
  "max_replicas": 10,
  "cache_size": 10
}

4. Time-Series Forecasting

Optimizer → Fetch Metrics → Train SARIMAX → Forecast Demand

SARIMAX model:

Seasonal
AutoRegressive
Integrated
Moving Average with eXogenous variables

The optimizer automatically selects the best SARIMAX order from multiple candidates:

Candidate orders: (1,1,1), (1,0,1), (2,1,1), (1,1,0)
Selection criteria: Minimum AIC (Akaike Information Criterion)
Exogenous variables: avg_response_time, authentication_awaiting_users, queue_waiting, avg_processing_time

5. MILP Optimization

Forecasted Demand → MILP Solver → Optimal Replica Counts

Optimization objective:

Minimize: Σ (replica_cost × replicas[t] +
              penalty × sla_shortfall[t] +
              startup_cost × scale_up[t] +
              shutdown_cost × scale_down[t])

Subject to:
  min_replicas ≤ replicas[t] ≤ max_replicas     ∀t
  capacity × replicas[t] + sla[t] ≥ demand[t]   ∀t
  scale_down[t] ≤ replicas[t-1]                  ∀t
  scale_up[t] ≤ max_scale_up                     ∀t
  scale_down[t] ≤ max_scale_down                 ∀t
  replicas[t] ∈ Integer                          ∀t

6. Response Generation

Optimizer → Generate N values → Return to Plugin

Response:

json

{
  "desired": [3, 3, 4, 4, 5, 5, 6, 6, 7, 7]
}

Array contains replica counts for next 10 seconds.

7. Cache Management

Plugin → Cache Response → Serve Values → Refresh

Cache behavior:

Store all N values with timestamp
Select value based on seconds_elapsed % cache_size
After 2 cache uses, request fresh predictions
On optimizer failure, continue using cache

8. Scaling Execution

Plugin → Update Nomad Job → Nomad Schedules Tasks

Nomad actions:

Calculate delta (desired - current)
Schedule new allocations (scale up)
Stop allocations (scale down)
Wait for health checks
Update service catalog

Failure Modes & Handling

Optimizer Unavailable

Behavior: Plugin uses cached predictions

Impact: Continues scaling with last known good predictions

Recovery: Automatic reconnection on next refresh attempt

Stale Metrics

Detection: Check metric timestamp

Behavior: Use default conservative scaling

Mitigation: Alert on metric staleness

MILP Solver Timeout

Behavior: Fall back to simple linear interpolation

Impact: Less optimal but still functional scaling

Configuration: Set solver timeout in config

Network Partition

Behavior: Plugin caches last 10 seconds of predictions

Impact: Up to 10 seconds of scaling with stale data

Recovery: Resume normal operation when network restored

Scaling Patterns

Proactive Scale-Up

09:25 - Forecast predicts 09:30 traffic increase
09:25 - Start adding replicas
09:30 - Fully scaled when traffic arrives

Gradual Scale-Down

23:00 - Forecast predicts decreasing demand
23:00 - Begin removing replicas one by one
00:00 - Minimal replicas for overnight

Event-Driven

11:50 - Scheduled event at 12:00
11:50 - Rapidly scale to predicted capacity
12:00 - Ready for event traffic
12:30 - Gradual scale down

Autoscaler Architecture ​

System Overview ​

Components ​

Nomadscaler Plugin (Go) ​

Optimizer Service (Python) ​

Data Flow ​

1. Metrics Collection ​

2. Policy Evaluation ​

3. Prediction Request ​

4. Time-Series Forecasting ​

5. MILP Optimization ​

6. Response Generation ​

7. Cache Management ​

8. Scaling Execution ​

Failure Modes & Handling ​

Optimizer Unavailable ​

Stale Metrics ​

MILP Solver Timeout ​

Network Partition ​

Scaling Patterns ​

Proactive Scale-Up ​

Gradual Scale-Down ​

Event-Driven ​

See Also ​