Productify Autoscaler
The autoscaler component for the Productify FrameWork — primarily for use with Nomad. This repository holds 2 components. Each has its own README with detailed usage and configuration.
Overview
The autoscaler uses a second-level time-based caching mechanism for improved accuracy:
- The optimizer service forecasts resource requirements at second-level granularity
- It returns a list of desired replica counts for the next N seconds (configurable via
cache_size, default: 10) - The nomadscaler plugin caches these values with a timestamp
- Each cached value corresponds to one second of elapsed time
- The plugin selects the appropriate cached value based on seconds elapsed since cache update
- After using the cache 2 times, the plugin requests fresh predictions (with fallback to cache on failure)
This approach allows for granular, second-by-second scaling decisions based on the optimizer's MILP-calculated resource requirements over time.
Components
- nomadscaler/ — Nomad autoscaler target plugin (Go). See Nomadscaler Plugin for build and usage instructions.
- optimizer/ — MILP optimizer service (Python). See Optimizer Service for installation, configuration and API details.
Prediction Horizon
The prediction horizon is configurable:
- Default: 10 seconds (returns 10 values)
- Configure via
cache_sizeparameter in the scaling policy - Optimizer uses SARIMAX time-series forecasting for predictions
- Historical metrics (minutes) are used to forecast future demand (seconds)
Architecture
┌───────────────────────────────────────────────────┐
│ Nomad Autoscaler Framework │
│ ┌────────────────────────────────────────────┐ │
│ │ Nomadscaler Plugin (Target Plugin) │ │
│ │ - Reads scaling policies │ │
│ │ - Manages cache (second-level) │ │
│ │ - Updates Nomad job counts │ │
│ └────────────────────────────────────────────┘ │
└───────────────────────────────────────────────────┘
│ HTTP API
▼
┌───────────────────────────────────────────────────┐
│ Optimizer Service (Python) │
│ - SARIMAX time-series forecasting │
│ - MILP optimization │
│ - Returns N second predictions │
│ - Historical metrics analysis │
└───────────────────────────────────────────────────┘
│
▼
┌───────────────────┐
│ Nomad Metrics │
│ (APM, Telemetry) │
└───────────────────┘How It Works
1. Prediction Request
Nomadscaler plugin requests predictions from optimizer service:
{
"check": {
"metric_app_name": "my-app"
},
"current_replicas": 3,
"min_replicas": 1,
"max_replicas": 10,
"cache_size": 10
}2. Time-Series Forecasting
Optimizer analyzes historical metrics and forecasts demand for next N seconds.
3. MILP Optimization
For each second in the prediction horizon, optimizer calculates optimal replica count considering:
- Forecasted resource demand
- Min/max constraints
- Cost optimization
- Performance requirements
4. Second-Level Caching
Optimizer returns array of desired replica counts:
{
"desired": [3, 3, 4, 4, 5, 5, 6, 6, 7, 7]
}Each value corresponds to one second in the future.
5. Cache-Based Scaling
Plugin caches predictions and selects value based on elapsed time:
- Second 0: Use
desired[0]= 3 replicas - Second 1: Use
desired[1]= 3 replicas - Second 2: Use
desired[2]= 4 replicas - ...and so on
6. Automatic Refresh
After using cache 2 times, plugin requests fresh predictions (with fallback to cache on failure).
Use Cases
Traffic Spike Prevention
Predict and scale before traffic increases:
Historical Pattern:
09:00 - Low traffic
09:30 - Gradual increase
10:00 - Peak traffic
Autoscaler Action:
09:25 - Start scaling up (proactive)
09:30 - Continue scaling
10:00 - Ready for peak (no lag)Cost Optimization
Scale down during predictable low-traffic periods:
Forecasted Demand:
23:00 - High traffic ends
00:00 - Low traffic predicted
Autoscaler Action:
23:00 - Begin gradual scale down
00:00 - Minimal replicas (cost savings)
07:00 - Proactive scale up for morning trafficEvent-Driven Scaling
Handle scheduled events:
Event: Product launch at 12:00
Historical: Similar launches caused 10x traffic
Autoscaler Action:
11:50 - Start scaling to predicted capacity
12:00 - Fully scaled for event
12:30 - Gradual scale down as traffic normalizesBenefits
Accuracy
- Second-level predictions vs. minute-level in traditional autoscalers
- MILP optimization for mathematically optimal scaling decisions
- Historical pattern recognition for accurate forecasting
Performance
- Proactive scaling eliminates lag time
- Smooth transitions with gradual scaling
- No scaling oscillations due to intelligent caching
Cost Efficiency
- Minimize over-provisioning with accurate predictions
- Optimize resource usage with MILP
- Scheduled scale-down during predictable low-traffic periods
Reliability
- Fallback mechanisms maintain service during optimizer outages
- Cache redundancy ensures continuous operation
- Constraint enforcement prevents under/over-scaling
Quick Links
- Architecture - Detailed architecture overview
- Quick Start - Get started in minutes
- Nomadscaler Plugin - Plugin configuration
- Optimizer Service - Optimizer setup
- Scaling Policies - Policy examples
Requirements
- Nomad 1.4+ with autoscaler
- Go 1.25+ (for building nomadscaler)
- Python 3.12+ (for optimizer service)
- Metrics source (Nomad APM, Prometheus, etc.)
Next Steps
- Read the Architecture overview
- Follow the Quick Start guide
- Configure Scaling Policies
- Deploy to Nomad