Productify Autoscaler — Optimizer

A small optimizer service used by Productify's autoscaler. It provides calculation utilities and a HTTP endpoint for running optimization calculations.

The optimizer uses second-level granularity for predictions, returning a list of desired replica counts for the next N seconds (configurable via cache_size, default: 10 seconds).

Minimal requirements: Python 3.12+ and Poetry

Overview

The Optimizer:

Forecasts future resource demand using SARIMAX
Optimizes scaling decisions with MILP
Returns second-level predictions for caching
Exposes REST API for integration

Installation

Via Poetry

Install Poetry (if you don't have it):
Follow the instructions at https://python-poetry.org/docs/
Install project dependencies:

bash

cd optimizer
poetry install

Usage

Run the example calculation (quick check):

bash

poetry run testcalc

Start the HTTP endpoint (starts a small web service exposing the optimizer):

bash

poetry run web

Via Docker

There is a Dockerfile in this folder. To build and run the image locally:

bash

cd optimizer
docker build -t ghcr.io/productifyfw/optimizer:latest .

# Run the optimizer
docker run --rm -p 8015:8015 ghcr.io/productifyfw/optimizer:latest

Adjust published ports to match the service configuration.

Configuration

config.ini

The optimizer reads configuration from a config.ini file in the project root. Below is an example config.ini and a short explanation of each value. Do not commit secrets — keep tokens and credentials out of version control.

Example config.ini:

ini

[main]
loglevel=debug
api_loglevel=warning
only_test_data=true
token=SUPER_SECRET_TOKEN

Keys:

loglevel — Controls logging verbosity for the optimizer. Typical values: debug, info, warning, error.
api_loglevel — Controls logging for API/request handling.
only_test_data — Set to true to force the optimizer to use bundled test/sample data instead of real inputs (useful for local testing).
token — API token used to authenticate requests. Treat this as a secret; prefer injecting it via a secrets manager or environment variable in production.

Usage notes:

Edit config.ini before starting the service
If your deployment platform supports secrets or environment variables, use those instead of storing tokens in plaintext.

API

POST /optimize

The main optimization endpoint that returns desired replica counts.

Request Body:

json

{
  "token": "SUPER_SECRET_TOKEN",
  "check": {
    "name": "scaling-check",
    "source": "nomad-apm",
    "group": "webapp",
    "metric_app_name": "my-app"
  },
  "current_replicas": 3,
  "min_replicas": 1,
  "max_replicas": 10,
  "cache_size": 10
}

Parameters:

cache_size (optional, default: 10): Number of seconds to predict ahead. The optimizer will return this many values in the desired array.

Response:

json

{
  "desired": [3, 3, 4, 4, 5, 5, 6, 6, 7, 7]
}

The response contains a list of desired replica counts, one for each second in the prediction horizon. The nomadscaler plugin caches these values and serves the appropriate value based on elapsed time.

GET /health

Health check endpoint.

GET /metrics

Prometheus-compatible metrics, only available if enable_test_metrics setting is enabled.

SARIMAX Forecasting

Automatic Order Selection

The optimizer automatically selects the best SARIMAX model from multiple candidate orders:

python

candidate_orders = [(1, 1, 1), (1, 0, 1), (2, 1, 1), (1, 1, 0)]
best_result = None
best_aic = float("inf")

for order in candidate_orders:
    try:
        model = SARIMAX(
            y,  # historical request rate
            exog=X,  # exogenous variables
            order=order,
            enforce_stationarity=False,
            enforce_invertibility=False,
        )
        res = model.fit(disp=False)
        if res.aic < best_aic:
            best_aic = res.aic
            best_result = res
    except Exception:
        continue

# Use best model for forecasting
forecast = best_result.get_forecast(
    steps=forecast_horizon_seconds,
    exog=exog_forecast
)

Exogenous Variables

The model uses the following exogenous variables:

avg_response_time: Average response time per request
authentication_awaiting_users: Number of users waiting for authentication
queue_waiting: Queue waiting time
avg_processing_time: Average processing time per request

Demand Calculation

Forecasted request rate is converted to demand:

python

demand[t] = forecast_requests[t] * avg_response_time * 0.05 +
            queue_waiting * 0.1 +
            authentication_awaiting_users * 0.2

MILP Optimization

OR-Tools SCIP Solver

The optimizer uses OR-Tools with the SCIP solver for Mixed-Integer Linear Programming:

python

from ortools.linear_solver import pywraplp

solver = pywraplp.Solver.CreateSolver("SCIP")

Objective Function

Minimize total cost including replica costs, SLA penalties, and scaling costs:

python

total_cost = solver.Sum(
    replica_cost * x[t] +           # Cost of running replicas
    penalty * sla[t] +              # SLA violation penalty
    startup_cost * up[t] +          # Cost to start instances
    shutdown_cost * down[t] +       # Cost to stop instances
    extra_penalty * no_rep[t]       # Penalty for no replicas when needed
    for t in range(T)
)
solver.Minimize(total_cost)

Decision Variables

python

# Integer variables
x[t]    = Number of replicas at time t (min_replicas to max_replicas)
up[t]   = Number of instances to scale up (0 to max_scale_up, default 2)
down[t] = Number of instances to scale down (0 to max_scale_down, default 2)
no_rep[t] = Binary indicator: 1 if no replicas running, 0 otherwise

# Continuous variables
sla[t]  = SLA shortfall/unmet demand at time t (>= 0)

Constraints

python

# 1. Meet predicted demand (with SLA slack)
solver.Add(x[t] * capacity_per_replica + sla[t] >= demand[t])

# 2. Replica count evolution
solver.Add(x[t] == x[t-1] + up[k] - down[t])

# 3. Scale down limit (can't stop more than you have)
solver.Add(down[t] <= x[t-1])

# 4. Initial state
solver.Add(x[0] == initial_replicas)

Parameters

Default values:

capacity_per_replica: Based on allocated CPU/memory
replica_cost: Based on CPU/memory pricing (configurable weights)
penalty: 1.0 (SLA violation cost)
startup_cost: 0.5
shutdown_cost: 0.3
max_scale_up: 2 (instances per time step)
max_scale_down: 2 (instances per time step)

Troubleshooting

Slow Predictions

Optimize:

Reduce forecast horizon
Simplify SARIMAX parameters
Increase solver time limit
Add more CPU resources

Inaccurate Forecasts

Improve:

Provide more historical metrics
Tune SARIMAX parameters
Adjust seasonal period
Filter metric outliers

Memory Issues

Solutions:

Limit metric history size
Reduce forecast horizon
Implement metric sampling
Increase memory limits

Monitoring

Logging

python

# Configure logging
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
)

Development

If you plan on developing the optimizer library:

Use poetry install to install dev dependencies.
Run the web service locally with poetry run web.
Run unit tests frequently and add tests for new behaviour.

Testing

Run the project's test suite with:

bash

poetry run test

Productify Autoscaler — Optimizer ​

Overview ​

Installation ​

Via Poetry ​

Usage ​

Via Docker ​

Configuration ​

config.ini ​

API ​

POST /optimize ​

GET /health ​

GET /metrics ​

SARIMAX Forecasting ​

Automatic Order Selection ​

Exogenous Variables ​

Demand Calculation ​

MILP Optimization ​

OR-Tools SCIP Solver ​

Objective Function ​

Decision Variables ​

Constraints ​

Parameters ​

Troubleshooting ​

Slow Predictions ​

Inaccurate Forecasts ​

Memory Issues ​

Monitoring ​

Logging ​

Development ​

Testing ​

See Also ​

Productify Autoscaler — Optimizer

Overview

Installation

Via Poetry

Usage

Via Docker

Configuration

config.ini

API

POST /optimize

GET /health

GET /metrics

SARIMAX Forecasting

Automatic Order Selection

Exogenous Variables

Demand Calculation

MILP Optimization

OR-Tools SCIP Solver

Objective Function

Decision Variables

Constraints

Parameters

Troubleshooting

Slow Predictions

Inaccurate Forecasts

Memory Issues

Monitoring

Logging

Development

Testing

See Also