Skip to content

Productify Autoscaler — Optimizer

A small optimizer service used by Productify's autoscaler. It provides calculation utilities and a HTTP endpoint for running optimization calculations.

The optimizer uses second-level granularity for predictions, returning a list of desired replica counts for the next N seconds (configurable via cache_size, default: 10 seconds).

Minimal requirements: Python 3.12+ and Poetry

Overview

The Optimizer:

  • Forecasts future resource demand using SARIMAX
  • Optimizes scaling decisions with MILP
  • Returns second-level predictions for caching
  • Exposes REST API for integration

Installation

Via Poetry

  1. Install Poetry (if you don't have it):

    Follow the instructions at https://python-poetry.org/docs/

  2. Install project dependencies:

bash
cd optimizer
poetry install

Usage

Run the example calculation (quick check):

bash
poetry run testcalc

Start the HTTP endpoint (starts a small web service exposing the optimizer):

bash
poetry run web

Via Docker

There is a Dockerfile in this folder. To build and run the image locally:

bash
cd optimizer
docker build -t ghcr.io/productifyfw/optimizer:latest .

# Run the optimizer
docker run --rm -p 8015:8015 ghcr.io/productifyfw/optimizer:latest

Adjust published ports to match the service configuration.

Configuration

config.ini

The optimizer reads configuration from a config.ini file in the project root. Below is an example config.ini and a short explanation of each value. Do not commit secrets — keep tokens and credentials out of version control.

Example config.ini:

ini
[main]
loglevel=debug
api_loglevel=warning
only_test_data=true
token=SUPER_SECRET_TOKEN

Keys:

  • loglevel — Controls logging verbosity for the optimizer. Typical values: debug, info, warning, error.
  • api_loglevel — Controls logging for API/request handling.
  • only_test_data — Set to true to force the optimizer to use bundled test/sample data instead of real inputs (useful for local testing).
  • token — API token used to authenticate requests. Treat this as a secret; prefer injecting it via a secrets manager or environment variable in production.

Usage notes:

  • Edit config.ini before starting the service
  • If your deployment platform supports secrets or environment variables, use those instead of storing tokens in plaintext.

API

POST /optimize

The main optimization endpoint that returns desired replica counts.

Request Body:

json
{
  "token": "SUPER_SECRET_TOKEN",
  "check": {
    "name": "scaling-check",
    "source": "nomad-apm",
    "group": "webapp",
    "metric_app_name": "my-app"
  },
  "current_replicas": 3,
  "min_replicas": 1,
  "max_replicas": 10,
  "cache_size": 10
}

Parameters:

  • cache_size (optional, default: 10): Number of seconds to predict ahead. The optimizer will return this many values in the desired array.

Response:

json
{
  "desired": [3, 3, 4, 4, 5, 5, 6, 6, 7, 7]
}

The response contains a list of desired replica counts, one for each second in the prediction horizon. The nomadscaler plugin caches these values and serves the appropriate value based on elapsed time.

GET /health

Health check endpoint.

GET /metrics

Prometheus-compatible metrics, only available if enable_test_metrics setting is enabled.

SARIMAX Forecasting

Automatic Order Selection

The optimizer automatically selects the best SARIMAX model from multiple candidate orders:

python
candidate_orders = [(1, 1, 1), (1, 0, 1), (2, 1, 1), (1, 1, 0)]
best_result = None
best_aic = float("inf")

for order in candidate_orders:
    try:
        model = SARIMAX(
            y,  # historical request rate
            exog=X,  # exogenous variables
            order=order,
            enforce_stationarity=False,
            enforce_invertibility=False,
        )
        res = model.fit(disp=False)
        if res.aic < best_aic:
            best_aic = res.aic
            best_result = res
    except Exception:
        continue

# Use best model for forecasting
forecast = best_result.get_forecast(
    steps=forecast_horizon_seconds,
    exog=exog_forecast
)

Exogenous Variables

The model uses the following exogenous variables:

  • avg_response_time: Average response time per request
  • authentication_awaiting_users: Number of users waiting for authentication
  • queue_waiting: Queue waiting time
  • avg_processing_time: Average processing time per request

Demand Calculation

Forecasted request rate is converted to demand:

python
demand[t] = forecast_requests[t] * avg_response_time * 0.05 +
            queue_waiting * 0.1 +
            authentication_awaiting_users * 0.2

MILP Optimization

OR-Tools SCIP Solver

The optimizer uses OR-Tools with the SCIP solver for Mixed-Integer Linear Programming:

python
from ortools.linear_solver import pywraplp

solver = pywraplp.Solver.CreateSolver("SCIP")

Objective Function

Minimize total cost including replica costs, SLA penalties, and scaling costs:

python
total_cost = solver.Sum(
    replica_cost * x[t] +           # Cost of running replicas
    penalty * sla[t] +              # SLA violation penalty
    startup_cost * up[t] +          # Cost to start instances
    shutdown_cost * down[t] +       # Cost to stop instances
    extra_penalty * no_rep[t]       # Penalty for no replicas when needed
    for t in range(T)
)
solver.Minimize(total_cost)

Decision Variables

python
# Integer variables
x[t]    = Number of replicas at time t (min_replicas to max_replicas)
up[t]   = Number of instances to scale up (0 to max_scale_up, default 2)
down[t] = Number of instances to scale down (0 to max_scale_down, default 2)
no_rep[t] = Binary indicator: 1 if no replicas running, 0 otherwise

# Continuous variables
sla[t]  = SLA shortfall/unmet demand at time t (>= 0)

Constraints

python
# 1. Meet predicted demand (with SLA slack)
solver.Add(x[t] * capacity_per_replica + sla[t] >= demand[t])

# 2. Replica count evolution
solver.Add(x[t] == x[t-1] + up[k] - down[t])

# 3. Scale down limit (can't stop more than you have)
solver.Add(down[t] <= x[t-1])

# 4. Initial state
solver.Add(x[0] == initial_replicas)

Parameters

Default values:

  • capacity_per_replica: Based on allocated CPU/memory
  • replica_cost: Based on CPU/memory pricing (configurable weights)
  • penalty: 1.0 (SLA violation cost)
  • startup_cost: 0.5
  • shutdown_cost: 0.3
  • max_scale_up: 2 (instances per time step)
  • max_scale_down: 2 (instances per time step)

Troubleshooting

Slow Predictions

Optimize:

  • Reduce forecast horizon
  • Simplify SARIMAX parameters
  • Increase solver time limit
  • Add more CPU resources

Inaccurate Forecasts

Improve:

  • Provide more historical metrics
  • Tune SARIMAX parameters
  • Adjust seasonal period
  • Filter metric outliers

Memory Issues

Solutions:

  • Limit metric history size
  • Reduce forecast horizon
  • Implement metric sampling
  • Increase memory limits

Monitoring

Logging

python
# Configure logging
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
)

Development

If you plan on developing the optimizer library:

  • Use poetry install to install dev dependencies.
  • Run the web service locally with poetry run web.
  • Run unit tests frequently and add tests for new behaviour.

Testing

Run the project's test suite with:

bash
poetry run test

See Also