Backend Selection¶

fast-vollib supports four numeric backends. All backends expose the same API and produce numerically equivalent results.

Backend	When to use
`numpy`	Default; works everywhere; no GPU required
`numba`	JIT-compiled CPU loops; faster than NumPy for large batches
`torch`	GPU acceleration on CUDA hardware
`jax`	JIT-compiled CPU/GPU/TPU; functional programming style

Automatic resolution¶

When backend="auto" (the default), fast-vollib resolves the backend at call time using this priority order:

Explicit backend= kwarg on the function call
fast_vollib.set_backend(name) process-level override
FAST_VOLLIB_BACKEND environment variable
torch — if torch.cuda.is_available() returns True
jax — if JAX is importable
numpy — always available as the final fallback

Note

The numba backend is not inserted into the auto resolution chain because it is a CPU-only backend like NumPy. Use it explicitly via backend="numba" or FAST_VOLLIB_BACKEND=numba when you want JIT-compiled CPU acceleration without a GPU.

Setting a backend¶

Per-session (process-level)¶

import fast_vollib

fast_vollib.set_backend("torch")   # all subsequent calls use PyTorch

# Reset to auto-resolution
fast_vollib.set_backend("auto")

Per-call¶

Every pricing, IV, and Greek function accepts a backend keyword:

price = fast_vollib.fast_black_scholes(
    flag="c", S=100, K=100, t=0.25, r=0.05, sigma=0.20,
    backend="numpy",
)

Via environment variable¶

export FAST_VOLLIB_BACKEND=torch
python my_script.py

Inspecting the active backend¶

print(fast_vollib.get_backend())         # resolved backend for "auto"
print(fast_vollib.get_backend("torch"))  # pass an explicit value to validate it

Native tensor output¶

Most public functions default to return_as="dataframe", which materializes a pandas.DataFrame. Pass return_as="numpy" if you want a numpy.ndarray, or pass return_native=True on the PyTorch and JAX backends to receive the backend's native type instead:

# Returns a torch.Tensor (float64)
price = fast_vollib.fast_black_scholes(
    flag="c", S=100, K=100, t=0.25, r=0.05, sigma=0.20,
    backend="torch",
    return_native=True,
)

Note

return_native=True has no effect on the NumPy backend — NumPy arrays are already the native type.

For get_all_greeks, return_native=True returns a dict mapping each Greek name to a native backend array or tensor.

Backend availability¶

Backend	`pip install`	Notes
`numpy`	bundled	Always available
`numba`	`pip install "fast-vollib[numba]"`	CPU-only; JIT-compiled parallel loops
`torch`	`pip install "fast-vollib[torch]"`	CPU wheels are cross-platform; GPU requires CUDA
`jax`	`pip install "fast-vollib[jax]"`	CPU-only by default; add `jax[cuda13]` for GPU

Numba backend¶

The Numba backend compiles Black-Scholes pricing, Greeks, and the Halley+bisection IV solver to native machine code via @numba.njit(parallel=True). Each batch incurs a single Python→native dispatch; the Halley loop and bisection fallback execute entirely within the compiled kernel.

Kernels are compiled on first call and cached to __pycache__ for subsequent process starts. Expect a ~1–2 s warm-up on the first call; all subsequent calls hit the cache and are near-instant.

import fast_vollib

# Use the numba backend for a single call
price = fast_vollib.fast_black_scholes(
    flag="c", S=100, K=100, t=0.25, r=0.05, sigma=0.20,
    backend="numba",
)

# Or set it process-wide
fast_vollib.set_backend("numba")

# Or via environment variable
export FAST_VOLLIB_BACKEND=numba
python my_script.py

When to choose numba over numpy¶

Large batches (≥ 10 k options) on a CPU-only machine
Environments where PyTorch / JAX cannot be installed (e.g. minimal Docker images)
When you need deterministic, portable CPU performance without GPU drivers

Jäckel IV — a separate high-precision solver¶

The fast_vollib.jackel module is not routed through the backend system described above. It is a self-contained implementation of Peter Jäckel's "Let's Be Rational" algorithm and exposes one function per backend:

Backend	Import	Notes
NumPy + Numba (CPU)	`fast_vollib.jackel.jackel_iv.jackel_iv_black`	Parallel Numba kernels; ~8.5 ms / 100k
PyTorch (GPU)	`fast_vollib.jackel.torch_backend.jackel_iv_black_torch`	`torch.compile` fused; ~2.7 ms / 100k
JAX (GPU)	`fast_vollib.jackel.jax_backend.jackel_iv_black_jax`	XLA fused; ~2.4 ms / 100k
Triton (GPU)	`fast_vollib.jackel.triton_kernels.jackel_iv_triton`	Single-pass kernel; 0.056 ms / 100k

See Jäckel IV for full documentation and usage examples.