Backend Selection¶
fast-vollib supports four numeric backends. All backends expose the same API and produce numerically equivalent results.
| Backend | When to use |
|---|---|
numpy |
Default; works everywhere; no GPU required |
numba |
JIT-compiled CPU loops; faster than NumPy for large batches |
torch |
GPU acceleration on CUDA hardware |
jax |
JIT-compiled CPU/GPU/TPU; functional programming style |
Automatic resolution¶
When backend="auto" (the default), fast-vollib resolves the backend at call time
using this priority order:
- Explicit
backend=kwarg on the function call fast_vollib.set_backend(name)process-level overrideFAST_VOLLIB_BACKENDenvironment variabletorch— iftorch.cuda.is_available()returnsTruejax— if JAX is importablenumpy— always available as the final fallback
Note
The numba backend is not inserted into the auto resolution chain because
it is a CPU-only backend like NumPy. Use it explicitly via backend="numba" or
FAST_VOLLIB_BACKEND=numba when you want JIT-compiled CPU acceleration without
a GPU.
Setting a backend¶
Per-session (process-level)¶
import fast_vollib
fast_vollib.set_backend("torch") # all subsequent calls use PyTorch
# Reset to auto-resolution
fast_vollib.set_backend("auto")
Per-call¶
Every pricing, IV, and Greek function accepts a backend keyword:
price = fast_vollib.fast_black_scholes(
flag="c", S=100, K=100, t=0.25, r=0.05, sigma=0.20,
backend="numpy",
)
Via environment variable¶
Inspecting the active backend¶
print(fast_vollib.get_backend()) # resolved backend for "auto"
print(fast_vollib.get_backend("torch")) # pass an explicit value to validate it
Native tensor output¶
Most public functions default to return_as="dataframe", which materializes a
pandas.DataFrame. Pass return_as="numpy" if you want a numpy.ndarray, or
pass return_native=True on the PyTorch and JAX backends to receive the
backend's native type instead:
# Returns a torch.Tensor (float64)
price = fast_vollib.fast_black_scholes(
flag="c", S=100, K=100, t=0.25, r=0.05, sigma=0.20,
backend="torch",
return_native=True,
)
Note
return_native=True has no effect on the NumPy backend — NumPy arrays
are already the native type.
For get_all_greeks, return_native=True returns a dict mapping each Greek
name to a native backend array or tensor.
Backend availability¶
| Backend | pip install |
Notes |
|---|---|---|
numpy |
bundled | Always available |
numba |
pip install "fast-vollib[numba]" |
CPU-only; JIT-compiled parallel loops |
torch |
pip install "fast-vollib[torch]" |
CPU wheels are cross-platform; GPU requires CUDA |
jax |
pip install "fast-vollib[jax]" |
CPU-only by default; add jax[cuda13] for GPU |
Numba backend¶
The Numba backend compiles Black-Scholes pricing, Greeks, and the
Halley+bisection IV solver to native machine code via @numba.njit(parallel=True).
Each batch incurs a single Python→native dispatch; the Halley loop and
bisection fallback execute entirely within the compiled kernel.
Kernels are compiled on first call and cached to __pycache__ for
subsequent process starts. Expect a ~1–2 s warm-up on the first call; all
subsequent calls hit the cache and are near-instant.
import fast_vollib
# Use the numba backend for a single call
price = fast_vollib.fast_black_scholes(
flag="c", S=100, K=100, t=0.25, r=0.05, sigma=0.20,
backend="numba",
)
# Or set it process-wide
fast_vollib.set_backend("numba")
When to choose numba over numpy¶
- Large batches (≥ 10 k options) on a CPU-only machine
- Environments where PyTorch / JAX cannot be installed (e.g. minimal Docker images)
- When you need deterministic, portable CPU performance without GPU drivers
Jäckel IV — a separate high-precision solver¶
The fast_vollib.jackel module is not routed through the backend system
described above. It is a self-contained implementation of Peter Jäckel's
"Let's Be Rational" algorithm and exposes one function per backend:
| Backend | Import | Notes |
|---|---|---|
| NumPy + Numba (CPU) | fast_vollib.jackel.jackel_iv.jackel_iv_black |
Parallel Numba kernels; ~8.5 ms / 100k |
| PyTorch (GPU) | fast_vollib.jackel.torch_backend.jackel_iv_black_torch |
torch.compile fused; ~2.7 ms / 100k |
| JAX (GPU) | fast_vollib.jackel.jax_backend.jackel_iv_black_jax |
XLA fused; ~2.4 ms / 100k |
| Triton (GPU) | fast_vollib.jackel.triton_kernels.jackel_iv_triton |
Single-pass kernel; 0.056 ms / 100k |
See Jäckel IV for full documentation and usage examples.