vLLM for IBM Power Linux ppc64le — CPU LLM Inference | LibrePower
v0.9.2 · ppc64le · Ubuntu · RHEL · Fedora

vLLM

OpenAI-compatible LLM inference
on IBM POWER. No GPU required.

Pre-built ppc64le packages. Skip the source build, skip the GPU bill — POWER10 and POWER11 do the work with MMA acceleration. Same OpenAI API your apps already speak.

Packaged by LibrePower · Backend for lpai
admin@power10:~ — vllm serve
Ubuntu 22 / 24
Debian
RHEL 9 / 10
Rocky 9
AlmaLinux 9
Fedora 41 / 42
Why vLLM on POWER

The hardware is already there.
Stop renting GPUs to talk to it.

Your POWER systems already run mission-critical workloads. Adding LLM inference shouldn't mean a separate GPU farm, a new vendor relationship, and a compliance review. vLLM on ppc64le runs the same OpenAI-compatible API every framework speaks — on the CPUs you already paid for.

CPU-Only Inference

No GPU, no CUDA, no driver headaches. Run on POWER9, POWER10 and POWER11 with bfloat16 weights. POWER10+ uses MMA (Matrix Math Assist) for substantial speedups on 7B+ models.

OpenAI-Compatible API

Drop-in replacement for the OpenAI endpoint. Every SDK, framework and tool that speaks OpenAI works unchanged — including lpai, LangChain, LlamaIndex.

Pre-Built. No Compiling.

Native .deb and .rpm packages for ppc64le. apt install python3-vllm and you're running. No source builds, no missing wheels.

Tested on POWER hardware

Pick a model.
Match it to your workload.

Every model below has been tested with CPU inference on real POWER systems. RAM figures are bfloat16, single instance.

Qwen2.5-0.5B-Instruct 0.5B
1 GB RAM Real-time

Fast classification, filtering, real-time log monitoring. Perfect for lpai classify and lpai watch.

Qwen2.5-1.5B-Instruct 1.5B
3 GB RAM Fast

Improved quality while staying fast. Sweet spot for routing, simple summaries, and structured extraction tasks.

How it fits together

One server.
Every client speaks it.

lpai LangChain Your apps curl / SDK vLLM OpenAI /v1/chat/completions PORT 8000 IBM POWER POWER9 VSX SIMD POWER10 + MMA POWER11 + MMA gen2 CLIENTS INFERENCE SERVER HARDWARE
Install in seconds

One package. Pick your distro.

# Add LibrePower repository (one time) curl -fsSL https://linux.librepower.org/install.sh | sudo sh # Install vLLM sudo apt update sudo apt install python3-vllm
# Add LibrePower repository (one time) curl -fsSL https://linux.librepower.org/install.sh | sudo sh # Install vLLM sudo dnf install python3-vllm
# Add LibrePower repository (one time) curl -fsSL https://linux.librepower.org/install.sh | sudo sh # Install vLLM sudo dnf install python3-vllm

PyTorch dependency

vLLM requires PyTorch for ppc64le. Install the CPU build from the official PyTorch index:

pip3 install torch --extra-index-url https://download.pytorch.org/whl/cpu

Quick start

# Start vLLM with a small model
python3 -m vllm.entrypoints.openai.api_server \
    --model Qwen/Qwen2.5-0.5B-Instruct \
    --device cpu --dtype bfloat16 --port 8000

# Query it (OpenAI-compatible)
curl -s localhost:8000/v1/chat/completions \
    -d '{"model":"Qwen/Qwen2.5-0.5B-Instruct",
        "messages":[{"role":"user","content":"Hello"}]}'
Companion Tool

Pair it with lpai.

vLLM is the recommended local backend for lpai — the AI-powered sysadmin toolkit for POWER. Together they let you classify logs, diagnose incidents, decode error codes and audit security — entirely on your own hardware, with zero data leaving the machine.

22 commands, 40 code translation pairs, 5 compliance frameworks. All powered by the model you choose, hosted by vLLM, running on POWER.

admin@power10:~ — vllm + lpai
# Install both
sudo apt install python3-vllm lpai

# Start vLLM in background
python3 -m vllm.entrypoints.openai.api_server \
    --model Qwen/Qwen2.5-7B-Instruct \
    --device cpu --dtype bfloat16 &

# Use lpai — all data stays on your machine
journalctl --since today | lpai classify
lpai decode "CPF4131"
cat report.txt | lpai mask > safe.txt

 Zero network, zero cloud, full POWER.

Need help sizing vLLM for your POWER fleet?

SIXE can help you pick models, tune memory and integrate with your existing stack.

Click here to contact us

Run LLMs
on POWER.

Subscribe for releases, model recommendations and POWER community news.