vLLM for IBM Power Linux ppc64le — CPU LLM Inference | LibrePower

v0.9.2 · ppc64le · Ubuntu · RHEL · Fedora

vLLM

OpenAI-compatible LLM inference
on IBM POWER. No GPU required.

Pre-built ppc64le packages. Skip the source build, skip the GPU bill — POWER10 and POWER11 do the work with MMA acceleration. Same OpenAI API your apps already speak.

Install See models GitLab

Packaged by LibrePower · Backend for lpai

admin@power10:~ — vllm serve

Ubuntu 22 / 24

Debian

RHEL 9 / 10

Rocky 9

AlmaLinux 9

Fedora 41 / 42

Why vLLM on POWER

The hardware is already there.
Stop renting GPUs to talk to it.

Your POWER systems already run mission-critical workloads. Adding LLM inference shouldn't mean a separate GPU farm, a new vendor relationship, and a compliance review. vLLM on ppc64le runs the same OpenAI-compatible API every framework speaks — on the CPUs you already paid for.

CPU-Only Inference

No GPU, no CUDA, no driver headaches. Run on POWER9, POWER10 and POWER11 with bfloat16 weights. POWER10+ uses MMA (Matrix Math Assist) for substantial speedups on 7B+ models.

OpenAI-Compatible API

Drop-in replacement for the OpenAI endpoint. Every SDK, framework and tool that speaks OpenAI works unchanged — including lpai, LangChain, LlamaIndex.

Pre-Built. No Compiling.

Native .deb and .rpm packages for ppc64le. apt install python3-vllm and you're running. No source builds, no missing wheels.

Tested on POWER hardware

Pick a model.
Match it to your workload.

Every model below has been tested with CPU inference on real POWER systems. RAM figures are bfloat16, single instance.

Qwen2.5-0.5B-Instruct 0.5B

1 GB RAM Real-time

Fast classification, filtering, real-time log monitoring. Perfect for lpai classify and lpai watch.

Qwen2.5-1.5B-Instruct 1.5B

3 GB RAM Fast

Improved quality while staying fast. Sweet spot for routing, simple summaries, and structured extraction tasks.

Recommended

Qwen2.5-7B-Instruct 7B

14 GB RAM Reasoning

Diagnosis, masking, error decoding, multi-step reasoning. The all-rounder for serious sysadmin work — and where MMA acceleration starts to shine.

For Code

Qwen2.5-Coder-7B-Instruct 7B

14 GB RAM Code

Specialized for code-related tasks. RPG IV analysis, COBOL translation, refactoring suggestions, test case generation.

How it fits together

One server.
Every client speaks it.

Install in seconds

One package. Pick your distro.

# Add LibrePower repository (one time) curl -fsSL https://linux.librepower.org/install.sh | sudo sh # Install vLLM sudo apt update sudo apt install python3-vllm

# Add LibrePower repository (one time) curl -fsSL https://linux.librepower.org/install.sh | sudo sh # Install vLLM sudo dnf install python3-vllm

PyTorch dependency

vLLM requires PyTorch for ppc64le. Install the CPU build from the official PyTorch index:

pip3 install torch --extra-index-url https://download.pytorch.org/whl/cpu

Quick start

# Start vLLM with a small model
python3 -m vllm.entrypoints.openai.api_server \
    --model Qwen/Qwen2.5-0.5B-Instruct \
    --device cpu --dtype bfloat16 --port 8000

# Query it (OpenAI-compatible)
curl -s localhost:8000/v1/chat/completions \
    -d '{"model":"Qwen/Qwen2.5-0.5B-Instruct",
        "messages":[{"role":"user","content":"Hello"}]}'

Companion Tool

Pair it with lpai.

vLLM is the recommended local backend for lpai — the AI-powered sysadmin toolkit for POWER. Together they let you classify logs, diagnose incidents, decode error codes and audit security — entirely on your own hardware, with zero data leaving the machine.

22 commands, 40 code translation pairs, 5 compliance frameworks. All powered by the model you choose, hosted by vLLM, running on POWER.

Explore lpai

admin@power10:~ — vllm + lpai

# Install both
sudo apt install python3-vllm lpai

# Start vLLM in background
python3 -m vllm.entrypoints.openai.api_server \
    --model Qwen/Qwen2.5-7B-Instruct \
    --device cpu --dtype bfloat16 &

# Use lpai — all data stays on your machine
journalctl --since today | lpai classify
lpai decode "CPF4131"
cat report.txt | lpai mask > safe.txt

✓ Zero network, zero cloud, full POWER.

Need help sizing vLLM for your POWER fleet?

SIXE can help you pick models, tune memory and integrate with your existing stack.

Click here to contact us

Run LLMs
on POWER.

Subscribe for releases, model recommendations and POWER community news.

GitLab lpai Newsletter Contact