Llama-AIX — LLM Inference on IBM AIX (POWER9+) | LibrePower
GGUF Q4_0
$ ./llama-cli \ -m lfm2.5-1.2b.Q4_0.gguf \ -p "Audit this errpt log" \ -n 256 -t 16 # CPU-only inference ✓ # No GPU required ✓
# AIX 7.3 POWER9+ $ gmake -j64 CC=xlc CXX=xlC # LFM2.5 model loading... # ~27 tok/s on CPU # <750 MB memory ✓
ExperimentalProof of Concept
🦙
Llama-AIX

Llama
on AIX

LLM inference on IBM AIX. CPU-only, no GPUs.
llama.cpp compiled for AIX 7.3 POWER9+. Running Liquid AI's LFM2.5 at ~27 tok/s.

CPU-only LFM2.5-1.2B On-premises POWER9+ Proof of concept

llama.cpp compiled for AIX. Running Liquid AI's LFM2.5 natively.

~27 tokens/s, <750 MB memory. An honest technical exploration of LLMs on Power.

LibrePower
What is this

Beyond the GPU hype.

Not every LLM workload needs a rack of GPUs. Llama-AIX explores where CPU-only inference makes sense — on the infrastructure you already have.

An honest technical exploration

Llama-AIX is a proof of concept for running LLM inference on IBM AIX 7.3 using only CPU and memory — no GPUs. It's llama.cpp compiled for POWER9+ with IBM Open XL C/C++, running Liquid AI's LFM2.5-1.2B model.

LFM2.5 is a hybrid architecture combining convolutional blocks for speed and attention layers for context, with a 128k token context window. On a POWER9 S924, it achieves ~27 tokens/s using just 8 cores in SMT-2 mode (16 threads) and less than 750 MB of memory. Small enough to run, smart enough to be useful for sysadmin tasks, RAG, and document QA.

What it is

  • A technical PoC
  • An honest exploration
  • An engineering exercise
  • Open source (GPLv3)

What it isn't

  • Not a benchmark
  • Not a complete AI platform
  • Not competing with GPU solutions
  • Not "AI marketing"
llama-aix — AIX 7.3 POWER9 S924
# Clone and build
aix $ git clone https://gitlab.com/librepower/llama-aix.git
Cloning into 'llama-aix'... done.
aix $ cd llama-aix && gmake -j64 CC=xlc CXX=xlC
Building llama.cpp for AIX ppc64...
[ 1%] Compiling ggml-cpu.c (VSX enabled)
[ 34%] Compiling llama.cpp
[ 78%] Compiling sampling.cpp
[100%] Build complete ✓
# Optimize threading — sweet spot: 8 cores, SMT-2
aix $ smtctl -t 2 -w now
# Run Liquid AI's LFM2.5-1.2B — CPU only
aix $ ./llama-cli -m lfm2.5-1.2b.Q4_0.gguf -t 16
llama_model_load: format = GGUF V3
llama_model_load: arch = LFM2.5 (hybrid)
llama_model_load: ctx = 128k tokens
llama_model_load: using 16 threads on POWER9 (8 cores SMT-2)
Model loaded. Memory: 742 MB (CPU)
> Audit this errpt: Power/Cooling subsystem Unrecovered Error, bypassed with loss of redundancy. FRU: PWRSPLY
The error usually points to a problem with the power or
cooling hardware (like a fan or power supply) that couldn't
be fixed automatically. Check the fans, especially the power
fans. Replace the FRU at location U78D2.001.WZS00P4...
──────────────────────────────────────────
~27 tok/s | CPU-only ✓ | <750 MB | No GPU
Scroll to run Llama-AIX
Why Power

The architecture advantage.

Power's design philosophy — sustained throughput, massive memory bandwidth, hardware coherence — aligns naturally with LLM inference workloads.

~27

tokens/s with LFM2.5-1.2B

Liquid AI's hybrid architecture achieves real-world inference speeds on POWER9 CPU — fast enough for interactive sysadmin tasks and document QA.

8 cores SMT-2
<750

MB memory footprint

The quantized LFM2.5-1.2B model fits in under 750 MB. No GPU VRAM needed — just system memory on your existing Power.

Q4_0 GGUF
128k

token context window

LFM2.5's hybrid architecture (shortconv + GQA) supports 128k tokens — enough to read thousands of lines of logs without losing context.

hybrid arch
0

GPUs required

For lightweight models doing on-prem QA, sysadmin assistance, and semantic search — CPU and memory are enough.

CPU-only
$0

New hardware investment

Many Power customers run stable, amortized, mission-critical infrastructure. Llama-AIX lets you explore AI on hardware you already own — no new CAPEX, no GPU procurement, no cloud bills.

existing infrastructure
Use cases

Where CPU inference makes sense.

Not everything needs a GPU farm. These are the scenarios where lightweight, local LLM inference on Power adds real value.

AIX sysadmin assistant

Audit errpt logs, review /etc/passwd, analyze lssrc services. LFM2.5 provides actionable sysadmin guidance directly on your Power system.

RAG pipelines

Retrieval Augmented Generation over your internal documentation. Feed context from your vector store, generate answers locally.

Document QA

Ask questions about internal reports, manuals, and knowledge bases. Get answers without sending data to any cloud.

On-prem assistants

Internal technical assistants that understand your infrastructure. Help desk, runbook automation, operational guidance.

Text analysis

Summarize logs, classify support tickets, extract entities from documents. Local text intelligence for operational data.

Scroll horizontally to explore

Platform

Native on AIX 7.3 POWER9+

Compiled with IBM Open XL C/C++ for AIX. No emulation layers. Direct POWER9+ binary execution.

AIX 7.3
POWER9+
GGUF models
quickstart.sh
# Clone and build on AIX 7.3
$ git clone https://gitlab.com/librepower/llama-aix.git
$ cd llama-aix

# Build with IBM Open XL C/C++
$ gmake -j64 CC=xlc CXX=xlC

# Optimize threading (sweet spot: 8 cores SMT-2)
$ smtctl -t 2 -w now

# Run Liquid AI's LFM2.5-1.2B (GGUF)
$ ./llama-cli \
    -m models/lfm2.5-1.2b.Q4_0.gguf \
    -p "Audit this AIX errpt log" \
    -n 256 -t 16

# ~27 tok/s | <750 MB | CPU-only ✓
Explore

LLMs on your Power infrastructure

Explore CPU-only LLM inference on AIX with Liquid AI's LFM2.5. No GPUs, no cloud, no hype — just engineering.