Llama-AIX — LLM Inference on IBM AIX (POWER9+) | LibrePower

$ ./llama-cli \ -m lfm2.5-1.2b.Q4_0.gguf \ -p "Audit this errpt log" \ -n 256 -t 16 # CPU-only inference ✓ # No GPU required ✓

# AIX 7.3 POWER9+ $ gmake -j64 CC=xlc CXX=xlC # LFM2.5 model loading... # ~27 tok/s on CPU # <750 MB memory ✓

Experimental—Proof of Concept

🦙

Llama-AIX

Llama
on AIX

Name: Llama-AIX
Author: LibrePower

LLM inference on IBM AIX. CPU-only, no GPUs.
llama.cpp compiled for AIX 7.3 POWER9+. Running Liquid AI's LFM2.5 at ~27 tok/s.

CPU-only LFM2.5-1.2B On-premises POWER9+ Proof of concept

What is this

Beyond the GPU hype.

Not every LLM workload needs a rack of GPUs. Llama-AIX explores where CPU-only inference makes sense — on the infrastructure you already have.

An honest technical exploration

Llama-AIX is a proof of concept for running LLM inference on IBM AIX 7.3 using only CPU and memory — no GPUs. It's llama.cpp compiled for POWER9+ with IBM Open XL C/C++, running Liquid AI's LFM2.5-1.2B model.

LFM2.5 is a hybrid architecture combining convolutional blocks for speed and attention layers for context, with a 128k token context window. On a POWER9 S924, it achieves ~27 tokens/s using just 8 cores in SMT-2 mode (16 threads) and less than 750 MB of memory. Small enough to run, smart enough to be useful for sysadmin tasks, RAG, and document QA.

What it is

A technical PoC
An honest exploration
An engineering exercise
Open source (GPLv3)

What it isn't

Not a benchmark
Not a complete AI platform
Not competing with GPU solutions
Not "AI marketing"

llama-aix — AIX 7.3 POWER9 S924

# Clone and build

aix $ git clone https://gitlab.com/librepower/llama-aix.git

Cloning into 'llama-aix'... done.

aix $ cd llama-aix && gmake -j64 CC=xlc CXX=xlC

Building llama.cpp for AIX ppc64...

[ 1%] Compiling ggml-cpu.c (VSX enabled)

[ 34%] Compiling llama.cpp

[ 78%] Compiling sampling.cpp

[100%] Build complete ✓

# Optimize threading — sweet spot: 8 cores, SMT-2

aix $ smtctl -t 2 -w now

# Run Liquid AI's LFM2.5-1.2B — CPU only

aix $ ./llama-cli -m lfm2.5-1.2b.Q4_0.gguf -t 16

llama_model_load: format = GGUF V3

llama_model_load: arch = LFM2.5 (hybrid)

llama_model_load: ctx = 128k tokens

llama_model_load: using 16 threads on POWER9 (8 cores SMT-2)

Model loaded. Memory: 742 MB (CPU)

> Audit this errpt: Power/Cooling subsystem Unrecovered Error, bypassed with loss of redundancy. FRU: PWRSPLY

The error usually points to a problem with the power or

cooling hardware (like a fan or power supply) that couldn't

be fixed automatically. Check the fans, especially the power

fans. Replace the FRU at location U78D2.001.WZS00P4...

──────────────────────────────────────────

~27 tok/s | CPU-only ✓ | <750 MB | No GPU

Scroll to run Llama-AIX

Why Power

The architecture advantage.

Power's design philosophy — sustained throughput, massive memory bandwidth, hardware coherence — aligns naturally with LLM inference workloads.

~27

tokens/s with LFM2.5-1.2B

Liquid AI's hybrid architecture achieves real-world inference speeds on POWER9 CPU — fast enough for interactive sysadmin tasks and document QA.

8 cores SMT-2

<750

MB memory footprint

The quantized LFM2.5-1.2B model fits in under 750 MB. No GPU VRAM needed — just system memory on your existing Power.

Q4_0 GGUF

128k

token context window

LFM2.5's hybrid architecture (shortconv + GQA) supports 128k tokens — enough to read thousands of lines of logs without losing context.

hybrid arch

GPUs required

For lightweight models doing on-prem QA, sysadmin assistance, and semantic search — CPU and memory are enough.

CPU-only

New hardware investment

Many Power customers run stable, amortized, mission-critical infrastructure. Llama-AIX lets you explore AI on hardware you already own — no new CAPEX, no GPU procurement, no cloud bills.

existing infrastructure

Use cases

Where CPU inference makes sense.

Not everything needs a GPU farm. These are the scenarios where lightweight, local LLM inference on Power adds real value.

AIX sysadmin assistant

Audit errpt logs, review /etc/passwd, analyze lssrc services. LFM2.5 provides actionable sysadmin guidance directly on your Power system.

RAG pipelines

Retrieval Augmented Generation over your internal documentation. Feed context from your vector store, generate answers locally.

Document QA

Ask questions about internal reports, manuals, and knowledge bases. Get answers without sending data to any cloud.

On-prem assistants

Internal technical assistants that understand your infrastructure. Help desk, runbook automation, operational guidance.

Text analysis

Summarize logs, classify support tickets, extract entities from documents. Local text intelligence for operational data.

Scroll horizontally to explore

Platform

Native on AIX 7.3 POWER9+

Compiled with IBM Open XL C/C++ for AIX. No emulation layers. Direct POWER9+ binary execution.

AIX 7.3

POWER9+

GGUF models

quickstart.sh

# Clone and build on AIX 7.3
$ git clone https://gitlab.com/librepower/llama-aix.git
$ cd llama-aix

# Build with IBM Open XL C/C++
$ gmake -j64 CC=xlc CXX=xlC

# Optimize threading (sweet spot: 8 cores SMT-2)
$ smtctl -t 2 -w now

# Run Liquid AI's LFM2.5-1.2B (GGUF)
$ ./llama-cli \
    -m models/lfm2.5-1.2b.Q4_0.gguf \
    -p "Audit this AIX errpt log" \
    -n 256 -t 16

# ~27 tok/s | <750 MB | CPU-only ✓

Explore

LLMs on your Power infrastructure

Explore CPU-only LLM inference on AIX with Liquid AI's LFM2.5. No GPUs, no cloud, no hype — just engineering.

Talk to us View source on GitLab

Llama on AIX