Llama
on AIX
LLM inference on IBM AIX. CPU-only, no GPUs.
llama.cpp compiled for AIX 7.3 POWER9+. Running Liquid AI's LFM2.5 at ~27 tok/s.
llama.cpp compiled for AIX. Running Liquid AI's LFM2.5 natively.
~27 tokens/s, <750 MB memory. An honest technical exploration of LLMs on Power.
Beyond the GPU hype.
Not every LLM workload needs a rack of GPUs. Llama-AIX explores where CPU-only inference makes sense — on the infrastructure you already have.
An honest technical exploration
Llama-AIX is a proof of concept for running LLM inference on IBM AIX 7.3 using only CPU and memory — no GPUs. It's llama.cpp compiled for POWER9+ with IBM Open XL C/C++, running Liquid AI's LFM2.5-1.2B model.
LFM2.5 is a hybrid architecture combining convolutional blocks for speed and attention layers for context, with a 128k token context window. On a POWER9 S924, it achieves ~27 tokens/s using just 8 cores in SMT-2 mode (16 threads) and less than 750 MB of memory. Small enough to run, smart enough to be useful for sysadmin tasks, RAG, and document QA.
What it is
- A technical PoC
- An honest exploration
- An engineering exercise
- Open source (GPLv3)
What it isn't
- Not a benchmark
- Not a complete AI platform
- Not competing with GPU solutions
- Not "AI marketing"
The architecture advantage.
Power's design philosophy — sustained throughput, massive memory bandwidth, hardware coherence — aligns naturally with LLM inference workloads.
tokens/s with LFM2.5-1.2B
Liquid AI's hybrid architecture achieves real-world inference speeds on POWER9 CPU — fast enough for interactive sysadmin tasks and document QA.
8 cores SMT-2MB memory footprint
The quantized LFM2.5-1.2B model fits in under 750 MB. No GPU VRAM needed — just system memory on your existing Power.
Q4_0 GGUFtoken context window
LFM2.5's hybrid architecture (shortconv + GQA) supports 128k tokens — enough to read thousands of lines of logs without losing context.
hybrid archGPUs required
For lightweight models doing on-prem QA, sysadmin assistance, and semantic search — CPU and memory are enough.
CPU-onlyNew hardware investment
Many Power customers run stable, amortized, mission-critical infrastructure. Llama-AIX lets you explore AI on hardware you already own — no new CAPEX, no GPU procurement, no cloud bills.
existing infrastructureWhere CPU inference makes sense.
Not everything needs a GPU farm. These are the scenarios where lightweight, local LLM inference on Power adds real value.
AIX sysadmin assistant
Audit errpt logs, review /etc/passwd, analyze lssrc services. LFM2.5 provides actionable sysadmin guidance directly on your Power system.
RAG pipelines
Retrieval Augmented Generation over your internal documentation. Feed context from your vector store, generate answers locally.
Document QA
Ask questions about internal reports, manuals, and knowledge bases. Get answers without sending data to any cloud.
On-prem assistants
Internal technical assistants that understand your infrastructure. Help desk, runbook automation, operational guidance.
Text analysis
Summarize logs, classify support tickets, extract entities from documents. Local text intelligence for operational data.
Native on AIX 7.3 POWER9+
Compiled with IBM Open XL C/C++ for AIX. No emulation layers. Direct POWER9+ binary execution.
# Clone and build on AIX 7.3 $ git clone https://gitlab.com/librepower/llama-aix.git $ cd llama-aix # Build with IBM Open XL C/C++ $ gmake -j64 CC=xlc CXX=xlC # Optimize threading (sweet spot: 8 cores SMT-2) $ smtctl -t 2 -w now # Run Liquid AI's LFM2.5-1.2B (GGUF) $ ./llama-cli \ -m models/lfm2.5-1.2b.Q4_0.gguf \ -p "Audit this AIX errpt log" \ -n 256 -t 16 # ~27 tok/s | <750 MB | CPU-only ✓
LLMs on your Power infrastructure
Explore CPU-only LLM inference on AIX with Liquid AI's LFM2.5. No GPUs, no cloud, no hype — just engineering.