Skip to main content

Build llama-cpp-python wheel for Astra Machina Board

This tutorial walks you through building and installing the llama-cpp-python wheel on the Synaptics Astra Machina SL Series board, enabling local LLM inference with llama.cpp using Python bindings.

Prerequisites

Ensure your Astra Machina board has Astra SDK OOBE Image , if not please update to OOBE image. Get latest SDK from here.

Step-by-Step Instructions

1. Clone the repo with submodules

git clone --recurse-submodules https://github.com/abetlen/llama-cpp-python.git
cd llama-cpp-python

The --recurse-submodules flag is essential to pull in the llama.cpp backend C++ code.

2. Set up a Python virtual environment

python3 -m venv .venv 
source .venv/bin/activate

This keeps your environment clean and avoids polluting the system Python.


3. Install build tools for Python

pip install build wheel

These tools are needed to build the Python wheel locally from source.

4. Build an optimized wheel for Astra

CMAKE_ARGS="-DLLAMA_NATIVE=ON" python -m build -w llama-cpp-python

This command:

  • Uses the current CPU (ARM Cortex on Astra) to optimize the llama.cpp backend with -march=native
  • Builds a .whl file in the dist/ directory

5. Install the built wheel

After building, the wheel will be located in the dist/ directory inside your llama-cpp-python folder.

Example:

ll llama-cpp-python/dist/
# Output:
# llama_cpp_python-0.3.16-cp312-cp312-linux_aarch64.whl
pip install dist/llama_cpp_python-0.3.16-cp312-cp312-linux_aarch64.whl

This will install the native-compiled Python binding and shared library.

Directory Structure After Build

llama-cpp-python/
├── llama_cpp/ ← Python wrapper
├── vendor/llama.cpp/ ← Submodule (C++ backend)
├── dist/ ← Built wheel file lives here
└── .venv/ ← Python virtual environment

References