Build llama-cpp-python wheel for Astra Machina Board
This tutorial walks you through building and installing the llama-cpp-python wheel on the Synaptics Astra Machina SL Series board, enabling local LLM inference with llama.cpp using Python bindings.
Prerequisites
Ensure your Astra Machina board has Astra SDK OOBE Image , if not please update to OOBE image. Get latest SDK from here.
Step-by-Step Instructions
1. Clone the repo with submodules
git clone --recurse-submodules https://github.com/abetlen/llama-cpp-python.git
cd llama-cpp-python
The --recurse-submodules flag is essential to pull in the llama.cpp backend C++ code.
2. Set up a Python virtual environment
python3 -m venv .venv
source .venv/bin/activate
This keeps your environment clean and avoids polluting the system Python.
3. Install build tools for Python
pip install build wheel
These tools are needed to build the Python wheel locally from source.
4. Build an optimized wheel for Astra
CMAKE_ARGS="-DLLAMA_NATIVE=ON" python -m build -w llama-cpp-python
This command:
- Uses the current CPU (ARM Cortex on Astra) to optimize the
llama.cppbackend with-march=native - Builds a
.whlfile in thedist/directory
5. Install the built wheel
After building, the wheel will be located in the dist/ directory inside your llama-cpp-python folder.
Example:
ll llama-cpp-python/dist/
# Output:
# llama_cpp_python-0.3.16-cp312-cp312-linux_aarch64.whl
pip install dist/llama_cpp_python-0.3.16-cp312-cp312-linux_aarch64.whl
This will install the native-compiled Python binding and shared library.
Directory Structure After Build
llama-cpp-python/
├── llama_cpp/ ← Python wrapper
├── vendor/llama.cpp/ ← Submodule (C++ backend)
├── dist/ ← Built wheel file lives here
└── .venv/ ← Python virtual environment