Llama 3.2 for AI Assistants on Edge Devices

Meta recently announced Llama 3.2 with new lightweight text-only models (1B and 3B) designed specifically for edge devices! These include both pre-trained and instruction-tuned versions with a 128K token context length.
Building for Synaptics Astra SL1680
To run Llama 3.2 1B on the Synaptics Astra SL1680, follow these detailed steps:
1. Download the Llama 3.2 1B GGUF Model
- Visit the Llama-3.2-1B-Instruct-GGUF page on Hugging Face.
- Download the quantized model file, such as
Llama-3.2-1B-Instruct-Q8_0.gguf. This version offers a good balance between performance and memory usage. - Transfer the model file to your Astra SL1680 board. You can use
scp(secure copy) from your local machine:Replacescp Llama-3.2-1B-Instruct-Q8_0.gguf root@astra-ip-address:/home/root/astra-ip-addresswith your board's username and IP.
2. Connect to Your Astra SL1680 Board
- Open a terminal on your development machine.
- Start an SSH session to the Astra SL1680:
ssh root@astra-ip-address - Ensure you have internet access on the board for the next steps.
3. Build llama.cpp Natively
Next, build llama.cpp using these commands:
Once you llama-cli binary is built, run:
./build/bin/llama-cli -m ~/Llama-3.2-1B-Instruct-Q8_0.gguf -p "You are a helpful assistant" -cnv -c 4096 -b 512
Llama.cpp will run in interactive mode, so you can chat just as you would with ChatGPT, but on the edge - and of course with a much smaller model - although it is surprisingly good and fast!:
main: interactive mode on. sampler seed: 184923642
sampler params:
repeat_last_n = 64, repeat_penalty = 1.000, frequency_penalty = 0.000, presence_penalty = 0.000
top_k = 40, tfs_z = 1.000, top_p = 0.950, min_p = 0.050, typical_p = 1.000, temp = 0.800
mirostat = 0, mirostat_1r = 0.100, mirostat_ent = 5.000
sampler chain: logits -> logit-bias -> penalties -> top-k -> tail-free -> typical -> top-p -> min-p -> temp-ext -> softmax -> dist
generate: n_ctx = 4096, n_batch = 512, n_predict = -1, n_keep = 1
= Running in interactive mode. =
- Press Ctrl+C to interject at any time.
- Press Return to return control to the AI.
- To return control without starting a new line, end your input with '/'.
- If you want to submit another line, end your input with 'l'.
system
You are a helpful assistant
> Who are Synaptics Inc?
Synaptics Inc. is a technology company that specializes in designing, manufacturing, and selling semiconductor solutions for various industries, including:
Memory Usage on Synaptics Astra SL1680
The table below shows memory consumption when running different Llama 3.2 models on the SL1680:
| Memory (MB) | SL1680 Mem % | Llama-3.2 Model |
|---|---|---|
| 1393 | 41% | 1B-Instruct-Q4_K_M.gguf |
| 1882 | 55% | 1B-Instruct-Q8_0.gguf |
| 2845 | 83% | 3B-Instruct-Q4_K_M.gguf |
Further Reading
For those eager to explore further, here are some detailed, step-by-step tutorials to guide you through the process.