Skip to main content

Llama 3.2 for AI Assistants on Edge Devices

· 3 min read
Astra Team
Astra Team
Synaptics
Use Case Image

Meta recently announced Llama 3.2 with new lightweight text-only models (1B and 3B) designed specifically for edge devices! These include both pre-trained and instruction-tuned versions with a 128K token context length.

Building for Synaptics Astra SL1680

To run Llama 3.2 1B on the Synaptics Astra SL1680, follow these detailed steps:

1. Download the Llama 3.2 1B GGUF Model

  • Visit the Llama-3.2-1B-Instruct-GGUF page on Hugging Face.
  • Download the quantized model file, such as Llama-3.2-1B-Instruct-Q8_0.gguf. This version offers a good balance between performance and memory usage.
  • Transfer the model file to your Astra SL1680 board. You can use scp (secure copy) from your local machine:
    scp Llama-3.2-1B-Instruct-Q8_0.gguf root@astra-ip-address:/home/root/
    Replace astra-ip-address with your board's username and IP.

2. Connect to Your Astra SL1680 Board

  • Open a terminal on your development machine.
  • Start an SSH session to the Astra SL1680:
    ssh root@astra-ip-address
  • Ensure you have internet access on the board for the next steps.

3. Build llama.cpp Natively

Next, build llama.cpp using these commands:

Once you llama-cli binary is built, run:

./build/bin/llama-cli -m ~/Llama-3.2-1B-Instruct-Q8_0.gguf -p "You are a helpful assistant" -cnv -c 4096 -b 512

Llama.cpp will run in interactive mode, so you can chat just as you would with ChatGPT, but on the edge - and of course with a much smaller model - although it is surprisingly good and fast!:

main: interactive mode on. sampler seed: 184923642
sampler params:
repeat_last_n = 64, repeat_penalty = 1.000, frequency_penalty = 0.000, presence_penalty = 0.000
top_k = 40, tfs_z = 1.000, top_p = 0.950, min_p = 0.050, typical_p = 1.000, temp = 0.800
mirostat = 0, mirostat_1r = 0.100, mirostat_ent = 5.000
sampler chain: logits -> logit-bias -> penalties -> top-k -> tail-free -> typical -> top-p -> min-p -> temp-ext -> softmax -> dist
generate: n_ctx = 4096, n_batch = 512, n_predict = -1, n_keep = 1
= Running in interactive mode. =
- Press Ctrl+C to interject at any time.
- Press Return to return control to the AI.
- To return control without starting a new line, end your input with '/'.
- If you want to submit another line, end your input with 'l'.
system
You are a helpful assistant

> Who are Synaptics Inc?

Synaptics Inc. is a technology company that specializes in designing, manufacturing, and selling semiconductor solutions for various industries, including:

Memory Usage on Synaptics Astra SL1680

The table below shows memory consumption when running different Llama 3.2 models on the SL1680:

Memory (MB)SL1680 Mem %Llama-3.2 Model
139341%1B-Instruct-Q4_K_M.gguf
188255%1B-Instruct-Q8_0.gguf
284583%3B-Instruct-Q4_K_M.gguf

Further Reading

For those eager to explore further, here are some detailed, step-by-step tutorials to guide you through the process.