Embeddings

Embedding models transform text, images, or signals into vectors that represent meaning. They preserve relationships in a structured way, making them useful for edge AI tasks like natural language understanding, semantic searches, recommendation systems, and context awareness. Embeddings models are encoder-only transformers (i.e. they are tranformers, but not generative) - this limitation gives them two useful properties for constrained edge AI applications in comparison to LLMs; the models are smaller, and they do not hallucinate.

King - Man + Woman = Queen

Expressed as Python arrays:

embedding_vectors = {
    "King":   [1.0, 1.0],
    "Queen":  [0.0, 1.0],
    "Man":    [1.0, 0.0],
    "Woman":  [0.0, 0.0]
}

The vector arithmetic holds:

King - Man + Woman = Queen

Tokens

LLMs and embedding models operate on numbers, not raw text. A tokenizer preprocesses text into an array of numbers - each of which represents a word, subword, or character before passing the tokens to a language model. You can try the interactive GPT-4 tokenizer below:

Installing `llama-cpp-python`

note

This assumes you are familiar with setting up your Astra board. If not, please refer to the setup tutorial.

info

This quick guide is compatible with all SL16xx boards. While inference may vary, the steps remain the same across all Astra SL-Series processors.

To run embeddings on your Astra board, we will use the llama-cpp-python package which provides a convenient Python binding for Georgi Gerganov's llamacpp. This wheel is already installed with the requirements.txt (For Astra OOBE SDK 1.8 (kirkstone) and below) or requirements-py312.txt (For Astra OOBE SDK 2.0 (scarthgap) and above).

SQLite3 is required for certain AI model operations. Astra SDK OOBE v1.7 images and above already have SQLite3 pre-installed. For previous version, install it using the following commands:

Embeddings on the edge

Let's try running a full text embedding model sentence-transformers/all-MiniLM-L6-v2 - one of the most popular models on Hugging Face. The model has been finetuned with 1 billion sentence pairs extracted from Reddit, Stack Overflow and Yahoo! Answers among other sources.

Our code example runs the model with a string input, returning a 384 dimension vector as an answer:

python3 -m embeddings.minilm "synaptics astra example!"

You would see a lot of numbers. Let's see some practical application of these vectors in the next Quick guide AI Text Assistant, where we will build a simple question answering assistant.

Embeddings​

Tokens​

Installing llama-cpp-python​

Embeddings on the edge​

Embeddings

Tokens

Installing `llama-cpp-python`

Embeddings on the edge