At the Department of Information, Electronics and Telecommunications Engineering (DIET) at Sapienza University of Rome we fondly remember Prof. Elio Di Claudio, full professor of Circuit Theory and Master's degree courses, who passed away too soon. In the course "Sensor Arrays" he used to strongly advise students to study the so-called "Matrix computation", a very influential book in numerical algebra from 1983, by Gene H. Golub and Charles F. Van Loan. In the vast landscape of artificial intelligence and deep learning, it's easy to overlook the foundational algorithms that silently power even the most advanced models. But beneath every Transformer, every Large Language Model, and every GPU-accelerated training loop, there are humble, decades-old operations still doing the heavy lifting. Among these are SAXPY and GAXPY — terms that may sound obscure today for non-specialists, but which remain essential even in the age of ChatGPT and GPT-4.
"Matrix computation" remains an essential guide today in low-level routines that require advanced algebraic calculations and Machine Learning is full of it, as is generative AI. Hence, this book remains a reference point for anyone working with matrix algorithms, from theoretical researchers to engineers designing high-performance scientific computing systems.
Let's do a very brief review on SAXPY and GAXPY operations.
SAXPY stands for Scalar A times X Plus Y, and it's a simple vector update operation. Formally, it computes:
Where
This operation appears in Level 1 of the BLAS (Basic Linear Algebra Subprograms), and expresses one of the most frequent patterns in linear algebra: updating a vector based on another scaled vector. Here’s a basic Python implementation using NumPy:
import numpy as np
alpha = 2.0
x = np.array([1, 2, 3])
y = np.array([4, 5, 6])
y = alpha * x + y
print(y) # Output: [6. 9. 12.]
GAXPY, on the other hand, generalizes this idea to matrices. It stands for Generalized A times X Plus Y and describes a column-oriented approach to matrix-vector multiplication. Instead of computing the dot product of each row of a matrix
Where
A = np.array([[1, 2],
[3, 4],
[5, 6]])
x = np.array([10, 20])
y = np.zeros(3)
for j in range(len(x)):
y += x[j] * A[:, j]
print(y) # Output: [ 50. 110. 170.]
Now, you might be wondering: what do these old operations have to do with modern deep learning models like Transformers or GPT? The answer is—everything.
At the heart of every neural network layer, especially in Transformers, lie massive matrix multiplications. The attention mechanism alone involves computing
Which is essentially a SAXPY operation again. Deep learning frameworks like PyTorch, TensorFlow, and JAX don't expose SAXPY directly, but they all build on top of libraries that implement it. Under the hood, PyTorch uses cuBLAS for NVIDIA GPUs and MKL or OpenBLAS for CPUs. These libraries include high-performance versions of SAXPY, GEMV, GEMM, and related routines. These are the building blocks of every forward and backward pass in neural networks.
On GPUs, especially when training large models, these operations are optimized using techniques like kernel fusion. A single SAXPY might not be efficient on its own because it’s memory-bound, but when fused with other operations, or applied over millions of parameters in parallel, it becomes incredibly effective. Libraries like cuBLAS, XLA (in JAX), and Triton (used by some PyTorch kernels) apply massive parallelism and scheduling strategies to run these operations efficiently on thousands of GPU cores.
So even though today’s machine learning models deal with billions of parameters and require massive compute, the core operations remain surprisingly simple. The genius lies in the layering, optimization, and orchestration—not in reinventing the algebra.
To understand modern AI, it’s worth remembering that every Transformer is still built on operations described in Matrix Computations. SAXPY and GAXPY are not relics of the past; they are the silent workhorses of today’s AI revolution.
As Golub and Van Loan reminded us decades ago, understanding these basic patterns is not only useful—it's essential. Because while the models have changed, the math hasn't.
---
This post is written in memory of Prof. Elio Di Claudio for not having been able to see the wonders of algebraic and matrix techniques in LLMs and generative AI, as he passed away a few months before the fateful November 30, 2022, the launch date of ChatGPT to the general public. I am sure that he would have studied and known these architectures perfectly and would have been available, as always, to systematize his already extensive technical knowledge.