CIPARLABS-research

Thursday, July 2, 2026

CIPARLABS Team at IEEE WCCI 2026

CIPARLABS took part in IEEE WCCI 2026 with a broad set of scientific contributions and organizational activities, reflecting the group’s research agenda across Computational Intelligence, machine learning, complex systems, smart grids, bioinformatics, and sustainable energy management.

Within the Special Session «Scientific ML and Bio-Physical Sensing», the group presented the paper «Decoding Functional Multiplicity: Graph Learning Approaches to Multifunctional Proteins». The work addresses the prediction of protein multifunctionality from three-dimensional molecular structures. Proteins are represented as residue contact networks, enabling the use of graph-based learning methods, including simplicial-complex embeddings, graph kernels, and Graph Neural Networks. The study frames the problem as a multi-label classification task over first-level Enzyme Commission classes and shows that topological structural representations can capture meaningful signals related to the functional multiplicity of proteins.

Two further contributions were presented in the Special Session «Smart Energy, Grid, and Infrastructure», including IJCNN SS35 «Computational Intelligence Techniques for Observable Smart Grid and Sustainable Energy Systems».

The first paper, «Forward–Forward Learning for Imbalanced Tabular Predictive Maintenance on a Real-World Smart-Grid Fault Dataset», investigates a stabilized Forward–Forward learning formulation for predictive maintenance in medium-voltage smart grids. The study evaluates layer-local learning on an imbalanced real-world fault dataset from the Rome power distribution grid, comparing it with standard tabular baselines and back-propagation-based MLPs. The results show that Forward–Forward learning is competitive in terms of PR-AUC and F1-score, while Random Forests remain the strongest overall baseline in the considered setting. The work also analyzes calibration, goodness-margin dynamics, stabilization mechanisms, and timing performance, offering a detailed view of the practical viability of Forward–Forward learning for smart-grid fault detection.

The second paper, «Simulation of Microgrid Energy Management under Battery Degradation Costs: a PPO-Based Reinforcement Learning Approach», focuses on residential microgrid energy management with photovoltaic generation and battery storage. The study introduces a degradation-aware simulation framework in which the battery system is modeled through a Battery Management System based on an equivalent circuit model with State-of-Health-dependent parameters. A Proximal Policy Optimization controller is trained to determine battery charge and discharge setpoints and is compared with a rule-based controller and an oracle Model Predictive Control benchmark. The results show that the learned policies outperform the rule-based baseline and approach the oracle MPC behavior in medium battery-utilization regimes, balancing short-term energy costs with long-term battery degradation.

Alongside the scientific presentations, CIPARLABS also contributed to the organization of IEEE WCCI 2026 through two Special Sessions.

The Special Session «AI for Energy and Resource Analytics», chaired by Enrico De Santis, included the session «Computational Intelligence and AI Applications for Sustainable Energy Management in Smart Grids and Energy Communities», CISEM. The session was organized by Enrico De Santis, Antonello Rizzi, and Danial Zendehdel from the Department of Information Engineering, Electronics and Telecommunications at Sapienza University of Rome.

Session page
https://sites.google.com/uniroma1.it/wcci-ijcnn-cisem2026/home-page

A second Special Session, «AICS: Computational Intelligence for Complex Systems», was organized by Alessio Martino, from the Department of AI, Data and Decision Sciences at LUISS University, together with Enrico De Santis and Antonello Rizzi, from the Department of Information Engineering, Electronics and Telecommunications at Sapienza University of Rome. The session gathered contributions devoted to the use of Computational Intelligence for modeling, analyzing, and interpreting complex systems.

Session page
https://sites.google.com/uniroma1.it/wcci-ijcnn-aics2026

Overall, the participation of CIPARLABS at IEEE WCCI 2026 highlights the group’s interdisciplinary research activity, ranging from graph learning for biological systems to predictive maintenance in power infrastructures and reinforcement learning for intelligent energy management. These contributions document the group’s commitment to developing advanced Computational Intelligence methods for scientific, technological, and industrial problems with significant societal and environmental impact.

Conference Proceedings can be downloade here.

See you at IJCNN 2027 in South Africa!

Saturday, April 18, 2026

The CIPARLABS team participates in the annual Open Diet 2026 event at the Faculty of Engineering, Department of Information Engineering, Electronics and Telecommunications (DIET), University of Rome "La Sapienza"

The CIPARLABS team is participating in the annual Open Diet 2026 event. This event is held at the Faculty of Engineering, Department of Information Engineering, Electronics and Telecommunications (DIET), University of Rome "La Sapienza".

It's an opportunity to meet high school students or undergraduates and showcase both the educational offerings and the research conducted in the laboratories by the various research teams. CIPARLABS strongly believes in this event because, in addition to disseminating knowledge, it also provides a team-building opportunity and a meeting point for all the DIET staff participating.

Our stand presents the Pattern Recognition and Computational Intelligence courses offered for the Master's Degree in Electronic Engineering and the Master's Degree in Telecommunications Engineering.

We presented a Renewable Energy Community simulator with a graphical interface that provides a simple understanding of how Artificial Intelligence and optimization techniques can be used to manage energy flows, piloting renewable sources and energy storage. The tool was developed for educational purposes.

See you next year at Open Diet 2027!

Thursday, March 12, 2026

Beyond Perplexity: A Multi-Faceted Analysis of a Novel Densely Connected Transformer

We are pleased to announce the publication of our new paper, Beyond Perplexity: A Multi-Faceted Analysis of a Novel Densely Connected Transformer. This study examines a central question in contemporary language modeling research. If a Transformer decoder is redesigned so that each layer can access the representations produced by all previous layers, can this richer internal connectivity improve performance in a meaningful way?

The idea is appealing for a clear reason. In several areas of deep learning, dense connectivity has been associated with improved feature reuse, shorter information paths, and potentially more favorable optimization behavior. In our paper, we test whether this intuition also holds for decoder-only autoregressive Transformers, the architectural family underlying many modern large language models.

To address this question rigorously, we designed a methodology aimed at isolating the effect of connectivity from other factors that often confound architectural comparisons. We compared a standard baseline Transformer decoder with a densely connected decoder on two well-known benchmarks for language modeling, Penn Treebank and WikiText-2. The comparison was carried out under two controlled fairness regimes. In the first, both model families were evaluated under the same training recipe, with shared optimization settings and learning-rate search. In the second, the comparison was constrained by the same parameter budget, so that the dense model could not exceed the baseline in parameter count. This distinction was important because it allowed us to separate the possible effect of dense historical connectivity from the simpler effect of adding more capacity.

The experimental setup was complemented by a precise implementation choice. Both datasets were processed with word-level tokenization, using official train, validation, and test splits, and all perplexity values were computed consistently within that vocabulary space. The two architectures shared the same general decoder-only scaffold, while differing in their internal organization. The baseline followed the standard residual Transformer design, whereas the dense variant introduced concatenation-based historical connections followed by learned projection, allowing each layer to reuse information from earlier layers more directly.

The paper was guided by three main research questions

Does dense historical connectivity improve test perplexity compared with a standard Transformer decoder under controlled comparison regimes?
Which architectural factors matter most within the explored design space, including model width, feed-forward size, depth, and number of attention heads?
Do dense and baseline models generate texts with different long-range structural signatures, even when standard predictive metrics do not show a clear advantage?

The answer that emerges is both interesting and methodologically instructive. Dense connectivity does not lead to a systematic reduction in perplexity. On WikiText-2, the baseline remains stronger in both fairness regimes. On Penn Treebank, the gains of the dense model are limited and depend on the comparison setting. This matters because it shows that an architectural idea may be theoretically plausible and still fail to deliver a robust practical advantage once tested under controlled conditions.

A particularly relevant result comes from the ablation study. Within the dense family, the most reliable improvements are associated with depth and feed-forward capacity, rather than with dense connectivity alone. This suggests that much of the observed performance variation is better explained by how model capacity is allocated than by the presence of cross-layer concatenation in itself. In other words, the study helps clarify that dense connectivity interacts with more fundamental architectural factors rather than replacing them as the main driver of performance.

Another important aspect of the work lies in the decision to evaluate the models beyond perplexity. Perplexity remains the standard metric for next-token prediction, but it does not capture every relevant aspect of generated language. For this reason, the paper also includes analyses of learning dynamics, attention behavior, targeted probes, and long-form text generation. The probing tasks and attention diagnostics do not reveal a clear linguistic advantage for the dense architecture in the explored setting, although they do highlight behavioral differences between the two model families.

One of the most original contributions of the paper is the use of Zipf–RQA for analyzing generated text. This framework combines Zipf-rank encoding with Recurrence Quantification Analysis in order to study long-range structural regularities in long-form outputs. Here, the results become especially interesting. Even when perplexity does not improve, the dense and baseline models show systematic structural differences in the organization of generated text. This suggests that architectural changes may alter the global form of language generation even when they do not produce better scores on standard predictive metrics.

From a broader perspective, this is the main message of the article. Evaluating a language model through a single headline number is rarely sufficient for understanding what an architecture is actually doing. A richer methodology, one that combines predictive performance, internal diagnostics, and structural analysis of generated text, can reveal differences that would otherwise remain invisible.

Overall, this publication offers a controlled and transparent contribution to Transformer research. Rather than presenting densification as a simple improvement, it shows where its limits emerge, which design factors matter most, and why multi-faceted evaluation is necessary for understanding architectural innovation in language models. For our lab, this work reflects a broader research direction devoted to studying neural language models as complex systems, whose behavior deserves to be analyzed from several complementary viewpoints.

Please cite as:

De Santis, E., Martino, A., & Rizzi, A. (2026). Beyond Perplexity: A Multi-Faceted Analysis of a Novel Densely Connected Transformer. Applied Sciences, 16(6), 2721. https://doi.org/10.3390/app16062721

BibTeX:

@article{deSantis2026BeyondPerplexity,
author = {De Santis, Enrico and Martino, Alessio and Rizzi, Antonello},
title = {Beyond Perplexity: A Multi-Faceted Analysis of a Novel Densely Connected Transformer},
journal = {Applied Sciences},
year = {2026},
volume = {16},
number = {6},
pages = {2721},
doi = {10.3390/app16062721},
url = {https://doi.org/10.3390/app16062721}
}

Sunday, October 26, 2025

Degradation-Aware Reinforcement Learning for Smarter Energy Management at IJCCI 2025, Marbella (Spain)

Towards Sustainable and Intelligent Microgrids

At the International Joint Conference on Computational Intelligence (IJCCI 2025), held in Marbella (Spain) from 22 to 24 October, CIPARLABS presented new research on degradation-aware energy management for residential microgrids. The paper, authored by Danial Zendehdel, Gianluca Ferro, Enrico De Santis, and Antonello Rizzi, introduces a reinforcement learning framework that intelligently manages battery usage to balance economic efficiency and battery longevity.

Residential microgrids and Renewable Energy Communities are redefining the energy landscape by promoting decentralized and cooperative energy production. Within these systems, lithium-ion batteries play a crucial role in storing excess solar power and ensuring supply continuity. Yet, their performance is constrained by degradation processes that reduce both capacity and lifespan.

Traditional control strategies often neglect this degradation, focusing only on short-term cost optimization. The work presented by the CIPARLABS team moves beyond that limitation by embedding battery health awareness directly into the control policy.

Reinforcement Learning Meets Battery Physics

The study proposes a Reinforcement Learning (RL) framework based on the Soft Actor-Critic (SAC) algorithm, implemented with Stable Baselines3, to learn optimal energy dispatch policies in real time.

What sets this approach apart is the explicit integration of battery State of Health (SoH) feedback into the agent’s learning loop. The RL agent learns to maximize long-term economic rewards while implicitly minimizing degradation, guided by a simplified but empirically calibrated degradation model derived from NASA’s Li-ion cell datasets.

The microgrid environment modeled in the study includes photovoltaic generation, household consumption profiles, and time-of-use electricity tariffs. The RL agent decides, at every timestep, whether to charge or discharge the battery or exchange energy with the grid. Its performance was benchmarked against a Model Predictive Control (MPC) strategy based on Mixed-Integer Linear Programming.

Learning to Preserve Energy and Battery Life

Simulations were conducted using high-resolution load and solar data from the Pecan Street Inc. Dataport. Over both one-year and ten-year scenarios, the RL-based controller demonstrated remarkable improvements over the MPC baseline.

For a typical household, the RL agent extended battery life by up to 6.3 % compared to the MPC benchmark.
It reduced energy purchased from the grid by 45–60 %, while maintaining or improving economic performance.
Over long-term simulations, the degradation-aware SAC agent lowered total battery wear cost by 6.4 %, reflecting more efficient use of the storage system without compromising availability.

These outcomes reveal that the RL framework not only optimizes daily dispatch decisions but also learns non-linear, context-dependent policies that capture the intricate balance between short-term gain and long-term sustainability.

Implications and Future Work

The results suggest a promising pathway for deploying AI-driven, degradation-aware control systems in residential and community microgrids. Such systems could operate autonomously, adapting to changing conditions and maximizing both user savings and battery lifespan.

The research team plans to extend this work through real-world validation and multi-agent reinforcement learning experiments, where multiple prosumers within a Renewable Energy Community coordinate energy exchanges. The framework is being developed within the MOST – Sustainable Mobility Center and supported by European Union Next-GenerationEU funding under the Italian PNRR program.

Danial Zendehdel at IJCCI 2025

This work continues CIPARLABS’ mission to merge computational intelligence and sustainable energy research, paving the way for smarter, more resilient energy ecosystems.

Cite as:

@inproceedings{Zendehdel2026IJCCI,
author = {Danial Zendehdel and Gianluca Ferro and Enrico De Santis and Antonello Rizzi},
title = {Degradation-Aware Energy Management in Residential Microgrids: A Reinforcement Learning Framework},
booktitle = {Proceedings of the 17th International Joint Conference on Computational Intelligence (IJCCI 2025)},
year = {2026},
address = {Marbella, Spain},
month = {October 22--24},
publisher = {SCITEPRESS -- Science and Technology Publications},
keywords = {Reinforcement Learning, Battery Management System, Energy Management, Lithium-ion Batteries, Degradation Modeling, Microgrids},
note = {(Presented at IJCCI 2025, Marbella, Spain)},
}

Tuesday, July 8, 2025

CIPAR Labs at IJCNN 2025: Shaping the Future of AI and Complex Systems in Rome

From June 30 to July 5, 2025, our team had the honor of actively contributing to the International Joint Conference on Neural Networks (IJCNN 2025), held this year in the heart of Rome, at the prestigious Pontifical Gregorian University. With nearly 2,000 attendees from all over the world, IJCNN 2025 confirmed its role as one of the premier venues for advancing neural network theory, applications, and interdisciplinary dialogue in AI.

While the Roman summer heat proved intense, the conference was intellectually energizing. Our team from CIPAR Labs (Department of Information Engineering, Electronics and Telecommunications, Sapienza University of Rome) took part in a variety of roles: as organizers, speakers, and authors of innovative research at the intersection of AI, smart grids, and complexity science.

One of the standout moments of IJCNN 2025 was the keynote speech by Samy Bengio, Head of AI and Machine Learning at Apple Inc., and a foundational figure in deep learning research. In his talk, Bengio offered a nuanced and thought-provoking perspective on the current state and limitations of large language models (LLMs). While expressing healthy skepticism about their reasoning capabilities, he also highlighted their immense untapped potential, especially when trained on carefully structured and sufficiently rich datasets. After the keynote, we had the opportunity to engage in a stimulating conversation with him, discussing the future trajectories of neural architectures and the balance between performance and interpretability. It was a meaningful exchange that reaffirmed how even at the frontiers of AI, curiosity and critical thinking remain essential drivers of innovation.

Organizing CISEM 2025: Computational Intelligence for Sustainable Energy Management

We proudly organized the CISEM 2025 Workshop — Computational Intelligence for Sustainable Energy Management in Microgrids and Renewable Energy Communities. The workshop addressed the growing demand for intelligent, interpretable, and efficient energy systems, presenting novel techniques for:

Smart grid optimization and predictive maintenance
Time-series forecasting
Multi-agent systems in decentralized energy infrastructures
Integration of Explainable AI (XAI) and cybersecurity frameworks

The workshop attracted a multidisciplinary audience and was praised for its balance between theory and real-world applications. Contributions came from both academia and industry, reinforcing our commitment to bridging AI research and sustainable energy innovation.

Thanks to Giulia Tanoni (Università Politecnica delle Marche (UNIVPM)) for her tutorial by the title: "Unlocking the potential of industrial non-intrusive load monitoring and future opportunities".

Co-Chairing the AICS Special Session: Artificial Intelligence and Complex Systems

Another milestone was co-organizing the first edition of the AICS Special Session – Artificial Intelligence and Complex Systems. This track explored deep connections between:

Complexity theory and neural computation
Emergence, stochastic dynamics, and agent-based modeling
Philosophical, linguistic, and cognitive perspectives on AI systems
Foundations of intelligence and model explainability

The AICS session stood out for its interdisciplinary ambition, offering a space where cognitive science, philosophy, and deep learning could meet. We are excited to evolve this session into a full workshop at next year’s conference.

Tutorial: Predictive Maintenance Meets Industry 5.0

During the DLT-4-OSGE Workshop (Deep Learning Techniques for Observable Smart Grid and Sustainable Energy Systems), Prof. Enrico De Santis delivered the tutorial:

“Machine Learning Techniques for Predictive Maintenance, Energy Efficiency and Industry 5.0 Applications”

The session focused on:

Practical AI architectures for maintenance in industrial and energy systems
Use of interpretable models (like, Clustering, KANs and SHAP) for actionable insight
Applications in smart buildings, plants, and Renewable Energy Communities

The tutorial generated vibrant discussions and established the relevance of XAI in high-stakes energy contexts.

Research Papers: Seven Contributions Across Three Tracks

DLT-4-OSGE Workshop

On a Fast and Explainable REC HEMS Based on Kolmogorov-Arnold Networks
Capillo, De Santis, Rizzi
KANs optimized with Genetic Algorithms for REC energy management, delivering high accuracy and 10% faster inference with explainable output.
Decision Focused Forecasting for Smart Grid Energy Management Systems
Ferro, De Santis, Capillo, Rizzi
End-to-end LSTM forecasting via Decision Focused Learning, reducing operational costs by 11% over standard pipelines.
Multi-Objective Battery Dispatching using an Enhanced SAC Algorithm
Zendehdel, De Santis, Capillo, Odonkor, Rizzi
A Lagrangian-penalized SAC algorithm to optimize battery dispatch, improving self-sufficiency ratio (SSR) by 18.2%.

CISEM 2025 Workshop

Graph-Augmented LSTM with Weighted Loss for Enhanced Energy Forecasting
Taghdisi Rastkar, Jamili, De Santis, Rizzi
Incorporates a GAT-based feature graph and peak-aware loss for accurate, interpretable load forecasting during critical periods.
A KAN–SHAP Framework for Fault Detection and Analysis in Smart Grids
De Santis, Ferro, Rizzi
Combines Kolmogorov–Arnold Networks with SHAP to classify grid faults with ROC AUC 0.993, enabling actionable fault interpretation and predictive maintenance in the ACEA MV grid.

AICS Special Session

LSTM in Recursive Feedback Loops: A Study on Textual Evolution and Complexity
De Santis, Martino, Ronci, Rizzi
Theoretical study of semantic self-organization in recursive LSTM generations, analyzing how complexity and feedback impact text evolution.
2025: A GPT Odyssey. Deconstructing Intelligence by Gradual Dissolution of a Transformer
De Santis, Martino, Bruno, Rizzi
A HAL 9000-inspired investigation: sequential ablation of GPT-2 layers reveals semantic and structural roles in generative performance, opening paths to explainability in large language models.

Final Reflections

Beyond the papers and presentations, what made IJCNN 2025 so memorable was the spirit of collaboration: colleagues from Brazil, UAE, Spain, UK, and Italy shared not only research, but laughter, meals, and vision.

In a world often divided, the conference served as a reminder that scientific cooperation transcends borders, and that intelligence – whether artificial or human – grows best through connection.

We’re grateful to all our co-authors, collaborators, organizers, and peers who contributed to this vibrant experience.

See you next year in Maastricht at WCCI 2026!

Friday, May 9, 2025

Solar and Wind Forecasting: Rethinking the Future with a Multi-Site Mindset

A review of solar and wind energy forecasting: From single-site to multi-site paradigm

The global energy transition is no longer a distant vision — it’s unfolding now, rapidly, and with it comes a crucial question: how do we predict the unpredictable? When it comes to solar and wind energy, forecasting isn’t just a technical detail; it’s a keystone of modern energy systems. In a new paper just published in Applied Energy, Alessio Verdone, Massimo Panella, Enrico De Santis, and Antonello Rizzi from Sapienza University of Rome offer a timely and in-depth exploration of how forecasting methods have evolved to meet this challenge.

Their work is more than just a literature review. It’s a methodological study around the transformation of forecasting paradigms, tracing the field’s progress from early single-site statistical models to the latest deep learning architectures that analyze spatio-temporal data from entire networks of plants. The authors shed light on how our understanding — and our tools — have shifted alongside the growing complexity of renewable energy infrastructures.

At the heart of the paper is a simple but powerful idea: renewable energy production is no longer a local matter. Today’s systems consist of distributed solar panels and wind farms spread across vast areas. By treating each site in isolation, we miss the chance to capture valuable correlations between them. This is where multi-site forecasting comes into play, allowing models to learn not just from the past of a single plant, but from the coordinated behavior of many. And thanks to innovations in machine learning—particularly Graph Neural Networks, Transformers, and hybrid architectures — we now have the tools to make this possible.

The paper is rich with insights. It offers a structured classification of forecasting methods and benchmarks, highlights the most commonly used datasets (and the difficulty in accessing reliable public data), and discusses the metrics used to evaluate performance. But what makes this work stand out is the authors’ critical perspective. They don’t just describe methods — they ask what works, what doesn’t, and why. Their analysis of how spatial and temporal data can be integrated to boost performance speaks directly to current needs in grid management and renewable energy communities.

For researchers, engineers, and energy planners, this review is a valuable resource. It connects the dots between methodological innovation and practical application, offering a clear picture of where the field stands and where it’s heading. More importantly, it invites readers to think systemically: to see renewable energy forecasting not as a single algorithmic task, but as a complex, multi-layered problem with implications for sustainability, policy, and technology.

If you’re interested in the intersection of AI and energy, or if you’re working on Smart Grids or Renewable Energy Communities, forecasting tools, or the design of future energy systems, this is a paper worth diving into.

Read the full paper here!

Monday, March 31, 2025

SAXPY and GAXPY: The Algebra Behind Modern AI

At the Department of Information, Electronics and Telecommunications Engineering (DIET) at Sapienza University of Rome we fondly remember Prof. Elio Di Claudio, full professor of Circuit Theory and Master's degree courses, who passed away too soon. In the course "Sensor Arrays" he used to strongly advise students to study the so-called "Matrix computation", a very influential book in numerical algebra from 1983, by Gene H. Golub and Charles F. Van Loan. In the vast landscape of artificial intelligence and deep learning, it's easy to overlook the foundational algorithms that silently power even the most advanced models. But beneath every Transformer, every Large Language Model, and every GPU-accelerated training loop, there are humble, decades-old operations still doing the heavy lifting. Among these are SAXPY and GAXPY — terms that may sound obscure today for non-specialists, but which remain essential even in the age of ChatGPT and GPT-4.

"Matrix computation" remains an essential guide today in low-level routines that require advanced algebraic calculations and Machine Learning is full of it, as is generative AI. Hence, this book remains a reference point for anyone working with matrix algorithms, from theoretical researchers to engineers designing high-performance scientific computing systems.

Let's do a very brief review on SAXPY and GAXPY operations.

SAXPY stands for Scalar A times X Plus Y, and it's a simple vector update operation. Formally, it computes:

$$ y := \alpha x + y $$

Where $x$ and $y$ are vectors of the same length, and $\alpha$ is a scalar. In component form:

$$ y_i := \alpha x_i + y_i \quad \text{for all } i $$

This operation appears in Level 1 of the BLAS (Basic Linear Algebra Subprograms), and expresses one of the most frequent patterns in linear algebra: updating a vector based on another scaled vector. Here’s a basic Python implementation using NumPy:

import numpy as np

alpha = 2.0
x = np.array([1, 2, 3])
y = np.array([4, 5, 6])

y = alpha * x + y
print(y)  # Output: [6. 9. 12.]

GAXPY, on the other hand, generalizes this idea to matrices. It stands for Generalized A times X Plus Y and describes a column-oriented approach to matrix-vector multiplication. Instead of computing the dot product of each row of a matrix $A$ with a vector $x$, as in the standard GEMV approach, GAXPY computes:

$$ y := \sum_j x_j a_j $$

Where $a_j$ is the j-th column of the matrix $A$, and $x_j$ is the j-th component of the vector $x$. Each iteration is essentially a SAXPY operation. Here's a small Python example:

A = np.array([[1, 2],
              [3, 4],
              [5, 6]])

x = np.array([10, 20])
y = np.zeros(3)

for j in range(len(x)):
    y += x[j] * A[:, j]

print(y)  # Output: [ 50. 110. 170.]

Now, you might be wondering: what do these old operations have to do with modern deep learning models like Transformers or GPT? The answer is—everything.

At the heart of every neural network layer, especially in Transformers, lie massive matrix multiplications. The attention mechanism alone involves computing $QK^T$, applying softmax, and then computing the result of that with $V$. These are all dense matrix-matrix or matrix-vector multiplications. When we train models, gradients are computed via backpropagation, and the parameter updates—such as in SGD or Adam—apply operations that are, at their core, vector updates of the form:

$$ \theta := \theta - \alpha \nabla_\theta L $$

Which is essentially a SAXPY operation again. Deep learning frameworks like PyTorch, TensorFlow, and JAX don't expose SAXPY directly, but they all build on top of libraries that implement it. Under the hood, PyTorch uses cuBLAS for NVIDIA GPUs and MKL or OpenBLAS for CPUs. These libraries include high-performance versions of SAXPY, GEMV, GEMM, and related routines. These are the building blocks of every forward and backward pass in neural networks.

On GPUs, especially when training large models, these operations are optimized using techniques like kernel fusion. A single SAXPY might not be efficient on its own because it’s memory-bound, but when fused with other operations, or applied over millions of parameters in parallel, it becomes incredibly effective. Libraries like cuBLAS, XLA (in JAX), and Triton (used by some PyTorch kernels) apply massive parallelism and scheduling strategies to run these operations efficiently on thousands of GPU cores.

So even though today’s machine learning models deal with billions of parameters and require massive compute, the core operations remain surprisingly simple. The genius lies in the layering, optimization, and orchestration—not in reinventing the algebra.

To understand modern AI, it’s worth remembering that every Transformer is still built on operations described in Matrix Computations. SAXPY and GAXPY are not relics of the past; they are the silent workhorses of today’s AI revolution.

As Golub and Van Loan reminded us decades ago, understanding these basic patterns is not only useful—it's essential. Because while the models have changed, the math hasn't.

---

This post is written in memory of Prof. Elio Di Claudio for not having been able to see the wonders of algebraic and matrix techniques in LLMs and generative AI, as he passed away a few months before the fateful November 30, 2022, the launch date of ChatGPT to the general public. I am sure that he would have studied and known these architectures perfectly and would have been available, as always, to systematize his already extensive technical knowledge.

Visit ou main website