CIPARLABS-research

Tuesday, July 8, 2025

CIPAR Labs at IJCNN 2025: Shaping the Future of AI and Complex Systems in Rome

From June 30 to July 5, 2025, our team had the honor of actively contributing to the International Joint Conference on Neural Networks (IJCNN 2025), held this year in the heart of Rome, at the prestigious Pontifical Gregorian University. With nearly 2,000 attendees from all over the world, IJCNN 2025 confirmed its role as one of the premier venues for advancing neural network theory, applications, and interdisciplinary dialogue in AI.

While the Roman summer heat proved intense, the conference was intellectually energizing. Our team from CIPAR Labs (Department of Information Engineering, Electronics and Telecommunications, Sapienza University of Rome) took part in a variety of roles: as organizers, speakers, and authors of innovative research at the intersection of AI, smart grids, and complexity science.

One of the standout moments of IJCNN 2025 was the keynote speech by Samy Bengio, Head of AI and Machine Learning at Apple Inc., and a foundational figure in deep learning research. In his talk, Bengio offered a nuanced and thought-provoking perspective on the current state and limitations of large language models (LLMs). While expressing healthy skepticism about their reasoning capabilities, he also highlighted their immense untapped potential, especially when trained on carefully structured and sufficiently rich datasets. After the keynote, we had the opportunity to engage in a stimulating conversation with him, discussing the future trajectories of neural architectures and the balance between performance and interpretability. It was a meaningful exchange that reaffirmed how even at the frontiers of AI, curiosity and critical thinking remain essential drivers of innovation.

Organizing CISEM 2025: Computational Intelligence for Sustainable Energy Management

We proudly organized the CISEM 2025 Workshop — Computational Intelligence for Sustainable Energy Management in Microgrids and Renewable Energy Communities. The workshop addressed the growing demand for intelligent, interpretable, and efficient energy systems, presenting novel techniques for:

Smart grid optimization and predictive maintenance
Time-series forecasting
Multi-agent systems in decentralized energy infrastructures
Integration of Explainable AI (XAI) and cybersecurity frameworks

The workshop attracted a multidisciplinary audience and was praised for its balance between theory and real-world applications. Contributions came from both academia and industry, reinforcing our commitment to bridging AI research and sustainable energy innovation.

Thanks to Giulia Tanoni (Università Politecnica delle Marche (UNIVPM)) for her tutorial by the title: "Unlocking the potential of industrial non-intrusive load monitoring and future opportunities".

Co-Chairing the AICS Special Session: Artificial Intelligence and Complex Systems

Another milestone was co-organizing the first edition of the AICS Special Session – Artificial Intelligence and Complex Systems. This track explored deep connections between:

Complexity theory and neural computation
Emergence, stochastic dynamics, and agent-based modeling
Philosophical, linguistic, and cognitive perspectives on AI systems
Foundations of intelligence and model explainability

The AICS session stood out for its interdisciplinary ambition, offering a space where cognitive science, philosophy, and deep learning could meet. We are excited to evolve this session into a full workshop at next year’s conference.

Tutorial: Predictive Maintenance Meets Industry 5.0

During the DLT-4-OSGE Workshop (Deep Learning Techniques for Observable Smart Grid and Sustainable Energy Systems), Prof. Enrico De Santis delivered the tutorial:

“Machine Learning Techniques for Predictive Maintenance, Energy Efficiency and Industry 5.0 Applications”

The session focused on:

Practical AI architectures for maintenance in industrial and energy systems
Use of interpretable models (like, Clustering, KANs and SHAP) for actionable insight
Applications in smart buildings, plants, and Renewable Energy Communities

The tutorial generated vibrant discussions and established the relevance of XAI in high-stakes energy contexts.

Research Papers: Seven Contributions Across Three Tracks

DLT-4-OSGE Workshop

On a Fast and Explainable REC HEMS Based on Kolmogorov-Arnold Networks
Capillo, De Santis, Rizzi
KANs optimized with Genetic Algorithms for REC energy management, delivering high accuracy and 10% faster inference with explainable output.
Decision Focused Forecasting for Smart Grid Energy Management Systems
Ferro, De Santis, Capillo, Rizzi
End-to-end LSTM forecasting via Decision Focused Learning, reducing operational costs by 11% over standard pipelines.
Multi-Objective Battery Dispatching using an Enhanced SAC Algorithm
Zendehdel, De Santis, Capillo, Odonkor, Rizzi
A Lagrangian-penalized SAC algorithm to optimize battery dispatch, improving self-sufficiency ratio (SSR) by 18.2%.

CISEM 2025 Workshop

Graph-Augmented LSTM with Weighted Loss for Enhanced Energy Forecasting
Taghdisi Rastkar, Jamili, De Santis, Rizzi
Incorporates a GAT-based feature graph and peak-aware loss for accurate, interpretable load forecasting during critical periods.
A KAN–SHAP Framework for Fault Detection and Analysis in Smart Grids
De Santis, Ferro, Rizzi
Combines Kolmogorov–Arnold Networks with SHAP to classify grid faults with ROC AUC 0.993, enabling actionable fault interpretation and predictive maintenance in the ACEA MV grid.

AICS Special Session

LSTM in Recursive Feedback Loops: A Study on Textual Evolution and Complexity
De Santis, Martino, Ronci, Rizzi
Theoretical study of semantic self-organization in recursive LSTM generations, analyzing how complexity and feedback impact text evolution.
2025: A GPT Odyssey. Deconstructing Intelligence by Gradual Dissolution of a Transformer
De Santis, Martino, Bruno, Rizzi
A HAL 9000-inspired investigation: sequential ablation of GPT-2 layers reveals semantic and structural roles in generative performance, opening paths to explainability in large language models.

Final Reflections

Beyond the papers and presentations, what made IJCNN 2025 so memorable was the spirit of collaboration: colleagues from Brazil, UAE, Spain, UK, and Italy shared not only research, but laughter, meals, and vision.

In a world often divided, the conference served as a reminder that scientific cooperation transcends borders, and that intelligence – whether artificial or human – grows best through connection.

We’re grateful to all our co-authors, collaborators, organizers, and peers who contributed to this vibrant experience.

See you next year in Maastricht at WCCI 2026!

Friday, May 9, 2025

Solar and Wind Forecasting: Rethinking the Future with a Multi-Site Mindset

A review of solar and wind energy forecasting: From single-site to multi-site paradigm

The global energy transition is no longer a distant vision — it’s unfolding now, rapidly, and with it comes a crucial question: how do we predict the unpredictable? When it comes to solar and wind energy, forecasting isn’t just a technical detail; it’s a keystone of modern energy systems. In a new paper just published in Applied Energy, Alessio Verdone, Massimo Panella, Enrico De Santis, and Antonello Rizzi from Sapienza University of Rome offer a timely and in-depth exploration of how forecasting methods have evolved to meet this challenge.

Their work is more than just a literature review. It’s a methodological study around the transformation of forecasting paradigms, tracing the field’s progress from early single-site statistical models to the latest deep learning architectures that analyze spatio-temporal data from entire networks of plants. The authors shed light on how our understanding — and our tools — have shifted alongside the growing complexity of renewable energy infrastructures.

At the heart of the paper is a simple but powerful idea: renewable energy production is no longer a local matter. Today’s systems consist of distributed solar panels and wind farms spread across vast areas. By treating each site in isolation, we miss the chance to capture valuable correlations between them. This is where multi-site forecasting comes into play, allowing models to learn not just from the past of a single plant, but from the coordinated behavior of many. And thanks to innovations in machine learning—particularly Graph Neural Networks, Transformers, and hybrid architectures — we now have the tools to make this possible.

The paper is rich with insights. It offers a structured classification of forecasting methods and benchmarks, highlights the most commonly used datasets (and the difficulty in accessing reliable public data), and discusses the metrics used to evaluate performance. But what makes this work stand out is the authors’ critical perspective. They don’t just describe methods — they ask what works, what doesn’t, and why. Their analysis of how spatial and temporal data can be integrated to boost performance speaks directly to current needs in grid management and renewable energy communities.

For researchers, engineers, and energy planners, this review is a valuable resource. It connects the dots between methodological innovation and practical application, offering a clear picture of where the field stands and where it’s heading. More importantly, it invites readers to think systemically: to see renewable energy forecasting not as a single algorithmic task, but as a complex, multi-layered problem with implications for sustainability, policy, and technology.

If you’re interested in the intersection of AI and energy, or if you’re working on Smart Grids or Renewable Energy Communities, forecasting tools, or the design of future energy systems, this is a paper worth diving into.

Read the full paper here!

Monday, March 31, 2025

SAXPY and GAXPY: The Algebra Behind Modern AI

At the Department of Information, Electronics and Telecommunications Engineering (DIET) at Sapienza University of Rome we fondly remember Prof. Elio Di Claudio, full professor of Circuit Theory and Master's degree courses, who passed away too soon. In the course "Sensor Arrays" he used to strongly advise students to study the so-called "Matrix computation", a very influential book in numerical algebra from 1983, by Gene H. Golub and Charles F. Van Loan. In the vast landscape of artificial intelligence and deep learning, it's easy to overlook the foundational algorithms that silently power even the most advanced models. But beneath every Transformer, every Large Language Model, and every GPU-accelerated training loop, there are humble, decades-old operations still doing the heavy lifting. Among these are SAXPY and GAXPY — terms that may sound obscure today for non-specialists, but which remain essential even in the age of ChatGPT and GPT-4.

"Matrix computation" remains an essential guide today in low-level routines that require advanced algebraic calculations and Machine Learning is full of it, as is generative AI. Hence, this book remains a reference point for anyone working with matrix algorithms, from theoretical researchers to engineers designing high-performance scientific computing systems.

Let's do a very brief review on SAXPY and GAXPY operations.

SAXPY stands for Scalar A times X Plus Y, and it's a simple vector update operation. Formally, it computes:

$$ y := \alpha x + y $$

Where $x$ and $y$ are vectors of the same length, and $\alpha$ is a scalar. In component form:

$$ y_i := \alpha x_i + y_i \quad \text{for all } i $$

This operation appears in Level 1 of the BLAS (Basic Linear Algebra Subprograms), and expresses one of the most frequent patterns in linear algebra: updating a vector based on another scaled vector. Here’s a basic Python implementation using NumPy:

import numpy as np

alpha = 2.0
x = np.array([1, 2, 3])
y = np.array([4, 5, 6])

y = alpha * x + y
print(y)  # Output: [6. 9. 12.]

GAXPY, on the other hand, generalizes this idea to matrices. It stands for Generalized A times X Plus Y and describes a column-oriented approach to matrix-vector multiplication. Instead of computing the dot product of each row of a matrix $A$ with a vector $x$, as in the standard GEMV approach, GAXPY computes:

$$ y := \sum_j x_j a_j $$

Where $a_j$ is the j-th column of the matrix $A$, and $x_j$ is the j-th component of the vector $x$. Each iteration is essentially a SAXPY operation. Here's a small Python example:

A = np.array([[1, 2],
              [3, 4],
              [5, 6]])

x = np.array([10, 20])
y = np.zeros(3)

for j in range(len(x)):
    y += x[j] * A[:, j]

print(y)  # Output: [ 50. 110. 170.]

Now, you might be wondering: what do these old operations have to do with modern deep learning models like Transformers or GPT? The answer is—everything.

At the heart of every neural network layer, especially in Transformers, lie massive matrix multiplications. The attention mechanism alone involves computing $QK^T$, applying softmax, and then computing the result of that with $V$. These are all dense matrix-matrix or matrix-vector multiplications. When we train models, gradients are computed via backpropagation, and the parameter updates—such as in SGD or Adam—apply operations that are, at their core, vector updates of the form:

$$ \theta := \theta - \alpha \nabla_\theta L $$

Which is essentially a SAXPY operation again. Deep learning frameworks like PyTorch, TensorFlow, and JAX don't expose SAXPY directly, but they all build on top of libraries that implement it. Under the hood, PyTorch uses cuBLAS for NVIDIA GPUs and MKL or OpenBLAS for CPUs. These libraries include high-performance versions of SAXPY, GEMV, GEMM, and related routines. These are the building blocks of every forward and backward pass in neural networks.

On GPUs, especially when training large models, these operations are optimized using techniques like kernel fusion. A single SAXPY might not be efficient on its own because it’s memory-bound, but when fused with other operations, or applied over millions of parameters in parallel, it becomes incredibly effective. Libraries like cuBLAS, XLA (in JAX), and Triton (used by some PyTorch kernels) apply massive parallelism and scheduling strategies to run these operations efficiently on thousands of GPU cores.

So even though today’s machine learning models deal with billions of parameters and require massive compute, the core operations remain surprisingly simple. The genius lies in the layering, optimization, and orchestration—not in reinventing the algebra.

To understand modern AI, it’s worth remembering that every Transformer is still built on operations described in Matrix Computations. SAXPY and GAXPY are not relics of the past; they are the silent workhorses of today’s AI revolution.

As Golub and Van Loan reminded us decades ago, understanding these basic patterns is not only useful—it's essential. Because while the models have changed, the math hasn't.

---

This post is written in memory of Prof. Elio Di Claudio for not having been able to see the wonders of algebraic and matrix techniques in LLMs and generative AI, as he passed away a few months before the fateful November 30, 2022, the launch date of ChatGPT to the general public. I am sure that he would have studied and known these architectures perfectly and would have been available, as always, to systematize his already extensive technical knowledge.

2025 - Studies in Computational Intelligence (SCI, volume 1196, Springer) finally published

Finally the collection "Studies in Computational Intelligence (SCI, volume 1196)" has been published, concerning the book chapter extensions of our works selected by IJCCI: International Joint Conference on Computational Intelligence (2022).

Our contributions concern the context of energy sustainability, Smart Grids and Renewable Energy Communities, in particular in modeling and control techniques and energy forecasting.

Specifically the two study are the following:

1) Antonino Capillo, Enrico De Santis , Fabio Massimo Frattale Mascioli , and Antonello Rizzi, On the Performance of Multi-Objective Evolutionary Algorithms for Energy Management in Microgrids

Abstract. In the context of Energy Communities (ECs), where energy flows among PV generators, batteries and loads have to be optimally managed not to waste a single drop of energy, relying on robust optimization algorithms is mandatory. The purpose of this work is to reasonably investigate the performance of the Fuzzy Inference System-Multi-Objective-Genetic Algorithm model (MO-FIS-GA), synthesized for achieving the optimal Energy Management strat-egy for a docked e-boat. The MO-FIS-GA performance is compared to a model composed of the same FIS implementation related to the former work but opti-mized by a Differential Evolution (DE) algorithm – instead of the GA – on the same optimization problem. Since the aim is not evaluating the best-performing optimization algorithm, it is not necessary to push their capabilities to the max. Rather, a good meta-parameter combination is found for the GA and the DE such that their performance is acceptable according to the technical literature. Results show that the MO-FIS-GA performance is similar to the equivalent MO-FIS-DE model, suggesting that the former could be worth developing. Further works will focus on proposing the aforementioned comparison on different optimiza-tion problems for a wider performance evaluation, aiming at implementing the MO-FIS-GA on a wide range of real applications, not only in the nautical field.

2) Sabereh Taghdisi Rastkar , Danial Zendehdel , Antonino Capillo , Enrico De Santis, and Antonello Rizzi, Seasonality Effect Exploration for Energy Demand Forecasting in Smart Grids

Abstract. Effective energy forecasting is essential for the efficient and sustain-able management of energy resources, especially as energy demand fluctuates significantly with seasonal changes. This paper explores the impact of seasonal-ity on forecasting algorithms in the context of energy consumption within Smart Grids. Using three years of data from four different countries, the study evaluates and compares both seasonal models – such as Seasonal Autoregressive Integrated Moving Average (SARIMA), Seasonal Long Short-Term Memory (Seasonal-LSTM), and Seasonal eXtreme Gradient Boosting (Seasonal-XGBoost) – and their non-seasonal counterparts. The results demonstrate that seasonal models outperform non-seasonal ones in capturing complex consumption patterns, offer-ing improved accuracy in energy demand prediction. These findings provide valu-able insights for energy companies or in the design of intelligent Energy Manage-ment Systems, suggesting optimized strategies for resource allocation and under-scoring the importance of advanced forecasting methods in supporting sustain-able energy practices in urban environments.

BibTex book citation:

@book{back2025computational,
editor    = {Thomas B{\"a}ck and Niki van Stein and Christian Wagner and Jonathan M. Garibaldi and Francesco Marcelloni and H. K. Lam and Marie Cottrell and Faiyaz Doctor and Joaquim Filipe and Kevin Warwick and Janusz Kacprzyk},
title     = {Computational Intelligence: 14th and 15th International Joint Conference on Computational Intelligence (IJCCI 2022 and IJCCI 2023) Revised Selected Papers},
year      = {2025},
publisher = {Springer},
series    = {Studies in Computational Intelligence},
volume    = {1196},
doi       = {10.1007/978-3-031-85252-7}
}

Single chapter BibTex references:

@incollection{chapter1IJCCI2025,
author    = {Author Name},
title     = {Computational Intelligence: 14th and 15th International Joint Conference on Computational Intelligence (IJCCI 2022 and IJCCI 2023) Revised Selected Papers},
booktitle = {Computational Intelligence},
editor    = {Thomas B{\"a}ck and Niki van Stein and Christian Wagner and Jonathan M. Garibaldi and Francesco Marcelloni and H. K. Lam and Marie Cottrell and Faiyaz Doctor and Joaquim Filipe and Kevin Warwick and Janusz Kacprzyk},
publisher = {Springer},
year      = {2025},
chapter   = {1},
pages     = {1--10},
doi       = {10.1007/978-3-031-85252-7_1}
}

@incollection{rastkar2025seasonality,
author    = {Sabereh Taghdisi Rastkar and Danial Zendehdel and Antonino Capillo and Enrico De Santis and Antonello Rizzi},
title     = {Seasonality Effect Exploration for Energy Demand Forecasting in Smart Grids},
booktitle = {Computational Intelligence: 14th and 15th International Joint Conference on Computational Intelligence (IJCCI 2022 and IJCCI 2023) Revised Selected Papers},
editor    = {Thomas B{\"a}ck and Niki van Stein and Christian Wagner and Jonathan M. Garibaldi and Francesco Marcelloni and H. K. Lam and Marie Cottrell and Faiyaz Doctor and Joaquim Filipe and Kevin Warwick and Janusz Kacprzyk},
publisher = {Springer},
year      = {2025},
series    = {Studies in Computational Intelligence},
volume    = {1196},
pages     = {211--223},
doi       = {10.1007/978-3-031-85252-7_12},
url       = {https://link.springer.com/chapter/10.1007/978-3-031-85252-7_12}
}

Thursday, January 16, 2025

The Future of Lithium-Ion Battery Diagnostics: Insights from Degradation Mechanisms and Differential Curve Modeling

Featured Research paper: Degradation mechanisms and differential curve modeling for non-invasive diagnostics of lithium cells: An overview

De Santis, E., Pennazzi, V., Luzi, M., & Rizzi, A., Renewable and Sustainable Energy Reviews, Volume 211, April 2025

As the world pivots towards sustainable energy solutions, lithium-ion batteries (LIBs) have emerged as indispensable components in electric vehicles (EVs) and renewable energy systems. Their efficiency and longevity, however, are hindered by the phenomenon of battery aging — a multifaceted issue tied to the gradual decline in performance and safety. The recent paper, grounding on a project developed with Ferrari S.p.A., "Degradation mechanisms and differential curve modeling for non-invasive diagnostics of lithium cells: An overview" — published on the prestigious journal Renewable and Sustainable Energy Reviews — offers a detailed exploration of the degradation processes in LIBs, introducing innovative diagnostic methodologies and shedding light on future directions for research and industry.

Our research group at CIPARLABS is strongly committed to the development of technologies for energy sustainability. The topic of lithium-ion battery modeling is among the topics under study and development carried out by our laboratory at the "Sapienza" University of Rome, Department of Information Engineering, Electronics and Telecommunications (DIET).

The Challenge of Battery Aging

Lithium-ion batteries, the backbone of EVs, offer numerous advantages such as high energy density, lightweight construction, and zero emissions. However, they face significant challenges, particularly the progressive degradation of their components. Battery aging manifests as a decline in capacity, efficiency, and safety, influenced by factors such as temperature extremes, charging rates, and the depth of discharge (DOD). Addressing these issues is critical to optimizing battery performance and aligning with broader environmental goals like the UN Sustainable Development Goals (SDGs).

Battery degradation occurs in two primary forms:

Calendar Aging: Degradation during storage, even in the absence of active use, exacerbated by conditions like high temperature and elevated state of charge (SOC).
Cycle Aging: Degradation resulting from repetitive charging and discharging cycles.

These processes lead to two key degradation modes:

Loss of Lithium Inventory (LLI): A reduction in the cyclable lithium ions due to side reactions.
Loss of Active Materials (LAM): Structural damage or dissolution of electrode materials, impacting the battery’s ability to store and deliver energy effectively.

A Diagnostic Revolution: Differential Curve Modeling

The cornerstone of the paper is its focus on differential curve modeling—a non-invasive and powerful tool for diagnosing battery aging. Differential curves, specifically Incremental Capacity (IC) and Differential Voltage (DV) curves, are derived from charge/discharge data. These curves amplify subtle changes in battery behavior, revealing critical insights into degradation mechanisms.

Incremental Capacity (IC) Curves: By plotting the change in charge against voltage, IC curves expose phase transitions in electrode materials, which are sensitive to degradation modes.
Differential Voltage (DV) Curves: These represent voltage changes relative to charge, offering detailed insights into electrode-specific reactions and transitions.

These curves act as diagnostic fingerprints, capturing the nuanced dynamics of battery aging. For instance, shifts in IC curve peaks or DV curve valleys can be linked to specific degradation processes, enabling precise assessments of battery health.

Bridging Science and Application

The paper highlights the practical potential of differential curve analysis. In the automotive sector, this technique can be integrated into Battery Management Systems (BMS) for real-time monitoring and predictive maintenance. By identifying early signs of aging, manufacturers can optimize charging protocols, enhance safety, and extend battery lifespan. This not only reduces costs but also aligns with sustainability objectives by minimizing waste.

In the energy sector, differential curves can support the management of large-scale energy storage systems, ensuring reliability and efficiency. Policymakers, too, can leverage these insights to refine regulations and standards for EVs, accelerating the transition to sustainable transportation.

Future Directions and Innovations

While differential curve modeling offers substantial promise, challenges remain. Noise sensitivity during data processing and variability in experimental conditions necessitate standardized protocols for broader applicability. The integration of machine learning represents an exciting frontier. By training algorithms on IC/DV curve data, researchers can automate diagnostics, identify anomalous patterns, and predict battery failures with unprecedented accuracy.

The non-destructive nature of this approach makes it particularly appealing. Unlike invasive post-mortem analyses, differential curve modeling preserves battery integrity, offering a cost-effective and scalable solution for both academic and industrial applications.

Conclusion

The insights presented in the paper underscore the transformative potential of advanced diagnostic techniques for lithium-ion batteries. By unraveling the complexities of degradation mechanisms and leveraging differential curve modeling, researchers and industry leaders can pave the way for safer, more efficient, and sustainable energy storage solutions. As the global push for electrification and decarbonization accelerates, such innovations are not just timely but essential.

The road ahead is one of collaboration and innovation, bridging gaps between scientific research, industrial practices, and policy frameworks. With tools like differential curve modeling, we are better equipped to meet the challenges of the energy transition and drive a future powered by clean and reliable energy.

Please cite as:

APA format:

De Santis, E., Pennazzi, V., Luzi, M., & Rizzi, A. (2025). Degradation mechanisms and differential curve modeling for non-invasive diagnostics of lithium cells: An overview. Renewable and Sustainable Energy Reviews, 211, 115349. https://doi.org/10.1016/j.rser.2025.115349

BibTex format:

@article{ENRICO2025115349,
title = {Degradation mechanisms and differential curve modeling for non-invasive diagnostics of lithium cells: An overview},
journal = {Renewable and Sustainable Energy Reviews},
volume = {211},
pages = {115349},
year = {2025},
issn = {1364-0321},
doi = {https://doi.org/10.1016/j.rser.2025.115349},
url = {https://www.sciencedirect.com/science/article/pii/S136403212500022X},
author = {De Santis Enrico and Pennazzi Vanessa and Luzi Massimiliano and Rizzi Antonello},
keywords = {Ageing, Diagnosis, Degradation mechanisms, Degradation modes, Differential curves, Differential voltage, Lithium-ion batteries, Incremental capacity, State of health}
}

Monday, January 6, 2025

Large Concept Models: A Paradigm Shift in Language and Information Representation

Left: visualization of reasoning in an embedding space of concepts (task of summarization). Right: fundamental architecture of an Large Concept Model (LCM). ⋆: concept encoder and decoder are frozen.

Large Concept Models (LCMs) – presented by Facebook research [1] – represent a transformative leap in the field of natural language processing, aiming to address limitations in traditional Large Language Models (LLMs) by operating on a higher semantic plane. While LLMs focus on token-level processing, predicting the next word or subword in a sequence, LCMs abstract their operations to the level of "concepts." These concepts are language- and modality-agnostic, encapsulating the semantic essence of sentences or higher-order units of meaning, which positions LCMs as closer analogs to human reasoning.

Traditional LLMs have demonstrated unparalleled success across a range of tasks, from text generation to multimodal applications like image captioning or code synthesis. However, their reliance on token-level prediction imposes a fundamental limitation. Tokens capture syntactic, low-level structure but do not inherently represent the semantic relationships or higher-order dependencies humans naturally process. LCMs, by contrast, explicitly model these relationships, leveraging conceptual units that enable reasoning and planning over long sequences or across complex ideas.

In this context, LCMs align well with the broader theoretical framework of language and information representation. At the heart of this shift lies the notion of concepts as high-level semantic units. These units, often corresponding to sentences, abstract away from the specifics of language and modality, providing a universal representation of meaning. This abstraction is vital for tasks requiring multilingual and multimodal capabilities, as it decouples the reasoning process from the constraints of individual languages or data types. For example, by operating in a sentence embedding space like SONAR, which supports multiple languages and modalities, LCMs can seamlessly perform zero-shot generalization, extending their reasoning capabilities to unseen languages or domains without additional fine-tuning. This reimagining of neural networks emphasizes the hierarchical and structured nature of human cognition, where meaning is not a property of individual words but emerges from their relationships within larger contexts.

This conceptual foundation brings several advantages. First, LCMs inherently handle longer contexts more efficiently. Unlike token-based transformers, whose computational complexity scales quadratically with sequence length, LCMs process much shorter sequences of embeddings. This not only reduces resource demands but also enables more effective modeling of dependencies over extended sequences. Additionally, by decoupling reasoning from language or modality, LCMs achieve unmatched flexibility, allowing for local interactive edits, seamless multilingual integration, and the potential to extend into novel modalities like sign language. Furthermore, the explicit hierarchical structure of LCMs facilitates coherent long-form generation, making them particularly suited for tasks like summarization, story generation, or cross-lingual document alignment.

However, LCMs are not without limitations. Their reliance on pre-trained embedding spaces like SONAR introduces a dependency that may constrain performance when embeddings do not perfectly align with the model’s reasoning architecture. While the abstraction to sentence-level concepts is a significant step forward, further extending this abstraction to paragraphs or sections remains an area of ongoing research. Additionally, LCMs currently lag behind LLMs in tasks requiring fine-tuned instruction-following, such as creative writing or detailed conversational exchanges. This gap highlights the need for better integration of low-level token dynamics with high-level conceptual reasoning.

The architecture of LCMs is pivotal in achieving their ambitious goals. The process begins with an encoder, typically leveraging a pre-trained system like SONAR, which maps sentences into a fixed-dimensional semantic embedding space. These embeddings form the foundation for the LCM core – a transformer-based model that predicts the next embedding in an autoregressive manner. The core transforms and processes embeddings through layers of attention and feed-forward mechanisms, using additional components like PreNet and PostNet to normalize and denormalize embeddings to match the transformer’s internal dimensions. Variants of the core architecture explore different strategies for embedding prediction. For example, diffusion-based models introduce noise to embeddings during training and iteratively denoise them during inference, while quantized models discretize the embedding space into learned centroids for efficient prediction. Finally, the decoder converts the predicted embeddings back into text or other modalities, ensuring semantic consistency with the original input.

The ability of LCMs to generate coherent text hinges on this seamless integration of encoding, reasoning, and decoding. By operating in a structured embedding space, the model can capture the relationships between concepts and generate meaningful continuations or transformations of input sequences. The flexibility of the decoder allows for outputs in different languages or modalities, depending on the task requirements, without necessitating retraining or additional data.

Looking forward, the potential for LCMs is immense. Future improvements could involve developing embedding spaces specifically optimized for conceptual reasoning, moving beyond generic solutions like SONAR. Extending the level of abstraction to encompass paragraphs or sections could unlock new capabilities in long-form text generation and document analysis. Hybrid architectures that combine the strengths of token-level models with concept-level reasoning might also bridge the current gap in instruction-following tasks. Moreover, incorporating additional modalities, such as visual or gestural data, could further expand the applicability of LCMs to areas like education, accessibility, and human-computer interaction.

The conceptual shift represented by LCMs has profound implications for the future of AI. By modeling reasoning and meaning at a semantic level, these systems provide a path toward more human-like understanding and interaction. As research progresses, LCMs have the potential to redefine how machines represent and process information, bringing us closer to the goal of truly intelligent systems.

Large Concepts Models, therefore, are another piece to recompose the puzzle that will allow more intelligent and performing systems that can approach the so-called AGI (Artificial General Intelligence). Below is a detailed list of open research topics in the field of LLMs:

Multimodal Integration: Training models that process diverse modalities (text, images, audio, video) to develop broader contextual understanding and reasoning abilities.
Hierarchical Reasoning: Incorporating explicit reasoning structures, like Large Concept Models (LCMs), to operate at multiple abstraction levels for coherent, long-term planning.
Retrieval-Augmented Generation (RAG): Combining LLMs with external knowledge bases to enhance factual accuracy and reduce hallucinations.
Memory-Augmented Models: Developing persistent memory architectures for LLMs to store and recall knowledge dynamically across interactions.
Embodiment and Simulation: Training AI systems in physical or simulated environments to foster embodied cognition and interactive learning.
Self-Supervised Learning (SSL): Leveraging vast unlabeled data to improve representations without explicit supervision, advancing generalization and robustness.
Continual Learning: Developing mechanisms for incremental knowledge acquisition without forgetting past information (mitigating catastrophic forgetting).
Prompt Engineering and Fine-Tuning: Refining prompts or using adaptive fine-tuning strategies to align LLM outputs with specific tasks or ethical standards.
Modular Architectures: Splitting tasks among specialized sub-models that collaborate for broader, more efficient problem-solving.
Meta-Learning: Enabling models to learn how to learn, generalizing quickly to new tasks with minimal training.
Neuro-Symbolic Approaches: Combining neural networks with symbolic reasoning for interpretable and logic-driven decision-making.
Causal Reasoning: Integrating causality into LLMs to enable better reasoning and decision-making beyond statistical correlations.
Energy-Efficient Training: Investigating low-resource training methodologies, such as sparse transformers and quantized architectures, for scalability.
Evolutionary Algorithms: Applying optimization inspired by biological evolution to explore model architectures and strategies.
Large Context Windows: Extending LLMs' capacity to handle longer contexts for coherent long-form reasoning and memory.
Alignment and Alignment-Based Architectures: Ensuring LLMs align with human values and goals, incorporating reinforcement learning from human feedback (RLHF).
Distributed Multi-Agent Systems: Creating networks of interacting agents to simulate collaborative intelligence and emergent behavior.
Transformer Alternatives: Exploring architectures beyond transformers (e.g., RNN variants, Perceiver, or liquid neural networks) for flexibility and efficiency.
Sparse Models: Utilizing sparsity in parameter usage to scale model sizes without corresponding resource costs.
Open-Ended Exploration: Developing AI systems capable of self-guided exploration and intrinsic motivation to learn autonomously.
Simulated Interiority: Training LLMs to create intermediate symbolic representations that mimic introspective thought.
Ethical and Societal Alignment: Embedding ethical reasoning, bias mitigation, and societal impact analysis into AI development.

To learn more about Large Concepts Models:

Download the Paper

GitHub repository

________________________

[1] The LCM team, LargeConceptModels: LanguageModelinginaSentenceRepresentationSpace, arXiv, 2024

Tuesday, December 31, 2024

Fractal Happy New Year 2025 from CIPARLABS!

As we step into 2025, CIPARLABS reflects on a year of exceptional multidisciplinary research spanning artificial intelligence and neural networks, energy systems, healthcare, and complex systems theory. This year we wish you a happy "fractal" 2025 to underline our scientific approach to the problems we are going to solve and which concerns the science of complexity.

The synergy between AI and complexity science has unlocked innovative solutions to societal challenges. From revolutionizing energy grids to enhancing medical diagnostics, our work exemplifies how common frameworks can empower diverse fields. This post celebrates our achievements, highlighting the unity of disciplines and the endless possibilities of a collaborative future.

2024 Highlights: Research Achievements

1. Transformative Advances in Energy Management

Battery Modeling for Renewable Energy Communities: A Thevenin-based equivalent circuit model optimized energy management strategies, balancing computational efficiency and accuracy in predicting battery performance.
Energy Load Forecasting Breakthrough: Novel integration of second-derivative features into machine learning models like LSTM and XGBoost significantly improved predictions for peak energy demands, enhancing microgrid stability.
Smart Grid Fault Detection: The Bilinear Logistic Regression Model enabled interpretable AI-driven fault detection, ensuring resilient energy infrastructures.

2. Innovations in Healthcare through Explainable AI

Melanoma Diagnosis: Developed a custom CNN with feature injection, utilizing Grad-CAM, LRP, and SHAP methodologies to interpret deep learning predictions. This workflow sets a benchmark for explainability in computer-aided diagnostics.
Text Classification in Healthcare Discussions: Conducted a comparative study of traditional and transformer-based models (BERT, GPT-4) to classify Italian-language healthcare-related social media discussions, combating misinformation effectively.

3. Exploring Human vs. Machine Intelligence

Using complex systems theory and Large Language Models, we analyzed GPT-2’s language generation dynamics versus human-authored content. The study revealed distinct statistical properties, such as recurrence and multifractality, informing applications like fake news detection and authorship verification.

Future Directions: Looking Ahead to 2025

CIPARLABS aims to deepen its focus on explainable AI for critical applications in energy, healthcare, and language modeling. We are committed to expanding our interdisciplinary efforts, incorporating insights from philosophy, complex systems, and AI ethics. Future work will include:

Integrating advanced multimodal AI systems in healthcare.
Scaling energy solutions to diverse legislative frameworks worldwide.
Further bridging AI and human cognition to enhance ethical and transparent AI systems.

List of Published Papers (2024)

An Online Hierarchical Energy Management System for Renewable Energy Communities
Submitted to: IEEE Transactions on Sustainable Energy

Improving Prediction Performances by Integrating Second Derivative in Microgrids Energy Load Forecasting
Published in: IEEE IJCNN 2024, IEEE

From Bag-of-Words to Transformers: A Comparative Study for Text Classification in Healthcare Discussions in Social Media
Published in: IEEE Transactions on Emerging Topics in Computational Intelligence

An Extended Battery Equivalent Circuit Model for an Energy Community Real-Time EMS
Published in: IEEE IJCNN 2024

Modeling Failures in Smart Grids by a Bilinear Logistic Regression Approach
Published in: Neural Networks, Elsevier

An XAI Approach to Melanoma Diagnosis: Explaining the Output of Convolutional Neural Networks with Feature Injection
Published in: Information, MDPI

Human Versus Machine Intelligence: Assessing Natural Language Generation Models Through Complex Systems Theory
Published in: IEEE Transactions on Pattern Analysis and Machine Intelligence, IEEE

Many other are in process!

Visit ou main website