Friday, October 25, 2024

Artificial Intelligence: From Origins to Modern Breakthroughs

 


Artificial intelligence (AI) has grown at an unprecedented pace in recent years, advancing from simple automation to complex systems capable of processing language, recognizing images, and even generating human-like content. This progress has cemented AI's role in everyday life, transforming industries, reshaping economies, and challenging our ideas about intelligence and creativity. Despite the recent surge, AI as a concept is not new. The term itself, "Artificial Intelligence", was coined in 1956 during the Dartmouth Summer Research Project on Artificial Intelligence, a pivotal event that laid the groundwork for decades of inquiry into what it means to simulate human intelligence.

The Dartmouth Conference, held in the summer of 1956 at Dartmouth College, gathered some of the most innovative minds of the time: John McCarthy, Marvin Minsky, Nathaniel Rochester, and Claude Shannon. These four pioneers set forth with a radical idea: that "every aspect of learning or any other feature of intelligence can, in principle, be so precisely described that a machine can be made to simulate it". This ambitious hypothesis propelled the conference and became the foundation of AI research. Their collective goal was to investigate how machines could use language, form abstractions, improve themselves, and solve complex problems. In retrospect, this vision was not only ambitious but also prescient, as many of the themes discussed at Dartmouth remain central to AI research today.

Dartmouth Summer Research Project on Artificial Intelligence, 1956

The original document, housed in archives at Dartmouth College and Stanford University, reveals a surprising focus on neural networks — a topic that would later undergo decades of refinement and evolution. At the time, the potential of neuron-like structures in machines was seen as a way to model concepts and mimic aspects of human thought. Even today, neural networks underpin much of AI, from image recognition systems to Large Language Models (LLMs), such as GPT-4, Gemini or Claude. These models, through layers of simulated neurons, can learn to detect complex patterns in data, leading to groundbreaking applications in natural language processing, autonomous systems, and beyond. The roots of today’s neural networks can be traced back to early theories discussed at the Dartmouth conference, showing the remarkable foresight of those early researchers in understanding the potential of such architectures.

The conference discussed numerous foundational themes, many of which continue to inform AI research today. One primary focus was the idea of Automatic Computers, or the potential of computers to simulate human thought processes. While hardware limitations existed, the key challenge was not computational power but developing programs capable of intelligent reasoning. The attendees foresaw a future where software, not just hardware, would define machine intelligence—a vision realized in the sophisticated software frameworks of modern AI.

Another critical area was language processing, where participants speculated that much of human thought might be reducible to manipulating language according to reasoning rules. The idea here was that if machines could mimic the way humans generalize, infer, and reason with language, they could achieve a form of “thinking.” This concept was crucial in the development of modern natural language processing (NLP), which allows machines not only to understand but also to generate coherent text, opening new frontiers in human-machine communication.

Neuron nets, or early neural networks, were also prominently discussed. The idea of mimicking the brain’s interconnected neurons was fascinating and, despite technical limitations, the participants speculated on how machines might form concepts and learn through networked neuron structures. Today’s neural networks—layered, deep, and capable of autonomous learning — owe much to these early musings. As evidenced by today’s LLMs, neural networks have indeed become a formidable tool for learning and pattern recognition, demonstrating how the concept has evolved and scaled.

Calculation Complexity was another focus, with the attendees debating ways to measure the efficiency of computations. Recognizing that brute-force computation was impractical for complex problems, they discussed the need for efficient algorithms and theoretical measures of computational complexity. This discussion prefigured the development of complexity theory and the role it would play in optimizing machine learning algorithms, search methods, and decision-making processes.

In addition to complexity, the theme of self-improvement surfaced as an essential aspect of intelligence. The group theorized that true intelligence would likely involve self-modifying behavior, a notion that would later become key to machine learning algorithms capable of adjusting based on feedback. The concept of abstraction also intrigued the group; they suggested that machines capable of recognizing and forming abstractions could learn and adapt to new contexts, a capability we continue to develop today through methods like transfer learning.

The document also touches on randomness and creativity, suggesting that creativity might involve an element of controlled randomness, guided by intuition to produce unexpected insights. This early notion of “educated guessing” laid the groundwork for areas of AI focused on exploration, optimization, and heuristic-driven creativity.

But to truly understand AI's origins and its philosophical underpinnings, one must trace back further —much further, in fact, to ancient Greek philosophy. The concept of artificial intelligence, as the pinnacle of human technological achievement, is rooted in the Greek pursuit of knowledge and understanding. The Greek philosophers, beginning with figures like Thales, proposed that reality might be reducible to a fundamental principle or arche, marking the dawn of a rationalist, principle-driven approach to understanding the world. Later, with Plato, this intellectual journey deepened. Plato’s theory of Forms and his emphasis on abstract reasoning laid the groundwork for ideas central to AI. Just as Plato posited that abstract entities (Forms) represent ultimate reality, AI researchers, in their own way, pursue abstract representations of intelligence, perception, and understanding within machines.

In essence, artificial intelligence reflects humanity's age-old desire to understand, model, and extend the principles of thought and knowledge. It is the expression of our ability to translate abstract reasoning into tangible systems, thus bridging philosophy with engineering and science. From the early visions of the Dartmouth conference to the advanced neural architectures we use today, AI remains a testament to our intellectual curiosity and the drive to simulate — and perhaps one day fully understand — our own intelligence. As we continue this journey, AI not only reveals the potential of machines but also, fundamentally, offers us new insights into what it means to be human.

 

References

McCarthy, J., Minsky, M., Rochester, N., & Shannon, C. (1955). A Proposal for the Dartmouth Summer Research Project on Artificial Intelligence. AI Magazine, 27(4), 12-14.

 

 

Saturday, October 12, 2024

Understanding Time Complexity in Machine Learning: Training vs. Testing Phases


Machine Learning (ML) algorithms are the foundation of modern artificial intelligence applications, ranging from image recognition to predictive modeling. Whether you're building a machine learning model to recommend products or forecast energy consumption, every ML algorithm goes through two critical phases: the training phase and the testing (or inference) phase. The time it takes for an algorithm to complete these phases can vary greatly, and this is where time complexity comes into play. In this blog post, we will break down these two phases and delve into how to interpret time complexity formulas for common ML algorithms using Big "O" notation.

1. The Training Phase vs. the Testing Phase in Machine Learning

In any supervised machine learning workflow, we can identify two main phases: training and testing. These phases are distinct but complementary, each playing a vital role in building and using a predictive model.

  • Training Phase: This is where the algorithm learns from the data. During training, the model is fed data (input) along with the corresponding labels (output), and it optimizes its parameters to minimize the error between predicted and actual outputs. This phase can be computationally expensive as it requires the algorithm to process a large amount of data and adjust its internal parameters accordingly. The training complexity refers to how long it takes the algorithm to build this model.

  • Testing Phase (Inference): Once the model is trained, it is tested on new, unseen data to evaluate its performance. The goal of the testing phase is to use the trained model to make predictions. The complexity of the testing phase often determines how fast the model can provide predictions in real-time.

Understanding the computational complexity in both phases helps in optimizing the choice of algorithm depending on the problem at hand, the size of the data, and the need for real-time predictions.

2. What is Complexity? Big "O" Notation Explained

Time complexity is a way to describe how the computational resources required for an algorithm scale as the size of the input increases. The most common notation to describe time complexity is Big "O" notation. Big "O" focuses on the upper bound, describing the worst-case scenario for how an algorithm behaves as the input size increases.

  • n: Typically represents the size of the dataset, i.e., the number of training examples.
  • p: Represents the number of features (or dimensions) in each data point.
  • T: The number of trees in ensemble methods like Random Forest or Gradient Boosting.
  • l: The number of iterations (often seen in iterative algorithms like K-Means or neural networks).
  • k: The number of clusters (for K-Means) or the number of neighbors (for K-Nearest Neighbors).
  • m: The number of components (for methods like Principal Component Analysis).

For example, if an algorithm has a time complexity of O(n2)O(n^2), it means that doubling the size of the input data approximately quadruples the time it will take to run.

3. Complexity of Common Machine Learning Algorithms

Let's now explore the time complexity of several popular machine learning algorithms, both for the training and testing phases, to give you a clearer understanding of how to interpret these complexities.

Linear Regression

  • Training Time: O(np2+p3)O(np^2 + p^3)
  • Inference Time: O(p)O(p)

Explanation: During training, Linear Regression solves a system of linear equations, often by inverting a matrix. Matrix inversion contributes to the O(p3)O(p^3) complexity, while calculating the normal equations requires O(np2)O(np^2), which depends on the number of data points nn and features pp. Once the model is trained, making predictions (inference) only requires a dot product between the input features and learned weights, which has a complexity of O(p)O(p).

Logistic Regression

  • Training Time: O(np2+p3)O(np^2 + p^3)
  • Inference Time: O(p)O(p)

Explanation: Logistic Regression uses iterative methods like gradient descent or the Newton-Raphson method to optimize its cost function. The complexity is similar to Linear Regression because each iteration requires O(np2)O(np^2), and solving the normal equations can take O(p3)O(p^3). For inference, the complexity remains O(p)O(p), as predicting the class probability also involves a dot product.

Naive Bayes

  • Training Time: O(np)O(np)
  • Inference Time: O(p)O(p)

Explanation: Naive Bayes assumes conditional independence among features, making the training process very efficient. Each feature's probability is calculated individually, resulting in a linear complexity of O(np)O(np). For inference, it computes the posterior probabilities for each class, which requires iterating over all features, hence the complexity O(p)O(p).

Decision Tree

  • Training Time: O(Tnlogn)O(T \cdot n \log n) (average), O(n2)O(n^2) (worst)
  • Inference Time: O(Tlogn)O(T \cdot \log n) (average), O(n)O(n) (worst)

Explanation: Building a Decision Tree involves splitting the data recursively, where each split requires sorting the data, which has an average complexity of O(nlogn)O(n \log n). For balanced trees, this results in O(nlogn)O(n \log n) complexity, but in the worst case (unbalanced trees), the complexity can degrade to O(n2)O(n^2). During inference, making a prediction involves traversing the tree, which takes O(logn)O(\log n) for balanced trees but could be as bad as O(n)O(n) for unbalanced trees.

Random Forest

  • Training Time: O(Tnlogn)O(T \cdot n \log n)
  • Inference Time: O(Tlogn)O(T \cdot \log n)

Explanation: Random Forest is an ensemble of decision trees, and each tree is built with complexity O(nlogn)O(n \log n). Since there are T trees in the forest, the overall complexity is O(Tnlogn)O(T \cdot n \log n). Inference time is also proportional to the number of trees, as predictions need to be aggregated from each tree, giving O(Tlogn)O(T \cdot \log n).

Gradient Boosted Trees

  • Training Time: O(Tnlogn)O(T \cdot n \log n)
  • Inference Time: O(Tlogn)O(T \cdot \log n)

Explanation: Similar to Random Forest, Gradient Boosted Trees iteratively train T decision trees. However, each tree is trained on the residuals of the previous one. The complexity remains O(Tnlogn)O(T \cdot n \log n), and inference also involves traversing each of the T trees, resulting in O(Tlogn)O(T \cdot \log n).

Principal Component Analysis (PCA)

  • Training Time: O(np2+p3)O(np^2 + p^3)
  • Inference Time: O(pm)O(pm)

Explanation: PCA computes the covariance matrix, which has complexity O(np2)O(np^2), and then performs eigenvalue decomposition, which costs O(p3)O(p^3). After training, projecting a data point onto the top m principal components requires O(pm)O(pm), making inference relatively efficient.

K-Nearest Neighbors (K-NN)

  • Training Time: O(1)O(1)
  • Inference Time: O(np)O(np)

Explanation: K-NN does not have an explicit training phase; it simply stores the dataset. For inference, K-NN computes the distance between the query point and all nn data points, with each distance calculation requiring O(p)O(p) operations, leading to a total complexity of O(np)O(np).

K-Means

  • Training Time: O(lknp)O(l \cdot k \cdot n \cdot p)
  • Inference Time: O(kp)O(k \cdot p)

Explanation: K-Means is an iterative clustering algorithm where ll is the number of iterations, kk is the number of clusters, nn is the number of data points, and pp is the number of features. Each iteration involves calculating distances and updating the cluster centroids, resulting in O(lknp)O(l \cdot k \cdot n \cdot p) complexity. Inference involves assigning new points to the nearest cluster, which has complexity O(kp)O(k \cdot p).

Dense Neural Networks

  • Training Time: O(lnph)O(l \cdot n \cdot p \cdot h)
  • Inference Time: O(ph)O(p \cdot h)

Explanation: Training a dense neural network involves multiple forward and backward passes through the network, with hh representing the number of hidden units. Each pass involves calculating activations and gradients, giving O(lnph)O(l \cdot n \cdot p \cdot h) complexity. During inference, only the forward pass is required, leading to O(ph)O(p \cdot h) complexity.

 

Reading&watching

  • Andrew Ng - An Introduction to Machine Learning
  • GeeksforGeeks Team - Understanding Time Complexity in Algorithms
  • Dheeraj Singh Tomar - Big O Notation in Machine Learning
  • Marc Peter Deisenroth, A. Aldo Faisal, and Cheng Soon Ong - Mathematics for Machine Learning
  • Analytics Vidhya Team - Understanding Random Forest
  • Scikit-learn Documentation - K-Means Clustering Explained



  • Thursday, October 10, 2024

    Artificial Intelligence and Complex Systems series: Intelligence at the Edge of Chaos — Featured paper


    Authors: Shiyang Zhang, Aakash Patel, Syed A Rizvi, Nianchen Liu, Sizhuang He, Amin Karbasi, Emanuele Zappala, and David van Dijk.

    The paper "Intelligence at the Edge of Chaos" by Zhang et al. (link) explores how intelligence in artificial systems may emerge not from exposure to inherently intelligent data but through interactions with complex environments. By studying Elementary Cellular Automata (ECA), a class of rule-based systems capable of generating diverse behaviors, the authors investigate how system complexity influences the performance of large language models (LLMs) in reasoning and prediction tasks.

    The key finding of the paper is the identification of an "edge of chaos" — a critical point between complete order and randomness — where models trained on systems with moderate complexity exhibit superior intelligence. When LLMs are exposed to data generated by ECA rules that balance structure and unpredictability, the models develop more sophisticated reasoning abilities compared to those trained on simple or highly chaotic systems. This finding suggests that complex systems can foster the emergence of intelligence even when the data itself lacks inherent cognitive structure.

    The research demonstrates that complexity plays a critical role in shaping the learning process. Models trained on systems with optimal complexity learn to leverage historical information, displaying non-trivial solutions to problems. This principle could be applied across various domains in AI, highlighting the importance of complexity analysis in understanding how intelligence emerges both in machines and natural systems.

    The paper contributes to our understanding of emergent intelligence in AI, suggesting that a balanced exposure to complexity is key for developing models that can generalize, reason, and perform effectively on tasks that require adaptability. It provides a framework for further research on how computational systems can replicate cognitive-like behaviors, making it a significant work for those interested in complexity theory, cognitive science, and artificial intelligence.


    Bibliography

    1. Zhang, S., Patel, A., Rizvi, S.A., Liu, N., He, S., Karbasi, A., Zappala, E., van Dijk, D. (2024). Intelligence at the Edge of Chaos. arXiv.
    2. Langton, C.G. (1990). Computation at the edge of chaos. Physica D: Nonlinear Phenomena.

    Wednesday, October 9, 2024

    AI's Cognitive Revolution: A Nobel Prize for Deep Learning Pioneers

     


    In 2024, a profound moment in the history of artificial intelligence was marked by the awarding of the Nobel Prize in Physics to two pioneers in the field: Geoffrey Hinton and John Hopfield. Their recognition signals both the impact of their work and the broader "cognitive revolution" that AI has catalyzed in recent years. This revolution involves a paradigm shift in how we understand intelligence, cognition, and the potential of machines to replicate—or even surpass—human intellectual abilities.

     

    The Cognitive Revolution and the Nobel Prize

    The awarding of the Nobel Prize in Physics to Hinton and Hopfield represents a deep acknowledgment of how AI is transforming fields traditionally associated with human cognition. For centuries, physics was viewed as the study of the material universe. Today, with the incorporation of machine learning and neural networks into various scientific disciplines, the boundaries of physics have expanded to include the study of artificial systems that mimic cognitive processes.

    The contributions of Hinton and Hopfield have driven this revolution forward. John Hopfield, who is primarily recognized as a statistical physicist, created the Hopfield network in 1982—a system that uses physical concepts like energy minimization to model associative memory and pattern recognition. This breakthrough demonstrated that neural networks could be used to simulate cognitive functions, such as memory retrieval, and allowed physicists and AI researchers to view brain-like computation from a physical and energetic perspective.

    Meanwhile, Geoffrey Hinton, often referred to as the "Godfather of AI," is known for revolutionizing deep learning. Hinton’s work on backpropagation, a learning algorithm that enables multi-layer neural networks to adjust their internal parameters, has been one of the most influential in modern AI. His innovations, particularly in convolutional neural networks and the later development of capsule networks, have enabled machines to achieve remarkable performance in image recognition, language understanding, and many other tasks. These systems, built upon functional models of biological neural networks, now outperform classical AI systems, which relied on more rigid, rule-based algorithms.

    The recognition of AI through the lens of the Nobel Prize symbolizes its critical role in reshaping both science and society. As Geoffrey Hinton stated after winning the award, advancements in neural networks will likely have an influence on humanity comparable to the Industrial Revolution.

     

    The Careers and Contributions of Hinton and Hopfield

    The lives and careers of both Geoffrey Hinton and John Hopfield are deeply intertwined with the development of modern AI, though they approach it from distinct scientific disciplines.

    Geoffrey Hinton was born in London in 1947 into a family of intellectuals. He pursued his studies in cognitive psychology before transitioning into computer science, a journey that ultimately led him to work on artificial neural networks. Hinton’s early work on distributed representations laid the groundwork for representing concepts within neural networks as patterns of activity across many units. His development of backpropagation in the 1980s, along with David Rumelhart and Ronald Williams, allowed neural networks to be trained on large datasets by adjusting internal weights—a method that became the cornerstone of deep learning.

    One of Hinton’s most significant contributions came in the 2010s with the success of AlexNet, a deep neural network developed by his student Alex Krizhevsky, which won the 2012 ImageNet competition. AlexNet’s success demonstrated the power of deep learning to solve complex tasks in computer vision. This breakthrough, in turn, inspired a wave of research that advanced both AI technology and applications.

    Hinton’s work in deep learning goes beyond technical advances; it represents a shift from classical AI approaches based on logic and rules to systems that learn from experience. More recently, Hinton has worked on Capsule Networks and the Forward-Forward Algorithm, innovations that address some of the limitations of current deep learning models by improving how neural networks handle spatial hierarchies and learn in more biologically plausible ways.

     Backpropagation scheme

    Hinton contributed also to the development and popularization of the backpropagation algorithm, a fundamental method for training multi-layer neural networks. This algorithm, introduced in the influential 1986 paper "Learning Representations by Back-Propagating Errors" co-authored with David Rumelhart and Ronald J. Williams, revolutionized machine learning by enabling networks to adjust their weights based on the error of their predictions. This breakthrough laid the foundation for modern deep learning techniques and has been essential in advancing artificial intelligence applications.

     

    John Hopfield, born in 1933, trained as a physicist but became a leader in computational neuroscience. His 1982 paper, “Neural Networks and Physical Systems with Emergent Collective Computational Abilities,” introduced the Hopfield network, a recurrent neural network that could store memories and retrieve them based on incomplete information. His approach, rooted in statistical physics, introduced the idea that memory retrieval could be framed as an energy minimization problem, where the system seeks the state of lowest energy that corresponds to a stored memory.

    Hopfield’s contributions are essential to understanding how physical systems, like the brain, can perform complex computational tasks. His work laid the groundwork for many models of neural computation that are still in use today, influencing both AI and neuroscience.

    Example of Hopfield network

    Hopfield networks are a form of recurrent neural network designed to model associative memory. These networks consist of a set of neurons that are fully connected to each other and use binary threshold units. Hopfield's key insight was to apply concepts from statistical physics to demonstrate that the network could store and retrieve memories as stable states, known as attractors, through a process of energy minimization. When presented with partial or noisy inputs, the network can converge to a stored memory by minimizing its energy function, simulating how the brain might retrieve incomplete memories. This model provided foundational insights into both neural computation and machine learning and remains influential in studying memory, optimization problems, and collective computation


    Understanding Complex Systems

    The contributions of Geoffrey Hinton and John Hopfield have significantly advanced our understanding of complex systems by providing models that capture the emergent behavior of neural networks, which are key examples of such systems. Hopfield's work on associative memory networks, grounded in statistical physics, revealed how collective computational abilities emerge from interconnected neurons, pushing the boundaries of neuroscience and computational modeling. Similarly, Hinton’s breakthroughs in deep learning have shown how complex patterns can be learned and generalized from data, reflecting the dynamic nature of biological cognition. These models, inspired by the brain’s neural architecture, have paved the way for AI to solve complex tasks, from image recognition to decision-making.

    The multidisciplinary approaches of both scientists are crucial to their success. Hopfield’s background in physics, combined with computational and biological insights, and Hinton’s expertise in psychology, cognitive science, and computer science, exemplify the power of integrating knowledge across fields. This cross-disciplinary perspective has allowed them to bridge abstract theory with practical applications, transforming AI into a field that addresses complex, real-world problems.

     

    The Sapienza School of Neural Networks

    The history of neural networks is not only a global story but also has deep roots in specific academic institutions. For example, Sapienza University of Rome—where CIPAR LABS are locatedplayed a pioneering role in neural network research, thanks to Prof. Giuseppe Martinelli, who spearheaded work on neural networks in the 1980s. Under his guidance, the circuital approach to neural networks was continued to be developed, treating these systems not as abstract algorithms but as models of electrical circuits with nonlinear components. This approach predated the widespread use of neural networks in computer science and laid the foundation for viewing AI through the lens of electrical engineering.

    The circuital approach emphasizes the physical realizability of neural networks, focusing on how circuits can model the behavior of neurons in a way that is both efficient and scalable. In this tradition, the Department of Information Engineering, Electronics, and Telecommunications (DIET) at Sapienza continues to produce leading research in AI and neural networks. Many of the current professors in the department were trained by Martinelli, and they continue to carry forward this tradition of blending theoretical advances with practical, circuit-based models of neural computation.

    This tradition underscores the idea that before neural networks became algorithmic abstractions, they were rooted in models of real-world systems, providing a physical grounding for what has since become a dominant paradigm in computer science and AI.

     

      Fondamenti di Reti Neurali - Giuseppe Martinelli (with courtesy of the Prof. Fabio Massimo Frattale Mascioli)


    Key Papers and Video Materials of Hinton and Hopfield

    While both Hinton and Hopfield have written extensively on AI and neural networks, some of their most influential works include:

     

    Bibliography

    The Future of Lithium-Ion Battery Diagnostics: Insights from Degradation Mechanisms and Differential Curve Modeling

      Featured Research paper: Degradation mechanisms and differential curve modeling for non-invasive diagnostics of lithium cells: An overview...