Abstract
The paper explores how coaxing desired behaviors from pretrained models has shifted NLP from a scientific engineering discipline to a complex systems science. The authors emphasize the need for a systematic decomposition of language model behaviors into categories that can explain cross-task performance and guide future research.
1. The Newformer: A Thought Experiment
1.1 How should we study the Newformer?
- Introduces a fictional model called the Newformer, which outperforms existing models but has a completely different architecture.
- Emphasizes the importance of identifying high-level behaviors shared with older models to guide lower-level mechanism investigations.
1.2 Top-down behavioral taxonomy guides bottom-up mechanistic explanation
- Advocates for a hierarchical taxonomy of language model behavior to guide investigations.
- Highlights the need for behavioral categories to prevent misdirection in model interpretation.
1.3 The Transformer is the old Newformer
- Discusses the switch from Recurrent Neural Networks (RNNs) to Transformers and the ongoing challenge of explaining these models' behaviors.
- Suggests the value of black-box interpretation methods that focus on model outputs.
1.4 Are we there yet?
- Reflects on whether significant advancements have occurred in NLP, noting that the gap for training large models is widening.
- Stresses the importance of formalizing what models are doing to guide bottom-up explanations.
2. The Behavioral Bottleneck
2.1 A working definition of “behavior”
- Defines behavior in terms of a function mapping inputs to features, aiming to explain model tendencies with simple rules.
- Introduces the concept of mutual information to measure how much a behavior reveals about model outputs.
2.2 Can benchmarks discover new behavior?
- Discusses the limitations of benchmarks in discovering new behaviors, as benchmarks often test known capabilities rather than revealing new ones.
2.3 Can benchmarks characterize behavior?
- Highlights that benchmarks do not fully describe human behavior and are insufficient to characterize model behavior.
- Suggests that benchmarks indicate contexts to examine rather than precisely what models do.
2.4 Behaviors: building blocks for evaluation
- Proposes focusing on behavior to understand models better and guide evaluations.
- Suggests that behaviors can serve as building blocks for creating new benchmarks.
3. Generative Models as a Complex Systems Science
3.1 What is a complex system?
- Defines complex systems as those with interacting parts that exhibit emergent behaviors.
- Notes the increasing interest in emergent behaviors in NLP, typically defined in terms of scaling.
3.2 Emergent behaviors in LMs are discovered, not designed
- Illustrates that behaviors in language models often emerge rather than being explicitly designed.
- Uses the example of copying behavior in Transformers to show how behaviors can emerge from model architectures.
3.3 Neuronal explanations are limited by our understanding of behavior
- Emphasizes the difficulty of explaining model outputs without a good description of behaviors.
- Compares the study of LMs to organic chemistry without biology, highlighting the need for a behavioral taxonomy.
3.4 Access is not a silver bullet
- Argues that access to pretrained models alone is insufficient to understand them.
- Stresses the need for explanations that account for models' broad capabilities and errors.
3.5 (Generated) data represents behavior
- Explains that model behavior can be characterized by the distribution of data it generates.
- Visualizes different behavioral mappings between training data and inference data.
4. A Different Kind of Complex System
4.1 Two kinds of complex systems simulations
- Differentiates between simplified mathematical models and comprehensive simulations.
- Positions generative models as a distinct type of complex system.
4.2 Generative Models: the easiest complex system to study
- Lists advantages of studying generative models, such as perfect fidelity state encoding, complete theory of low-level dynamics, exact repeatability, ease of perturbation, and absence of observer effects.
- Highlights that generative models produce human-understandable media, facilitating the study of patterns.
4.3 The necessity of open-source models
- Emphasizes the importance of open-source models for reproducible science.
- Argues that proprietary APIs hinder the ability to fully study and understand models.
5. Conclusion
- Calls for a shift in focus towards understanding what models are doing.
- Advocates for a behavioral approach to studying generative models to form the basis for deeper understanding and future evaluations.
- Emphasizes the importance of open-source models in enabling this research.
6. Limitations
- Acknowledges that the perspective presented is one of many and does not capture the totality of viewpoints.
- Notes the exclusion of a comprehensive survey of related issues and the subjective nature of some assessments.
Final Resume and Main Considerations
The paper by Holtzman et al. argues that NLP has shifted from an engineering discipline to a complex systems science due to the emergent behaviors of large language models. The authors propose a systematic effort to categorize and understand these behaviors to guide future research and improve model interpretability. They highlight the limitations of current benchmarks and the necessity of focusing on behavioral explanations. The paper underscores the advantages of studying generative models, such as their simulability and human-understandable outputs, but stresses the need for open-source models to enable reproducible science. Ultimately, the authors call for a focus on understanding what models do as a foundation for further research into how and why they exhibit certain behaviors.
Paper source: https://arxiv.org/abs/2308.00189
No comments:
Post a Comment