Trustworthiness and fact storage in GPT
Detailed analysis of the Factual representation in GPT architecture
PSV
8/29/20241 min read


The challenge of the big language model is to be trustworthy.
Take a look at this paper, which evaluates the factual representation of data in GPT:
Locating and Editing Factual Associations in GPT
Kevin Meng∗ MIT CSAIL, David Bau∗ Northeastern University, Alex Andonian MIT CSAIL, Yonatan Belinkov† Technion – IIT
Abstract
We analyze the storage and recall of factual associations in autoregressive transformer language models, finding evidence that these associations correspond to localized, directly-editable computations. We first develop a causal intervention for identifying neuron activations that are decisive in a model’s factual predictions. This reveals a distinct set of steps in middle-layer feed-forward modules that mediate factual predictions while processing subject tokens. To test our hypothesis that these computations correspond to factual association recall, we modify feedforward weights to update specific factual associations using Rank-One Model Editing (ROME). We find that ROME is effective on a standard zero-shot relation extraction (zsRE) model-editing task. We also evaluate ROME on a new dataset of difficult counterfactual assertions, on which it simultaneously maintains both specificity and generalization, whereas other methods sacrifice one or another. Our results confirm an important role for mid-layer feed-forward modules in storing factual associations and suggest that direct manipulation of computational mechanisms may be a feasible approach for model editing. The code, dataset, visualizations, and an interactive demo notebook are available at https://rome.baulab.info/.
AI Development Process Consulting
Achieve product greatness with process greateness
© 2024. All rights reserved.
Petr Švimberský
Bítovská 1219/26
Praha 4, 140 00, Czech Republic
IČ: 71559124
DIČ: CZ8206190564
+420 731 512 401
info@petrsvimbersky.consulting