publications

publications by categories in reversed chronological order. generated by jekyll-scholar.

2025

  1. ACL
    pos.png
    Position-aware Automatic Circuit Discovery
    Tal Haklay, Hadas Orgad, David Bau, and 2 more authors
    CoRR, 2025
  2. ICML
    mib.png
    MIB: A Mechanistic Interpretability Benchmark
    Aaron Mueller, Atticus Geiger, Sarah Wiegreffe, and 20 more authors
    CoRR, 2025

2024

  1. ICLR
    ft.png
    Fine-Tuning Enhances Existing Mechanisms: A Case Study on Entity Tracking
    Nikhil Prakash, Tamar Rott Shaham, Tal Haklay, and 2 more authors
    In The Twelfth International Conference on Learning Representations, ICLR 2024, Vienna, Austria, May 7-11, 2024, 2024
  2. ICLR - SPOTLIGHT
    linear.png
    Linearity of Relation Decoding in Transformer Language Models
    Evan Hernandez, Arnab Sen Sharma, Tal Haklay, and 5 more authors
    In The Twelfth International Conference on Learning Representations, ICLR 2024, Vienna, Austria, May 7-11, 2024, 2024
    spotlight paper