Publications

What In-Context Learning “Learns” In-Context: Disentangling Task Recognition and Task Learning

Published in Findings of ACL, 2023

…we characterize two ways through which ICL leverages demonstrations. Task recognition (TR) captures the extent to which LLMs can recognize a task through demonstrations, even without ground-truth labels, and apply their pre-trained priors, whereas task learning (TL) is the ability to capture new input-label mappings unseen in pre-training. Using a wide range of classification datasets and three LLM families (GPT-3, LLaMA and OPT), we design controlled experiments to disentangle the roles of TR and TL in ICL…

Recommended citation: Pan, J., Gao, T., Chen, H., & Chen, D. (2023). What In-Context Learning "Learns" In-Context: Disentangling Task Recognition and Task Learning. Annual Meeting of the Association for Computational Linguistics. [ArXiV](https://arxiv.org/abs/2305.09731)

Spontaneous Reward Hacking in Iterative Self-Refinement

Published in pre-print, 2023

…Using an essay editing task, we show that iterative self-refinement leads to deviation between the language model evaluator and human judgment, demonstrating that reward hacking can occur spontaneously in-context with the use of iterative self-refinement. In addition, we study conditions under which reward hacking occurs and observe two factors that affect reward hacking severity: model size and context sharing between the generator and the evaluator.

Recommended citation: Pan, J., He, H., Bowman, S., & Feng, S. (2023). Spontaneous Reward Hacking in Iterative Self-Refinement. [ArXiV](https://arxiv.org/abs/2407.04549)

Jane Pan

Publications

What In-Context Learning “Learns” In-Context: Disentangling Task Recognition and Task Learning

Spontaneous Reward Hacking in Iterative Self-Refinement