Published in , 2025
Publications
Measuring LLM Novelty As The Frontier Of Original And High-Quality Output
Published in preprint, 2025
…We introduce a new novelty metric for LLM generations that balances originality and quality – the harmonic mean of the fraction of \ngrams unseen during training and a task-specific quality score.
Recommended citation: V Padmakumar, C Yueh-Han, J Pan, V Chen, H He (2025). When Benchmarks Talk: Re-Evaluating Code LLMs with Interactive Feedback https://arxiv.org/abs/2504.09389
When Benchmarks Talk: Re-Evaluating Code LLMs with Interactive Feedback
Published in Findings of ACL 2025, 2025
…We introduce an interactive evaluation pipeline to examine how LLMs incorporate different types of feedback in a collaborative setting.
Recommended citation: J Pan, R Shar, J Pfau, A Talwalkar, H He, V Chen (2025). When Benchmarks Talk: Re-Evaluating Code LLMs with Interactive Feedback https://arxiv.org/abs/2502.18413
Retrieval augmented scientific claim verification
Published in JAMIA Open, 2024
…We developed CliVER, an end-to-end scientific Claim VERification system that leverages retrieval-augmented techniques to automatically retrieve relevant clinical trial abstracts, extract pertinent sentences, and use the PICO framework to support or refute a scientific claim
Recommended citation: H Liu, A Soroush, JG Nestor, E Park, B Idnay, Y Fang, J Pan, S Liao, M Bernard, Y Peng, C Weng (2024). Retrieval augmented scientific claim verification https://academic.oup.com/jamiaopen/article/7/1/ooae021/7612234
What In-Context Learning “Learns” In-Context: Disentangling Task Recognition and Task Learning
Published in Findings of ACL, 2023
…we characterize two ways through which ICL leverages demonstrations. Task recognition (TR) captures the extent to which LLMs can recognize a task through demonstrations, even without ground-truth labels, and apply their pre-trained priors, whereas task learning (TL) is the ability to capture new input-label mappings unseen in pre-training. Using a wide range of classification datasets and three LLM families (GPT-3, LLaMA and OPT), we design controlled experiments to disentangle the roles of TR and TL in ICL…
Recommended citation: Pan, J., Gao, T., Chen, H., & Chen, D. (2023). What In-Context Learning "Learns" In-Context: Disentangling Task Recognition and Task Learning. Annual Meeting of the Association for Computational Linguistics. https://arxiv.org/abs/2305.09731
Spontaneous Reward Hacking in Iterative Self-Refinement
Published in pre-print, 2023
…Using an essay editing task, we show that iterative self-refinement leads to deviation between the language model evaluator and human judgment, demonstrating that reward hacking can occur spontaneously in-context with the use of iterative self-refinement. In addition, we study conditions under which reward hacking occurs and observe two factors that affect reward hacking severity: model size and context sharing between the generator and the evaluator.
Recommended citation: Pan, J., He, H., Bowman, S., & Feng, S. (2023). Spontaneous Reward Hacking in Iterative Self-Refinement. https://arxiv.org/abs/2407.04549
