Toward Transparent and Aligned Language Models, Andreas Weller, Sara Hooker, Mark M. D. E. K. M. Neerincx, Jessica Clark, 2023Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 37 (AAAI Press)DOI: 10.1609/aaai.v37i13.26463 - This survey paper reviews the progress and challenges in achieving transparency and alignment in LLMs, highlighting the role of interpretability in understanding and controlling model behavior for safety.