LLM Task Underspecification Detection
👀
9
Evaluate gendered pronoun resolution in text
Explore and calibrate model predictions to better understand probabilities
Generate text answers to various prompts
Generate code snippets in Python, Java, JavaScript