IBM

company

Verified

https://www.ibm.com/

AI & ML interests

Enterprise AI and ML, Foundation Models, Responsible AI

Recent Activity

DhavalPatel submitted a paper 2 days ago

Beyond Static Leaderboards: Predictive Validity for the Evaluation of LLM Agents

DhavalPatel submitted a paper about 1 month ago

Evaluating Temporal Semantic Caching and Workflow Optimization in Agentic Plan-Execute Pipelines

DhavalPatel submitted a paper about 1 month ago

Code-Guided Reasoning for Small Language Models: Evaluating Executable MCQA Scaffolds

View all activity

Papers

Beyond Static Leaderboards: Predictive Validity for the Evaluation of LLM Agents

Evaluating Temporal Semantic Caching and Workflow Optimization in Agentic Plan-Execute Pipelines

View all Papers

ibm 's Spaces 7

BenchBench Leaderboad

Compare benchmarks for language models

Unitxt

Risk Atlas Nexus

Evaluate AI risks with common risk taxonomies

JuStRank

Display ranked LLM judges based on performance metrics

README

Biomed-multi-alignment unified demo with PPI and TDI examples

Demo for MAMMAL approch on multiple domains

Llm Rank Themselves

Rank and compare language models using benchmarks