evals

evaluation of LLMs on a type 1 diabetes care benchmark

Note: t1dbench is still under development. The official release will come at a later date and will include a benchmark suite for evaluating language models on their ability to assist people living with diabetes.
I am seeking qualified medical professionals (i.e. endocrynologists) and ai researchers who are interested in contributing to the evaluation benchmark. Please reach out to me at [email protected] if you are interested in contributing to the llm benchmark project.