PREP Research Associate - Reliability of Human and LLM Annotations for AI Risk Assessment

PREP0004176

February 18, 2026

This position is part of the National Institute of Standards and Technology (NIST) Professional Research Experience Program (PREP). NIST recognizes that its research staff may want to collaborate with researchers at academic institutions on specific projects of mutual interest and, therefore, requires those institutions to be recipients of a PREP award. The PREP program involves staff from a wide range of backgrounds conducting scientific research across various fields. Individuals in this position will perform technical work supporting the collaboration's scientific research.

Research Title:

Reliability of Human and LLM Annotations for AI Risk Assessment

The work will entail:

This project focuses on using Large Language Models (LLMs) to provide annotations of evaluation data (a.k.a., LLM as judge), and the design of an Inter-Annotator Agreement study to assess the reliability of both human and LLM annotations. The candidate will explore assessing the indicators of a given AI-related risk, determining how to identify them, and providing annotators with examples to annotate the presence of various risks. The project aims to develop an annotation framework for AI risk assessment and establish metrics for data quality in AI risk research, supporting broader work at NIST in assessing and measuring the validity and reliability of AI-related risks in data annotation.

Candidates must be eligible to obtain a Department of Commerce background check for facility access.

Key responsibilities will include but are not limited to:

Gain familiarity with existing literature on data annotation and LLM as judge
Understand NIST’s role and ongoing efforts in assessing and measuring the validity and reliability of AI-related risks in data annotation
Contribute to developing an annotation framework for AI risk assessment
Collaborate effectively with cross-functional and interdisciplinary stakeholders to ensure successful project outcomes

Deliverables

Contributions to a NIST report that supports ongoing NIST AI evaluation efforts focused on the design of an Inter-Annotator Agreement to assess the reliability of both human and LLM annotations.

Qualifications

Background in Computer Science, Data Science, or related field.
Education level: Bachelor’s or Graduate Degree
Strong interest in data annotation and AI risks
Familiarity with scientific reading and technical writing

Apply Here

The university is an Equal Employment Opportunity employer that does not unlawfully discriminate in any of its programs or activities on the basis of race, color, religion, sex, national origin, age, disability, veteran status, sexual orientation, gender identity or expression, or on any other basis prohibited by applicable law.