This position is part of the National Institute of Standards and Technology (NIST) Professional Research Experience Program (PREP). NIST recognizes that its research staff may want to collaborate with researchers at academic institutions on specific projects of mutual interest and, therefore, requires those institutions to be recipients of a PREP award. The PREP program involves staff from a wide range of backgrounds conducting scientific research across various fields. Individuals in this position will perform technical work supporting the collaboration's scientific research.
Research Title: Statistical Analysis and Tool Development for the NIST GenAI Evaluation Program
The work will entail: This position involves statistical analysis and analysis tool development supporting the NIST GenAI evaluation series (https://ai-challenges.nist.gov/genai) within NIST's Information Technology Laboratory. The primary project is a Testing and Evaluation (T&E) framework for generative AI watermarking, where the associate will lead the statistical analysis of evaluation results and develop the statistical engine behind NIST-CARAT (Calibrated Risk Assessment Tool for authentication technologies), an interactive tool enabling policymakers to explore the empirical performance consequences of compliance-threshold choices. This work centers on characterizing the tradeoff between content quality and watermark resilience under routine image handling, using rigorous detection-performance analysis, calibration assessment, uncertainty quantification, and Bayes-risk estimation to produce internationally defensible threshold claims.
Beyond this project, the associate will contribute statistical and analytical support across the wider NIST GenAI evaluation portfolio, which spans the evaluation of generative AI technologies across multiple modalities (text, voice, image, video, and code). This includes experimental design, metric development, analysis of evaluation outputs, and the development of reproducible analysis pipelines and interactive reporting tools. The associate will actively participate in NIST measurement science and contribute to cutting-edge research and evaluation in generative AI.
Candidates must be eligible to obtain a Department of Commerce background check for facility access.
Key responsibilities will include but are not limited to:
Analyzing generative AI evaluation results using detection and classification performance methods (ROC, partial AUC, equal error rate, Brier score) and Bayes-risk characterization
Performing uncertainty quantification and calibration assessment to support defensible threshold and compliance claims
Developing the statistical backend of NIST-CARAT, including the models mapping compliance-threshold choices to expected operational consequences
Designing and implementing reproducible analysis pipelines from raw evaluation outputs through to estimates with calibrated uncertainty
Building interactive analysis and reporting tools (e.g., dashboards, interactive plots) to communicate evaluation results to technical and policy audiences
Processing large evaluation datasets, including work in GPU-accelerated, high-performance computing environments
Exploring data through descriptive statistics and graphical displays
Contributing to internal reports, workshop white papers, and submissions to international standards bodies (e.g., ISO/IEC JTC 1/SC 42 and SC 29)
Explaining statistical and metrological concepts to non-statisticians, including policy and standards stakeholders
Qualifications
A Ph.D. in statistics, biostatistics or a closely related quantitative field
Mastery of statistical analysis methods, including experimental design, detection/classification evaluation (ROC, partial AUC, EER, Brier score), calibration, uncertainty quantification, and Bayes-risk characterization
Experience with Bayesian modeling and Monte Carlo / Markov Chain Monte Carlo methods
Proficiency in a statistical computing and scripting language (e.g., R, Python) and in shell scripting, with version-controlled, reproducible analysis workflows
Experience building interactive analysis tools and dashboards (e.g., R Shiny, interactive plotting libraries, Jupyter notebooks)
Familiarity with generative AI tools, including large language models, and with AI test and evaluation
Experience with high-performance or GPU-accelerated computing environments, and with containerization and workflow tooling (e.g., Docker, Argo Workflows) in a data-analysis context
Strong communication skills in speaking, writing, and graphical display, including the ability to explain statistical concepts to non-statisticians
The university is an Equal Employment Opportunity employer that does not unlawfully discriminate in any of its programs or activities on the basis of race, color, religion, sex, national origin, age, disability, veteran status, sexual orientation, gender identity or expression, or on any other basis prohibited by applicable law.