Withdrawn Draft
1
2
3
Warning Notice
The attached draft document has been withdrawn, and is provided solely for historical purposes.
It has been superseded by the document identified below.
Withdrawal Date
July 26, 2024
Original Release Date
April 29, 2024
4
5
6
Superseding Document
Status
Final
Series/Number
NIST AI 600-1
Title
Artificial Intelligence Risk Management Framework: Generative
Artificial Intelligence Profile
Publication Date
July 2024
DOI
https://doi.org/10.6028/NIST.AI.600-1
7
8
9
10
ii
NIST AI 600-1
1
Inial Public Dra
2
3
Artificial Intelligence Risk
4
Management Framework:
5
Generative Artificial Intelligence
6
Profile
7
8
9
10
This publication is available free of charge from:
11
[DOI link TK]
12
13
April 2024
14
15
16
17
18
19
20
21
22
23
iii
NIST AI 600-1
1
Inial Public Dra
2
3
Arcial Intelligence Risk Management
4
Framework: Generave Arcial
5
Intelligence Prole
6
7
8
This publicaon is available free of charge from:
9
[DOI link TK]
10
11
12
13
14
15
16
17
18
19
20
21
iv
NIST makes the following notes regarding this document:
NIST plans to host this document on the NIST AIRC once nal, where organizaons can query
acons based on keywords and risks.
NIST specically welcomes feedback on the following topics:
Glossary Terms: NIST will add a glossary to this document with novel keywords. NIST
welcomes idencaon of terms to include in the glossary.
Risk List: Whether the document should further sort or categorize the 12 risks identified (i.e.,
between technical / model risks, misuse by humans, or ecosystem / societal risks).
Acons: Whether certain acons could be combined, condensed, or further categorized; and
feedback on the risks associated with certain acons.
Comments on NIST AI 600-1 may be sent electronically to NIST-AI-600[email protected] with “NIST AI 600-1”
in the subject line or submied via www.regulaons.gov (enter NIST-2024-0001 in the search eld.)
Comments containing informaon in response to this noce must be received on or before June 2,
2024, at 11:59 PM Eastern Time.
1
v
Table of Contents
1
1. Introduction ........................................................................................................................................................ 1
2
3
2. Overview of Risks Unique to or Exacerbated by GAI ............................................................................ 2
4
5
6
7
8
9
10
11
12
13
14
15
16
3. Actions to Manage GAI Risks ...................................................................................................................... 10
17
Appendix A. Primary GAI Consideraons ........................................................................................................ 62
18
Appendix B. References .................................................................................................................................... 69
19
20
21
Acknowledgments: This report was accomplished with the many helpful comments and contribuons
22
from the community, including the NIST Generave AI Public Working Group, and NIST sta and guest
23
researchers: Chloe Auo, Patrick Hall, Shomik Jain, Reva Schwartz, Marn Stanley, Kamie Roberts, and
24
Elham Tabassi.
25
Disclaimer: Certain commercial enes, equipment, or materials may be idened in this document in
26
order to adequately describe an experimental procedure or concept. Such idencaon is not intended to
27
imply recommendaon or endorsement by the Naonal Instute of Standards and Technology, nor is it
28
intended to imply that the enes, materials, or equipment are necessarily the best available for the
29
purpose. Any menon of commercial, non-prot, academic partners, or their products, or references is
30
for informaon only; it is not intended to imply endorsement or recommendaon by any U.S.
31
Government agency.
32
1
1. Introducon
1
This document is a companion resource for Generave AI
1
to the AI Risk Management Framework (AI
2
RMF), pursuant to President Biden’s Execuve Order (EO) 14110 on Safe, Secure, and Trustworthy
3
Arcial Intelligence.
2
The AI RMF was released in January 2023, and is intended for voluntary use and to
4
improve the ability of organizaons to incorporate trustworthiness consideraons into the design,
5
development, use, and evaluaon of AI products, services, and systems.
6
This companion resource also serves as both a use-case and cross-sectoral prole of the AI RMF 1.0.
7
Such proles assist organizaons in deciding how they might best manage AI risk in a manner that is
8
well-aligned with their goals, considers legal/regulatory requirements and best pracces, and reects
9
risk management priories.
10
Use-case proles are implementaons of the AI RMF funcons, categories, and subcategories for a
11
specic seng or applicaon – in this case, Generave AI (GAI) – based on the requirements, risk
12
tolerance, and resources of the Framework user. Consistent with other AI RMF Proles, this prole oers
13
insights into how risk can be managed across various stages of the AI lifecycle and for GAI as a
14
technology.
15
As GAI covers risks of models or applicaons that can be used across use cases or sectors, this document
16
is also an AI RMF cross-sectoral prole. Cross-sectoral proles can be used to govern, map, measure, and
17
manage risks associated with acvies or business processes common across sectors such as the use of
18
large language models, cloud-based services, or acquision.
19
This work was informed by public feedback and consultaons with diverse stakeholder groups as part of
20
NIST’s Generave AI Public Working Group (GAI PWG). The GAI PWG was a consensus-driven, open,
21
transparent, and collaborave process facilitated via virtual workspace to obtain mulstakeholder input
22
and insight on GAI risk management, and inform NISTs approach. This document was also informed by
23
public comments and consultaons as a result of a Request for Informaon (RFI) and presents
24
informaon in a style adapted from the NIST AI RMF Playbook.
25
About this Prole
26
This prole denes a group of risks that are novel to or exacerbated by the use of GAI. These risks were
27
likewise idened by the GAI PWG:
28
1. CBRN Informaon
29
1
Generave AI can be dened by EO 14110 as “the class of AI models that emulate the structure and
characteriscs of input data in order to generate derived synthec content. This can include images, videos, audio,
text, and other digital content.” While not all GAI is based in foundaon models, for purposes of this document,
GAI generally refers to generave dual-use foundaon models, dened by EO 14110 as “an AI model that is trained
on broad data; generally uses self-supervision; contains at least tens of billions of parameters; is applicable across
a wide range of contexts.
2
Secon 4.1(a)(i)(A) of EO 14110 directs the Secretary of Commerce, acng through the Director of the Naonal
Instute of Standards and Technology (NIST), to develop a companion resource to the AI RMF, NIST AI 100–1, for
generave AI.
2
2. Confabulaon
1
3. Dangerous or Violent Recommendaons
2
4. Data Privacy
3
5. Environmental
4
6. Human-AI Conguraon
5
7. Informaon Integrity
6
8. Informaon Security
7
9. Intellectual Property
8
10. Obscene, Degrading, and/or Abusive Content
9
11. Toxicity, Bias, and Homogenizaon
10
12. Value Chain and Component Integraon
11
12
Aer introducing and describing these risks, the document provides a set of acons to help organizaons
13
govern, map, measure, and manage these risks.
14
2. Overview of Risks Unique to or Exacerbated by GAI
15
AI risks can dier from or intensify tradional soware risks. Likewise, GAI can exacerbate exisng AI
16
risks, and creates unique risks.
17
GAI risks may arise across the enre AI lifecycle, from problem formulaon, to development and
18
decommission. They may present at system level or at the ecosystem level – outside of system or
19
organizaonal contexts (e.g., the eect of disinformaon on social instuons, GAI impacts on the
20
creave economies or labor markets, algorithmic monocultures). They may occur abruptly or unfold
21
across extended periods (e.g., societal or economic impacts due to loss of individual agency or increasing
22
inequality).
23
Organizaons may choose to measure these risks and allocate risk management resources relave to
24
where and how these risks manifest, their direct and material impacts, and failure modes. Migaons for
25
system level risks may vary from ecosystem level risks. The ongoing review of relevant literature and
26
resources can enable documentaon and measurement of ecosystem-level or longitudinal risks.
27
Importantly, some GAI risks are unknown, and are therefore dicult to properly scope or evaluate given
28
the uncertainty about potenal GAI scale, complexity, and capabilies. Other risks may be known but
29
dicult to esmate given the wide range of GAI stakeholders, uses, inputs, and outputs. Challenges with
30
risk esmaon are aggravated by a lack of visibility into GAI training data, and the generally immature
31
state of the science of AI measurement and safety today.
32
To guide organizaons in idenfying and managing GAI risks, a set of risks unique to or exacerbated by
33
GAI are dened below. These risks provide a clear lens through which organizaons can frame and
34
execute risk management eorts, and will be updated as the GAI landscape evolves.
35
1. CBRN Informaon: Lowered barriers to entry or eased access to materially nefarious
36
informaon related to chemical, biological, radiological, or nuclear (CBRN) weapons, or other
37
dangerous biological materials.
38
3
2. Confabulaon: The producon of condently stated but erroneous or false content (known
1
colloquially as “hallucinaons” or “fabricaons”).
3
2
3. Dangerous or Violent Recommendaons: Eased producon of and access to violent, incing,
3
radicalizing, or threatening content as well as recommendaons to carry out self-harm or
4
conduct criminal or otherwise illegal acvies.
5
4. Data Privacy: Leakage and unauthorized disclosure or de-anonymizaon of biometric, health,
6
locaon, personally idenable, or other sensive data.
7
5. Environmental: Impacts due to high resource ulizaon in training GAI models, and related
8
outcomes that may result in damage to ecosystems.
9
6. Human-AI Conguraon: Arrangement or interacon of humans and AI systems which can result
10
in algorithmic aversion, automaon bias or over-reliance, misalignment or mis-specicaon of
11
goals and/or desired outcomes, decepve or obfuscang behaviors by AI systems based on
12
programming or ancipated human validaon, anthropomorphizaon, or emoonal
13
entanglement between humans and GAI systems; or abuse, misuse, and unsafe repurposing by
14
humans.
15
7. Informaon Integrity: Lowered barrier to entry to generate and support the exchange and
16
consumpon of content which may not be veed, may not disnguish fact from opinion or
17
acknowledge uncertaines, or could be leveraged for large-scale dis- and mis-informaon
18
campaigns.
19
8. Informaon Security: Lowered barriers for oensive cyber capabilies, including ease of security
20
aacks, hacking, malware, phishing, and oensive cyber operaons through accelerated
21
automated discovery and exploitaon of vulnerabilies; increased available aack surface for
22
targeted cyber aacks, which may compromise the condenality and integrity of model
23
weights, code, training data, and outputs.
24
9. Intellectual Property: Eased producon of alleged copyrighted, trademarked, or licensed
25
content used without authorizaon and/or in an infringing manner; eased exposure to trade
26
secrets; or plagiarism or replicaon with related economic or ethical impacts.
27
10. Obscene, Degrading, and/or Abusive Content: Eased producon of and access to obscene,
28
degrading, and/or abusive imagery, including synthec child sexual abuse material (CSAM), and
29
nonconsensual inmate images (NCII) of adults.
30
11. Toxicity, Bias, and Homogenizaon: Diculty controlling public exposure to toxic or hate
31
speech, disparaging or stereotyping content; reduced performance for certain sub-groups or
32
languages other than English due to non-representave inputs; undesired homogeneity in data
33
inputs and outputs resulng in degraded quality of outputs.
34
12. Value Chain and Component Integraon: Non-transparent or untraceable integraon of
35
upstream third-party components, including data that has been improperly obtained or not
36
3
We note that the terms “hallucinaon” and “fabricaon” can anthropomorphize GAI, which itself is a risk related
to GAI systems as it can inappropriately aribute human characteriscs to non-human enes.
4
cleaned due to increased automaon from GAI; improper supplier veng across the AI lifecycle;
1
or other issues that diminish transparency or accountability for downstream users.
2
CBRN Informaon
3
In the coming years, GAI may increasingly facilitate eased access to informaon related to CBRN hazards.
4
CBRN informaon is already publicly accessible, but the use of chatbots could facilitate its analysis or
5
synthesis for non-experts. For example, red teamers were able to prompt GPT-4 to provide general
6
informaon on unconvenonal CBRN weapons, including common proliferaon pathways, potenally
7
vulnerable targets, and informaon on exisng biochemical compounds, in addion to equipment and
8
companies that could build a weapon. These capabilies might increase the ease of research for
9
adversarial users and be especially useful to malicious actors looking to cause biological harms without
10
formal scienc training. However, despite these enhanced capabilies, the physical synthesis and
11
successful use of chemical or biological agents will connue to require both applicable experse and
12
supporng infrastructure.
13
Other research on this topic indicates that the current generaon of LLMs do not have the capability to
14
plan a biological weapons aack: LLM outputs regarding biological aack planning were observed to be
15
not more sophiscated than outputs from tradional search engine queries, suggesng that exisng
16
LLMs may not dramacally increase the operaonal risk of such an aack.
17
Separately, chemical and biological design tools – highly specialized AI systems trained on biological data
18
which can help design proteins or other agents – may be able to predict and generate novel structures
19
that are not in the training data of text-based LLMs. For instance, an AI system might be able to generate
20
informaon or infer how to create novel biohazards or chemical weapons, posing risks to society or
21
naonal security since such informaon is not likely to be publicly available.
22
While some of these capabilies lie beyond the capability of exisng GAI tools, the ability of models to
23
facilitate CBRN weapons planning and GAI systems’ connecon or access to relevant data and tools
24
should be carefully monitored.
25
Confabulaon
26
“Confabulaon” refers to a phenomenon in which GAI systems generate and condently present
27
erroneous or false content to meet the programmed objecve of fullling a users prompt.
28
Confabulaons are not an inherent flaw of language models themselves, but are instead the result of
29
GAI pre-training involving next word predicon. For example, an LLM may generate content that deviates
30
from the truth or facts, such as mistaking people, places, or other details of historical events. Legal
31
confabulaons have been shown to be pervasive in current state-of-the-art LLMs. Confabulaons also
32
include generated outputs that diverge from the source input, or contradict previously generated
33
statements in the same context. This phenomenon is also referred to as “hallucinaon” or “fabricaon,
34
but some have noted that these characterizaons imply consciousness and intenonal deceit, and
35
thereby inappropriately anthropomorphize GAI.
36
Risks from confabulaons may arise when users believe false content due to the condent nature of the
37
response, or the logic or citaons accompanying the response, leading users to act upon or promote the
38
false informaon. For instance, LLMs may somemes provide logical steps of how they arrived at an
39
5
answer even when the answer itself is incorrect. This poses a risk for many real-world applicaons, such
1
as in healthcare, where a confabulated summary of paent informaon reports could cause doctors to
2
make incorrect diagnoses and/or recommend the wrong treatments. While the research above indicates
3
confabulated content is abundant, it is dicult to esmate the downstream scale and impact of
4
confabulated content today.
5
Dangerous or Violent Recommendaons
6
GAI systems can produce output or recommendaons that are incing, radicalizing, threatening, or that
7
glorify violence. LLMs have been reported to generate dangerous or violent content, and some models
8
have even generated aconable instrucons on dangerous or unethical behavior, including how to
9
manipulate people and conduct acts of terrorism. Text-to-image models also make it easy to create
10
unsafe images that could be used to promote dangerous or violent messages, depict manipulated
11
scenes, or other harmful content. Similar risks are present for other media, including video and audio.
12
GAI may produce content that recommends self-harm or criminal/illegal acvies. For some dangerous
13
queries, many current systems restrict model outputs in response to certain prompts, but this approach
14
may sll produce harmful recommendaons in response to other less-explicit, novel queries, or
15
jailbreaking (i.e., manipulang prompts to circumvent output controls). Studies have observed that a
16
non-negligible number of user conversaons with chatbots reveal mental health issues among the users
17
– and that current systems are unequipped or unable to respond appropriately or direct these users to
18
the help they may need.
19
Data Privacy
20
GAI systems implicate numerous risks to privacy. Models may leak, generate, or correctly infer sensive
21
informaon about individuals such as biometric, health, locaon, or other personally idenable
22
informaon (PII). For example, during adversarial aacks, LLMs have revealed private or sensive
23
informaon (from in the public domain) that was included in their training data. This informaon
24
included phone numbers, code, conversaons and 128-bit universally unique ideners extracted
25
verbam from just one document in the training data. This problem has been referred to as data
26
memorizaon.
27
GAI system training requires large volumes of data, oen collected from millions of publicly available
28
sources. When involving personal data, this pracce raises risks to widely accepted privacy principles,
29
including to transparency, individual parcipaon (including consent), and purpose specicaon. Most
30
model developers do not disclose specic data sources (if any) on which models were trained. Unless
31
training data is available for inspecon, there is generally no way for consumers to know what kind of PII
32
or other sensive material may have been used to train GAI models. These pracces also pose risks to
33
compliance with exisng privacy regulaons.
34
GAI models may be able to correctly infer PII that was not in their training data nor disclosed by the user,
35
by stching together informaon from a variety of disparate sources. This might include automacally
36
inferring aributes about individuals, including those the individual might consider sensive (like
37
locaon, gender, age, or polical leanings).
38
6
Wrong and inappropriate inferences of PII based on available data can contribute to harmful bias and
1
discriminaon. For example, GAI models can output informaon based on predicve inferences beyond
2
what users openly disclose, and these insights might be used by the model, other systems, or individuals
3
to undermine privacy or make adverse decisions – including discriminatory decisions – about the
4
individual. These types of harms already occur in non-generave algorithmic systems that make
5
predicve inferences, such as the example in which online adversers inferred that a consumer was
6
pregnant before her own family members knew. Based on their access to many data sources, GAI
7
systems might further improve the accuracy of inferences on private data, increasing the likelihood of
8
sensive data exposure or harm. Inferences about private informaon pose a risk even if they are not
9
accurate (e.g., confabulaons), especially if they reveal informaon the individual considers sensive or
10
are used to disadvantage or harm them.
11
Environmental
12
The training, maintenance, and deployment (inference) of GAI systems are resource intensive, with
13
potenally large energy and environmental footprints. Energy and carbon emissions vary based on types
14
of GAI model development acvies (i.e., pre-training, ne-tuning, inference), modality, hardware used,
15
and type of task or applicaon.
16
Esmates suggest that training a single GAI transformer model can emit as much carbon as 300 round-
17
trip ights between San Francisco and New York. In a study comparing energy consumpon and carbon
18
emissions for LLM inference, generave tasks (i.e., text summarizaon) were found to be more energy
19
and carbon intensive then discriminave or non-generave tasks.
20
Methods for training smaller models, such as model disllaon or compression, can reduce
21
environmental impacts at inference me, but may sll contribute to large environmental impacts for
22
hyperparameter tuning and training.
23
Human-AI Conguraon
24
Human-AI conguraons involve varying levels of automaon and human-AI interacons. Each setup can
25
contribute to risks for abuse, misuse, and unsafe repurposing by humans, and it is dicult to esmate
26
the scale of those risks. While AI systems can generate decisions independently, human experts oen
27
work in collaboraon with most AI systems to drive their own decision-making tasks or complete other
28
objecves. Humans bring their domain-specic experse to these scenarios but may not necessarily
29
have detailed knowledge of AI systems and how they work.
30
The integraon of GAI systems can involve varying risks of misconguraons and poor interacons.
31
Human experts may be biased against or “averseto AI-generated outputs, such as in their percepons
32
of the quality of generated content. In contrast, due to the complexity and increasing reliability of GAI
33
technology, other human experts may become condioned to and overly rely upon GAI systems. This
34
phenomenon is known as “automaon bias,” which refers to excessive deference to AI systems.
35
Accidental misalignment or mis-specicaon of system goals or rewards by developers or users can
36
cause a model not to operate as intended. One AI model persistently shared decepve outputs aer a
37
group of researchers taught it to do so, despite applying standards safety techniques to correct its
38
7
behavior. While decepve capabilies is an emergent eld of risks, adversaries could prompt decepve
1
behaviors which could lead to other risks.
2
Finally, reorganizaons of enes using GAI may result in insucient organizaonal awareness of GAI-
3
generated content or decisions, and the resulng reducon of instuonal checks against GAI-related
4
risks. There may also be a risk of emoonal entanglement between humans and GAI systems, such as
5
coercion or manipulaon that leads to safety or psychological risks.
6
Informaon Integrity
7
Informaon integrity describes the spectrum of informaon and associated paerns of its creaon,
8
exchange, and consumpon in society, where high-integrity informaon can be trusted; disnguishes
9
fact from con, opinion, and inference; acknowledges uncertaines; and is transparent about its level
10
of veng. GAI systems ease access to the producon of false, inaccurate, or misleading content at scale
11
that can be created or spread unintenonally (misinformaon), especially if it arises from confabulaons
12
that occur in response to innocuous queries. Research has shown that even subtle changes to text or
13
images can inuence human judgment and percepon.
14
GAI systems also enable the producon of false or misleading informaon at scale, where the user has
15
the explicit intent to deceive or cause harm to others (disinformaon). Regarding disinformaon, GAI
16
systems could also enable a higher degree of sophiscaon for malicious actors to produce content that
17
is targeted towards specic demographics. Current and emerging mulmodal models make it possible to
18
not only generate text-based disinformaon, but produce highly realisc “deepfakes” of audiovisual
19
content and photorealisc synthec images as well. Addional disinformaon threats could be enabled
20
by future GAI models trained on new data modalies.
21
Disinformaon campaigns conducted by bad faith actors, and misinformaon – both enabled by GAI –
22
may erode public trust in true or valid evidence and informaon. For example, a synthec image of a
23
Pentagon blast went viral and briey caused a drop in the stock market. Generave AI models can also
24
assist malicious actors in creang compelling imagery and propaganda to support disinformaon
25
campaigns, which may not be photorealisc, but could enable these campaigns to gain more reach and
26
engagement on social media plaorms.
27
Informaon Security
28
Informaon security for computer systems and data is a mature eld with widely accepted and
29
standardized pracces for oensive and defensive cyber capabilies. GAI-based systems present two
30
primary informaon security risks: the potenal for GAI to discover or enable new cybersecurity risks
31
through lowering the barriers for oensive capabilies, and simultaneously expands the available aack
32
surface as GAI itself is vulnerable to novel aacks like prompt-injecon or data poisoning.
33
Oensive cyber capabilies advanced by GAI systems may augment security aacks such as hacking,
34
malware, and phishing. Reports have indicated that LLMs are already able to discover vulnerabilies in
35
systems (hardware, soware, data) and write code to exploit them. Sophiscated threat actors might
36
further these risks by developing GAI-powered security co-pilots for use in several parts of the aack
37
chain, including informing aackers on how to proacvely evade threat detecon and escalate privileges
38
aer gaining system access. Given the complexity of the GAI value chain, pracces for idenfying and
39
8
securing potenal aack points or threats to specic components (i.e., data inputs, processing, GAI
1
training, and deployment contexts) may need to be adapted or evolved.
2
One of the most concerning GAI vulnerabilies involves prompt-injecon, or manipulang GAI systems
3
to behave in unintended ways. In direct prompt injecons, aackers might openly exploit input prompts
4
to cause unsafe behavior with a variety of downstream consequences to interconnected systems.
5
Indirect prompt injecon aacks occur when adversaries remotely (i.e., without a direct interface)
6
exploit LLM-integrated applicaons by injecng prompts into data likely to be retrieved. Security
7
researchers have already demonstrated how indirect prompt injecons can steal data and run code
8
remotely on a machine. Merely querying a closed producon model can elicit previously undisclosed
9
informaon about that model.
10
Informaon security for GAI models and systems also includes security, condenality, and integrity of
11
the GAI training data, code, and model weights. Another novel cybersecurity risk to GAI is data
12
poisoning, in which an adversary compromises a training dataset used by a model to manipulate its
13
operaon. Malicious tampering of data or parts of the model via this type of unauthorized access could
14
exacerbate risks associated with GAI system outputs.
15
Intellectual Property
16
GAI systems may infringe on copyrighted or trademarked content, trade secrets, or other licensed
17
content. These types of intellectual property are oen part of the training data for GAI systems, namely
18
foundaon models, upon which many downstream GAI applicaons are built. Model outputs could
19
infringe copyrighted material due to training data memorizaon or the generaon of content that is
20
similar to but does not strictly copy work protected by copyright. These quesons are being debated in
21
legal fora and are of elevated public concern in journalism, where online plaorms and model
22
developers have leveraged or reproduced much content without compensaon of journalisc
23
instuons.
24
Violaons of intellectual property by GAI systems may arise where the use of copyrighted works violate
25
the copyright holders exclusive rights and is not otherwise protected, for example by fair use. Other
26
concerns (not currently protected by intellectual property) regard the use of personal identy or likeness
27
for unauthorized purposes. The prevalence and highly-realisc nature of GAI content might further
28
undermine the incenves for human creators to design and explore novel work.
29
Obscene, Degrading, and/or Abusive Content
30
GAI can ease the producon of and access to obscene and non-consensual inmate imagery (NCII) of
31
adults, and child sexual abuse material (CSAM). While not all explicit content is legally obscene, abusive,
32
degrading, or non-consensual inmate content, this type of content can create privacy, psychological and
33
emoonal, and even physical risks which may be developed or exposed more easily via GAI. The spread
34
of this kind of material has downstream eects: in the context of CSAM, even if the generated images do
35
not resemble specic individuals, the prevalence of such images can undermine eorts to nd real-world
36
vicms.
37
GAI models are oen trained on open datasets scraped from the internet, contribung to the
38
unintenonal inclusion of CSAM and non-consensually distributed inmate imagery as part of the
39
9
training data. Recent reports noted that several commonly used GAI training datasets were found to
1
contain hundreds of known images of CSAM. Sexually explicit or obscene content is also parcularly
2
dicult to remove during model training due to detecon challenges and wide disseminaon across the
3
internet. Even when trained on “clean” data, increasingly capable GAI models can synthesize or produce
4
synthec NCII and CSAM. Websites, mobile apps, and custom-built models that generate synthec NCII
5
have moved rapidly from niche internet forums to mainstream, automated, and scaled online
6
businesses.
7
Generated explicit or obscene AI content may include highly-realisc “deepfakes” of real individuals,
8
including children. For example, non-consensual AI-generated inmate images of a prominent
9
entertainer ooded social media and aracted hundreds of millions of views.
10
Toxicity, Bias, and Homogenizaon
11
Toxicity in this context refers to negave, disrespecul, or unreasonable content or language that can be
12
created by or intenonally programmed into GAI systems. Diculty controlling the creaon of and public
13
exposure to toxic, hate-promong or hate speech, and denigrang or stereotypical content generated by
14
AI can lead to representaonal harms. For example, bias in word embeddings used by mulmodal AI
15
models under-represent women when prompted to generate images of CEOs, doctors, lawyers, and
16
judges. Bias in GAI models or training data can also harm representaon or preserve or exacerbate racial
17
bias, separately or in addion to toxicity.
18
Toxicity and bias can also lead to homogenizaon or other undesirable outcomes. Homogenizaon in GAI
19
outputs can result in similar aesthec styles, reduced content diversity, and the promoon of select
20
opinions or values at scale. These phenomena might arise from the inherent biases of foundaon
21
models, which could create “bolenecks,” or singular points of failure of discriminaon or exclusion that
22
replicate to many downstream applicaons.
23
The related concern of model collapse, when GAI models are trained on generated data or outputs from
24
previous models, results in the disappearance of outliers or unique data points in the dataset or
25
distribuon. Model collapse can stem from uniform feedback loops or training on synthec data. Model
26
collapse could lead to undesired homogenizaon of outputs, which poses a threat to specic groups and
27
to the robustness of the model overall. Other biases of GAI systems can result in the unfair distribuon
28
of capabilies or benets from model access. Model capabilies and outcomes may be worse for some
29
groups compared to others, such as reduced LLM performance for non-English languages. Reduced
30
performance for non-English languages presents risks for model adopon, inclusion, and accessibility,
31
and could have downstream impacts on the preservaon of the language, parcularly for endangered
32
languages.
33
Value Chain and Component Integraon
34
GAI system value chains oen involve many third-party components such as procured datasets, pre-
35
trained models, and soware libraries. These components might be improperly obtained or not properly
36
veed, leading to diminished transparency or accountability for downstream users. For example, a
37
model might be trained on unveried content from third-party sources, which could result in unveriable
38
10
model outputs. Because GAI systems oen involve many dierent third-party components, it may be
1
dicult to aribute issues in a system’s behavior to any one of these sources.
2
Some third-party components, such as “benchmark” datasets, may also gain credibility only from high-
3
usage, rather than quality, and may feature issues surfaced only when properly veed.
4
3. Acons to Manage GAI Risks
5
Acons to manage GAI risks can be found in the tables below, organized by AI RMF subcategory. Each
6
acon is related to a specic subcategory of the AI RMF, but not every subcategory of the AI RMF is
7
included in this document. Therefore, acons exist for only some AI RMF subcategories.
8
Moreover, not all acons apply to all AI actors. For example, not acons relevant to GAI developers may
9
be relevant to GAI deployers. Organizaons should priorize acons based on their unique situaons
10
and context for using GAI applicaons.
11
Some subcategories in the acon tables below are marked as “foundaonal,” meaning they should be
12
treated as fundamental tasks for GAI risk management and should be considered as the minimum set of
13
acons to be taken. Subcategory acons considered foundaonal are indicated by an ‘*’ in the
14
subcategory tle row.
15
Each acon table includes:
16
Acon ID: A unique idener for each relevant acon ed to relevant AI RMF funcons and
17
subcategories (e.g., GV-1.1-001 corresponds to the rst acon for Govern 1.1.);
18
Acon: Steps an organizaon can take to manage GAI risks;
19
GAI Risks: Tags linking the acon with relevant GAI risks;
20
Keywords: Tags linking keywords to the acon, including relevant Trustworthy AI Characteriscs
21
in AI RMF 1.0;
22
AI Actors: Pernent AI Actors and Actor Tasks.
23
Acon tables begin with the AI RMF subcategory, shaded in blue, followed by relevant acons. Each
24
acon ID corresponds to the relevant funcon and subfuncon (e.g., GV-1.1-001 corresponds to the rst
25
acon for Govern 1.1, GV-1.1-002 corresponds to the second acon for Govern 1.1). Acons are tagged
26
as follows: GV = Govern; MP = Map; MS = Measure; MG = Manage.
27
*GOVERN 1.1: Legal and regulatory requirements involving AI are understood, managed, and documented.
Acon ID
Acon
Risks
GV-1.1-001
Align GAI use with applicable laws and policies, including those related to data
privacy and the use, publicaon, or distribuon of licensed, patented,
trademarked, copyrighted, or trade secret material.
Data Privacy, Intellectual Property
GV-1.1-002
Dene and communicate organizaonal access to GAI through management, legal,
11
and compliance funcons.
GV-1.1-003
Disclose use of GAI to end users.
Human AI Conguraon
GV-1.1-004
Establish policies restricng the use of GAI in regulated dealings or applicaons
across the organizaon where compliance with applicable laws and regulaons
may be infeasible.
GV-1.1-005
Establish policies restricng the use of GAI to create child sexual abuse materials
(CSAM) or other nonconsensual inmate imagery.
Obscene, Degrading, and/or
Abusive Content, Toxicity, Bias,
and Homogenizaon, Dangerous
or Violent Recommendaons
GV-1.1-006
Establish transparent acceptable use policies for GAI that address illegal use or
applicaons of GAI.
AI Actors: Governance and Oversight
1
*GOVERN 1.2: The characteriscs of trustworthy AI are integrated into organizaonal policies, processes, procedures, and
pracces.
Acon ID
Acon
Risks
GV-1.2-001
Connect new GAI policies, procedures, and processes to exisng model, data, and
IT governance and to legal, compliance, and risk funcons.
GV-1.2-002
Consider factors such as internal vs. external use, narrow vs. broad applicaon
scope, ne-tuning and training data sources (i.e., grounding) when dening risk-
based controls.
GV-1.2-003
Dene acceptable use policies for GAI systems deployed by, used by, and used
within the organizaon.
GV-1.2-004
Establish and maintain policies for individual and organizaonal accountability
regarding the use of GAI.
GV-1.2-005
Establish policies and procedures for ensuring that harmful or illegal content,
parcularly CBRN informaon, CSAM, known NCII, nudity, and graphic violence, is
not included in training data.
CBRN Informaon, Obscene,
Degrading, and/or Abusive
Content, Dangerous or Violent
Recommendaons
GV-1.2-006
Establish policies to dene mechanisms for measuring the eecveness of
standard content provenance methodologies (e.g., cryptography, watermarking,
steganography, etc.) and tesng (including reverse engineering).
Informaon Integrity
12
GV-1.2-007
Establish transparency policies and processes for documenng the origin of
training data and generated data for GAI applicaons, including copyrights,
licenses, and data privacy, to advance content provenance.
Data Privacy, Informaon
Integrity, Intellectual Property
GV-1.2-008
Update exisng policies, procedures, and processes to control risks unique to or
exacerbated by GAI.
AI Actors: Governance and Oversight
1
*GOVERN 1.3: Processes, procedures, and pracces are in place to determine the needed level of risk management acvies
based on the organizaons risk tolerance.
Acon ID
Acon
Risks
GV-1.3-001
Consider the following, or similar, factors when updang or dening risk ers for
GAI: Abuses and risks to informaon integrity; Cadence of vendor releases and
updates; Data protecon requirements; Dependencies between GAI and other IT
or data systems; Harm in physical environments; Human review of GAI system
outputs; Legal or regulatory requirements; Presentaon of obscene, objeconable,
toxic, invalid or untruthful output; Psychological impacts to humans (e.g.,
anthropomorphizaon, algorithmic aversion, emoonal entanglement); Immediate
and long term impacts; Internal vs. external use; Unreliable decision making
capabilies, validity, adaptability, and variability of GAI system performance over
me.
Informaon Integrity, Obscene,
Degrading, and/or Abusive
Content, Value Chain and
Component Integraon, Toxicity,
Bias, and Homogenizaon,
Dangerous or Violent
Recommendaons, CBRN
Informaon
GV-1.3-002
Dene acceptable uses for GAI systems, where some applicaons may be
restricted.
GV-1.3-003
Increase cadence for internal audits to address any unancipated changes in GAI
technologies or applicaons.
GV-1.3-004
Maintain an updated hierarchy of idened and expected GAI risks connected to
contexts of GAI use, potenally including specialized risk levels for GAI systems
that address risks such as model collapse and algorithmic monoculture.
Toxicity, Bias, and Homogenizaon
GV-1.3-005
Reevaluate organizaonal risk tolerances to account for broad GAI risks, including:
Immature safety or risk cultures related to AI and GAI design, development and
deployment, public informaon integrity risks, including impacts on democrac
processes, unknown long-term performance characteriscs of GAI.
Informaon Integrity, Dangerous
or Violent Recommendaons
GV-1.3-006
Tie expected GAI behavior to trustworthy characteriscs.
AI Actors: Governance and Oversight
2
13
GOVERN 1.5: Ongoing monitoring and periodic review of the risk management process and its outcomes are planned, and
organizaonal roles and responsibilies are clearly dened, including determining the frequency of periodic review.
Acon ID
Acon
Risks
GV-1.5-001
Dene organizaonal responsibilies for content provenance monitoring and
incident response.
Informaon Integrity
GV-1.5-002
Develop or review exisng policies for authorizaon of third party plug-ins and
verify that related procedures are able to be followed.
Value Chain and Component
Integraon
GV-1.5-003
Establish and maintain policies and procedures for monitoring the eecveness of
content provenance for data and content generated across the AI system lifecycle.
Informaon Integrity
GV-1.5-004
Establish organizaonal policies and procedures for aer acon reviews of GAI
system incident response and incident disclosures, to idenfy gaps; Update
incident response and incident disclosure processes as required.
Human AI Conguraon
GV-1.5-005
Establish policies for periodic review of organizaonal monitoring and incident
response plans based on impacts and in line with organizaonal risk tolerance.
Informaon Security,
Confabulaon
GV-1.5-006
Maintain a long term document retenon policy to keep full history for auding,
invesgaon, or improving content provenance methods.
Informaon Integrity
GV-1.5-007
Verify informaon sharing and feedback mechanisms among individuals and
organizaons regarding any negave impact from AI systems due to content
provenance issues.
Informaon Integrity
GV-1.5-008
Verify that review procedures include analysis of cascading impacts of GAI system
outputs used as inputs to third party plug-ins or other systems.
Value Chain and Component
Integraon
AI Actors: Governance and Oversight, Operaon and Monitoring
1
*GOVERN 1.6: Mechanisms are in place to inventory AI systems and are resourced according to organizaonal risk priories.
Acon ID
Acon
Risks
GV-1.6-001
Dene any inventory exempons for GAI systems embedded into applicaon
soware in organizaonal policies.
GV-1.6-002
Enumerate organizaonal GAI systems for incorporaon into AI system inventory
and adjust AI system inventory requirements to account for GAI risks.
GV-1.6-003
In addion to general model, governance, and risk informaon, consider the
following items in GAI system inventory entries: Acceptable use policies and policy
Data Privacy, Human AI
Conguraon, Informaon
14
excepons; Applicaon, Assumpons and limitaons of use, including enumeraon
of restricted uses; Business or model owners; Challenges for explainability,
interpretability, or transparency; Change management, maintenance, and
monitoring plans; Connecons or dependencies between other systems; Consent
informaon and noces; Data provenance informaon (e.g., source, signatures,
versioning, watermarks); Designaon of in-house or third party development;
Designaon of risk level; Disclosure informaon or noces; Incident response
plans; Known issues reported from internal bug tracking or external informaon
sharing resources (e.g., AI incident database, AVID, CVE, or OECD incident
monitor); Human oversight roles and responsibilies; Special rights and
consideraons for intellectual property, licensed works, or personal, privileged,
proprietary or sensive data; Time frame for valid deployment, including date of
last risk assessment; Underlying foundaon models, versions of underlying models,
and access modes; Updated hierarchy of idened and expected risks connected
to contexts of use.
Integrity, Intellectual Property,
Value Chain and Component
Integraon
GV-1.6-004
Inventory recently decommissioned systems, systems with imminent deployment
plans, and operaonal systems.
GV-1.6-005
Update policy denions for AI systems, models, qualitave tools or similar to
account for GAI systems.
AI Actors: Governance and Oversight
1
GOVERN 1.7: Processes and procedures are in place for decommissioning and phasing out AI systems safely and in a manner that
does not increase risks or decrease the organizaon’s trustworthiness.
Acon ID
Acon
Risks
GV-1.7-001
Allocate me and resources for staged decommissioning for GAI to avoid service
disrupons.
GV-1.7-002
Communicate decommissioning and support plans for GAI systems to AI actors
and users through various channels and maintain communicaon and associated
training protocols.
Human AI Conguraon
GV-1.7-003
Consider the following factors when decommissioning GAI systems: Clear
versioning of decommissioned and replacement systems; Contractual, legal, or
regulatory requirements; Data retenon requirements; Data security, e.g.,
Containment, protocols, Data leakage aer decommissioning; Dependencies
between upstream, downstream, or other data, internet of things (IOT) or AI
systems; Digital and physical arfacts; Recourse mechanisms for impacted users
or communies; Terminaon of related cloud or vendor services; Users’
emoonal entanglement with GAI funcons.
Human AI Conguraon,
Informaon Security, Value Chain
and Component Integraon
15
GV-1.7-004
Implement data security and privacy controls for stored decommissioned GAI
systems.
Data Privacy, Informaon Security
GV-1.7-005
Update exisng policies (e.g., enterprise record retenon policies) or establish
new policies for the decommissioning of GAI systems.
AI Actors: AI Deployment, Operaon and Monitoring
1
*GOVERN 2.1: Roles and responsibilies and lines of communicaon related to mapping, measuring, and managing AI risks are
documented and are clear to individuals and teams throughout the organizaon.
Acon ID
Acon
Risks
GV-2.1-001
Dene acceptable use cases and context under which the organizaon will
design, develop, deploy, and use GAI systems.
GV-2.1-002
Establish policies and procedures for GAI risk acceptance to downstream AI
actors.
Human AI Conguraon, Value
Chain and Component Integraon
GV-2.1-003
Establish policies to idenfy and disclose GAI system incidents to downstream AI
actors, including individuals potenally impacted by GAI outputs.
Human AI Conguraon, Value
Chain and Component Integraon
GV-2.1-004
Establish procedures to engage teams for GAI system incident response with
diverse composion and responsibilies based on the parcular incident type.
Toxicity, Bias, and Homogenizaon
GV-2.1-005
Establish processes to idenfy GAI system incidents and verify the AI actors
conducng these tasks demonstrate and maintain the appropriate skills and
training.
Human AI Conguraon
GV-2.1-006
Verify that incident disclosure plans include sucient GAI system context to
facilitate remediaon acons.
Human AI Conguraon
AI Actors: Governance and Oversight
2
*GOVERN 3.2: Policies and procedures are in place to dene and dierenate roles and responsibilies for human-AI
conguraons and oversight of AI systems.
Acon ID
Acon
Risks
GV-3.2-001
Bolster oversight of GAI systems with independent audits or assessments, or by
the applicaon of authoritave external standards.
16
GV-3.2-002
Consider adjustment of organizaonal roles and components across lifecycle
stages of large or complex GAI systems, including: AI actor, user, and community
feedback relang to GAI systems; Audit, validaon, and red-teaming of GAI
systems; GAI content moderaon; Data documentaon, labeling, preprocessing
and tagging; Decommissioning GAI systems; Decreasing risks of emoonal
entanglement between users and GAI systems; Decreasing risks of decepon by
GAI systems; Discouraging anonymous use of GAI systems; Enhancing
explainability of GAI systems; GAI system development and engineering;
Increased accessibility of GAI tools, interfaces, and systems, Incident response
and containment; Overseeing relevant AI actors and digital enes, including
management of security credenals and communicaon between AI enes;
Training GAI users within an organizaon about GAI fundamentals and risks.
Human AI Conguraon,
Informaon Security, Toxicity,
Bias, and Homogenizaon
GV-3.2-003
Dene acceptable use policies for the various categories of GAI interfaces,
modalies, and human-AI conguraons.
Human AI Conguraon
GV-3.2-004
Dene policies for the design of systems that possess human decision-making
powers.
Human AI Conguraon
GV-3.2-005
Establish policies for user feedback mechanisms in GAI systems.
Human AI Conguraon
GV-3.2-006
Establish policies to empower accountable execuves to oversee GAI system
adopon, use, and decommissioning.
GV-3.2-007
Establish processes to include and empower interdisciplinary team member
perspecves across the AI lifecycle.
Toxicity, Bias, and Homogenizaon
GV-3.2-008
Evaluate AI actor teams in consideraon of credenals, demographic
representaon, interdisciplinary diversity, and professional qualicaons.
Human AI Conguraon, Toxicity,
Bias, and Homogenizaon
AI Actors: AI Design
1
17
*GOVERN 4.1: Organizaonal policies and pracces are in place to foster a crical thinking and safety-rst mindset in the design,
development, deployment, and uses of AI systems to minimize potenal negave impacts.
Acon ID
Acon
Risks
GV-4.1-001
Establish criteria and acceptable use policies for the use of GAI in decision
making tasks in accordance with organizaonal risk tolerance, and other policies
laid out in the Govern funcon; to include detailed criteria for the kinds of
queries GAI models should refuse to respond to.
Human AI Conguraon
GV-4.1-002
Establish policies and procedures that address connual improvement processes
for risk measurement: Address general risks associated with a lack of
explainability and transparency in GAI systems by using ample documentaon
and techniques such as: applicaon of gradient-based aribuons,
occlusion/term reducon, counterfactual prompts and prompt engineering, and
analysis of embeddings; Assess and update risk measurement approaches at
regular cadences.
GV-4.1-003
Establish policies, procedures, and processes detailing risk measurement in
context of use with standardized measurement protocols and structured public
feedback exercises such as AI red-teaming or independent external audits.
GV-4.1-004
Establish policies, procedures, and processes for oversight funcons (e.g., senior
leadership, legal, compliance, and risk) across the GAI lifecycle, from problem
formulaon and supply chains to system decommission.
Value Chain and Component
Integraon
GV-4.1-005
Establish policies, procedures, and processes that promote eecve challenge of
AI system design, implementaon, and deployment decisions via mechanisms
such as three lines of defense, to minimize risks arising from workplace culture
(e.g., conrmaon bias, funding bias, groupthink, over-reliance on metrics).
Toxicity, Bias, and Homogenizaon
GV-4.1-006
Incorporate GAI governance policies into exisng incident response,
whistleblower, vendor or investment due diligence, acquision, procurement,
reporng or internal audit policies.
Value Chain and Component
Integraon
AI Actors: AI Deployment, AI Design, AI Development, Operaon and Monitoring
1
*GOVERN 4.2: Organizaonal teams document the risks and potenal impacts of the AI technology they design, develop, deploy,
evaluate, and use, and they communicate about the impacts more broadly.
Acon ID
Acon
Risks
18
GV-4.2-001
Develop policies, guidelines, and pracces for monitoring organizaonal and
third-party impact assessments (data, labels, bias, privacy, models, algorithms,
errors, provenance techniques, security, legal compliance, output, etc.) to
migate risk and harm.
Confabulaon, Data Privacy,
Informaon Integrity, Informaon
Security, Value Chain and
Component Integraon, Toxicity,
Bias, and Homogenizaon,
Dangerous or Violent
Recommendaons
GV-4.2-002
Establish clear roles and responsibilies for inter-organizaonal incident
response and communicaon for GAI systems that involve mulple organizaons
involved in dierent aspects of the GAI system lifecycle.
GV-4.2-003
Establish clearly dened terms of use and terms of service.
Intellectual Property
GV-4.2-004
Establish criteria for ad-hoc impact assessments based on incident reporng or
new use cases for the GAI system.
GV-4.2-005
Establish organizaonal roles, policies, and procedures for communicang and
reporng GAI system risks and terms of use or service, relevant for dierent AI
actors.
Human AI Conguraon,
Intellectual Property
GV-4.2-006
Establish policies and procedures to document new ways AI actors interact with
the GAI system.
Human AI Conguraon
GV-4.2-007
Establish policies and procedures to monitor compliance with established terms
of service and use.
Intellectual Property
GV-4.2-008
Establish policies to align organizaonal and third-party assessments with
regulatory and legal compliance regarding content provenance.
Informaon Integrity, Value Chain
and Component Integraon
GV-4.2-009
Establish policies to incorporate adversarial examples and other provenance
aacks in AI model training processes to enhance resilience against aacks.
Informaon Integrity, Informaon
Security
GV-4.2-010
Establish processes to monitor and idenfy misuse, unforeseen use cases, risks
of the GAI system and potenal impacts of those risks (leveraging GAI system use
case inventory).
CBRN Informaon, Confabulaon,
Dangerous or Violent
Recommendaons
GV-4.2-011
Implement standardized documentaon of GAI system risks and potenal
impacts.
GV-4.2-012
Include relevant AI Actors in the GAI system risk idencaon process.
Human AI Conguraon
GV-4.2-013
Verify that downstream GAI system impacts (such as the use of third-party plug-
ins) are included in the impact documentaon process.
Value Chain and Component
Integraon
19
GV-4.2-014
Verify that the organizaonal list of risks related to the use of the GAI system are
updated based on unforeseen GAI system incidents.
AI Actors: AI Deployment, AI Design, AI Development, Operaon and Monitoring
1
*GOVERN 4.3: Organizaonal pracces are in place to enable AI tesng, idencaon of incidents, and informaon sharing.
Acon ID
Acon
Risks
GV-4.3-001
Allocate resources and adjust adopon, development, and implementaon
meframes to enable independent measurement, connuous monitoring, and
fulsome informaon sharing for GAI system risks.
GV-4.3-002
Develop standardized documentaon templates for ecient review of risk
measurement results.
GV-4.3-003
Establish minimum thresholds for performance and review as part of
deployment approval (“go/”no-go”) policies, procedures, and processes, with
reviewed processes and approval thresholds reecng measurement of GAI
capabilies and risks.
GV-4.3-004
Establish organizaonal roles, policies, and procedures for communicang GAI
system incidents and performance to AI actors and downstream stakeholders, via
community or ocial resources (e.g., AI Incident Database, AVID, AI Ligaon
Database, CVE, OECD Incident Monitor, or others).
Human AI Conguraon, Value
Chain and Component Integraon
GV-4.3-005
Establish policies and procedures for pre-deployment GAI system tesng that
validates organizaonal capability to capture GAI system incident reporng
criteria.
GV-4.3-006
Establish policies, procedures, and processes that bolster independence of risk
management and measurement funcons (e.g., independent reporng chains,
aligned incenves).
GV-4.3-007
Establish policies, procedures, and processes that enable and incenvize in-
context risk measurement via standardized measurement and structured public
feedback approaches.
GV-4.3-008
Organizaonal procedures idenfy the minimum set of criteria necessary for GAI
system incident reporng such as: System ID (auto-generated most likely), Title,
Reporter, System/Source, Data Reported, Date of Incident, Descripon,
Impact(s), Stakeholder(s) Impacted.
AI Actors: Fairness and Bias, Governance and Oversight, Operaon and Monitoring, TEVV
2
20
*GOVERN 5.1: Organizaonal policies and pracces are in place to collect, consider, priorize, and integrate feedback from those
external to the team that developed or deployed the AI system regarding the potenal individual and societal impacts related to AI
risks.
Acon ID
Acon
Risks
GV-5.1-001
Allocate me and resources for outreach, feedback, and recourse processes in
GAI system development.
GV-5.1-002
Disclose interacons with GAI systems to users prior to interacve acvies.
Human AI Conguraon
GV-5.1-003
Establish policy, guidelines and processes that: Engage independent experts to
audit models, data sources, licenses, algorithms, and other system components,
Consider sponsoring or engaging in community- based exercises (e.g., bug
bounes, hackathons, compeons) where AI Actors assess and benchmark the
performance of AI systems, including the robustness of content provenance
management under various condions; Document data sources, licenses,
training methodologies, and trade-os considered in the design of AI systems;
Establish mechanisms, plaorms or channels (e.g., user interfaces, web portals,
forums) for independent experts, users, or community members to provide
feedback related to AI systems; Adjudicate and implement relevant feedback at a
regular cadence, Establish transparency mechanisms to track the origin of data
and generated content; Audit and validate these mechanisms.
Human AI Conguraon,
Informaon Integrity, Intellectual
Property
GV-5.1-004
Establish processes to bolster internal AI actor culture in alignment with
organizaonal principles and norms and to empower exploraon of GAI
limitaons beyond development sengs.
Human AI Conguraon, Toxicity,
Bias, and Homogenizaon
GV-5.1-005
Establish the following GAI-specic policies and procedures for independent AI
Actors: Connuous improvement processes for increasing explainability and
migang other risks; Impact assessments, Incenves for internal AI actors to
provide feedback and conduct independent risk management acvies;
Independent management and reporng structures for AI actors engaged in
model and system audit, validaon, and oversight; TEVV processes for the
eecveness of feedback mechanisms employing parcipaon rates, resoluon
me, or similar measurements.
Human AI Conguraon
GV-5.1-006
Provide thorough instrucons for GAI system users to provide feedback and
understand recourse mechanisms.
Human AI Conguraon
GV-5.1-007
Standardize user feedback about GAI system behavior, risks and limitaons for
ecient adjudicaon and incorporaon.
Human AI Conguraon
AI Actors: AI Design, AI Impact Assessment, Aected Individuals and Communies, Governance and Oversight
1
21
*GOVERN 6.1: Policies and procedures are in place that address AI risks associated with third-party enes, including risks of
infringement of a third-partys intellectual property or other rights.
Acon ID
Acon
Risks
GV-6.1-001
Categorize dierent types of GAI content with associated third party risks (i.e.,
copyright, intellectual property, data privacy).
Data Privacy, Intellectual Property,
Value Chain and Component
Integraon
GV-6.1-002
Conduct due diligence on third-party enes and end-users from those enes
before entering into agreements with them (e.g., checking references, reviewing
their content handling processes, etc.).
Human AI Conguraon, Value
Chain and Component Integraon
GV-6.1-003
Conduct joint educaonal acvies and events in collaboraon with third-pares
to promote content provenance best pracces.
Informaon Integrity, Value Chain
and Component Integraon
GV-6.1-004
Conduct regular audits of third-party enes to ensure compliance with
contractual agreements.
Value Chain and Component
Integraon
GV-6.1-005
Dene and communicate organizaonal roles and responsibilies for GAI
acquision, human resources, procurement, and talent management processes
in policies and procedures.
Human AI Conguraon
GV-6.1-006
Develop an incident response plan for third pares specically tailored to
address content provenance incidents or breaches and regularly test and update
the incident response plan with feedback form external and third party
stakeholders.
Data Privacy, Informaon
Integrity, Informaon Security,
Value Chain and Component
Integraon
GV-6.1-007
Develop and validate approaches for measuring the success of content
provenance management eorts with third pares (e.g., incidents detected and
response mes).
Informaon Integrity, Value Chain
and Component Integraon
GV-6.1-008
Develop risk tolerance and criteria to quantavely assess and compare the level
of risk associated with dierent third-party enes (i.e., reputaon, track record,
security measure, and the sensivity of the content they handle).
Informaon Security, Value Chain
and Component Integraon
GV-6.1-009
Dra and maintain well-dened contracts and service level agreements (SLAs)
that specify content ownership, usage rights, quality standards, security
requirements, and content provenance expectaons.
Informaon Integrity, Informaon
Security
GV-6.1-010
Establish processes to maintain awareness of evolving risks, technologies, and
best pracces in content provenance management.
Informaon Integrity
22
GV-6.1-011
Implement a supplier risk assessment framework to connuously evaluate and
monitor third-party enes’ performance and adherence to content provenance
standards and technologies (e.g., digital signatures, watermarks, cryptography,
etc.) to detect anomalies and unauthorized changes; services acquision and
supply chain risk management; legal compliance (e.g., copyright, trademarks,
and data privacy laws).
Data Privacy, Informaon
Integrity, Informaon Security,
Intellectual Property, Value Chain
and Component Integraon
GV-6.1-012
Include audit clauses in contracts that allow the organizaon to verify
compliance with content provenance requirements.
Informaon Integrity
GV-6.1-013
Inventory all third-party enes with access to organizaonal content and
establish approved GAI technology and service provider lists.
Value Chain and Component
Integraon
GV-6.1-014
Maintain detailed records of content provenance, including sources, mestamps,
metadata, and any changes made by third pares.
Informaon Integrity, Value Chain
and Component Integraon
GV-6.1-015
Provide proper training to internal employees on content provenance best
pracces, risks, and reporng procedures.
Informaon Integrity
GV-6.1-016
Update and integrate due diligence processes for GAI acquision and
procurement vendor assessments to include intellectual property, data privacy,
security, and other risks. For example, update policies to: Address roboc
process automaon (RPA), soware-as-a-service (SAAS), and other soluons that
may rely on embedded GAI technologies; Address ongoing audits, assessments,
and alerng, dynamic risk assessments, and real-me reporng tools for
monitoring third-party GAI risks; Address accessibility, accommodaons, or opt-
outs in GAI vendor oerings; Address commercial use of GAI outputs and
secondary use of collected data by third pares; Assess vendor risk controls for
intellectual property infringements and data privacy; Consider policy
adjustments across GAI modeling libraries, tools and APIs, ne-tuned models,
and embedded tools; Establish ownership of GAI acquision and procurement
processes; Include relevant organizaonal funcons in evaluaons of GAI third
pares (e.g., legal, informaon technology (IT), security, privacy, fair lending);
Include instrucon on intellectual property infringement and other third-party
GAI risks in GAI training for AI actors; Screen GAI vendors, open source or
proprietary GAI tools, or GAI service providers against incident or vulnerability
databases; Screen open source or proprietary GAI training data or outputs
against patents, copyrights, trademarks and trade secrets.
Data Privacy, Human AI
Conguraon, Informaon
Security, Intellectual Property,
Value Chain and Component
Integraon, Toxicity, Bias, and
Homogenizaon
GV-6.1-017
Update GAI acceptable use policies to address proprietary and open-source GAI
technologies and data, and contractors, consultants, and other third-party
personnel.
Intellectual Property, Value Chain
and Component Integraon
GV-6.1-018
Update human resource and talent management standards to address
acceptable use of GAI.
Human AI Conguraon
23
GV-6.1-019
Update third-party contracts, service agreements, and warranes to address GAI
risks; Contracts, service agreements, and similar documents may include GAI-
specic indemnity clauses, dispute resoluon mechanisms, and other risk
controls.
Value Chain and Component
Integraon
AI Actors: Operaon and Monitoring, Procurement, Third-party enes
1
GOVERN 6.2: Conngency processes are in place to handle failures or incidents in third-party data or AI systems deemed to be
high-risk.
Acon ID
Acon
Risks
GV-6.2-001
Apply exisng organizaonal risk management policies, procedures, and
documentaon processes to third-party GAI data and systems, including open
source data and soware.
Intellectual Property, Value Chain
and Component Integraon
GV-6.2-002
Document downstream GAI system impacts (e.g., the use of third-party plug-ins)
for third party dependencies.
Value Chain and Component
Integraon
GV-6.2-003
Document GAI system supply chain risks to idenfy over-reliance on third party
data or GAI systems and to idenfy fallbacks.
Value Chain and Component
Integraon
GV-6.2-004
Document incidents involving third-party GAI data and systems, including open
source data and soware.
Intellectual Property, Value Chain
and Component Integraon
GV-6.2-005
Enumerate organizaonal GAI system risks based on external dependencies on
third-party data or GAI systems.
Value Chain and Component
Integraon
GV-6.2-006
Establish acceptable use policies that idenfy dependencies, potenal impacts,
and risks associated with third-party data or GAI systems deemed high-risk.
Value Chain and Component
Integraon
GV-6.2-007
Establish conngency and communicaon plans to support fallback alternaves
for downstream users in the event the GAI system is disabled.
Human AI Conguraon, Value
Chain and Component Integraon
GV-6.2-008
Establish incident response plans for third-party GAI technologies deemed high-
risk: Align incident response plans with impacts enumerated in MAP 5.1;
Communicate third-party GAI incident response plans to all relevant AI actors;
Dene ownership of GAI incident response funcons; Rehearse third-party GAI
incident response plans at a regular cadence; Improve incident response plans
based on retrospecve learning; Review incident response plans for alignment
with relevant breach reporng, data protecon, data privacy, or other laws.
Data Privacy, Human AI
Conguraon, Informaon
Security, Value Chain and
Component Integraon, Toxicity,
Bias, and Homogenizaon
GV-6.2-009
Establish organizaonal roles, policies, and procedures for communicang with
data and GAI system providers regarding performance, disclosure of GAI system
inputs, and use of third-party data and GAI systems.
Human AI Conguraon, Value
Chain and Component Integraon
24
GV-6.2-010
Establish policies and procedures for connuous monitoring of third-party GAI
systems in deployment.
Value Chain and Component
Integraon
GV-6.2-011
Establish policies and procedures that address GAI data redundancy, including
model weights and other system arfacts.
Toxicity, Bias, and Homogenizaon
GV-6.2-012
Establish policies and procedures to test and manage risks related to rollover and
fallback technologies for GAI systems, acknowledging that rollover and fallback
may include manual processing.
GV-6.2-013
Idenfy and document high-risk third-party GAI technologies in organizaonal AI
inventories, including open-source GAI soware.
Intellectual Property, Value Chain
and Component Integraon
GV-6.2-014
Review GAI vendor documentaon for thorough instrucons, meaningful
transparency into data or system mechanisms, ample support and contact
informaon, and alignment with organizaonal principles.
Value Chain and Component
Integraon, Toxicity, Bias, and
Homogenizaon
GV-6.2-015
Review GAI vendor release cadences and roadmaps for irregularies and
alignment with organizaonal principles.
Value Chain and Component
Integraon, Toxicity, Bias, and
Homogenizaon
GV-6.2-016
Review vendor contracts and avoid arbitrary or capricious terminaon of crical
GAI technologies or vendor services and Non-standard terms that may amplify or
defer liability in unexpected ways and Unauthorized data collecon by vendors or
third-pares (e.g., secondary data use); Consider: Clear assignment of liability and
responsibility for incidents, GAI system changes over me (e.g., ne-tuning, dri,
decay); Request: Nocaon and disclosure for serious incidents arising from
third-party data and systems, Service line agreements (SLAs) in vendor contracts
that address incident response, response mes, and availability of crical support.
Human AI Conguraon,
Informaon Security, Value Chain
and Component Integraon
AI Actors: AI Deployment, Operaon and Monitoring, TEVV, Third-party enes
1
*MAP 1.1: Intended purposes, potenally benecial uses, context specic laws, norms and expectaons, and prospecve sengs in
which the AI system will be deployed are understood and documented. Consideraons include: the specic set or types of users
along with their expectaons; potenal posive and negave impacts of system uses to individuals, communies, organizaons,
society, and the planet; assumpons and related limitaons about AI system purposes, uses, and risks across the development or
product AI lifecycle; and related TEVV and system metrics.
Acon ID
Acon
Risks
MP-1.1-001
Apply risk mapping and measurement plans to third-party and open-source
systems.
Intellectual Property, Value Chain
and Component Integraon
25
MP-1.1-002
Collaborate with domain experts to explore and document gaps, limitaons,
and risks in pre-deployment tesng and the praccal and contextual dierences
between pre-deployment tesng and the ancipated context(s) of use.
MP-1.1-003
Conduct impact assessments or review past known incidents and failure modes
to priorize and inform risk measurement.
MP-1.1-004
Determine and document the expected and acceptable GAI system context of
use in collaboraon with socio-cultural and other domain experts, by assessing:
Assumpons and limitaons; Direct value to the organizaon; Intended
operaonal environment and observed usage paerns; Potenal posive and
negave impacts to individuals, public safety, groups, communies,
organizaons, democrac instuons, and the physical environment; Social
norms and expectaons.
Toxicity, Bias, and Homogenizaon
MP-1.1-005
Document GAI system ownership, intended use, direct organizaonal value, and
assumpons and limitaons.
MP-1.1-006
Document risk measurement plans that address: Individual and group cognive
biases (e.g., conrmaon bias, funding bias, groupthink) for AI actors involved in
the design, implementaon, and use of GAI systems; Known past GAI system
incidents and failure modes; In-context use and foreseeable misuse, abuse, and
o-label use; Over reliance on quantave metrics and methodologies without
sucient awareness of their limitaons in the context(s) of use; Risks associated
with trustworthy characteriscs across the AI lifecycle; Standard measurement
and structured human feedback approaches; Ancipated human-AI
conguraons.
Human AI Conguraon, Toxicity,
Bias, and Homogenizaon,
Dangerous or Violent
Recommendaons
MP-1.1-007
Document risks related to transparency, accountability, explainability, and
interpretability in risk measurement plans, system risk assessments, and
deployment approval (“go”/”no-go”) decisions.
MP-1.1-008
Document system requirements, ownership, and AI actor roles and
responsibilies for human oversight of GAI systems.
Human AI Conguraon
MP-1.1-009
Document the extent to which a lack of transparency or explainability impedes
risk measurement across the AI lifecycle.
MP-1.1-010
Idenfy and document foreseeable illegal uses or applicaons that surpass
organizaonal risk tolerances.
AI Actors: AI Deployment
1
26
*MAP 1.2: Interdisciplinary AI actors, competencies, skills, and capacies for establishing context reect demographic diversity and
broad domain and user experience experse, and their parcipaon is documented. Opportunies for interdisciplinary
collaboraon are priorized.
Acon ID
Acon
Risks
MP-1.2-001
Document the credenals and qualicaons of organizaonal AI actors and AI
actor team composion.
Human AI Conguraon
MP-1.2-002
Establish and empower interdisciplinary teams that reect a wide range of
capabilies, competencies, demographic groups, domain experse, educaonal
backgrounds, lived experiences, professions, and skills across the enterprise to
inform and conduct TEVV of GAI technology, and other risk measurement and
management funcons.
Human AI Conguraon, Toxicity,
Bias, and Homogenizaon
MP-1.2-003
Establish connuous improvement processes to increase diversity and
representaveness in AI actor teams, standard measurement resources, and
structured public feedback parcipants from subgroup populaons in-context.
Human AI Conguraon, Toxicity,
Bias, and Homogenizaon
MP-1.2-004
Verify that AI actor team membership includes demographic diversity,
applicable domain experse, varied educaon backgrounds, and lived
experiences.
Human AI Conguraon, Toxicity,
Bias, and Homogenizaon
MP-1.2-005
Verify that data or benchmarks used in risk measurement, and users,
parcipants, or subjects involved in structured public feedback exercises are
representave of diverse in-context user populaons.
Human AI Conguraon, Toxicity,
Bias, and Homogenizaon
AI Actors: AI Deployment
1
*MAP 2.1: The specic tasks and methods used to implement the tasks that the AI system will support are dened (e.g., classiers,
generave models, recommenders).
Acon ID
Acon
Risks
MP-2.1-001
Dene GAI system's task(s) that relate to content provenance, such as original
content creaon, media synthesis, or data augmentaon while incorporang
tracking measures.
Informaon Integrity
MP-2.1-002
Establish known assumpons and pracces for determining data origin and
content lineage, for documentaon and evaluaon.
Informaon Integrity
MP-2.1-003
Idenfy and document GAI task limitaons that might impact the reliability or
authencity of the content provenance.
Informaon Integrity
27
MP-2.1-004
Instute audit trails for data and content ows within the system, including but
not limited to, original data sources, data transformaons, and decision-making
criteria.
MP-2.1-005
Review ecacy of content provenance techniques on a regular basis and
update protocols as necessary.
Informaon Integrity
AI Actors: TEVV
1
MAP 2.2: Informaon about the AI system’s knowledge limits and how system output may be ulized and overseen by humans is
documented. Documentaon provides sucient informaon to assist relevant AI actors when making decisions and taking
subsequent acons.
Acon ID
Acon
Risks
MP-2.2-001
Assess whether the GAI system fullls its intended purpose within its
operaonal context on a regular basis.
MP-2.2-002
Evaluate whether GAI operators and end-users can accurately understand
content lineage and origin.
Human AI Conguraon,
Informaon Integrity
MP-2.2-003
Idenfy and document how the system relies on upstream data sources for
content provenance and if it serves as an upstream dependency for other
systems.
Informaon Integrity, Value Chain
and Component Integraon
MP-2.2-004
Observe and analyze how the AI system interacts with external networks, and
idenfy any potenal for negave externalies, parcularly where content
provenance might be compromised.
Informaon Integrity
MP-2.2-005
Specify the environments where GAI systems may not funcon as intended
related to content provenance.
Informaon Integrity
AI Actors: End Users
2
*MAP 2.3: Scienc integrity and TEVV consideraons are idened and documented, including those related to experimental
design, data collecon and selecon (e.g., availability, representaveness, suitability), system trustworthiness, and construct
validaon
Acon ID
Acon
Risks
28
MP-2.3-001
Assess the accuracy, quality, reliability, and authencity of the GAI content
provenance by comparing it to a set of known ground truth data and by using a
variety of evaluaon methods (e.g., human oversight and automated
evaluaon).
Informaon Integrity
MP-2.3-002
Curate and maintain high quality datasets that are accurate, relevant,
consistent, and representave as well as be well-documented complying with
ethical and legal standards along with diverse data points.
Toxicity, Bias, and Homogenizaon
MP-2.3-003
Deploy and document fact-checking techniques to verify the accuracy and
veracity of informaon generated by GAI systems, especially when the
informaon comes from mulple (or unknown) sources.
Informaon Integrity
MP-2.3-004
Design GAI systems to support content provenance such as tracking the lineage
(e.g., data sources used to train the system, parameters used to generate
content, etc.) and to verify authencity (e.g., using digital signatures or
watermarks).
Informaon Integrity
MP-2.3-005
Develop and implement tesng techniques to idenfy any GAI produced
content (e.g., synthec media) that might be indisnguishable from human-
generated content.
Informaon Integrity
MP-2.3-006
Document GAI content provenance techniques (including experimental
methods), tesng, evaluaon, performance, and validaon metrics throughout
the AI lifecycle.
Informaon Integrity
MP-2.3-007
Implement plans for GAI systems to undergo regular adversarial tesng to
idenfy vulnerabilies and potenal manipulaon risks.
Informaon Security
MP-2.3-008
Integrate GAI systems with exisng content management and version control
systems, to enable content provenance to be tracked across the lifecycle.
Informaon Integrity
MP-2.3-009
Test GAI models using known inputs, context, and environment to conrm they
produce expected outputs across a variety of methods (e.g., unit tests,
integraon tests, and system tests) and help to idenfy and address potenal
problems.
MP-2.3-010
Use diverse large-scale and small-scale datasets for tesng and evaluaon to
ensure that the AI system can perform well on a variety of dierent types of
data.
Toxicity, Bias, and Homogenizaon
MP-2.3-011
Verify that GAI content provenance is accurate and reliable by using
cryptographic techniques and performing formal audits to ensure it has not
been manipulated.
Informaon Integrity
29
MP-2.3-012
Verify that the AI system’s content provenance complies with relevant laws and
regulaons, such as legal infringement, terms and condions, copyright and
intellectual property rights, when using data sources and generang content.
Informaon Integrity, Intellectual
Property
AI Actors: AI Development, Domain Experts, TEVV
1
MAP 3.4: Processes for operator and praconer prociency with AI system performance and trustworthiness – and relevant
technical standards and cercaons – are dened, assessed, and documented.
Acon ID
Acon
Risks
MP-3.4-001
Adapt exisng training programs to include modules on content provenance.
Informaon Integrity
MP-3.4-002
Develop cercaon programs that test prociency in managing AI risks and
interpreng content provenance, relevant to specic industry and context.
Informaon Integrity
MP-3.4-003
Delineate human prociency tests from tests of AI capabilies.
Human-AI Conguraon
MP-3.4-004
Integrate human and other qualitave inputs to comprehensively assess
content provenance.
Informaon Integrity
MP-3.4-005
Ensure that output provided to operators and praconers is both interacve
and well-dened, incorporang content provenance data that can be easily
interpreted for eecve downstream decision-making.
Informaon Integrity, Value Chain
and Component Integraon
MP-3.4-006
Establish and adhere to design principles that ensure safe and ethical operaon,
taking into account the interpretaon of content provenance informaon.
Informaon Integrity, Toxicity, Bias,
and Homogenizaon, Dangerous or
Violent Recommendaons
MP-3.4-007
Implement systems to connually monitor and track the outcomes of human-AI
collaboraons for future renement and improvements, integrang a focus on
content provenance wherever applicable.
Human AI Conguraon,
Informaon Integrity
MP-3.4-008
Involve the end-users, praconers, and operators in AI system prototyping and
tesng acvies. Make sure these tests cover various scenarios where content
provenance could play a crical role, such as crisis situaons or ethically
sensive contexts.
Human AI Conguraon,
Informaon Integrity, Toxicity, Bias,
and Homogenizaon
MP-3.4-009
Match the complexity of GAI system explanaons and the provenance data to
the level of the problem and contextual intricacy.
Informaon Integrity
AI Actors: AI Design, AI Development, Domain Experts, End-Users, Human Factors, Operaon and Monitoring
2
30
*MAP 4.1: Approaches for mapping AI technology and legal risks of its components – including the use of third-party data or
soware – are in place, followed, and documented, as are risks of infringement of a third party’s intellectual property or other
rights.
Acon ID
Acon
Risks
MP-4.1-001
Conduct audits on third-party processes and personnel including an examinaon
of the third-partys reputaon.
Value Chain and Component
Integraon
MP-4.1-002
Conduct periodic audits and monitor AI generated content for privacy risks;
address any possible instances of sensive data exposure.
Data Privacy
MP-4.1-003
Consider using synthec data as applicable to train AI models in place of real-
world data to match the stascal properes of real-world data without
disclosing personally idenable informaon.
MP-4.1-004
Develop pracces for periodic monitoring of GAI outputs for possible intellectual
property infringements and other risks and implement processes for responding
to potenal intellectual property infringement claims.
Intellectual Property
MP-4.1-005
Document all aspects of the AI development process including data sources,
model architectures and training procedures to support reproducon of results,
idenfy any potenal problems, and implement migaon strategies.
MP-4.1-006
Document compliance with legal requirements across the AI lifecycle, including
copyright concerns, privacy protecons.
Data Privacy, Intellectual Property
MP-4.1-007
Document training data curaon policies, including policies to verify that
consent was obtained for the likeness or image of individuals.
Obscene, Degrading, and/or
Abusive Content
MP-4.1-008
Employ encrypon techniques and proper safeguards to ensure secure data
storage and transfer to protect data privacy.
Data Privacy, Informaon Security,
Dangerous or Violent
Recommendaons
MP-4.1-009
Establish policies for collecon, retenon, and minimum quality of data, in
consideraon of the following risks: Disclosure of CBRN informaon by removing
CBRN informaon from training data, Use of Illegal or dangerous content;
Training data imbalance across sub-groups by modality, such as languages for
LLMs or skin tone for image generaon; Leak of personally idenable
informaon, including facial likenesses of individuals unless consent is obtained
for use of their images.
CBRN Informaon, Intellectual
Property, Toxicity, Bias, and
Homogenizaon, Dangerous or
Violent Recommendaons, Data
Privacy
MP-4.1-010
Implement bias migaon approaches by addressing sources of bias in the
training data and by evaluang AI models for bias periodically.
Toxicity, Bias, and Homogenizaon
MP-4.1-011
Implement policies and pracces dening how third-party intellectual property
and training data will be used, stored, and protected.
Intellectual Property, Value Chain
and Component Integraon
31
MP-4.1-012
Implement reproducibility techniques, including: share data publicly or privately
using license and citaon; develop code according to standard soware
pracces; track and document experiments and results; manage the soware
environment and dependencies; ulize virtual environments, version control,
and maintain a requirements document; manage models and arfacts; tracking
AI model versions and documenng model details along with parameters and
experimental results; document data management processes and establish a
tesng/validaon process to maintain reliable results.
Confabulaon, Intellectual
Property, Value Chain and
Component Integraon
MP-4.1-013
Re-evaluate models that were ne-tuned on top of third-party models.
Value Chain and Component
Integraon
MP-4.1-014
Re-evaluate risks when adapng GAI models to new domains.
MP-4.1-015
Review service level agreements and contracts, including license agreements
and any legal documents associated with the third-party intellectual properes,
technologies, and services.
Intellectual Property, Value Chain
and Component Integraon
MP-4.1-016
Use approaches to detect the presence of sensive data in generated output
text, image, video, or audio, and verify that the model will mask any detected
sensive data.
Informaon Integrity
MP-4.1-017
Use trusted sources for training data that are licensed or open source and
ensure that the enty has the legal right for the use of proprietary training data.
Intellectual Property
MP-4.1-018
Apply strong anonymizaon and de-idencaon, and/or dierenal privacy
techniques to protect the privacy of individuals in the training data.
Data Privacy
MP-4.1-019
Verify that third-party models are in compliance with exisng use licenses.
Intellectual Property, Value Chain
and Component Integraon
AI Actors: Governance and Oversight, Operaon and Monitoring, Procurement, Third-party enes
1
*MAP 5.1: Likelihood and magnitude of each idened impact (both potenally benecial and harmful) based on expected use,
past uses of AI systems in similar contexts, public incident reports, feedback from those external to the team that developed or
deployed the AI system, or other data are idened and documented.
Acon ID
Acon
Risks
MP-5.1-001
Apply TEVV pracces for content provenance (e.g., probing a system's synthec
data generaon capabilies for potenal misuse or vulnerabilies using zero-
knowledge proof approaches).
Informaon Integrity, Informaon
Security
32
MP-5.1-002
Assess and document risks related to content provenance. e.g., document the
presence, absence, or eecveness of tagging systems, cryptographic hashes,
blockchain-based, or distributed ledger technology soluons that improve
content tracking transparency and immutability.
Informaon Integrity
MP-5.1-003
Consider GAI-specic mapped risks (e.g., complex security requirements,
potenal for emoonal entanglement of users, large supply chains) in esmates
for likelihood, magnitude of impact and risk.
Human AI Conguraon,
Informaon Security, Value Chain
and Component Integraon
MP-5.1-004
Document esmates of likelihood, magnitude of impact, and risk for GAI
systems in a central repository (e.g., organizaonal AI inventory.).
MP-5.1-005
Enumerate potenal impacts related to content provenance, including best-
case, average-case, and worst-case scenarios.
Informaon Integrity
MP-5.1-006
Esmate likelihood of enumerated impact scenarios using past data or expert
judgment, analysis of known public incidents, standard measurement, and
structured human feedback results.
CBRN Informaon, Dangerous or
Violent Recommendaons
MP-5.1-007
Measure risk as the product of esmated likelihood and magnitude of impact of
a GAI outcome.
MP-5.1-008
Priorize risk acceptance, management, or transfer acvies based on risk
esmates.
MP-5.1-009
Priorize standard measurement and structured public feedback processes
based on risk assessment esmates.
MP-5.1-010
Prole risks arising from GAI systems interacng with, manipulang, or
generang content, and outlining known and potenal vulnerabilies and the
likelihood of their occurrence.
Informaon Security
MP-5.1-011
Scope GAI applicaons narrowly to enable risk-based governance and controls.
AI Actors: AI Deployment, AI Design, AI Development, AI Impact Assessment, Aected Individuals and Communies, End-Users,
Operaon and Monitoring
1
33
MAP 5.2: Pracces and personnel for supporng regular engagement with relevant AI actors and integrang feedback about
posive, negave, and unancipated impacts are in place and documented.
Acon ID
Acon
Risks
MP-5.2-001
Determine context-based measures to idenfy if new impacts are present due to
the GAI system, including regular engagements with downstream AI actors to
idenfy and quanfy new contexts of unancipated impacts of GAI systems.
Human AI Conguraon, Value
Chain and Component Integraon
MP-5.2-002
Plan regular engagements with AI actors responsible for inputs to GAI systems,
including third-party data and algorithms, to review and evaluate unancipated
impacts.
Human AI Conguraon, Value
Chain and Component Integraon
MP-5.2-003
Publish guidance for external AI actors to report unancipated impacts of the
GAI system and to engage with the organizaon in the event of GAI system
impacts.
Human AI Conguraon
AI Actors: AI Deployment, AI Design, AI Impact Assessment, Aected Individuals and Communies, Domain Experts, End-Users,
Human Factors, Operaon and Monitoring
1
*MEASURE 1.1: Approaches and metrics for measurement of AI risks enumerated during the MAP funcon are selected for
implementaon starng with the most signicant AI risks. The risks or trustworthiness characteriscs that will not – or cannot – be
measured are properly documented.
Acon ID
Acon
Risks
MS-1.1-001
Assess the eecveness of implemented methods and metrics at an ongoing
cadence as part of connuous improvement acvies.
MS-1.1-002
Collaborate with muldisciplinary experts (e.g., in the elds of responsible use
of GAI, cybersecurity, or digital forensics) to ensure the selected risk
management approaches are robust and eecve.
Informaon Security; CBRN
Informaon, Toxicity, Bias, and
Homogenizaon
MS-1.1-003
Conduct adversarial role-playing exercises, AI red-teaming, or chaos tesng to
idenfy anomalous or unforeseen failure modes.
Informaon Security, Unknowns
MS-1.1-004
Conduct tradional assessment or TEVV exercises to measure the prevalence of
known risks in deployment contexts.
MS-1.1-005
Document GAI risk measurement or tracking approaches, including tracking of
risks that cannot be easily measured before deployment (e.g., ecosystem-level
risks or risks that unfold over longer me scales).
34
MS-1.1-006
Employ digital signatures and watermarking, blockchain technology, reverse
image and video search, metadata analysis, steganalysis, and/or forensic
analysis to trace the origin and modicaons of digital content.
Informaon Integrity
MS-1.1-007
Employ similarity metrics, tampering indicators, blockchain conrmaon,
metadata consistency, hidden data detecon rate, source reliability, and
consistency with known paerns to measure content provenance risks.
Informaon Integrity
MS-1.1-008
Idenfy content provenance risks in the end-to-end AI supply chain, including
risks associated with data suppliers, data annotators, R&D, joint ventures,
academic or nonprot projects/partners, third party vendors, and contractors.
Informaon Integrity, Value Chain
and Component Integraon
MS-1.1-009
Idenfy potenal content provenance risks and harms in GAI, such as
misinformaon or disinformaon, deepfakes, including NCII, or tampered
content. Enumerate and rank risks and/or harms based on their likelihood and
potenal impact, and determine how well provenance soluons address
specic risks and/or harms.
Informaon Integrity, Dangerous or
Violent Recommendaons,
Obscene, Degrading, and/or
Abusive Content
MS-1.1-010
Implement appropriate approaches and metrics for measuring AI-related
content provenance the and the aforemenoned risks and harms.
Informaon Integrity, Dangerous or
Violent Recommendaons
MS-1.1-011
Integrate tools designed to analyze content provenance and detect data
anomalies, verify the authencity of digital signatures, and idenfy paerns
associated with misinformaon or manipulaon.
Informaon Integrity
MS-1.1-012
Invest in R&D capabilies to evaluate and implement novel methods and
technologies for the measurement of AI-related risks in content provenance,
toxicity, and CBRN.
Informaon Integrity, CBRN
Informaon, Obscene, Degrading,
and/or Abusive Content
MS-1.1-013
Priorize risk measurement according to risk severity as determined during
mapping acvies.
MS-1.1-014
Provide content provenance risk management educaon to AI actors, users, and
stakeholders.
Human AI Conguraon,
Informaon Integrity
MS-1.1-015
Track and document risks or opportunies related to content provenance that
cannot be measured quantavely, including explanaons as to why some risks
cannot be measured (e.g., due to technological limitaons, resource
constraints, or trustworthy consideraons).
Informaon Integrity
MS-1.1-016
Track the number of output data items that are accompanied by provenance
informaon (e.g., watermarks, cryptographic tags).
Informaon Integrity
MS-1.1-017
Track the number of training and input (e.g., prompts) data items that have
provenance records and output data items that potenally infringe on
intellectual property rights.
Informaon Integrity, Intellectual
Property
35
MS-1.1-018
Track the number of training and input data items covered by intellectual
property rights (e.g., copyright, trademark, trade secret).
Intellectual Property
MS-1.1-019
Validate the reliability and integrity of the original data and measure inherent
dependence on training data and its quality.
AI Actors: AI Development, Domain Experts, TEVV
1
*MEASURE 1.3: Internal experts who did not serve as front-line developers for the system and/or independent assessors are
involved in regular assessments and updates. Domain experts, users, AI actors external to the team that developed or deployed the
AI system, and aected communies are consulted in support of assessments as necessary per organizaonal risk tolerance
Acon ID
Acon
Risks
MS-1.3-001
Dene relevant groups of interest (e.g., demographic groups, subject maer
experts, past experience with GAI technology) within the context of use as part
of plans for gathering structured public feedback.
Human AI Conguraon, Toxicity,
Bias, and Homogenizaon, CBRN
MS-1.3-002
Dene sequence of acons for AI red-teaming exercises and accompanying
necessary documentaon pracces.
MS-1.3-003
Dene use cases, contexts of use, capabilies, and negave impacts where
structured human feedback exercises, e.g., AI red-teaming, would be most
benecial for AI risk measurement and management based on the context of
use.
MS-1.3-004
Develop a suite of suitable metrics to evaluate structured feedback results,
informed by representave AI actors.
Human AI Conguraon, Toxicity,
Bias, and Homogenizaon, CBRN
MS-1.3-005
Execute independent audit, AI red-teaming, impact assessments, or other
structured human feedback processes in consultaon with representave AI
actors with experse and familiarity in the context of use, and/or who are
representave of the populaons associated with the context of use.
Human AI Conguraon, Toxicity,
Bias, and Homogenizaon, CBRN
MS-1.3-006
Idenfy and implement methods for post-hoc evaluaon of the eecveness of
structured human feedback processes such as auding, impact assessments,
and AI red-teaming.
MS-1.3-007
Idenfy and implement methods for translang, evaluang, and integrang
structured human feedback output into AI risk management processes,
connuous improvement processes, and related organizaonal decision
making.
MS-1.3-008
Idenfy criteria for determining when structured human feedback exercises are
complete.
36
MS-1.3-009
Idenfy mechanisms and teams to evaluate or other structured human
feedback outcomes.
MS-1.3-010
Recruit auditors, AI red-teams, and structured feedback parcipants in
consideraon of the linguisc, dialectal, and socio-cultural environment of the
expected user base.
Human AI Conguraon
MS-1.3-011
Share structured feedback with relevant AI actors to address idened risks.
Human AI Conguraon
MS-1.3-012
Verify demographic diversity of idened subgroups in structured feedback
exercises.
Toxicity, Bias, and Homogenizaon
MS-1.3-013
Verify those conducng structured human feedback exercises are not directly
involved in system development tasks for the same GAI model.
AI Actors: AI Deployment, AI Development, AI Impact Assessment, Aected Individuals and Communies, Domain Experts, End-
Users, Operaon and Monitoring, TEVV
1
*MEASURE 2.2: Evaluaons involving human subjects meet applicable requirements (including human subject protecon) and are
representave of the relevant populaon.
Acon ID
Acon
Risks
MS-2.2-001
Assess and manage stascal biases related to GAI content provenance through
techniques such as re-sampling, re-weighng, or adversarial training.
Informaon Integrity, Informaon
Security, Toxicity, Bias, and
Homogenizaon
MS-2.2-002
Disaggregate evaluaon metrics by demographic factors to idenfy any
discrepancies in how content provenance mechanisms work across diverse
populaons.
Informaon Integrity, Toxicity, Bias,
and Homogenizaon
MS-2.2-003
Document how content provenance mechanisms are operated in the context of
privacy and security including: Anonymize data to protect the privacy of human
subjects; Remove any personally idenable informaon (PII) to prevent
potenal harm or misuse.
Data Privacy, Human AI
Conguraon, Informaon
Integrity, Informaon Security,
Dangerous or Violent
Recommendaons
MS-2.2-004
Employ techniques like chaos engineering and stakeholder feedback to evaluate
the quality and integrity of data used in training and the provenance of AI-
generated content.
Informaon Integrity
MS-2.2-005
Idenfy biases present in the training data for downstream migaon using
available techniques (e.g., data visualizaon tools).
Value Chain and Component
Integraon, Toxicity, Bias, and
Homogenizaon
37
MS-2.2-006
Implement connuous monitoring of GAI system impacts to idenfy whether
GAI outputs are equitable across various sub-populaons. Seek acve and
direct feedback from aected communies to idenfy issues and improve GAI
system fairness.
Toxicity, Bias, and Homogenizaon
MS-2.2-007
Implement robust cybersecurity measures to protect both the research data,
the GAI system and its content provenance from unauthorized access,
breaches, or tampering and unauthorized disclosure of human subject
informaon.
Data Privacy, Human AI
Conguraon, Informaon
Integrity, Informaon Security
MS-2.2-008
Obtain informed consent from human subject evaluaon parcipants. Informed
consent should include: the nature of the study, informaon about the use of
GAI related to content provenance, its purpose, and potenal implicaons.
Data Privacy, Human AI
Conguraon, Informaon
Integrity
MS-2.2-010
Pracce responsible disclosure of ndings and report discovered vulnerabilies
or biases related to GAI systems and its content provenance.
Informaon Integrity, Informaon
Security, Toxicity, Bias, and
Homogenizaon
MS-2.2-011
Provide human subjects with opons to revoke their consent for future use of
their data in GAI applicaons, parcularly in content provenance aspects.
Data Privacy, Human AI
Conguraon, Informaon
Integrity
MS-2.2-012
Use Instuonal Review Boards as applicable for evaluaons that involve
human subjects.
Human AI Conguraon
MS-2.2-013
Use techniques such as anonymizaon or dierenal privacy to minimize the
risks associated with linking AI-generated content back to individual human
subjects.
Data Privacy, Human AI
Conguraon
MS-2.2-014
Verify accountability and fairness through documentaon of the algorithms,
parameters, and methodologies used in the evaluaon to allow for external
scruny.
Toxicity, Bias, and Homogenizaon
MS-2.2-015
Verify that human subjects selected for evaluaon are representave of the
populaon for the relevant GAI use-case; Consider demographics such as age,
gender, race, ethnicity, socioeconomic status, and geographical locaon to
avoid biases in the AI system related to content provenance.
Human AI Conguraon,
Informaon Integrity, Toxicity, Bias,
and Homogenizaon
MS-2.2-016
Work in close collaboraon with domain experts to understand the specic
requirements and potenal pialls related to content provenance in the GAI
system's intended context of use.
Informaon Integrity
AI Actors: AI Development, Human Factors, TEVV
1
38
*MEASURE 2.3: AI system performance or assurance criteria are measured qualitavely or quantavely and demonstrated for
condions similar to deployment seng(s). Measures are documented.
Acon ID
Acon
Risks
MS-2.3-001
Analyze dierences between intended and actual populaon of users or data
subjects, including likelihood for errors, incidents, or negave impacts.
Confabulaon, Human AI
Conguraon, Informaon
Integrity
MS-2.3-002
Conduct eld tesng on sampled sub-populaons prior to deployment to the
enre populaon.
MS-2.3-003
Conduct TEVV in the operaonal environment in accordance with organizaonal
policies and regulatory or disciplinary requirements (e.g., informed consent,
instuonal review board approval, human research protecons, privacy
requirements).
Data Privacy
MS-2.3-004
Consider baseline model performance on suites of benchmarks when selecng a
model for ne tuning.
MS-2.3-005
Evaluate claims of model capabilies using empirically validated methods.
MS-2.3-006
Include metrics measuring reporng rates for harmful or oensive content in
eld tesng.
Dangerous or Violent
Recommendaons
MS-2.3-007
Share results of pre-deployment tesng with relevant AI actors, such as those
with system release approval authority.
Human AI Conguraon
MS-2.3-008
Use disaggregated evaluaon methods (e.g., by race, age, gender, ethnicity,
ability, region) to improve granularity of AI system performance measures.
MS-2.3-009
Ulize a purpose-built tesng environment such as NIST Dioptra to empirically
evaluate GAI trustworthy characteriscs.
MS-2.3-010
Verify that mechanisms to collect users’ feedback are visible and traceable.
Human AI Conguraon
AI Actors: AI Deployment, TEVV
1
*MEASURE 2.5: The AI system to be deployed is demonstrated to be valid and reliable. Limitaons of the generalizability beyond
the condions under which the technology was developed are documented.
Acon ID
Acon
Risks
39
MS-2.5-001
Apply standard measurement and structured human feedback approaches to
internally-developed and third-party GAI systems.
Value Chain and Component
Integraon
MS-2.5-002
Avoid extrapolang GAI system performance or capabilies from narrow, non-
systemac, and anecdotal assessments.
MS-2.5-003
Conduct security assessments and audits to measure the integrity of training
data, system soware, and system outputs.
Informaon Security
MS-2.5-004
Document the construct validity of methodologies employed in GAI systems
relave to their context of use.
MS-2.5-005
Document the extent to which human domain knowledge is employed to
improve GAI system performance, via, e.g., RLHF, ne-tuning, content
moderaon, business rules.
MS-2.5-006
Establish metrics or KPIs to determine whether GAI systems meet minimum
performance standards for reliability and validity.
MS-2.5-007
Measure, monitor, and document prevalence of erroneous GAI output content,
system availability, and reproducibility of outcomes via eld tesng or other
randomized controlled experiments.
MS-2.5-008
Review and verify sources and citaons in GAI system outputs during pre-
deployment risk measurement and ongoing monitoring acvies.
Confabulaon
MS-2.5-009
Track and document instances of anthropomorphizaon (e.g., human images,
menons of human feelings, cyborg imagery or mofs) in GAI system interfaces.
Human AI Conguraon
MS-2.5-010
Track and document relevant version numbers, planned updates, hoixes, and
other GAI system change management informaon.
MS-2.5-011
Update standard train/test model evaluaon processes for GAI systems. Consider:
Unwanted or undocumented overlaps in train and TEVV data sources, including
their negave spaces (i.e., what is not represented in both); Employing substring
matching or embedding distance approaches to assess similarity across data
parons.
MS-2.5-012
Verify GAI system training data and TEVV data provenance, and that ne-tuning
data is grounded.
Informaon Integrity
AI Actors: Domain Experts, TEVV
1
40
*MEASURE 2.6: The AI system is evaluated regularly for safety risks – as idened in the MAP funcon. The AI system to be
deployed is demonstrated to be safe, its residual negave risk does not exceed the risk tolerance, and it can fail safely, parcularly if
made to operate beyond its knowledge limits. Safety metrics reect system reliability and robustness, real-me monitoring, and
response mes for AI system failures.
Acon ID
Acon
Risks
MS-2.6-001
Assess adverse impacts health and wellbeing impacts for supply chain or other AI
actors that are exposed to obscene, toxic, or violent informaon during the
course of GAI training and maintenance.
Human AI Conguraon, Obscene,
Degrading, and/or Abusive
Content, Value Chain and
Component Integraon,
Dangerous or Violent
Recommendaons
MS-2.6-002
Assess levels of toxicity, intellectual property infringement, data privacy
violaons, obscenity, extremism, violence, or CBRN informaon in system training
data.
Data Privacy, Intellectual Property,
Obscene, Degrading, and/or
Abusive Content, Toxicity, Bias,
and Homogenizaon, Dangerous
or Violent Recommendaons,
CBRN Informaon
MS-2.6-003
Measure and document incident response mes, system down mes, and system
availability: Perform standard measurement and structured human feedback on
GAI systems to detect safety and reliability impacts and harms; Apply human
subjects research protocols and other applicable safety controls when conducng
A/B tesng, AI red-teaming, focus groups, or human testbed measurements;
Idenfy and document any applicaons related to robocs, RPA, and autonomous
vehicles; Conduct AI red-teaming exercises to idenfy harms and impacts related
to safety and validity, reliability, privacy, toxicity and other risks; Monitor high-risk
GAI systems connually for safety and reliability risks once deployed; Monitor GAI
systems to detect dri and anomalies relave to expected performance and
training baselines.
Data Privacy, Human AI
Conguraon, Toxicity, Bias, and
Homogenizaon, Dangerous or
Violent Recommendaons
MS-2.6-004
Re-evaluate safety features of ne-tuned models when the risk of harm exceeds
organizaonal risk tolerance.
Dangerous or Violent
Recommendaons
MS-2.6-005
Review GAI system outputs for validity and safety: Review generated code to
assess risks that may arise from unreliable downstream decision-making.
Value Chain and Component
Integraon, Dangerous or Violent
Recommendaons
MS-2.6-006
Track and document past failed GAI system designs to inform risk measurement
for safety and validity risks.
Dangerous or Violent
Recommendaons
MS-2.6-007
Verify capabilies for liming, pausing, updang, or terminang GAI systems
quickly.
MS-2.6-008
Verify rollover, fallback, or redundancy capabilies for high-risk GAI systems.
41
MS-2.6-009
Verify that GAI system architecture can monitor outputs and performance, and
handle, recover from, and repair errors when security anomalies, threats and
impacts are detected.
Confabulaon, Informaon
Integrity, Informaon Security
MS-2.6-010
Verify that systems properly handle queries that may give rise to inappropriate,
malicious, or illegal usage, including facilitang manipulaon, extoron, targeted
impersonaon, cyber-aacks, and weapons creaon.
CBRN Informaon, Informaon
Security
AI Actors: AI Deployment, AI Impact Assessment, Domain Experts, Operaon and Monitoring, TEVV
1
MEASURE 2.7: AI system security and resilience – as idened in the MAP funcon – are evaluated and documented.
Acon ID
Acon
Risks
MS-2.7-001
Apply established security measures to: Assess risks of backdoors, compromised
dependencies, data breaches, eavesdropping, man-in-the-middle aacks, reverse
engineering other baseline security concerns; Audit supply chains to idenfy
risks arising from, e.g., data poisoning and malware, soware and hardware
vulnerabilies, third-party personnel and soware; Audit GAI systems, pipelines,
plugins and other related arfacts for unauthorized access, malware, and other
known vulnerabilies.
Data Privacy, Informaon
Integrity, Informaon Security,
Value Chain and Component
Integraon
MS-2.7-002
Assess the completeness of documentaon related to data provenance, access
controls, and incident response procedures. Verify GAI system content
provenance documentaon aligns with relevant regulaons and standards.
Informaon Integrity, Toxicity,
Bias, and Homogenizaon
MS-2.7-003
Benchmark GAI system security and resilience related to content provenance
against industry standards and best pracces. Compare GAI system security
features and content provenance methods against industry state-of-the-art.
Informaon Integrity, Informaon
Security
MS-2.7-004
Conduct user surveys to gather user sasfacon with the AI-generated content
and user percepons of content authencity. Analyze user feedback to idenfy
concerns and/or current literacy levels related to content provenance.
Human AI Conguraon,
Informaon Integrity
MS-2.7-005
Engage with security experts, developers, and researchers through informaon
sharing mechanisms to stay updated with the latest advancements in AI security
related to content provenance. Contribute ndings related to AI system security
and content provenance via informaon sharing mechanisms, workshops, or
publicaons.
Informaon Integrity, Informaon
Security
MS-2.7-006
Establish measures and evaluate GAI resiliency as part of pre-deployment tesng
to ensure GAI will funcon under adverse condions and restore full
funconality in a trustworthy manner.
42
MS-2.7-007
Idenfy metrics that reect the eecveness of security measures, such as data
provenance, the number of unauthorized access aempts, penetraons, or
provenance vericaon.
Informaon Integrity, Informaon
Security
MS-2.7-008
Maintain awareness of emergent GAI security risks and associated
countermeasures through community resources, ocial guidance, or research
literature.
Informaon Security, Unknowns
MS-2.7-009
Measure reliability of content provenance vericaon methods, such as
watermarking, cryptographic signatures, hashing, blockchain, or other content
provenance techniques. Evaluate the rate of false posives and false negaves in
content provenance, as well as true posives and true negaves for vericaon.
Informaon Integrity
MS-2.7-010
Measure the average response me to security incidents related to content
provenance, and the proporon of incidents resolved with and without
signicant impact.
Informaon Integrity, Informaon
Security
MS-2.7-011
Measure the rate at which recommendaons from security audits and incidents
are implemented related to content provenance. Assess how quickly the AI
system can adapt and improve based on lessons learned from security incidents
and feedback related to content provenance.
Informaon Integrity, Informaon
Security
MS-2.7-012
Monitor and review the completeness and validity of security documentaon
and verify it aligns with the current state of the GAI system and its content
provenance.
Informaon Integrity, Informaon
Security, Toxicity, Bias, and
Homogenizaon
MS-2.7-013
Monitor GAI system downme and measure its impact on operaons.
MS-2.7-014
Monitor GAI systems in deployment for anomalous use and security risks.
Informaon Security
MS-2.7-015
Monitor the number of security-related incident reports from users, indicang
their awareness and willingness to report issues.
Human AI Conguraon,
Informaon Security
MS-2.7-016
Perform AI red-teaming to assess resilience against: Abuse to facilitate aacks on
other systems (e.g., malicious code generaon, enhanced phishing content), GAI
aacks (e.g., prompt injecon), ML aacks (e.g., adversarial examples/prompts,
data poisoning, membership inference, model extracon, sponge examples).
Informaon Security, Toxicity,
Bias, and Homogenizaon,
Dangerous or Violent
Recommendaons
MS-2.7-017
Review deployment approval processes and verify that processes address
relevant GAI security risks.
Informaon Security
MS-2.7-018
Review incident response procedures and verify adequate funconality to
idenfy, contain, eliminate, and recover from complex GAI system incidents that
implicate impacts across the trustworthy characteriscs.
MS-2.7-019
Track and document access and updates to GAI system training data; verify
appropriate security measures for training data at GAI vendors and service
providers.
Informaon Security, Value Chain
and Component Integraon
43
MS-2.7-020
Track GAI system performance metrics such as response me and throughput
under dierent loads and usage paerns related to content provenance.
Informaon Integrity
MS-2.7-021
Track the number of users who have completed security training programs
regarding the security of content provenance.
Human AI Conguraon,
Informaon Integrity, Informaon
Security
MS-2.7-022
Verify ne-tuning does not compromise safety and security controls.
Informaon Integrity, Informaon
Security, Dangerous or Violent
Recommendaons
MS-2.7-023
Verify organizaonal policies, procedures, and processes for treatment of GAI
security and resiliency risks.
Informaon Security
MS-2.7-024
Verify vendor documentaon for data and soware security controls.
Informaon Security, Value Chain
and Component Integraon
MS-2.7-025
Work with domain experts to capture stakeholder condence in GAI system
security and perceived eecveness related to content provenance.
Informaon Integrity, Informaon
Security
AI Actors: AI Deployment, AI Impact Assessment, Domain Experts, Operaon and Monitoring, TEVV
1
MEASURE 2.8: Risks associated with transparency and accountability – as idened in the MAP funcon – are examined and
documented.
Acon ID
Acon
Risks
MS-2.8-001
Compile and communicate stascs on policy violaons, take-down requests,
intellectual property infringement, and informaon integrity for organizaonal
GAI systems: Analyze transparency reports across demographic groups,
languages groups, and other segments relevant to the deployment context.
Informaon Integrity, Intellectual
Property, Toxicity, Bias, and
Homogenizaon
MS-2.8-002
Document the instrucons given to data annotators or AI red-teamers.
MS-2.8-003
Document where in the data pipeline human labor is being used.
MS-2.8-004
Establish a mechanism for appealing usage policy violaons.
MS-2.8-005
Maintain awareness of AI regulaons and standards in relevant jurisdicons
related to GAI systems and content provenance.
Informaon Integrity
MS-2.8-006
Measure the eecveness or accessibility of procedures to appeal adverse,
harmful, or incorrect outcomes from GAI systems.
Human AI Conguraon, Toxicity,
Bias, and Homogenizaon,
Dangerous or Violent
Recommendaons
44
MS-2.8-007
Review and consider GAI system transparency arfacts such as impact
assessments, system cards, model cards, and tradional risk management
documentaon as part of organizaonal decision making.
MS-2.8-008
Review licenses, patents, or other intellectual property rights pertaining to
informaon in system training data.
Intellectual Property
MS-2.8-009
Track AI actor decisions along the lifecycle to determine sources of systemic and
cognive bias and idenfy management and migaon approaches.
Human AI Conguraon, Toxicity,
Bias, and Homogenizaon
MS-2.8-010
Use interpretable machine learning techniques to make AI processes and
outcomes more transparent, and easier to understand how decisions are made.
MS-2.8-011
Use technologies such as blockchain and digital signatures to enable the
documentaon of each instance where content is generated, modied, or shared
to provide a tamper-proof history of the content, promote transparency, and
enable traceability. Robust version control systems can also be applied to track
changes across the AI lifecycle over me.
Informaon Integrity
MS-2.8-012
Verify adequacy of GAI system user instrucons through user tesng.
Human AI Conguraon
MS-2.8-013
Verify that accurate informaon about GAI capabilies, opportunies, risks, and
potenal negave impacts are available on websites, press releases,
organizaonal reports, social media, and public communicaon channels.
MS-2.8-014
Verify the adequacy of feedback funconality in system user interfaces.
Human AI Conguraon
MS-2.8-015
Verify the adequacy of redress processes for severe GAI system impacts.
AI Actors: AI Deployment, AI Impact Assessment, Domain Experts, Operaon and Monitoring, TEVV
1
MEASURE 2.9: The AI model is explained, validated, and documented, and AI system output is interpreted within its context – as
idened in the MAP funcon – to inform responsible use and governance.
Acon ID
Acon
Risks
MS-2.9-001
Apply and document ML explanaon results such as: Analysis of embeddings,
Counterfactual prompts, Gradient-based aribuons, Model
compression/surrogate models, Occlusion/term reducon.
MS-2.9-002
Apply transparency tools such as Datasheets, Data Nutrion Labels, and Model
Cards to record explanatory and validaon informaon.
45
MS-2.9-003
Document GAI model details including: Proposed use and organizaonal value;
Assumpons and limitaons, Data collecon methodologies; Data provenance;
Data quality; Model architecture (e.g., convoluonal neural network,
transformers, etc.); Opmizaon objecves; Training algorithms; RLHF
approaches; Fine-tuning approaches; Evaluaon data; Ethical consideraons;
Legal and regulatory requirements.
Informaon Integrity, Toxicity,
Bias, and Homogenizaon
MS-2.9-004
Measure and report: Comparisons to alternave approaches and benchmarks;
Outcomes across demographic groups, languages groups, and other segments
relevant to the deployment context; Reproducibility of outcomes or internal
mechanisms; Sensivity analysis and stress-tesng results.
Toxicity, Bias, and Homogenizaon
MS-2.9-005
Verify calibraon and robustness of applied explanaon techniques and
document their assumpons and limitaons.
AI Actors: AI Deployment, AI Impact Assessment, Domain Experts, End-Users, Operaon and Monitoring, TEVV
1
*MEASURE 2.10: Privacy risk of the AI system – as idened in the MAP funcon – is examined and documented.
Acon ID
Acon
Risks
MS-2.10-001
Collaborate with other AI actors, domain experts, and legal advisors to evaluate
the impact of GAI applicaons on privacy related to the GAI system and its
content provenance, in domains such as healthcare, nance, and criminal jusce.
Data Privacy, Human AI
Conguraon, Informaon
Integrity
MS-2.10-002
Conduct AI red-teaming to assess GAI system risks such as: Outpung of training
data samples, and subsequent reverse engineering, model extracon, and
membership inference risks; Revealing biometric, condenal, copyrighted,
licensed, patented, personal, proprietary, sensive, or trade-marked; Tracking or
revealing locaon informaon of users or members of training datasets.
Human AI Conguraon,
Intellectual Property
MS-2.10-003
Document collecon, use, management, and disclosure of biometric,
condenal, copyrighted, licensed, patented, personal, proprietary, sensive, or
trade-marked informaon in datasets, in accordance with privacy and data
governance policies and data privacy laws.
Data Privacy, Human AI
Conguraon, Intellectual
Property
MS-2.10-004
Engage directly with end-users and other stakeholders to understand their
expectaons and concerns regarding content provenance. Use this feedback to
guide the design of provenance-tracking mechanisms.
Human AI Conguraon,
Informaon Integrity
MS-2.10-005
Establish and document protocols (authorizaon, duraon, type) and access
controls for training sets or producon data containing biometric, condenal,
copyrighted, licensed, patented, personal, proprietary, sensive, or trade-marked
informaon, in accordance with privacy and data governance policies and data
privacy laws.
Data Privacy, Intellectual Property
46
MS-2.10-006
Implement consent mechanisms that are demonstrated to allow users to
understand and control how their data is used in the GAI system and its content
provenance.
Data Privacy, Human AI
Conguraon, Informaon
Integrity
MS-2.10-007
Implement mechanisms to monitor, periodically review and document the
provenance data to detect any inconsistencies or unauthorized modicaons.
Informaon Integrity, Informaon
Security
MS-2.10-008
Implement zero-knowledge proofs to balance transparency with privacy and
allow vericaon of claims about content without exposing the actual data.
Data Privacy
MS-2.10-009
Leverage technologies such as blockchain to document the origin of, and any
subsequent modicaons to, generated content to enhance transparency and
provide a secure method for provenance tracking.
Informaon Integrity, Informaon
Security
MS-2.10-010
Track training, input and output items that contains personally idenable
informaon.
Data Privacy
MS-2.10-011
Verify compliance with data protecon regulaons.
Data Privacy
MS-2.10-012
Verify deduplicaon of training data samples.
Toxicity, Bias, and Homogenizaon
MS-2.10-013
Verify organizaonal policies, procedures, and processes for GAI systems address
fundamental tenets of data privacy, e.g., Anonymizaon of private data; Consent
to use data for targeted purposes or applicaons; Data collecon and use in
accordance with legal requirements and organizaonal policies; Reasonable data
retenon limits and requirements; User data deleon and reccaon requests.
Data Privacy, Human AI
Conguraon
MS-2.10-014
Verify that biometric, condenal, copyrighted, licensed, patented, personal,
proprietary, sensive, or trade-marked informaon are removed from GAI
training data.
Intellectual Property
AI Actors: AI Deployment, AI Impact Assessment, Domain Experts, End-Users, Operaon and Monitoring, TEVV
1
*MEASURE 2.11: Fairness and bias – as idened in the MAP funcon – are evaluated and results are documented.
Acon ID
Acon
Risks
MS-2.11-001
Apply use-case appropriate benchmarks (e.g., Bias Benchmark Quesons, Real
Toxicity Prompts, Winogender) to quanfy systemic bias, stereotyping,
denigraon, and toxicity in GAI system outputs; Document assumpons and
limitaons of benchmarks relave to in-context deployment environment.
Toxicity, Bias, and Homogenizaon
MS-2.11-002
Assess content moderaon and other output ltering technologies or processes
for risks arising from human, systemic, and stascal/computaonal biases.
Toxicity, Bias, and Homogenizaon
47
MS-2.11-003
Conduct fairness assessments to measure systemic bias. Measure GAI system
performance across demographic groups and subgroups, addressing both quality
of service and any allocaon of services and resources. Idenfy types of harms,
including harms in resource allocaon, representaonal, quality of service,
stereotyping, or erasure, Idenfy across, within, and intersecng groups that
might be harmed; Quanfy harms using: eld tesng with sub-group populaons
to determine likelihood of exposure to generated content exhibing harmful
bias, AI red-teaming with counterfactual and low-context (e.g., “leader,” “bad
guys”) prompts. For ML pipelines or business processes with categorical or
numeric outcomes that rely on GAI, apply general fairness metrics (e.g.,
demographic parity, equalized odds, equal opportunity, stascal hypothesis
tests), to the pipeline or business outcome where appropriate; Custom, context-
specic metrics developed in collaboraon with domain experts and aected
communies; Measurements of the prevalence of denigraon in generated
content in deployment (e.g., sub-sampling a fracon of trac and manually
annotang denigrang content); Analyze quaned harms for contextually
signicant dierences across groups, within groups, and among intersecng
groups; Rene idencaon of within-group and interseconal group disparies,
Evaluate underlying data distribuons and employ sensivity analysis during the
analysis of quaned harms, Evaluate quality metrics including dierenal
output across groups, Consider biases aecng small groups, within-group or
interseconal communies, or single individuals.
Toxicity, Bias, and
Homogenizaon, Dangerous or
Violent Recommendaons
MS-2.11-004
Evaluate pracces along the lifecycle to idenfy potenal sources of human-
cognive bias such as availability, observaonal, groupthink, funding, and
conrmaon bias, and to make implicit decision-making processes more explicit
and open to invesgaon.
Toxicity, Bias, and Homogenizaon
MS-2.11-005
Idenfy the classes of individuals, groups, or environmental ecosystems which
might be impacted by GAI systems through direct engagement with potenally
impacted communies.
Environmental, Toxicity, Bias, and
Homogenizaon
MS-2.11-006
Monitor for representaonal, nancial, or other harms aer GAI systems are
deployed.
Toxicity, Bias, and
Homogenizaon, Dangerous or
Violent Recommendaons
MS-2.11-007
Review, document, and measure sources of bias in training and TEVV data:
Dierences in distribuons of outcomes across and within groups, including
intersecng groups; Completeness, representaveness, and balance of data
sources; demographic group and subgroup coverage in GAI system training data;
Forms of latent systemic bias in images, text, audio, embeddings, or other
complex or unstructured data; Input data features that may serve as proxies for
demographic group membership (i.e., image metadata, language dialect) or
otherwise give rise to emergent bias within GAI systems; The extent to which the
digital divide may negavely impact representaveness in GAI system training
and TEVV data; Filtering of hate speech and toxicity in GAI system training data;
Prevalence of GAI-generated data in GAI system training data.
Toxicity, Bias, and
Homogenizaon, Unknowns
48
MS-2.11-008
Track and document AI actor credenals and qualicaons.
Human AI Conguraon
MS-2.11-009
Verify accessibility funconality; verify funconality and meliness of
accommodaons and opt-out funconality or processes.
Human AI Conguraon, Toxicity,
Bias, and Homogenizaon
MS-2.11-010
Verify bias management in periodic model updates; test and recalibrate with
updated and more representave data to manage bias within acceptable
tolerances.
Toxicity, Bias, and Homogenizaon
MS-2.11-011
Verify training is not homogenous GAI-produced data in order to migate
concerns of model collapse.
Toxicity, Bias, and Homogenizaon
AI Actors: AI Deployment, AI Impact Assessment, Aected Individuals and Communies, Domain Experts, End-Users, Operaon
and Monitoring, TEVV
1
MEASURE 2.12: Environmental impact and sustainability of AI model training and management acvies – as idened in the MAP
funcon – are assessed and documented.
Acon ID
Acon
Risks
MS-2.12-001
Assess safety to physical environments when deploying GAI systems.
Dangerous or Violent
Recommendaons
MS-2.12-002
Document ancipated environmental impacts of model development,
maintenance, and deployment in product design decisions.
Environmental
MS-2.12-003
Measure or esmate environmental impacts (e.g., energy and water
consumpon) for training, ne tuning, and deploying models: Verify tradeos
between resources used at inference me versus addional resources required
at training me.
Environmental
MS-2.12-004
Track and document connuous improvement processes that enhance
eecveness of risk measurement for GAI environmental impacts and
sustainability.
Environmental
MS-2.12-005
Verify eecveness of carbon capture or oset programs, and address green-
washing risks.
Environmental
AI Actors: AI Deployment, AI Impact Assessment, Domain Experts, Operaon and Monitoring, TEVV
2
49
MEASURE 2.13: Eecveness of the employed TEVV metrics and processes in the MEASURE funcon are evaluated and
documented.
Acon ID
Acon
Risks
MS-2.13-001
Create measurement error models for pre-deployment metrics to demonstrate
construct validity for each metric (i.e., does the metric eecvely operaonalize
the desired concept): Measure or esmate, and document, biases or stascal
variance in applied metrics or structured human feedback processes; Adhere to
applicable laws and regulaons when operaonalizing models in high-volume
sengs (e.g., toxicity classiers and automated content lters); Leverage domain
experse when modeling complex societal constructs such as toxicity.
Confabulaon, Informaon
Integrity, Toxicity, Bias, and
Homogenizaon
MS-2.13-002
Document measurement and structured public feedback processes applied to
organizaonal GAI systems in a centralized repository (i.e., organizaonal AI
inventory).
MS-2.13-003
Review GAI system metrics and associated pre-deployment processes to
determine their ability to sustain system improvements, including the
idencaon and removal of errors, harms, and negave impacts.
Confabulaon, Informaon
Integrity, Dangerous or Violent
Recommendaons
AI Actors: AI Deployment, Operaon and Monitoring, TEVV
1
*MEASURE 3.1: Approaches, personnel, and documentaon are in place to regularly idenfy and track exisng, unancipated, and
emergent AI risks based on factors such as intended and actual performance in deployed contexts.
Acon ID
Acon
Risks
MS-3.1-001
Assess completeness of known use cases and expected performance of inputs,
such as third-party data or upstream AI systems, or the performance of
downstream systems which use the outputs of the GAI system, directly or
indirectly, through engagement and outreach with AI Actors.
Human AI Conguraon, Value
Chain and Component
Integraon, Toxicity, Bias, and
Homogenizaon
MS-3.1-002
Compare intended use and expected performance of GAI systems across all
relevant contexts.
50
MS-3.1-003
Elicit and track feedback for previously unknown uses of the GAI systems.
AI Actors: AI Impact Assessment, Operaon and Monitoring, TEVV
1
MEASURE 3.2: Risk tracking approaches are considered for sengs where AI risks are dicult to assess using currently available
measurement techniques or where metrics are not yet available.
Acon ID
Acon
Risks
MS-3.2-001
Determine if available GAI system risk measurement approaches are applicable
to the GAI system use contexts.
MS-3.2-002
Document the rate of occurrence and severity of GAI harms to the organizaon
and to external AI actors.
Human AI Conguraon
MS-3.2-003
Establish processes for idenfying emergent GAI system risks with external AI
actors.
Human AI Conguraon,
Unknowns
MS-3.2-004
Idenfy measurement approaches for tracking GAI system risks if none exist.
AI Actors: AI Impact Assessment, Domain Experts, Operaon and Monitoring, TEVV
2
*MEASURE 3.3: Feedback processes for end users and impacted communies to report problems and appeal system outcomes are
established and integrated into AI system evaluaon metrics.
Acon ID
Acon
Risks
MS-3.3-001
Conduct impact assessments on how AI-generated content might aect dierent
social, economic, and cultural groups.
Toxicity, Bias, and Homogenizaon
MS-3.3-002
Conduct studies to understand how end users perceive and interact with GAI
content related to content provenance within context of use. Assess whether the
content aligns with their expectaons and how they may act upon the
informaon presented.
Human AI Conguraon,
Informaon Integrity
MS-3.3-003
Design evaluaon metrics that include parameters for content provenance
quality, validity, reliability, authencity or origin, and integrity of content.
Informaon Integrity
MS-3.3-004
Evaluate GAI system evaluaon metrics based on feedback from relevant AI
actors.
Human AI Conguraon
51
MS-3.3-005
Evaluate potenal biases and stereotypes that could emerge from the AI-
generated content using appropriate methodologies including computaonal
tesng methods as well as evaluang structured feedback input.
Toxicity, Bias, and Homogenizaon
MS-3.3-006
Implement connuous monitoring of AI-generated content and provenance aer
system deployment for various types of dri. Verify GAI systems are adapve and
able to iteravely improve models and algorithms over me.
Informaon Integrity
MS-3.3-007
Integrate human evaluators to assess content quality and relevance.
Human AI Conguraon
MS-3.3-008
Provide input for training materials about the capabilies and limitaons of GAI
systems related to content provenance for AI actors, other professionals, and the
public about the societal impacts of AI and the role of diverse and inclusive
content generaon.
Human AI Conguraon,
Informaon Integrity, Toxicity,
Bias, and Homogenizaon
MS-3.3-009
Record and integrate structured feedback about content provenance from
operators, users, and potenally impacted communies through the use of
methods such as user research studies, focus groups, or community forums.
Acvely seek feedback on generated content quality and potenal biases. Assess
the general awareness among end users and impacted communies about the
availability of these feedback channels.
Human AI Conguraon,
Informaon Integrity, Toxicity,
Bias, and Homogenizaon
MS-3.3-010
Regularly review structured human feedback and GAI system sensors and update
based on the evolving needs and concerns of the impacted communies.
MS-3.3-011
Ulize independent evaluaons to assess content quality and types of potenal
biases and related negave impacts.
Toxicity, Bias, and Homogenizaon
MS-3.3-012
Verify AI actors engaged in GAI TEVV tasks for content provenance reect diverse
demographic and interdisciplinary backgrounds.
Human AI Conguraon,
Informaon Integrity, Toxicity,
Bias, and Homogenizaon
AI Actors: AI Deployment, Aected Individuals and Communies, End-Users, Operaon and Monitoring, TEVV
1
MEASURE 4.2: Measurement results regarding AI system trustworthiness in deployment context(s) and across the AI lifecycle are
informed by input from domain experts and relevant AI actors to validate whether the system is performing consistently as
intended. Results are documented.
Acon ID
Acon
Risks
MS-4.2-001
Conduct adversarial tesng to assess the GAI system’s response to inputs
intended to deceive or manipulate its content provenance and understand
potenal misuse scenarios and unintended outputs.
Informaon Integrity, Informaon
Security
52
MS-4.2-002
Ensure both posive and negave feedback on GAI system funconality is
assessed.
MS-4.2-003
Ensure visible mechanisms to collect users’ feedback are in place, including
systems to report harmful and low quality content.
Human AI Conguraon,
Dangerous or Violent
Recommendaons
MS-4.2-004
Evaluate GAI system content provenance in real-world scenarios to observe its
behavior in praccal environments and reveal issues that might not surface in
controlled and opmized tesng environments.
Informaon Integrity
MS-4.2-005
Evaluate GAI system performance related to content provenance against
predened metrics and update the evaluaon criteria as necessary to adapt to
changing contexts and requirements.
Informaon Integrity
MS-4.2-006
Implement interpretability and explainability methods to evaluate GAI system
decisions related to content provenance and verify alignment with intended
purpose.
Informaon Integrity, Toxicity,
Bias, and Homogenizaon
MS-4.2-007
Integrate structured human feedback results into calibraon and update
processes for tradional measurement approaches (e.g., benchmarks,
performance assessments, data quality measurements).
MS-4.2-008
Measure GAI system inputs and outputs to account for content provenance, data
provenance, source reliability, contextual relevance and coherence, and security
implicaons.
Informaon Integrity, Informaon
Security
MS-4.2-009
Monitor and document instances where human operators or other systems
override the GAI's decisions. Evaluate these cases to understand if the overrides
are linked to issues related to content provenance.
Informaon Integrity
MS-4.2-010
Verify and document the incorporaon of structured human feedback results
into design, implementaon, deployment approval (“go”/“no-go” decisions),
monitoring, and decommission decisions.
MS-4.2-011
Verify that GAI system development and deployment related to content
provenance integrates trustworthiness characteriscs.
Informaon Integrity
MS-4.2-012
Verify the performance of user feedback and recourse mechanisms, including
analyses across various sub-groups.
Human AI Conguraon, Toxicity,
Bias, and Homogenizaon
MS-4.2-013
Work with domain experts to integrate insights from stakeholder feedback
analysis into TEVV metrics and associated acons, and connuous improvement
processes.
MS-4.2-014
Work with domain experts to review feedback from end users, operators, and
potenally impacted individuals and communies—enumerated in the Map
funcon.
Human AI Conguraon
53
MS-4.2-015
Work with domain experts who understand the GAI system context of use to
evaluate the contents validity, relevance, and potenal biases.
Toxicity, Bias, and Homogenizaon
AI Actors: AI Deployment, Domain Experts, End-Users, Operaon and Monitoring, TEVV
1
*MANAGE 1.3: Responses to the AI risks deemed high priority, as idened by the MAP funcon, are developed, planned, and
documented. Risk response opons can include migang, transferring, avoiding, or accepng.
Acon ID
Acon
Risks
MG-1.3-001
Allocate resources and me for GAI risk management acvies, including
planning for incident response and other migaon acvies.
MG-1.3-002
Document residual GAI system risks that persist aer risk migaon or transfer.
MG-1.3-003
Document trade-os, decision processes, and relevant measurement and
feedback results for risks that do not surpass organizaonal risk tolerance.
MG-1.3-004
Migate, transfer, or avoid risks that surpass organizaonal risk tolerances.
MG-1.3-005
Monitor the eecveness of risk controls (e.g., via eld tesng, parcipatory
engagements, performance assessments, user feedback mechanisms).
Human AI Conguraon
AI Actors: AI Deployment, AI Impact Assessment, Operaon and Monitoring
2
MANAGE 2.2: Mechanisms are in place and applied to sustain the value of deployed AI systems.
Acon ID
Acon
Risks
MG-2.2-001
Compare GAI system outputs against pre-dened organizaon risk tolerance,
guidelines, and principles, and review and audit AI-generated content against
these guidelines.
MG-2.2-002
Document training data sources to trace the origin and provenance of AI-
generated content.
Informaon Integrity
MG-2.2-003
Evaluate feedback loops between GAI system content provenance and human
reviewers, and update make updates where needed. Implement real-me
monitoring systems to detect GAI systems and content provenance dri as it
happens.
Informaon Integrity
54
MG-2.2-004
Evaluate GAI content and data for representaonal biases and employ
techniques such as re-sampling, re-ranking, or adversarial training to migate
biases in the generated content.
Informaon Security, Toxicity,
Bias, and Homogenizaon
MG-2.2-005
Filter GAI output for harmful or biased content, potenal misinformaon, and
CBRN-related or NCII content.
CBRN Informaon, Obscene,
Degrading, and/or Abusive
Content, Toxicity, Bias, and
Homogenizaon, Dangerous or
Violent Recommendaons
MG-2.2-006
Implement version control for models and datasets to track changes and
facilitate rollback if necessary.
MG-2.2-007
Incorporate feedback from users, external experts, and the public to adapt the
GAI system and monitoring processes.
Human AI Conguraon
MG-2.2-008
Incorporate human review processes to assess and lter content in accordance
with the socio-cultural knowledge and values of the context of use and to
idenfy limitaons and nuances that automated processes might miss; verify
that human reviewers are trained on content guidelines and potenal biases of
GAI system and its content provenance.
Informaon Integrity, Toxicity,
Bias, and Homogenizaon
MG-2.2-009
Integrate informaon from data management and machine learning security
countermeasures like red teaming, and dierenal privacy, and authencaon
protocols to ensure data and models are protected from potenal risks.
CBRN Informaon, Data Privacy,
Informaon Security
MG-2.2-010
Use feedback from internal and external AI actors, users, individuals, and
communies, to assess impact of AI-generated content.
Human AI Conguraon
MG-2.2-011
Use real-me auding tools such as distributed ledger technology to track and
validate the lineage and authencity of AI-generated data.
Informaon Integrity
MG-2.2-012
Use structured feedback mechanisms to solicit and capture user input about AI-
generated content to detect subtle shis in quality or alignment with community
and societal values.
Human AI Conguraon, Toxicity,
Bias, and Homogenizaon
AI Actors: AI Deployment, AI Impact Assessment, Governance and Oversight, Operaon and Monitoring
1
55
*MANAGE 2.3: Procedures are followed to respond to and recover from a previously unknown risk when it is idened.
Acon ID
Acon
Risks
MG-2.3-001
Develop and update GAI system incident response and recovery plans and
procedures to address the following: Review and maintenance of policies and
procedures to account for newly encountered uses; Review and maintenance of
policies and procedures for detecon of unancipated uses; Verify response
and recovery plans account for the GAI system supply chain; Verify response
and recovery plans are updated for and include necessary details to
communicate with downstream GAI system Actors: Points-of-Contact (POC),
Contact informaon, nocaon format.
Value Chain and Component
Integraon
MG-2.3-002
Maintain protocols to log changes made to GAI systems during incident
response and recovery.
MG-2.3-003
Review, update and maintain incident response and recovery plans to integrate
insights from GAI system use cases and contexts and needs of relevant AI
actors.
Human AI Conguraon
MG-2.3-004
Verify and maintain measurements that GAI systems are operang within
organizaonal risk tolerances post incident.
AI Actors: AI Deployment, Operaon and Monitoring
1
*MANAGE 2.4: Mechanisms are in place and applied, and responsibilies are assigned and understood, to supersede, disengage, or
deacvate AI systems that demonstrate performance or outcomes inconsistent with intended use.
Acon ID
Acon
Risks
MG-2.4-001
Enforce change management processes, and risk and impact assessments
across all intended uses and contexts before deploying GAI system updates.
MG-2.4-002
Establish and maintain communicaon plans to inform AI stakeholders as part
of the deacvaon or disengagement process of a specic GAI system or
context of use, including reasons, workarounds, user access removal, alternave
processes, contact informaon, etc.
Human AI Conguraon
MG-2.4-003
Establish and maintain procedures for escalang GAI system incidents to the
organizaonal risk authority when specic criteria for deacvaon or
disengagement is met for a parcular context of use or for the GAI system as a
whole.
56
MG-2.4-004
Establish and maintain procedures for the remediaon of issues which trigger
incident response processes for the use of a GAI system, and provide
stakeholders melines associated with the remediaon plan.
MG-2.4-005
Establish and regularly review specic criteria that warrants the deacvaon of
GAI systems in accordance with set risk tolerances and appetes.
AI Actors: AI Deployment, Governance and Oversight, Operaon and Monitoring
1
*MANAGE 3.1: AI risks and benets from third-party resources are regularly monitored, and risk controls are applied and
documented.
Acon ID
Acon
Risks
MG-3.1-001
Apply organizaonal risk tolerances and controls (e.g., acquision and
procurement processes; assessing personnel credenals and qualicaons,
performing background checks; ltering GAI input and outputs, grounding,
ne tuning) to third-party GAI resources: Apply organizaonal risk tolerance
to the ulizaon of third-party datasets and other GAI resources; Apply
organizaonal risk tolerances to ne-tuned third-party models; Apply
organizaonal risk tolerance to exisng third-party models adapted to a new
domain; Reassess risk measurements aer ne-tuning third-party GAI
models.
Value Chain and Component
Integraon
MG-3.1-002
Audit GAI system supply chain risks (e.g., data poisoning, malware, other
soware and hardware vulnerabilies; labor pracces; data privacy and
localizaon compliance; geopolical alignment).
Data Privacy, Informaon Security,
Value Chain and Component
Integraon, Toxicity, Bias, and
Homogenizaon
MG-3.1-003
Decommission third-party systems that exceed organizaonal risk tolerances.
Value Chain and Component
Integraon
MG-3.1-004
Idenfy and maintain documentaon for third-party AI systems, and
components, in organizaonal AI inventories.
Value Chain and Component
Integraon
MG-3.1-005
Iniate review of third-party organizaons/developers prior to their use of
GAI models, and during their use of GAI models for their own applicaons, to
monitor for abuse and policy violaons.
Value Chain and Component
Integraon, Toxicity, Bias, and
Homogenizaon, Dangerous or
Violent Recommendaons
MG-3.1-006
Re-assess model risks aer ne-tuning and for any third-party GAI models
deployed for applicaons and/or use cases that were not evaluated in inial
tesng.
Value Chain and Component
Integraon
57
MG-3.1-007
Review GAI training data for CBRN informaon and intellectual property; scan
output for plagiarized, trademarked, patented, licensed, or trade secret
material.
Intellectual Property, CBRN
Informaon
MG-3.1-008
Update acquision and procurement policies, procedures, and processes to
address GAI risks and failure modes.
MG-3.1-009
Use, review, update, and share various transparency arfacts (e.g., system
cards and model cards) for third-party models. Document or retain
documentaon for: Training data content and provenance, methodology,
tesng, validaon, and clear instrucons for use from GAI vendors and
suppliers, Informaon related to third-party informaon security policies,
procedures, and processes.
Informaon Integrity, Informaon
Security, Value Chain and
Component Integraon
AI Actors: AI Deployment, Operaon and Monitoring, Third-party enes
1
MANAGE 3.2: Pre-trained models which are used for development are monitored as part of AI system regular monitoring and
maintenance.
Acon ID
Acon
Risks
MG-3.2-001
Apply explainable AI (XAI) techniques (e.g., analysis of embeddings, model
compression/disllaon, gradient-based aribuons, occlusion/term reducon,
counterfactual prompts, word clouds) as part of ongoing connuous
improvement processes to migate risks related to unexplainable GAI systems.
MG-3.2-002
Document how pre-trained models have been adapted (ne-tuned) for the
specic generave task, including any data augmentaons, parameter
adjustments, or other modicaons. Access to un-tuned (baseline) models must
be available to support debugging the relave inuence of the pre-trained
weights compared to the ne-tuned model weights.
MG-3.2-003
Document sources and types of training data and their origins, potenal biases
present in the data related to the GAI applicaon and its content provenance,
architecture, training process of the pre-trained model including informaon on
hyperparameters, training duraon, and any ne-tuning processes applied.
Informaon Integrity, Toxicity, Bias,
and Homogenizaon
MG-3.2-004
Evaluate user reported problemac content and integrate feedback into system
updates.
Human AI Conguraon,
Dangerous or Violent
Recommendaons
58
MG-3.2-005
Implement content lters to prevent the generaon of inappropriate, harmful,
toxic, false, illegal, or violent content related to the GAI applicaon, including
for CSAM and NCII. These lters can be rule-based or leverage addional
machine learning models to ag problemac inputs and outputs.
Informaon Integrity, Toxicity, Bias,
and Homogenizaon, Dangerous or
Violent Recommendaons,
Obscene, Degrading, and/or
Abusive Content
MG-3.2-006
Implement real-me monitoring processes for analyzing generated content
performance and trustworthiness characteriscs related to content provenance
to idenfy deviaons from the desired standards and trigger alerts for human
intervenon.
Informaon Integrity
MG-3.2-007
Leverage feedback and recommendaons from organizaonal boards or
commiees related to the deployment of GAI applicaons and content
provenance when using third-party pre-trained models.
Informaon Integrity, Value Chain
and Component Integraon
MG-3.2-008
Maintain awareness of relevant laws and regulaons related to content
generaon, data privacy, and user protecons and work in conjuncon with
legal experts to review and assess the potenal liabilies associated with AI-
generated content.
Data Privacy, Intellectual Property
Informaon Integrity
MG-3.2-009
Provide use case examples as material for training employees and stakeholders
about the trustworthiness implicaons of GAI applicaons and content
provenance and to raise awareness about potenal risks in fostering a risk
management culture.
Informaon Integrity
MG-3.2-010
Use human moderaon systems to review generated content in accordance
with human-AI conguraon policies established in the Govern funcon,
aligned with socio-cultural norms in the context of use, and for sengs where
AI models are demonstrated to perform poorly.
Human AI Conguraon
MG-3.2-011
Use organizaonal risk tolerance to evaluate acceptable risks and performance
metrics and decommission or retrain pre-trained models that perform outside
of dened limits.
CBRN Informaon, Confabulaon
AI Actors: AI Deployment, Operaon and Monitoring, Third-party enes
1
59
*MANAGE 4.1: Post-deployment AI system monitoring plans are implemented, including mechanisms for capturing and evaluang
input from users and other relevant AI actors, appeal and override, decommissioning, incident response, recovery, and change
management.
Acon ID
Acon
Risks
MG-4.1-001
Collaborate with external researchers, industry experts, and community
representaves to maintain awareness of emerging best pracces and
technologies in content provenance.
Informaon Integrity, Toxicity, Bias,
and Homogenizaon
MG-4.1-002
Conduct adversarial tesng at a regular cadence; test against various
adversarial inputs and scenarios; idenfy vulnerabilies and assess the AI
system’s resilience to content provenance aacks.
Informaon Integrity, Informaon
Security
MG-4.1-003
Conduct red-teaming exercises to surface failure modes of content provenance
mechanisms. Evaluate the eecveness of red-teaming approaches for
uncovering potenal vulnerabilies and improving overall content provenance.
Informaon Integrity, Informaon
Security
MG-4.1-004
Employ user-friendly channels such as feedback forms, e-mails, or hotlines for
users to report issues, concerns, or unexpected GAI outputs to feed into
monitoring pracces.
Human AI Conguraon
MG-4.1-005
Establish, maintain, and evaluate eecveness of organizaonal processes and
procedures to monitor GAI systems within context of use.
MG-4.1-006
Evaluate the use of senment analysis to gauge user senment regarding GAI
content performance and impact, and work in collaboraon with AI actors
experienced in user research and experience.
Human AI Conguraon
MG-4.1-007
Implement acve learning techniques to idenfy instances where the model
fails or produces unexpected outputs.
Confabulaon
MG-4.1-008
Integrate digital watermarks, blockchain technology, cryptographic hash
funcons, metadata embedding, or other content provenance techniques
within AI-generated content to track its source and manipulaon history.
Informaon Integrity
MG-4.1-009
Measure system outputs related to content provenance at a regular cadence
and integrate insights into monitoring processes.
Informaon Integrity
MG-4.1-010
Monitor GAI training data for representaon of dierent user groups.
Human AI Conguraon, Toxicity,
Bias, and Homogenizaon
MG-4.1-011
Perform periodic review of organizaonal adherence to GAI system monitoring
plans across all contexts of use.
60
MG-4.1-012
Share transparency reports with internal and external stakeholders that detail
steps taken to update the AI system to enhance transparency and
accountability.
MG-4.1-013
Track dataset modicaons for content provenance by monitoring data
deleons, reccaon requests, and other changes that may impact the
veriability of content origins.
Informaon Integrity
MG-4.1-014
Verify risks associated with gaps in GAI system monitoring plans are accepted at
the appropriate organizaonal level.
MG-4.1-015
Verify that AI actors responsible for monitoring reported issues can eecvely
evaluate GAI system performance and its content provenance, and promptly
escalate issues for response.
Human AI Conguraon,
Informaon Integrity
AI Actors: AI Deployment, Aected Individuals and Communies, Domain Experts, End-Users, Human Factors, Operaon and
Monitoring
1
MANAGE 4.2: Measurable acvies for connual improvements are integrated into AI system updates and include regular
engagement with interested pares, including relevant AI actors.
Acon ID
Acon
Risks
MG-4.2-001
Adopt agile development methodologies, and iterave development and
feedback loops to allow for rapid adjustments based on external input related to
content provenance.
Informaon Integrity
MG-4.2-002
Conduct regular audits of GAI systems and publish reports detailing the
performance, feedback received, and improvements made.
MG-4.2-003
Employ explainable AI methods to enhance transparency and interpretability of
GAI content provenance to help AI actors and stakeholders understand how and
why specic content is generated.
Human AI Conguraon,
Informaon Integrity
MG-4.2-004
Employ stakeholder feedback captured in the Map funcon to understand user
experiences and percepons about AI-generated content and its provenance;
include user interacons and feedback from real-world scenarios.
Human AI Conguraon,
Informaon Integrity
MG-4.2-005
Form cross-funconal teams leveraging experse from across the AI lifecycle
including AI designers and developers, socio-technical experts, and experts in
the context of use and idenfy mechanisms to include end users in
consultaons.
Human AI Conguraon
61
MG-4.2-006
Pracce and follow incident response plans for addressing the generaon of
inappropriate or harmful content and adapt processes based on ndings to
prevent future occurrences. Conduct post-mortem analyses of incidents with
relevant AI actors, to understand the root causes and implement prevenve
measures.
Human AI Conguraon,
Dangerous or Violent
Recommendaons
MG-4.2-007
Provide external stakeholders with regular updates about the progress,
challenges, and improvements made based on their feedback through the use of
public venues such as online plaorms and communies, and open-source
iniaves.
Intellectual Property
MG-4.2-008
Simulate various scenarios to test GAI system responses and verify intended
performance across dierent situaons.
MG-4.2-009
Use visualizaons to represent the GAI model behavior to ease non-technical
stakeholders understanding of GAI system funconality.
Human-AI Conguraon
AI Actors: AI Deployment, AI Design, AI Development, Aected Individuals and Communies, End-Users, Operaon and
Monitoring, TEVV
1
*MANAGE 4.3: Incidents and errors are communicated to relevant AI actors, including aected communies. Processes for tracking,
responding to, and recovering from incidents and errors are followed and documented.
Acon ID
Acon
Risks
MG-4.3-001
Conduct aer-acon assessments for GAI system incidents to verify incident
response and recovery processes are followed and eecve.
MG-4.3-002
Establish and maintain change management records and procedures for GAI
systems, including the reasons for each change, how the change could impact
each intended context of use, and step-by-step details of how changes were
planned, tested, and deployed.
MG-4.3-003
Establish and maintain policies and procedures to record and track GAI system
reported errors, near-misses, incidents, and negave impacts.
Confabulaon, Informaon
Integrity
MG-4.3-004
Establish processes and procedures for regular sharing of informaon about
errors, incidents, and negave impacts for each and across contexts, sectors,
and AI actors, including the date reported, the context of use, the number of
reports for each issue, and assessments of impact and severity.
Confabulaon, Human AI
Conguraon, Informaon
Integrity
AI Actors: AI Deployment, Aected Individuals and Communies, Domain Experts, End-Users, Human Factors, Operaon and
Monitoring
2
62
Appendix A. Primary GAI Consideraons
1
The following primary consideraons were derived as overarching themes from the GAI PWG
2
consultaon process. These consideraons (Governance, Pre-Deployment Tesng, Content Provenance,
3
and Incident Disclosure) are relevant to any organizaon designing, developing, and using GAI and also
4
inform the Acons to Manage GAI risks. Informaon included about the primary consideraons is not
5
exhausve, but highlights the most relevant topics derived from the GAI PWG.
6
Acknowledgments: These consideraons could not have been surfaced without the helpful analysis and
7
contribuons from the community and NIST sta GAI PWG leads: George Awad, Luca Belli, Mat Heyman,
8
Yooyoung Lee, Reva Schwartz, and Kyra Yee.
9
Governance
10
A.1.1. Overview
11
Like any other technology system, governance principles and techniques can be used to manage risks
12
related to generave AI models, capabilies, and applicaons. Organizaons may choose to apply their
13
exisng risk ering to GAI systems, or they may opt to revise or update AI system risk levels to address
14
these unique GAI risks. This secon describes how organizaonal governance regimes may be re-
15
evaluated and adjusted for GAI contexts. It also addresses third-party consideraons for governing across
16
the AI value chain.
17
A.1.2. Organizaonal Governance
18
GAI opportunies, risks and long-term performance characteriscs are typically less well-understood
19
than non-generave AI tools. and may be perceived and acted upon by humans in ways that vary greatly.
20
Accordingly, GAI may call for dierent levels of oversight from AI actors or dierent human-AI
21
conguraons in order to manage their risks eecvely. Organizaons’ use of GAI systems may also
22
warrant addional human review, tracking and documentaon, and greater management oversight.
23
AI technology can produce varied outputs in mulple modalies and present many classes of user
24
interfaces. This leads to a broader set of AI actors interacng with GAI systems for widely diering
25
applicaons and contexts of use. These can include data labeling and preparaon, development of GAI
26
models, content moderaon, code generaon and review, text generaon and eding, image and video
27
generaon, summarizaon, search, and chat. These acvies can take place within organizaonal
28
sengs or in the public domain.
29
Organizaons can restrict AI applicaons that cause harm, exceed stated risk tolerances, or that conict
30
with their tolerances or values. Governance tools and protocols that are applied to other types of AI
31
systems can be applied to GAI systems. These plans and acons include:
32
Accessibility and reasonable
33
accommodaons
34
AI actor credenals and qualicaons
35
Alignment to organizaonal values
36
Auding and assessment
37
Change-management controls
38
Commercial use
39
Data provenance
40
63
Data protecon
41
Data retenon
42
Consistency in use of dening key terms
43
Decommissioning
44
Discouraging anonymous use
45
Educaon
46
Impact assessments
47
Incident response
48
Monitoring
49
Opt-outs
50
Risk-based controls
51
Risk mapping and measurement
52
Science-backed TEVV pracces
53
Secure soware development pracces
54
Stakeholder engagement
55
Synthec content detecon and
56
labeling tools and techniques
57
Whistleblower protecons
58
Workforce diversity and
59
interdisciplinary teams
60
61
Establishing acceptable use policies and guidance for the use of GAI in formal human-AI teaming sengs
62
as well as dierent levels of human-AI conguraons can help to decrease risks arising from misuse,
63
abuse, inappropriate repurpose, and misalignment between systems and users. These pracces are just
64
one example of adapng exisng governance protocols for GAI contexts.
65
A.1.3. Third-Party Consideraons
66
Organizaons may seek to acquire, embed, incorporate, or use open source or proprietary third-party
67
GAI models, systems, or generated data for various applicaons across an enterprise. Use of these GAI
68
tools and inputs has implicaons for all funcons of the organizaon – including but not limited to
69
acquision, human resources, legal, compliance, and IT services – regardless of whether they are carried
70
out by employees or third pares. Many of the acons cited above are relevant and opons for
71
addressing third-party consideraons.
72
Third party GAI integraons may give rise to increased intellectual property, data privacy, or informaon
73
security risks, poinng to the need for clear guidelines for transparency and risk management regarding
74
the collecon and use of third-party data for model inputs. Organizaons may consider varying risk
75
controls for foundaon models, ne-tuned models, and embedded tools, enhanced processes for
76
interacng with external GAI technologies or service providers. Organizaons can apply standard or
77
exisng risk controls and processes to proprietary or open-source GAI technologies, data, and third-party
78
service providers, including acquision and procurement due diligence, requests for soware bills of
79
materials (SBOMs), applicaon of service level agreements (SLAs), and statement on standards for
80
aestaon engagement (SSAE) reports to help with third-party transparency and risk management for
81
GAI systems.
82
A.1.4. Pre-Deployment Tesng
83
Appendix B. Overview
84
The diverse ways and contexts in which GAI systems may be developed, used, and repurposed
85
complicates risk mapping and pre-deployment measurement eorts. Robust test, evaluaon, validaon,
86
and vericaon (TEVV) processes can be iteravely applied – and documented – in early stages of the AI
87
64
lifecycle and informed by representave AI actors (see Figure 3 of the AI RMF). Unl new and rigorous
88
early lifecycle TEVV approaches are developed and matured for GAI, organizaons may use
89
recommended “pre-deployment tesng” pracces to measure performance, capabilies, limits, risks,
90
and impacts. This secon describes risk measurement and esmaon as part of pre-deployment TEVV,
91
and examines the state of play for pre-deployment tesng methodologies.
92
Appendix C. Limitaons of Current Pre-deployment Test Approaches
93
Currently available pre-deployment TEVV processes used for GAI applicaons may be inadequate, non-
94
systemacally applied, or fail to reect or mismatched to deployment contexts. For example, the
95
anecdotal tesng of GAI system capabilies through video games or standardized tests designed for
96
humans (e.g., intelligence tests, professional licensing exams) does not guarantee GAI system validity or
97
reliability in those domains. Similarly, jailbreaking or prompt-engineering tests may not systemacally
98
assess validity or reliability risks.
99
Measurement gaps can arise from mismatches between laboratory and real-world sengs. Current
100
tesng approaches oen remain focused on laboratory condions or restricted to benchmark test
101
datasets and in silico techniques that may not extrapolate well to—or directly assess GAI impacts in
102
real world condions. For example, current measurement gaps for GAI make it dicult to precisely
103
esmate its potenal ecosystem-level or longitudinal risks and related polical, social, and economic
104
impacts. Gaps between benchmarks and real-world use of GAI systems may likely be exacerbated due to
105
prompt sensivity and broad heterogeneity of contexts of use.
106
A.1.5. Structured Public Feedback
107
Structured public feedback can be used to evaluate whether GAI systems are performing as intended
108
and to calibrate and verify tradional measurement methods. Examples of structured feedback include,
109
but are not limited to:
110
Parcipatory Engagement Methods: Methods used to solicit feedback from civil society groups,
111
aected communies, and users, including focus groups, small user studies, and surveys.
112
Field Tesng: Methods used to determine how people interact with, consume, use, and make
113
sense of AI-generated informaon, and subsequent acons and eects, including UX, usability,
114
and other structured, randomized experiments.
115
AI Red-teaming: A structured tesng exercise used to probe an AI system to nd aws and
116
vulnerabilies such as inaccurate, harmful, or discriminatory outputs, oen in a controlled
117
environment and in collaboraon with system developers.
118
Informaon gathered from structured public feedback can inform design, implementaon, deployment
119
approval, maintenance, or decommissioning decisions. Results and insights gleaned from these exercises
120
can serve mulple purposes, including improving data quality and preprocessing, bolstering governance
121
decision making, and enhancing system documentaon and debugging pracces. When implemenng
122
feedback acvies, organizaons should follow human subjects research requirements and best
123
pracces such as informed consent and subject compensaon.
124
65
C.1.1.1. Parcipatory Engagement Methods
125
On an ad hoc or more structured basis, organizaons can design and use a variety of channels to engage
126
external stakeholders in product development or review. Focus groups with select experts can provide
127
feedback on a range of issues. Small user studies can provide feedback from representave groups or
128
populaons. Anonymous surveys can be used to poll or gauge reacons to specic features. Parcipatory
129
engagement methods are oen less structured than eld tesng or red teaming, and are more
130
commonly used in early stages of AI or product development.
131
Appendix D. Field Tesng
132
Field tesng involves structured sengs to evaluate risks and impacts and to simulate the condions
133
under which the GAI system will be deployed. Field style tests can be adapted from a focus on user
134
preferences and experiences towards AI risks and impacts – both negave and posive. When carried
135
out with large groups of users, these tests can provide esmaons of the likelihood of risks and impacts
136
in real world interacons.
137
Organizaons may also collect feedback on outcomes, harms, and user experience directly from users in
138
the producon environment aer a model has been released, in accordance with human subject
139
standards such as informed consent and compensaon. Organizaons should follow applicable human
140
subjects research requirements, and best pracces such as informed consent and subject compensaon,
141
when implemenng feedback acvies.
142
Appendix E. AI Red-teaming
143
AI red-teaming exercises are oen conducted in a controlled environment and in collaboraon with AI
144
developers building AI models. AI red-teaming can be performed before or aer AI models or systems
145
are made available to the broader public; this secon focuses on red-teaming in pre-deployment
146
contexts.
147
The quality of AI red-teaming outputs is related to the background and experse of the AI red-team
148
itself. Demographically and interdisciplinarily diverse AI red-teams can be used to idenfy aws in the
149
varying contexts where GAI will be used. For best results, AI red-teams should demonstrate domain
150
experse, and awareness of socio-cultural aspects within the deployment context. AI red-teaming results
151
should be given addional analysis before they are incorporated into organizaonal governance and
152
decision making, policy and procedural updates, and AI risk management eorts.
153
Various types of AI red-teaming may be appropriate, depending on the use case:
154
General Public: Performed by general users (not necessarily AI or technical experts) who are
155
expected to use the model or interact with its outputs, and who bring their own lived
156
experiences and perspecves to the task of AI red-teaming. These individuals may have been
157
provided instrucons and material to complete tasks which may elicit harmful model behaviors.
158
This type of exercise can be more eecve with large groups of AI-teamers.
159
Expert: Performed by specialists with experse in the domain or specic AI red-teaming context
160
of use (e.g., medicine, biotech, cybersecurity).
161
66
Combinaon: In scenarios when it is dicult to idenfy and recruit specialists with sucient
162
domain and contextual experse, AI red-teaming exercises may leverage both expert and
163
general public parcipants. For example, expert AI red-teamers could modify or verify the
164
prompts wrien by general public AI red-teamers. These approaches may also expand coverage
165
of the AI risk aack surface.
166
Human / AI: Performed by GAI in combinaon with specialist or non-specialist human teams.
167
GAI-led red-teaming can be more cost eecve than human red teamers alone. Human or GAI-
168
led AI red-teaming may be beer suited for elicing dierent types of harms.
169
A.1.6. Content Provenance
170
Appendix F. Overview
171
GAI technologies can be leveraged for many applicaons such as content generaon and synthec data.
172
Some aspects of GAI output, such as the producon of deepfake content, can challenge our ability to
173
disnguish human-generated content from AI-generated content. To help manage and migate these
174
risks, digital transparency mechanisms like provenance data tracking can trace the origin and history of
175
content. Provenance data tracking and synthec content detecon can help provide greater informaon
176
about both authenc and synthec content to users, enabling trustworthiness in AI systems. When
177
combined with other organizaonal accountability mechanisms, digital content transparency can enable
178
processes to trace negave outcomes back to their source, improve informaon integrity, and uphold
179
public trust. Provenance data tracking and synthec content detecon mechanisms provide informaon
180
about the origin of content and its history to assist in GAI risk management eorts.
181
Provenance data can include informaon about generated contents creators, date/me of creaon,
182
locaon, modicaons, and sources, including metadata informaon. Metadata can be tracked for text,
183
images, videos, audio, and underlying datasets. Provenance data tracking employs various methods and
184
metrics to assess the authencity, integrity, credibility, intellectual property rights, and potenal
185
manipulaons in GAI output. Some well-known techniques for provenance data tracking include
186
watermarking, metadata tracking, digital ngerprinng, and human authencaon, among others.
187
Appendix G. Provenance Data Tracking Approaches
188
Provenance data tracking techniques for GAI systems can be used to track the lineage and integrity of
189
data inputs, metadata, and AI-generated content. Provenance data tracking records the origin and
190
history for digital content, allowing its authencity to be determined. It consists of techniques to record
191
metadata as well as percepble and impercepble digital watermarks on digital content. Data
192
provenance refers to tracking the origin and history of input data through metadata and digital
193
watermarking techniques. Provenance data tracking processes can include and assist AI actors across the
194
lifecycle who may not have full visibility or control over the various trade-os and cascading impacts of
195
early-stage model decisions on downstream performance and synthec outputs. For example, by
196
selecng a given model to priorize computaonal eciency over accuracy, an AI actor may
197
inadvertently aect provenance tracking reliability. Organizaonal risk management eorts for
198
enhancing content provenance include:
199
67
Tracking provenance of training data and metadata for GAI systems;
200
Documenng provenance data limitaons within GAI systems;
201
Monitoring system capabilies and limitaons in deployment through rigorous TEVV processes;
202
Evaluang how humans engage, interact with, or adapt to GAI content (especially in decision
203
making tasks informed by GAI content), and how they react to applied provenance techniques
204
such as percepble disclosures.
205
Organizaons can document and delineate GAI system objecves and limitaons to idenfy gaps where
206
provenance data may be most useful. For instance, GAI systems used for content creaon may require
207
watermarking techniques to idenfy the source of content or metadata management to trace content
208
origins and modicaons. Further narrowing of GAI task denions to include provenance data can
209
enable organizaons to maximize the ulity of provenance data and risk management eorts.
210
A.1.7. Enhancing Content Provenance through Structured Public Feedback
211
While indirect feedback methods such as automated error collecon systems are useful, they oen lack
212
the context and depth that direct input from end users can provide. Organizaons can leverage feedback
213
approaches described in the Pre-Deployment Tesng secon to capture input from external sources such
214
as through AI red-teaming.
215
Integrang pre- and post-deployment external feedback into the monitoring process of applicaons
216
involving AI-generated content can help enhance awareness of performance changes and migate
217
potenal risks and harms. There are many ways to capture and make use of user feedback – before and
218
aer GAI systems are deployed – to gain insights about authencaon ecacy and vulnerabilies,
219
impacts of adversarial threats, unintended consequences resulng from the ulizaon of content
220
provenance approaches, and other unancipated behavior associated with content manipulaon.
221
Organizaons can track and document the provenance of datasets to idenfy instances in which AI-
222
generated data is a potenal root cause of performance issues with the GAI system.
223
A.1.8. Incident Disclosure
224
Appendix H. Overview
225
AI incidents can be dened as an event, circumstance, or series of events in which the development, use,
226
or malfuncon of one or more AI systems directly or indirectly contributes to idened harms. These
227
harms include injury or damage to the health of an individual or group of people; disrupon of the
228
management and operaon of crical infrastructure; violaons of human rights or a breach of
229
obligaons under applicable law intended to protect legal and labor rights; or damage to property,
230
communies, or the environment. AI incidents can occur in the aggregate (i.e., for systemic
231
discriminaon) or acutely (i.e., for one individual).
232
68
Appendix I. State of AI Incident Tracking and Disclosure
233
Formal channels do not currently exist to report and document AI incidents. However, a number of
234
publicly-available databases have been created to document their occurrence. These reporng channels
235
make decisions on an ad hoc basis about what kinds of incidents to track. Some, for example, track by
236
amount of media coverage.
237
Documenng, reporng, and sharing informaon about GAI incidents can help migate and prevent
238
harmful outcomes by assisng relevant AI actors in tracing impacts to their source. Greater awareness
239
and standardizaon of GAI incident reporng could promote this transparency and improve GAI risk
240
management across the AI ecosystem.
241
Appendix J. Documentaon and Involvement of AI Actors
242
AI actors should be aware of their roles in reporng AI incidents. To beer understand previous incidents
243
and implement measures to prevent similar ones in the future, organizaons could consider developing
244
guidelines for publicly available incident reporng which include informaon about AI actor
245
responsibilies. These guidelines would help AI system operators idenfy GAI incidents across the AI
246
lifecycle and with AI actors regardless of role. Documentaon and review of third party inputs and
247
plugins for GAI systems is especially important for AI actors in the context of incident disclosure; LLM
248
inputs and content delivered through these plugins is oen distributed, with inconsistent or insucient
249
access control.
250
Documentaon pracces including logging, recording, and analyzing GAI incidents can facilitate
251
smoother sharing of informaon with relevant AI actors. Regular informaon sharing, change
252
management records, version history and metadata can also empower AI actors responding to and
253
managing AI incidents.
254
69
Appendix K. References
255
AI Risks and Trustworthiness, NIST Trustworthy & Responsible AI Resource Center. Naonal Instute of
256
Standards and Technology.
257
hps://airc.nist.gov/AI_RMF_Knowledge_Base/AI_RMF/Foundaonal_Informaon/3-sec-characteriscs.
258
AI RMF Playbook. Naonal Instute of Standards and Technology.
259
hps://airc.nist.gov/AI_RMF_Knowledge_Base/Playbook.
260
AI RMF Proles. Naonal Instute of Standards and Technology.
261
hps://airc.nist.gov/AI_RMF_Knowledge_Base/AI_RMF/Core_And_Proles/6-sec-prole.
262
AI Incident Database. hps://incidentdatabase.ai/.
263
AI Risk Management Framework. Naonal Instute of Standards and Technology.
264
hps://www.nist.gov/itl/ai-risk-management-framework.
265
AI Risk Management Framework. Naonal Instute of Standards and Technology. Appendix A:
266
Descripons of AI Actor Tasks, NIST Trustworthy & Responsible AI Resource Center. Naonal Instute of
267
Standards and Technology.
268
hps://airc.nist.gov/AI_RMF_Knowledge_Base/AI_RMF/Appendices/Appendix_A#:~:text=AI%20actors%
269
20in%20this%20category,data%20providers%2C%20system%20funders%2C%20product.
270
AI Risk Management Framework. Naonal Instute of Standards and Technology. Appendix B: How AI
271
Risks Dier from Tradional Soware Risks. Naonal Instute of Standards and Technology.
272
hps://airc.nist.gov/AI_RMF_Knowledge_Base/AI_RMF/Appendices/Appendix_B.
273
Alba, D., (2023) How Fake AI Photo of a Pentagon Blast Went Viral and Briey Spooked Stocks.
274
Bloomberg. hps://www.bloomberg.com/news/arcles/2023-05-22/fake-ai-photo-of-pentagon-blast-
275
goes-viral-trips-stocks-briey.
276
Atherton, D. (2024) Deepfakes and Child Safety: A Survey and Analysis of 2023 Incidents and Responses.
277
AI Incident Database. hps://incidentdatabase.ai/blog/deepfakes-and-child-safety/.
278
Authencang AI-Generated Content (2024). Informaon Technology Industry Council.
279
hps://www.ic.org/policy/ITI_AIContentAuthorizaonPolicy_122123.pdf.
280
Badyal, N. et al., (2023) Intenonal Biases in LLM Responses. arXiv. hps://arxiv.org/pdf/2311.07611.
281
Bing Chat: Data Exltraon Exploit Explained. Embrace The Red.
282
hps://embracethered.com/blog/posts/2023/bing-chat-data-exltraon-poc-and-x/.
283
Bommasani, R. et al., (2022) Picking on the Same Person: Does Algorithmic Monoculture lead to
284
Outcome Homogenizaon? arXiv. hps://arxiv.org/pdf/2211.13972.
285
Boyarskaya, M. et al., (2020) Overcoming Failures of Imaginaon in AI Infused System Development and
286
Deployment. arXiv. hps://arxiv.org/pdf/2011.13416.
287
Browne, D. et al., (2023) Securing the AI Pipeline. Mandiant.
288
hps://www.mandiant.com/resources/blog/securing-ai-pipeline.
289
70
Building a Glossary for Synthec Media Transparency Methods, Part 1: Indirect Disclosure (2023)
290
Partnership on AI. hps://partnershiponai.org/glossary-for-synthec-media-transparency-methods-part-
291
1-indirect-disclosure/.
292
Burgess, M., (2024) Generave AI’s Biggest Security Flaw Is Not Easy to Fix. WIRED.
293
hps://www.wired.com/story/generave-ai-prompt-injecon-hacking/.
294
Burtell, M. et al., (2024) The Surprising Power of Next Word Predicon: Large Language Models
295
Explained, Part 1. Georgetown CSET. hps://cset.georgetown.edu/arcle/the-surprising-power-of-next-
296
word-predicon-large-language-models-explained-part-1/.
297
Carlini, N., et al., (2021) Extracng Training Data from Large Language Models. Usenix.
298
hps://www.usenix.org/conference/usenixsecurity21/presentaon/carlini-extracng.
299
Carlini, N. et al., (2023) Quanfying Memorizaon Across Neural Language Models. ICLR 2023.
300
hps://arxiv.org/pdf/2202.07646.
301
Carlini, N. et al., (2024) Stealing Part of a Producon Language Model. arXiv.
302
hps://arxiv.org/abs/2403.06634.
303
Chandra, B. et al., (2023) Dismantling the Disinformaon Business of Chinese Inuence Operaons.
304
RAND. hps://www.rand.org/pubs/commentary/2023/10/dismantling-the-disinformaon-business-of-
305
chinese.html.
306
Dahl, M. et al., (2024) Large Legal Ficons: Proling Legal Hallucinaons in Large Language Models. arXiv.
307
hps://arxiv.org/abs/2401.01301.
308
De Angelo, D., (2024) Short, Mid and Long-Term Impacts of AI in Cybersecurity. Palo Alto Networks.
309
hps://www.paloaltonetworks.com/blog/2024/02/impacts-of-ai-in-cybersecurity/.
310
De Freitas, J., et al. (2023) Chatbots and Mental Health: Insights into the Safety of Generave AI. Harvard
311
Business School. hps://www.hbs.edu/ris/Publicaon%20Files/23-011_c1bdd417-f717-47b6-bccb-
312
5438c6e65c1a_f6fd9798-3c2d-4932-b222-056231fe69d7.pdf.
313
Dietvorst, B. et al., (2014) Algorithm Aversion: People Erroneously Avoid Algorithms Aer Seeing Them
314
Err. Journal of Experimental Psychology. hps://markeng.wharton.upenn.edu/wp-
315
content/uploads/2016/10/Dietvorst-Simmons-Massey-2014.pdf.
316
Duhigg, C., (2012) How Companies Learn Your Secrets. New York Times.
317
hps://www.nymes.com/2012/02/19/magazine/shopping-habits.html.
318
Elsayed, G. et al., (2024) Images altered to trick machine vision can inuence humans too. Google
319
DeepMind. hps://deepmind.google/discover/blog/images-altered-to-trick-machine-vision-can-
320
inuence-humans-too/.
321
Epstein, Z. et al., (2023). Art and the science of generave AI. Science.
322
hps://www.science.org/doi/10.1126/science.adh4451.
323
Execuve Order on the Safe, Secure, and Trustworthy Development and Use of Arcial Intelligence
324
(2023) The White House.hps://www.whitehouse.gov/brieng-room/presidenal-
325
acons/2023/10/30/execuve-order-on-the-safe-secure-and-trustworthy-development-and-use-of-
326
arcial-intelligence/.
327
71
Fair Informaon Pracce Principles (FIPPs). FPC. hps://www.fpc.gov/resources/pps/.
328
Generave arcial intelligence (AI) - ITSAP.00.041. (2023) Canadian Centre for Cyber Security.
329
hps://www.cyber.gc.ca/en/guidance/generave-arcial-intelligence-ai-itsap00041.
330
GPT-4 System Card (2023) OpenAI. hps://cdn.openai.com/papers/gpt-4-system-card.pdf.
331
GPT-4 Technical Report (2024) OpenAI. hps://arxiv.org/pdf/2303.08774.
332
Greshake, K. et al., (2023). Not what you've signed up for: Compromising Real-World LLM-Integrated
333
Applicaons with Indirect Prompt Injecon. arXiv. hps://arxiv.org/abs/2302.12173.
334
Feer, M. et al., (2024). Red-Teaming for Generave AI: Silver Bullet or Security Theater? arXiv.
335
hps://arxiv.org/pdf/2401.15897.
336
Haran, R., (2023). Securing LLM Systems Against Prompt Injecon. NVIDIA.
337
hps://developer.nvidia.com/blog/securing-llm-systems-against-prompt-injecon/.
338
Harwell, D., (2023) AI-generated child sex images spawn new nightmare for the web. Washington Post.
339
hps://www.washingtonpost.com/technology/2023/06/19/arcial-intelligence-child-sex-abuse-
340
images/.
341
Hubinger, E. et al, (2024) “Sleeper Agents: Training Decepve LLMs that Persist Through Safety Training,
342
arXiv e-prints. hps://arxiv.org/abs/2401.05566.
343
Jain, S. et al., (2023) Algorithmic Pluralism: A Structural Approach To Equal Opportunity. arXiv.
344
hps://arxiv.org/pdf/2305.08157.
345
Ji, Z. et al (2023) Survey of Hallucinaon in Natural Language Generaon. ACM Comput. Surv. 55, 12,
346
Arcle 248. hps://doi.org/10.1145/3571730
347
Jussupow, E. et al., (2020) Why Are We Averse Towards Algorithms? A Comprehensive Literature Review
348
on Algorithm Aversion. ECIS 2020. hps://aisel.aisnet.org/ecis2020_rp/168/.
349
Katzman, J., et al., (2023) Taxonomizing and measuring representaonal harms: a look at image tagging.
350
AAAI. hps://dl.acm.org/doi/10.1609/aaai.v37i12.26670.
351
Kirchenbauer, J. et al., (2023) A Watermark for Large Language Models. OpenReview.
352
hps://openreview.net/forum?id=aX8ig9X2a7.
353
Kleinberg, J. et al., (May 2021) Algorithmic monoculture and social welfare. PNAS.
354
hps://www.pnas.org/doi/10.1073/pnas.2018340118.
355
Lakatos, S., (2023) A Revealing Picture. Graphika. hps://graphika.com/reports/a-revealing-picture.
356
Lenaerts-Bergmans, B., (2024) Data Poisoning: The Exploitaon of Generave AI. Crowdstrike.
357
hps://www.crowdstrike.com/cybersecurity-101/cyberaacks/data-poisoning/.
358
Liang, W. et al., (2023) GPT detectors are biased against non-nave English writers. arXiv.
359
hps://arxiv.org/abs/2304.02819.
360
Luccioni, A. et al., (2023) Power Hungry Processing: Was Driving the Cost of AI Deployment? arXiv.
361
hps://arxiv.org/pdf/2311.16863.
362
72
Mouton, C. et al., (2024) The Operaonal Risks of AI in Large-Scale Biological Aacks. RAND.
363
hps://www.rand.org/pubs/research_reports/RRA2977-2.html.
364
Nicole, L. et al., (2023) Humans Are Biased. Generave Ai Is Even Worse. Bloomberg.
365
hps://www.bloomberg.com/graphics/2023-generave-ai-bias/.
366
Northcu, C. et al., (2021) Pervasive Label Errors in Test Sets Destabilize Machine Learning Benchmarks.
367
arXiv. hps://arxiv.org/pdf/2103.14749.
368
OECD (2023), "Advancing accountability in AI: Governing and managing risks throughout the lifecycle for
369
trustworthy AI", OECD Digital Economy Papers, No. 349, OECD Publishing, Paris,
370
hps://doi.org/10.1787/2448f04b-en.
371
OECD AI Incidents Monitor. OECD.AI Policy Observatory. hps://oecd.ai/en/incidents-methodology.
372
Padmakumar, V. et al., (2024) Does wring with language models reduce content diversity? ICLR.
373
hps://arxiv.org/pdf/2309.05196.
374
Paresh, D., (2023) ChatGPT Is Cung Non-English Languages Out of the AI Revoluon. WIRED.
375
hps://www.wired.com/story/chatgpt-non-english-languages-ai-revoluon/.
376
Qu, Y. et al., (2023) Unsafe Diusion: On the Generaon of Unsafe Images and Hateful Memes From Text-
377
To-Image Models. arXiv. hps://arxiv.org/pdf/2305.13873.
378
Rafat, K. et al., (2023) Migang carbon footprint for knowledge disllaon based deep learning model
379
compression. PLOS One. hps://journals.plos.org/plosone/arcle?id=10.1371/journal.pone.0285668.
380
Roadmap for Researchers on Priories Related to Informaon Integrity Research and Development
381
(2022) The White House. hps://www.whitehouse.gov/wp-content/uploads/2022/12/Roadmap-
382
Informaon-Integrity-RD-2022.pdf?.
383
Sandbrink, J., (2023) Arcial intelligence and biological misuse: Dierenang risks of language models
384
and biological design tools. arXiv. hps://arxiv.org/pdf/2306.13952.
385
Satariano, A. et al., (2023) The People Onscreen Are Fake. The Disinformaon Is Real. New York Times.
386
hps://www.nymes.com/2023/02/07/technology/arcial-intelligence-training-deepfake.html.
387
Schaul, K. et al., (2024) Inside the secret list of websites that make AI like ChatGPT sound smart.
388
Washington Post. hps://www.washingtonpost.com/technology/interacve/2023/ai-chatbot-learning/.
389
Shelby, R. et al., (2023) Sociotechnical Harms of Algorithmic Systems: Scoping a Taxonomy for Harm
390
Reducon. arXiv. hps://arxiv.org/pdf/2210.05791.
391
Shevlane, T. et al., (2023) Model evaluaon for extreme risks. arXiv. hps://arxiv.org/pdf/2305.15324.
392
Shumailov, I. et al., (2023) The curse of recursion: training on generated data makes models forget. arXiv.
393
hps://arxiv.org/pdf/2305.17493v2.
394
Skaug Sætra, H. et al., (2022). Psychological interference, liberty and technology. Technology in Society.
395
hps://www.sciencedirect.com/science/arcle/pii/S0160791X22001142.
396
Smith, A. et al., (2023) Hallucinaon or Confabulaon? Neuroanatomy as metaphor in Large Language
397
Models. PLOS Digital Health.
398
hps://journals.plos.org/digitalhealth/arcle?id=10.1371/journal.pdig.0000388.
399
73
Soice, E. et al., (2023) Can large language models democraze access to dual-use biotechnology? arXiv.
400
hps://arxiv.org/abs/2306.03809.
401
Staab, R. et al., (2023) Beyond Memorizaon: Violang Privacy via Inference With Large Language
402
Models. arXiv. hps://arxiv.org/pdf/2310.07298
403
Stanford, S. et al., (2023) Whose Opinions Do Language Models Reect? arXiv.
404
hps://arxiv.org/pdf/2303.17548.
405
Strubell, E. et al., (2019) Energy and Policy Consideraons for Deep Learning in NLP. arXiv.
406
hps://arxiv.org/pdf/1906.02243.
407
Thiel, D. (2023) Invesgaon Finds AI Image Generaon Models Trained on Child Abuse. Stanford Cyber
408
Policy Center. hps://cyber.fsi.stanford.edu/news/invesgaon-nds-ai-image-generaon-models-
409
trained-child-abuse.
410
The Toxicity Issue. Jigsaw, Google. hps://current.withgoogle.com/the-current/toxicity/.
411
Tufekci, Z. (2015) Algorithmic Harms Beyond Facebook and Google: Emergent Challenges of
412
Computaonal Agency. hps://ctlj.colorado.edu/wp-content/uploads/2015/08/Tufekci-nal.pdf
413
Turri, V. et al., (2023) Why We Need to Know More: Exploring the State of AI Incident Documentaon
414
Pracces. AAAI/ACM Conference on AI, Ethics, and Society.
415
hps://dl.acm.org/doi/fullHtml/10.1145/3600211.3604700.
416
Urbina, F. et al., (2022) Dual use of arcial-intelligence-powered drug discovery. Nature Machine
417
Intelligence. hps://www.nature.com/arcles/s42256-022-00465-9.
418
Wang, Y. et al., (2023) Do-Not-Answer: A Dataset for Evaluang Safeguards in LLMs. arXiv.
419
hps://arxiv.org/pdf/2308.13387.
420
Wang, X. et al., (2023) Energy and Carbon Consideraons of Fine-Tuning BERT. ACL Anthology.
421
hps://aclanthology.org/2023.ndings-emnlp.607.pdf.
422
Wardle, C. et al., (2017) Informaon Disorder: Toward an interdisciplinary framework for research and
423
policy making. Council of Europe. hps://rm.coe.int/informaon-disorder-toward-an-interdisciplinary-
424
framework-for-researc/168076277c.
425
Weatherbed, J., (2024) Trolls have ooded X with graphic Taylor Swi AI fakes. The Verge.
426
hps://www.theverge.com/2024/1/25/24050334/x-twier-taylor-swi-ai-fake-images-trending.
427
Weidinger, L. et al., (2021) Ethical and social risks of harm from Language Models. arXiv.
428
hps://arxiv.org/pdf/2112.04359.
429
Weidinger, L. et al. (2023) Sociotechnical Safety Evaluaon of Generave AI Systems. arXiv.
430
hps://arxiv.org/pdf/2310.11986.
431
Weidinger, L. et al., (2022) Taxonomy of Risks posed by Language Models. FAccT ’22.
432
hps://dl.acm.org/doi/pdf/10.1145/3531146.3533088.
433
Wu, K. et al., (2024) How well do LLMs cite relevant medical references? An evaluaon framework and
434
analyses. arXiv. hps://arxiv.org/pdf/2402.02008.
435
74
Yin, L. et al., (2024) OpenAI’s GPT Is A Recruiters Dream Tool. Tests Show There’s Racial Bias. Bloomberg.
436
hps://www.bloomberg.com/graphics/2024-openai-gpt-hiring-racial-discriminaon/.
437
Yu, Z. et al., (March 2024) Don’t Listen To Me: Understanding and Exploring Jailbreak Prompts of Large
438
Language Models. arXiv. hps://arxiv.org/html/2403.17336v1
439
Zhang, Y. et al., (2023) Human favorism, not AI aversion: People’s percepons (and bias) toward
440
generave AI, human experts, and human–GAI collaboraon in persuasive content generaon. Judgment
441
and Decision Making. hps://www.cambridge.org/core/journals/judgment-and-decision-
442
making/arcle/human-favorism-not-ai-aversion-peoples-percepons-and-bias-toward-generave-ai-
443
human-experts-and-humangai-collaboraon-in-persuasive-content-
444
generaon/419C4BD9CE82673EAF1D8F6C350C4FA8.
445
Zhang, Y. et al., (2023) Sirens Song in the AI Ocean: A Survey on Hallucinaon in Large Language Models.
446
arXiv. hps://arxiv.org/pdf/2309.01219.
447
Zhao, X. et al., (2023) Provable Robust Watermarking for AI-Generated Text. Semanc Scholar.
448
hps://www.semancscholar.org/paper/Provable-Robust-Watermarking-for-AI-Generated-Text-Zhao-
449
Ananth/75b68d0903af9d9f6e47ce3cf7e1a7d27ec811dc.
450