Amazon Bedrock Runtime Cheat Sheet
- Amazon Bedrock Runtime is a high-performance, serverless API that enables developers to make inference requests to Foundation Models (FMs) available in Amazon Bedrock.
- It serves as the primary runtime interface for building generative AI applications, supporting use cases including text generation, multi-turn conversations, real-time streaming, image generation, embeddings, and more. The API is optimized for low latency and high throughput and provides unified access across multiple model providers.
Amazon Bedrock Runtime Actions
ApplyGuardrail |
Evaluates user input or model output against a specific Guardrail configuration without invoking a Foundation Model. Useful for independent safety checks, PII redaction, or content filtering.
|
Converse |
Sends messages to a specified Amazon Bedrock model via the standard Converse interface. This API enables code reusability across different models while retaining the ability to pass model-specific inference parameters. |
ConverseStream |
Use ConverseStream to send messages to any supported Amazon Bedrock model and receive the response as a stream. This consistent API allows you to write logic once and apply it to different models, while still accepting unique inference parameters when necessary. |
CountTokens |
Calculates the token count for a specific inference input without invoking the model. Because tokenization varies by model, this operation returns a count that matches the billing for InvokeModel or Converse requests. Use this to forecast costs, optimize prompt length, and manage application quotas. |
GetAsyncInvoke |
Retrieves the current status and details (e.g., InProgress, Completed, Failed) of a specific asynchronous invocation job using its invocation ARN. |
InvokeModel |
Invokes a specified Amazon Bedrock model to run inference based on the provided prompt and parameters. This operation supports generating text, images, and embeddings and requires the
bedrock:InvokeModel permission. |
InvokeModelWithBidirectionalStream |
Invokes a specified Amazon Bedrock model via a bidirectional stream that remains open for up to 8 minutes. This operation facilitates multi-turn sessions where audio prompts are processed to return both spoken audio and text transcriptions. Requires bedrock:InvokeModel permissions. |
InvokeModelWithResponseStream |
Invokes a specified Amazon Bedrock model to run inference, returning the response as a continuous stream. This operation uses the provided prompt and parameters and requires bedrock:InvokeModelWithResponseStream permissions.
|
ListAsyncInvokes |
|
StartAsyncInvoke |
Initiates an asynchronous invocation. This operation requires bedrock:InvokeModel permissions. |
Key Data Types
-
Message- Represents a single turn in a conversation, consisting of a
role(such as user, assistant, or system) and a list ofcontentblocks. Used as both input and output forConverseandConverseStreamoperations.
- Represents a single turn in a conversation, consisting of a
-
ContentBlock- A unit of content within a Message. A
ContentBlockcan contain text, image (base64-encoded), document (PDF, CSV, etc.),toolUse(for invoking external tools), ortoolResult(for returning tool results) depending on model and operation.
- A unit of content within a Message. A
-
SystemContentBlock- A special content block used to provide system instructions, such as context, persona, or behavioral guidelines for the model. This is independent of the normal conversation flow between user and assistant.
-
InferenceConfiguration- An object specifying parameters that control model inference, including randomness (temperature,
topP), response length (maxTokens,stopSequences), and other model-specific settings such as presence or frequency penalties (when supported by the model).
- An object specifying parameters that control model inference, including randomness (temperature,
-
GuardrailConfigurationSpecifies guardrail identifiers and versions to apply responsible AI policies, content moderation, and sensitive information filters to inference requests. Enables enforcement of content safety standards per API reference.
-
Tool- Defines an external function or resource that the model may invoke. Each Tool object wraps a
ToolSpecification, which describes its behavior and input requirements.
- Defines an external function or resource that the model may invoke. Each Tool object wraps a
-
ToolSpecification- Contains the schema for a
Tool, including its name, description, andinputSchema(in JSON format) specifying the parameters required for invoking the tool.
- Contains the schema for a
-
ToolConfiguration- Configures which tools are available for the model to use during a Converse request. It includes the set of Tool objects and the toolChoice parameter, which determines if the model must use a specific tool, any tool, or can choose automatically.
-
ToolUseBlock- A content block produced by the model when it determines a tool should be invoked. It includes the
toolUseId, the tool’s name, and the input parameters for the tool.
- A content block produced by the model when it determines a tool should be invoked. It includes the
-
ToolResultBlock- A content block returned to the model after external tool execution. Contains the
toolUseId(to correlate with the request), the tool’s output content, and a status indicating success or error.
- A content block returned to the model after external tool execution. Contains the
-
TokenUsage- An object in the response summarizing the number of
inputTokens(sent to the model) andoutputTokens(generated by the model), used for monitoring usage and billing.
- An object in the response summarizing the number of
Inference Concepts
- Streaming vs. Synchronous:
- Streaming operations (such as
ConverseStreamorInvokeModelWithResponseStream) deliver tokens as they are generated, enabling real-time user experiences. Synchronous operations (such asConverseorInvokeModel) return the full response after completion.
- Streaming operations (such as
- Statelessness:
- The runtime is stateless; each request must include the entire conversation or context, as no session data is stored between requests.
- Unified API:
- Using standardized operations, such as
Converse, decouples your application code from provider-specific model schemas, making it easier to swap between Foundation Models (e.g., Anthropic Claude, Amazon Titan, Meta Llama, Cohere Command) without refactoring integration logic.
- Using standardized operations, such as
Security
- VPC Endpoints:
- You can keep API traffic private within your AWS environment by using Interface VPC Endpoints (AWS PrivateLink) for Bedrock Runtime, enhancing security and compliance.
- IAM Permissions:
- Fine-grained IAM permissions (such as
bedrock:InvokeModel,bedrock:Converse, etc.) allow you to control and audit which models and operations users or roles can access.
- Fine-grained IAM permissions (such as
Pricing
- Input Tokens:
- Charged per 1,000 tokens sent to the model as input for inference.
- Output Tokens:
- Charged per 1,000 tokens generated by the model as output during inference.
- Guardrails:
- Additional charges may apply for evaluating content against guardrails, priced per text or image unit evaluated when guardrail features are used independently of inference.
Amazon Bedrock Runtime Cheat Sheet References:
https://docs.aws.amazon.com/bedrock/latest/APIReference/API_Operations_Amazon_Bedrock_Runtime.html
https://docs.aws.amazon.com/bedrock/latest/APIReference/API_Types_Amazon_Bedrock_Runtime.html
https://docs.aws.amazon.com/bedrock/latest/userguide/conversation-inference.html












