Amazon Bedrock Runtime

Bookmarks

Amazon Bedrock Runtime Actions
Key Data Types
Inference Concepts
Security
Pricing

Amazon Bedrock Runtime Cheat Sheet

Amazon Bedrock Runtime is a high-performance, serverless API that enables developers to make inference requests to Foundation Models (FMs) available in Amazon Bedrock.
It serves as the primary runtime interface for building generative AI applications, supporting use cases including text generation, multi-turn conversations, real-time streaming, image generation, embeddings, and more. The API is optimized for low latency and high throughput and provides unified access across multiple model providers.

Amazon Bedrock Runtime Actions

The following operations are supported by the Amazon Bedrock Runtime for model inference.

`ApplyGuardrail`	Evaluates user input or model output against a specific Guardrail configuration without invoking a Foundation Model. Useful for independent safety checks, PII redaction, or content filtering.
`Converse`	Sends messages to a specified Amazon Bedrock model via the standard `Converse` interface. This API enables code reusability across different models while retaining the ability to pass model-specific inference parameters.
`ConverseStream`	Use `ConverseStream` to send messages to any supported Amazon Bedrock model and receive the response as a stream. This consistent API allows you to write logic once and apply it to different models, while still accepting unique inference parameters when necessary.
`CountTokens`	Calculates the token count for a specific inference input without invoking the model. Because tokenization varies by model, this operation returns a count that matches the billing for `InvokeModel` or `Converse` requests. Use this to forecast costs, optimize prompt length, and manage application quotas.
`GetAsyncInvoke`	Retrieves the current status and details (e.g., `InProgress`, `Completed`, `Failed`) of a specific asynchronous invocation job using its invocation ARN.
`InvokeModel`	Invokes a specified Amazon Bedrock model to run inference based on the provided prompt and parameters. This operation supports generating text, images, and embeddings and requires the `bedrock:InvokeModel` permission.
`InvokeModelWithBidirectionalStream`	Invokes a specified Amazon Bedrock model via a bidirectional stream that remains open for up to 8 minutes. This operation facilitates multi-turn sessions where audio prompts are processed to return both spoken audio and text transcriptions. Requires `bedrock:InvokeModel` permissions.
`InvokeModelWithResponseStream`	Invokes a specified Amazon Bedrock model to run inference, returning the response as a continuous stream. This operation uses the provided prompt and parameters and requires bedrock:InvokeModelWithResponseStream permissions.
`ListAsyncInvokes`
`StartAsyncInvoke`	Initiates an asynchronous invocation. This operation requires `bedrock:InvokeModel` permissions.

Key Data Types

Message
- Represents a single turn in a conversation, consisting of a role (such as user, assistant, or system) and a list of content blocks. Used as both input and output for Converse and ConverseStream operations.

ContentBlock
- A unit of content within a Message. A ContentBlock can contain text, image (base64-encoded), document (PDF, CSV, etc.), toolUse (for invoking external tools), or toolResult (for returning tool results) depending on model and operation.
SystemContentBlock
- A special content block used to provide system instructions, such as context, persona, or behavioral guidelines for the model. This is independent of the normal conversation flow between user and assistant.
InferenceConfiguration
- An object specifying parameters that control model inference, including randomness (temperature, topP), response length (maxTokens, stopSequences), and other model-specific settings such as presence or frequency penalties (when supported by the model).
GuardrailConfiguration
- Specifies guardrail identifiers and versions to apply responsible AI policies, content moderation, and sensitive information filters to inference requests. Enables enforcement of content safety standards per API reference.
Tool
- Defines an external function or resource that the model may invoke. Each Tool object wraps a ToolSpecification, which describes its behavior and input requirements.
ToolSpecification
- Contains the schema for a Tool, including its name, description, and inputSchema (in JSON format) specifying the parameters required for invoking the tool.
ToolConfiguration
- Configures which tools are available for the model to use during a Converse request. It includes the set of Tool objects and the toolChoice parameter, which determines if the model must use a specific tool, any tool, or can choose automatically.
ToolUseBlock
- A content block produced by the model when it determines a tool should be invoked. It includes the toolUseId, the tool’s name, and the input parameters for the tool.
ToolResultBlock
- A content block returned to the model after external tool execution. Contains the toolUseId (to correlate with the request), the tool’s output content, and a status indicating success or error.
TokenUsage
- An object in the response summarizing the number of inputTokens (sent to the model) and outputTokens (generated by the model), used for monitoring usage and billing.

Inference Concepts

Streaming vs. Synchronous:
- Streaming operations (such as ConverseStream or InvokeModelWithResponseStream) deliver tokens as they are generated, enabling real-time user experiences. Synchronous operations (such as Converse or InvokeModel) return the full response after completion.
Statelessness:
- The runtime is stateless; each request must include the entire conversation or context, as no session data is stored between requests.
Unified API:
- Using standardized operations, such as Converse, decouples your application code from provider-specific model schemas, making it easier to swap between Foundation Models (e.g., Anthropic Claude, Amazon Titan, Meta Llama, Cohere Command) without refactoring integration logic.

Security

VPC Endpoints:
- You can keep API traffic private within your AWS environment by using Interface VPC Endpoints (AWS PrivateLink) for Bedrock Runtime, enhancing security and compliance.
IAM Permissions:
- Fine-grained IAM permissions (such as bedrock:InvokeModel, bedrock:Converse, etc.) allow you to control and audit which models and operations users or roles can access.

Pricing

Input Tokens:
- Charged per 1,000 tokens sent to the model as input for inference.

Output Tokens:
- Charged per 1,000 tokens generated by the model as output during inference.
Guardrails:
- Additional charges may apply for evaluating content against guardrails, priced per text or image unit evaluated when guardrail features are used independently of inference.

Amazon Bedrock Runtime Cheat Sheet References:

https://docs.aws.amazon.com/bedrock/latest/APIReference/API_Operations_Amazon_Bedrock_Runtime.html

https://docs.aws.amazon.com/bedrock/latest/APIReference/API_Types_Amazon_Bedrock_Runtime.html

https://docs.aws.amazon.com/bedrock/latest/userguide/conversation-inference.html

https://aws.amazon.com/bedrock/pricing/

Written by: Cristieneil Ceballos

Cristieneil Ceballos, “Cris” for short, is a Computer Science student at the University of the Philippines Mindanao and an IT Intern at Tutorials Dojo. Passionate about continuous learning, she volunteers and engages with various tech communities—viewing each experience as both a chance to contribute and an opportunity to explore areas she’s interested in.

Amazon Bedrock Runtime

Amazon Bedrock Runtime

Amazon Bedrock Runtime Cheat Sheet

Amazon Bedrock Runtime Actions

Key Data Types

`Message`

`ContentBlock`

`SystemContentBlock`

`InferenceConfiguration`

`GuardrailConfiguration`

`Tool`

`ToolSpecification`

`ToolConfiguration`

`ToolUseBlock`

`ToolResultBlock`

`TokenUsage`

Inference Concepts

Security

Pricing

Amazon Bedrock Runtime Cheat Sheet References:

🌸 25% OFF All Reviewers on our International Women’s Month Sale! Save 10% OFF All Subscriptions Plans & 5% OFF Store Credits/Gift Cards!

Learn AWS with our PlayCloud Hands-On Labs

$2.99 AWS and Azure Exam Study Guide eBooks

New AWS Generative AI Developer Professional Course AIP-C01

Learn GCP By Doing! Try Our GCP PlayCloud

Learn Azure with our Azure PlayCloud

FREE AI and AWS Digital Courses

FREE AWS, Azure, GCP Practice Test Samplers

Subscribe to our YouTube Channel

Follow Us On Linkedin

Written by: Cristieneil Ceballos

Our Community

What our students say about us?

Amazon Bedrock Runtime

Amazon Bedrock Runtime

Amazon Bedrock Runtime Cheat Sheet

Amazon Bedrock Runtime Actions

Key Data Types

Message

ContentBlock

SystemContentBlock

InferenceConfiguration

GuardrailConfiguration

Tool

ToolSpecification

ToolConfiguration

ToolUseBlock

ToolResultBlock

TokenUsage

Inference Concepts

Security

Pricing

Amazon Bedrock Runtime Cheat Sheet References:

🌸 25% OFF All Reviewers on our International Women’s Month Sale! Save 10% OFF All Subscriptions Plans & 5% OFF Store Credits/Gift Cards!

Learn AWS with our PlayCloud Hands-On Labs

$2.99 AWS and Azure Exam Study Guide eBooks

New AWS Generative AI Developer Professional Course AIP-C01

Learn GCP By Doing! Try Our GCP PlayCloud

Learn Azure with our Azure PlayCloud

FREE AI and AWS Digital Courses

FREE AWS, Azure, GCP Practice Test Samplers

Subscribe to our YouTube Channel

Follow Us On Linkedin

Written by: Cristieneil Ceballos

Our Community

What our students say about us?

Did you find our content helpful?

`Message`

`ContentBlock`

`SystemContentBlock`

`InferenceConfiguration`

`GuardrailConfiguration`

`Tool`

`ToolSpecification`

`ToolConfiguration`

`ToolUseBlock`

`ToolResultBlock`

`TokenUsage`