Ends in
00
days
00
hrs
00
mins
00
secs
ENROLL NOW

💪 25% OFF on ALL Reviewers to Start Your 2026 Strong with our New Year, New Skills Sale!

Amazon Bedrock Runtime

Home » AWS » Amazon Bedrock Runtime

Amazon Bedrock Runtime

Amazon Bedrock Runtime Cheat Sheet

  • Amazon Bedrock Runtime is a high-performance, serverless API that enables developers to make inference requests to Foundation Models (FMs) available in Amazon Bedrock.
  • It serves as the primary runtime interface for building generative AI applications, supporting use cases including text generation, multi-turn conversations, real-time streaming, image generation, embeddings, and more. The API is optimized for low latency and high throughput and provides unified access across multiple model providers.

 

Amazon Bedrock Runtime Actions

The following operations are supported by the Amazon Bedrock Runtime for model inference.
ApplyGuardrail
Evaluates user input or model output against a specific Guardrail configuration without invoking a Foundation Model. Useful for independent safety checks, PII redaction, or content filtering.
Converse Sends messages to a specified Amazon Bedrock model via the standard Converse interface. This API enables code reusability across different models while retaining the ability to pass model-specific inference parameters.
ConverseStream Use ConverseStream to send messages to any supported Amazon Bedrock model and receive the response as a stream. This consistent API allows you to write logic once and apply it to different models, while still accepting unique inference parameters when necessary.
CountTokens Calculates the token count for a specific inference input without invoking the model. Because tokenization varies by model, this operation returns a count that matches the billing for InvokeModel or Converse requests. Use this to forecast costs, optimize prompt length, and manage application quotas.
GetAsyncInvoke Retrieves the current status and details (e.g., InProgress, Completed, Failed) of a specific asynchronous invocation job using its invocation ARN.
InvokeModel
Invokes a specified Amazon Bedrock model to run inference based on the provided prompt and parameters. This operation supports generating text, images, and embeddings and requires the bedrock:InvokeModel permission.
InvokeModelWithBidirectionalStream Invokes a specified Amazon Bedrock model via a bidirectional stream that remains open for up to 8 minutes. This operation facilitates multi-turn sessions where audio prompts are processed to return both spoken audio and text transcriptions. Requires bedrock:InvokeModel permissions.
InvokeModelWithResponseStream
Invokes a specified Amazon Bedrock model to run inference, returning the response as a continuous stream. This operation uses the provided prompt and parameters and requires bedrock:InvokeModelWithResponseStream permissions.
ListAsyncInvokes  
StartAsyncInvoke Initiates an asynchronous invocation. This operation requires bedrock:InvokeModel permissions.

 

Key Data Types

  • Message

    • Represents a single turn in a conversation, consisting of a role (such as user, assistant, or system) and a list of content blocks. Used as both input and output for Converse and ConverseStream operations.
  • Tutorials dojo strip
  • ContentBlock

    • A unit of content within a Message. A ContentBlock can contain text, image (base64-encoded), document (PDF, CSV, etc.), toolUse (for invoking external tools), or toolResult (for returning tool results) depending on model and operation.
  • SystemContentBlock

    • A special content block used to provide system instructions, such as context, persona, or behavioral guidelines for the model. This is independent of the normal conversation flow between user and assistant.
  • InferenceConfiguration 

    • An object specifying parameters that control model inference, including randomness (temperature, topP), response length (maxTokens, stopSequences), and other model-specific settings such as presence or frequency penalties (when supported by the model).
  • GuardrailConfiguration

    • Specifies guardrail identifiers and versions to apply responsible AI policies, content moderation, and sensitive information filters to inference requests. Enables enforcement of content safety standards per API reference.
  • Tool

    • Defines an external function or resource that the model may invoke. Each Tool object wraps a ToolSpecification, which describes its behavior and input requirements.
  • ToolSpecification 

    • Contains the schema for a Tool, including its name, description, and inputSchema (in JSON format) specifying the parameters required for invoking the tool.
  • ToolConfiguration 

    • Configures which tools are available for the model to use during a Converse request. It includes the set of Tool objects and the toolChoice parameter, which determines if the model must use a specific tool, any tool, or can choose automatically.
  • ToolUseBlock 

    • A content block produced by the model when it determines a tool should be invoked. It includes the toolUseId, the tool’s name, and the input parameters for the tool.
  • ToolResultBlock

    • A content block returned to the model after external tool execution. Contains the toolUseId (to correlate with the request), the tool’s output content, and a status indicating success or error.
  • TokenUsage 

    • An object in the response summarizing the number of inputTokens (sent to the model) and outputTokens (generated by the model), used for monitoring usage and billing.

 

Inference Concepts

  • Streaming vs. Synchronous:
    • Streaming operations (such as ConverseStream or InvokeModelWithResponseStream) deliver tokens as they are generated, enabling real-time user experiences. Synchronous operations (such as Converse or InvokeModel) return the full response after completion.
  • Statelessness:
    • The runtime is stateless; each request must include the entire conversation or context, as no session data is stored between requests.
  • Unified API:
    • Using standardized operations, such as Converse, decouples your application code from provider-specific model schemas, making it easier to swap between Foundation Models (e.g., Anthropic Claude, Amazon Titan, Meta Llama, Cohere Command) without refactoring integration logic.

 

Security

  • VPC Endpoints:
    • You can keep API traffic private within your AWS environment by using Interface VPC Endpoints (AWS PrivateLink) for Bedrock Runtime, enhancing security and compliance.
  • IAM Permissions:
    • Fine-grained IAM permissions (such as bedrock:InvokeModel, bedrock:Converse, etc.) allow you to control and audit which models and operations users or roles can access.

 

Pricing

    Free AWS Courses
  • Input Tokens:
    • Charged per 1,000 tokens sent to the model as input for inference.
  • Output Tokens:
    • Charged per 1,000 tokens generated by the model as output during inference.
  • Guardrails:
    • Additional charges may apply for evaluating content against guardrails, priced per text or image unit evaluated when guardrail features are used independently of inference.

 

Amazon Bedrock Runtime Cheat Sheet References:

https://docs.aws.amazon.com/bedrock/latest/APIReference/API_Operations_Amazon_Bedrock_Runtime.html

https://docs.aws.amazon.com/bedrock/latest/APIReference/API_Types_Amazon_Bedrock_Runtime.html

https://docs.aws.amazon.com/bedrock/latest/userguide/conversation-inference.html

https://aws.amazon.com/bedrock/pricing/

💪 25% OFF on ALL Reviewers to Start Your 2026 Strong with our New Year, New Skills Sale!

Tutorials Dojo portal

Learn AWS with our PlayCloud Hands-On Labs

$2.99 AWS and Azure Exam Study Guide eBooks

tutorials dojo study guide eBook

New AWS Generative AI Developer Professional Course AIP-C01

AIP-C01 Exam Guide AIP-C01 examtopics AWS Certified Generative AI Developer Professional Exam Domains AIP-C01

Learn GCP By Doing! Try Our GCP PlayCloud

Learn Azure with our Azure PlayCloud

FREE AI and AWS Digital Courses

FREE AWS, Azure, GCP Practice Test Samplers

Subscribe to our YouTube Channel

Tutorials Dojo YouTube Channel

Follow Us On Linkedin

Written by: Cristieneil Ceballos

Cristieneil Ceballos, “Cris” for short, is a Computer Science student at the University of the Philippines Mindanao and an IT Intern at Tutorials Dojo. Passionate about continuous learning, she volunteers and engages with various tech communities—viewing each experience as both a chance to contribute and an opportunity to explore areas she’s interested in.

AWS, Azure, and GCP Certifications are consistently among the top-paying IT certifications in the world, considering that most companies have now shifted to the cloud. Earn over $150,000 per year with an AWS, Azure, or GCP certification!

Follow us on LinkedIn, YouTube, Facebook, or join our Slack study group. More importantly, answer as many practice exams as you can to help increase your chances of passing your certification exams on your first try!

View Our AWS, Azure, and GCP Exam Reviewers Check out our FREE courses

Our Community

~98%
passing rate
Around 95-98% of our students pass the AWS Certification exams after training with our courses.
200k+
students
Over 200k enrollees choose Tutorials Dojo in preparing for their AWS Certification exams.
~4.8
ratings
Our courses are highly rated by our enrollees from all over the world.

What our students say about us?