Amazon Elastic Inference
- Allows attaching low-cost GPU-powered inference acceleration to EC2 instances, SageMaker instances, or ECS tasks.
- Reduce machine learning inference costs by up to 75%.
Common use cases
- Computer vision
- Natural language processing
- Speech recognition
- A GPU-powered hardware device provisioned.
- It is not a part of the hardware where your instance is hosted.
- Uses AWS PrivateLink endpoint service to attach to the instance over the network.
- Only a single endpoint service is required in every Availability Zone to connect Elastic Inference accelerator to instances.
- Supports TensorFlow, Apache MXNet, PyTorch, and ONNX models.
- Can provide 1 to 32 trillion floating-point operations per second (TFLOPS) per accelerator.
- The accelerator attached to each instance in an auto-scaling group scales accordingly to your application’s compute demand.
- You are charged for the accelerator hours you consume.