March 6, 2021
 min read

Scalable serving of Python machine learning models

Scaling out the serving of machine learning models in real deployments is hard - wrapping your model with a Flask API does not cut it. Ray Serve is a pure Python, framework and infrastructure-agnostic way of tackling this problem. It takes care of load balancing, batching, and resource management (CPU, GPU), whether in your local environment or in the cloud.