Rate limiting middleware for AspNet/Kestrel #57152

samsp-msft · 2024-08-02T23:03:21Z

Is there an existing issue for this?

I have searched the existing issues

Is your feature request related to a problem? Please describe the problem.

In experimenting with the performance characteristics of YARP, it became clear that there is a balancing act between how many requests should be handled at once, and the RPS, latency, CPU usage and working set. Kestrel + ASP.NET core by default does not try to throttle the number of requests that are handled at once. What this means is that if the server is hit with a high load, it will keep allocating tasks to the threadpool to handle the incoming requests. It may produce more RPS in average, but at the cost of higher latency for each request, higher CPU and potentially exponential working set increases in memory.

Describe the solution you'd like

We should have a rate limiting component that will limit the number of active requests that are processed by ASP.NET. Requests beyond that cap should be queued and then handled in order unless the request/connection times out. Ideally this throttling is done somewhere between Kestrel and the early stages of ASP.NET so that minimal work is done on each new request. AKA before allocating an HttpContext, doing any form of processing the stream for request line, headers etc.

There should be a way to specify a fixed value for this cap of simultaneous requests, a max queue size after which new requests will be rejected, a max queue duration for each waiting request.

In addition we should have an algorithmic solution that will dynamically adjust the max_simultaneous_requests to balance the value against the CPU usage and working set of the application. I suspect that the latter will be most useful in practice in containerized applications. This can then be set at a value below the OOM kill value so that the process can manage its resources.

I am not a mathematician, but I suspect some form of PID algorithm to control the value of the max_simultaneous_requests to limit the CPU usage and working set.

Additional context

Using the request latency as a variable is probably not practical, because while you could use the time ASP.NET core would take to handle the request, the latency would be affected by the queue duration, so any limitations would be of limited use.

dotnet-issue-labeler bot added the area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions label Aug 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rate limiting middleware for AspNet/Kestrel #57152

Rate limiting middleware for AspNet/Kestrel #57152

samsp-msft commented Aug 2, 2024

Rate limiting middleware for AspNet/Kestrel #57152

Rate limiting middleware for AspNet/Kestrel #57152

Comments

samsp-msft commented Aug 2, 2024

Is there an existing issue for this?

Is your feature request related to a problem? Please describe the problem.

Describe the solution you'd like

Additional context