Previously nginx used to mark backend again as live as soon as fail_timeout
passes (10s by default) since last failure. On the other hand, detecting
dead backend takes up to 60s (proxy_connect_timeout) in typical situation
"backend is down and doesn't respond to any packets". This resulted in
suboptimal behaviour in the above situation (up to 23% of requests were
directed to dead backend with default settings).
More detailed description of the problem may be found here (in Russian):
http://mailman.nginx.org/pipermail/nginx-ru/2011-August/042172.html
Fix is to only allow one request after fail_timeout passes, and
mark backend as "live" only if this request succeeds.
Note that with new code backend will not be marked "live" unless "check"
request is completed, and this may take a while in some specific workloads
(e.g. streaming). This is believed to be acceptable.
The following configuration causes nginx to hog cpu due to infinite loop
in ngx_http_upstream_get_peer():
upstream backend {
server 127.0.0.1:8080 down;
server 127.0.0.1:8080 down;
}
server {
...
location / {
proxy_pass http://backend;
}
}
Make sure we don't loop infinitely in ngx_http_upstream_get_peer() but stop
after resetting peer weights once.
Return 0 if we are stuck. This is guaranteed to work as peer 0 always exists,
and eventually ngx_http_upstream_get_round_robin_peer() will do the right
thing falling back to backup servers or returning NGX_BUSY.
by ngx_http_upstream_create_round_robin_peer(), since the peer lives
only during request so the saved SSL session will never be used again
and just causes memory leak
patch by Maxim Dounin