Set User-Agent: header field in HTTP request for curl downloads

Some servers (for example wikimedia.org) don't allow downloads
with the default user agent of libcurl and send HTTP status 403,
so OCR for images on such servers fails.

Setting the user agent to "Tesseract OCR" allows OCR for images
on those servers.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
This commit is contained in:
Stefan Weil 2024-01-18 08:38:49 +01:00
parent bcd6144ca5
commit 1bb72501dd

View File

@ -1184,6 +1184,10 @@ bool TessBaseAPI::ProcessPagesInternal(const char *filename, const char *retry_c
if (curlcode != CURLE_OK) {
return error("curl_easy_setopt");
}
curlcode = curl_easy_setopt(curl, CURLOPT_USERAGENT, "Tesseract OCR");
if (curlcode != CURLE_OK) {
return error("curl_easy_setopt");
}
curlcode = curl_easy_perform(curl);
if (curlcode != CURLE_OK) {
return error("curl_easy_perform");