Set User-Agent: header field in HTTP request for curl downloads

Some servers (for example wikimedia.org) don't allow downloads with the default user agent of libcurl and send HTTP status 403, so OCR for images on such servers fails. Setting the user agent to "Tesseract OCR" allows OCR for images on those servers. Signed-off-by: Stefan Weil <sw@weilnetz.de>
2024-11-27 12:49:35 +08:00 · 2024-01-18 08:38:49 +01:00 · 2024-01-18 08:38:49 +01:00 · 1bb72501dd
commit 1bb72501dd
parent bcd6144ca5
1 changed files with 4 additions and 0 deletions
--- a/src/api/baseapi.cpp
+++ b/src/api/baseapi.cpp
@ -1184,6 +1184,10 @@ bool TessBaseAPI::ProcessPagesInternal(const char *filename, const char *retry_c
      if (curlcode != CURLE_OK) {
        return error("curl_easy_setopt");
      }
+      curlcode = curl_easy_setopt(curl, CURLOPT_USERAGENT, "Tesseract OCR");
+      if (curlcode != CURLE_OK) {
+        return error("curl_easy_setopt");
+      }
      curlcode = curl_easy_perform(curl);
      if (curlcode != CURLE_OK) {
        return error("curl_easy_perform");