update ChangeLog;

remove ReleaseNotes (a relevant information are in Changelog file and there is Release note wiki online)
2024-11-24 02:59:07 +08:00 · 2016-03-14 23:03:44 +01:00 · 2016-03-14 23:03:44 +01:00 · ddd3cad8c6
commit ddd3cad8c6
parent 2c675dcc77
3 changed files with 41 additions and 326 deletions
--- a/42
+++ b/42
@ -1,4 +1,42 @@
-2014-02-04 v3.03
+2015-02-17 - V3.04.01
+  * Added OSD renderer for psm 0. Works for single page and multi-page images.
+  * Improve tesstrain.sh script.
+  * Simplify build and run of ScrollView.
+  * Improved PDF output for OS X Preview utility.
+  * INCOMPATIBLE fix to hOCR line height information - commit 134ebc3.
+  * Added option to build Tesseract without Cube OCR engine (-DNO_CUBE_BUILD).
+  * Enable OpenMP support.
+  * Many bug fixes.
+
+2015-07-11 - V3.04.00
+  * Tesseract development is now done with Git and hosted at github.com (Previously we used Subversion as a VCS and code.google.com for hosting).
+  * Tesseract now requires leptonica 1.71 or a higher version.
+  * Removed official support for VS 2008.
+  * Added support for 39 additional scripts/languages, including: amh, asm, aze_cyrl, bod, bos, ceb, cym, dzo, fas, gle, guj, hat, iku, jav, kat, kat_old, kaz, khm, kir, kur, lao, lat, mar, mya, nep, ori, pan, pus, san, sin, srp_latn, syr, tgk, tir, uig, urd, uzb, uzb_cyrl, yid
+  * Major updates to training system as a result of extensive testing on 100 languages.
+  * New training data for over 100 languages
+  * Improved performance with PIC compilation option.
+  * Significant change to invisible font system in pdf output to improve correctness and compatibility with external programs, particularly ghostscript.
+  * Improved font identification.
+  * Major change to improve layout analysis for heavily diacritic languages: Thai, Vietnamese, Kannada, Telugu etc.
+  * Fixed problems with shifted baselines so recognition can recover from layout analysis errors.
+  * Major refactor to improve speed on difficult images, especially when running a heap checker.
+  * Moved params from global in page layout to tesseractclass.
+  * Improved single column layout analysis.
+  * Allow ocr output to multiple formats using tesseract command line executable.
+  * Fixed issues with mixed eng+ara scripts.
+  * Improved script consistency in numbers.
+  * Major refactor of control.cpp to enable line recognition.
+  * Added tesstrain.sh - a master training script.
+  * Added ability to text2image training tool to just list available fonts.
+  * Added ability to text2image to underline words.
+  * Improved efficiency of image processing for PDF output.
+  * Added parameter description for each parameter listed with 'print-parameters' command line option.
+  * Added font info to hOCR output.
+  * Enabled streaming input and output of multi-page documents.
+  * Many bug fixes.
+
+2014-02-04 - V3.03(rc1)
  * Added new training tool text2image to generate box/tif file pairs from
    text and truetype fonts.
  * Added support for PDF output with searchable text.
@ -16,7 +54,7 @@
  * Many bug fixes.
  * More training source data included.

-2012-02-01 - v3.02
+2012-02-01 - V3.02
  * Moved ResultIterator/PageIterator to ccmain.
  * Added Right-to-left/Bidi capability in the output iterators for Hebrew/Arabic.
  * Added paragraph detection in layout analysis/post OCR.
--- a/Makefile.am
+++ b/Makefile.am
@ -22,7 +22,7 @@ if !NO_CUBE_BUILD
 endif
 SUBDIRS += ccmain api . tessdata doc

-EXTRA_DIST = ReleaseNotes README.md\
+EXTRA_DIST = README.md\
 	aclocal.m4 config configure.ac autogen.sh contrib \
 	tesseract.pc.in $(TRAINING_SUBDIR) java doc testing

--- a/323
+++ b/323
@ -1,323 +0,0 @@
-= Tesseract release notes July 11 2015 - V3.04.01 =
-  * Added OSD renderer for psm 0. Works for single page and multi-page images.
-  * Improve tesstrain.sh script.
-  * Simplify build and run of ScrollView.
-  * Improved PDF output for OS X Preview utility.
-  * INCOMPATIBLE fix to hOCR line height information - commit 134ebc3.
-  * Added option to build Tesseract without Cube OCR engine (-DNO_CUBE_BUILD).
-  * Enable OpenMP support.
-  * Many bug fixes.
-
-= Tesseract release notes July 11 2015 - V3.04.00 =
-  * Tesseract development is now done with Git and hosted at github.com (Previously we used Subversion as a VCS and code.google.com for hosting).
-  * Tesseract now requires leptonica 1.71 or a higher version.
-  * Removed official support for VS 2008.
-  * Added support for 39 additional scripts/languages, including: amh, asm, aze_cyrl, bod, bos, ceb, cym, dzo, fas, gle, guj, hat, iku, jav, kat, kat_old, kaz, khm, kir, kur, lao, lat, mar, mya, nep, ori, pan, pus, san, sin, srp_latn, syr, tgk, tir, uig, urd, uzb, uzb_cyrl, yid
-  * Major updates to training system as a result of extensive testing on 100 languages.
-  * New training data for over 100 languages
-  * Improved performance with PIC compilation option.
-  * Significant change to invisible font system in pdf output to improve correctness and compatibility with external programs, particularly ghostscript.
-  * Improved font identification.
-  * Major change to improve layout analysis for heavily diacritic languages: Thai, Vietnamese, Kannada, Telugu etc.
-  * Fixed problems with shifted baselines so recognition can recover from layout analysis errors.
-  * Major refactor to improve speed on difficult images, especially when running a heap checker.
-  * Moved params from global in page layout to tesseractclass.
-  * Improved single column layout analysis.
-  * Allow ocr output to multiple formats using tesseract command line executable.
-  * Fixed issues with mixed eng+ara scripts.
-  * Improved script consistency in numbers.
-  * Major refactor of control.cpp to enable line recognition.
-  * Added tesstrain.sh - a master training script.
-  * Added ability to text2image training tool to just list available fonts.
-  * Added ability to text2image to underline words.
-  * Improved efficiency of image processing for PDF output.
-  * Added parameter description for each parameter listed with 'print-parameters' command line option.
-  * Added font info to hOCR output.
-  * Enabled streaming input and output of multi-page documents.
-  * Many bug fixes.
-
-= Tesseract release notes Feb 4 2014 - V3.03(rc1) =
-  * Added OpenCL support (experimental).
-  * Added new training tool text2image to generate box/tif file pairs from text and truetype fonts.
-  * Added support for PDF output with searchable text.
-  * Removed entire IMAGE class and all code in image directory.
-  * Tesseract executable: support for output to stdout; limited support for one page images from stdin (especially on Windows)
-  * Added Renderer to API to allow document-level processing and output of document      formats, like hOCR, PDF.
-  * Major refactor of word-level recognition, beam search, eliminating dead code.
-  * Refactored classifier to make it easier to add new ones.
-  * Generalized feature extractor to allow feature extraction from greyscale.
-  * Improved sub/superscript treatment.
-  * Improved baseline fit.
-  * Added set_unicharset_properties to training tools.
-  * Many bug fixes.
-  * More training source data included.
-
-= Tesseract release notes Feb 01 2012 - V3.02 =
-  * Added Right-to-left/Bidi capability in the output iterators for Hebrew/Arabic.
-  * Added paragraph detection in layout analysis/post OCR.
-  * Added simultaneous multi-language capability.
-  * Added experimental equation detector.
-  * Major improvements to layout analysis for better image detection, diacritic detection, better textline finding, better tabstop finding.
-  * Improved line detection and removal.
-  * Added fixed pitch chopper for CJK.
-  * Added word bigram correction.
-  * Added new uniform classifier API.
-  * Added new training error counter.
-  * More detailed changes recorded in ChangeLog.
-
-
-= Tesseract release notes Oct 21 2011 - V3.01 =
-  * Thread-safety! Moved all critical globals and statics to members of the appropriate class. Tesseract is now thread-safe (multiple instances can be used in parallel in multiple threads.) with the minor exception that some control parameters are still global and affect all threads.
-  * Added Cube, a new recognizer for Arabic. Cube can also be used in combination with normal Tesseract for other languages with an improvement in accuracy at the cost of (much) lower speed. *There is no training module for Cube yet.*
-  * `OcrEngineMode` in `Init` replaces `AccuracyVSpeed` to control cube.
-  * Greatly improved segmentation search with consequent accuracy and speed improvements, especially for Chinese.
-  * Added `PageIterator` and `ResultIterator` as cleaner ways to get the full results out of Tesseract, that are not currently provided by any of the `TessBaseAPI::Get*` methods. All other methods, such as the `ETEXT_STRUCT` in particular are deprecated and will be deleted in the future.
-  * ApplyBoxes totally rewritten to make training easier. It can now cope with touching/overlapping training characters, and a new boxfile format allows word boxes instead of character boxes, BUT to use that you have to have already boostrapped the language with character boxes. "Cyclic dependency" on traineddata.
-  * Auto orientation and script detection added to page layout analysis.
-  * Deleted *lots* of dead code.
-  * Fixxht module replaced with scalable data-driven module.
-  * Output font characteristics accuracy improved.
-  * Removed the double conversion at each classification.
-  * Upgraded oldest structs to be classes and deprecated PBLOB.
-  * Removed non-deterministic baseline fit.
-  * Added fixed length dawgs for Chinese.
-  * Handling of vertical text improved.
-  * Handling of leader dots improved.
-  * Table detection greatly improved.
-  * Fixed a couple of memory leaks.
-  * Fixed font labels on output text. (Not perfect, but a lot better than before.)
-  * Cleanup and more bug fixes
-  * Special treatments for Hindi.
-  * Support for build in VS2010 with Microsoft Windows SDK for Windows 7 (thanks to Michael Lutz)
-
-Tesseract release notes Sep 30 2010 - V3.00
-  * Preparations for thread safety:
-     * Changed TessBaseAPI methods to be non-static
-     * Created a class hierarchy for the directories to hold instance data,
-       and began moving code into the classes.
-     * Moved thresholding code to a separate class.
-  * Added major new page layout analysis module.
-  * Added HOCR output.
-  * Added Leptonica as main image I/O and handling. Currently optional,
-    but in future releases linking with Leptonica will be mandatory.
-  * Ambiguity table rewritten to allow definite replacements in place
-    of fix_quotes.
-  * Added TessdataManager to combine data files into a single file.
-  * Some dead code deleted.
-  * VC++6 no longer supported. It can't cope with the use of templates.
-  * Many more languages added. 
-  * Doxygenation of most of the function header comments.
-
-Tesseract release notes June 30 2009 - V2.04
-Integrated patches for portability and to remove some of the
-"access" macros.
-Removed dependence on lua from the viewer making it a *lot*
-faster. Also the viewer now compiles and works (on Linux.)
-Fixed the following issues:
-1, 63, 67, 71, 76, 79, 81, 82, 84, 106, 108, 111, 112, 128, 129, 130, 133, 135,
-142, 143, 145, 146, 147, 153, 154, 160, 165, 169, 170, 175, 177, 187, 192,
-195, 199, 201, 205, 209.
-This is the last version to support VC++6!
-This may also be the last version to compile without leptonica!
-Windows version now outputs to stderr by default, fixing a lot of the problems with lack of visible meaningful error messages.
-
-Tesseract release notes April 22 2008 - V2.03
-2.02 was unrunnable, due to a last-minute "simple" change.
-2.03 fixes the problem and also adds an include check for leptonica
-to make it more usable.
-
-Tesseract release notes April 21 2008 - V2.02
-Improvements to clustering, training and classifier.
-Major internationalization improvements for large-character-set
-languages, eg Kannada.
-Removed some compiler warnings.
-Added multipage tiff support for training and running.
-Updated graphics output to talk to new java-based viewer.
-Added ability to save n-best lists.
-Added leptonica support for more file types.
-Improved Init/End to make them safe.
-Reduced memory use of dictionaries.
-Added some new APIs to TessBaseAPI.
-Fixed namespace collisions with jpeg library (INT32).
-Portability fixes for Windows for new code.
-Updates to autoconf system for new code.
-
-Tesseract release notes August 27 2007 - V2.01
-Fixed UTF8 input problems with box file reader.
-Fixed various infinite loops and crashes in dawg code.
-Removed include of config_auto.h from host.h.
-Added automatic wctype encoding to unicharset_extractor.
-Fixed dawg table too full error.
-Removed svn files from tarball.
-Added new functions to tessdll.
-Increased maximum utf8 string in a classification result to 8.
-
-Tesseract release notes July 17, 2007 - V2.00
-
-First release of the International version.
-This version recognizes the following languages:
-English - eng
-French  - fra
-Italian - ita
-German  - deu
-Spanish - spa
-Dutch   - nld
-The language codes follow ISO 639-2. The default language is English.
-To recognize another language:
-tesseract inputimage outputbase -l langcode
-
-To train on a new language, see separate documentation.
-More languages will be appearing over time.
-
-List of changes in this release:
-  Converted internal character handling to UTF8.
-  Trained with 6 languages.
-  Added unicharset_extractor, wordlist2dawg.
-  Added boxfile creation mode.
-  Added UNLV regression test capability.
-  Fixed problems with copyright and registered symbols.
-  Fixed extern "C" declarations problem.
-  Made some improvements to consistency of accuracy across platforms.
-  Added vc++ express support.
-
-Instructions for downloading and building version 2.00.
-Things have changed quite a bit since the previous versions so please read carefully.
-*All users*
-The tarballs are split into pieces.
-tesseract-2.00.tar.gz contains all the source code.
-tesseract-2.00.<lang>.tar.gt contains the data files for <lang>. You need at least one of these or tesseract will not work.
-tesseract-2.00.exe.tar.gz is not for the 'exe' language. It is windows executables. They are built with VC++ express and come with absolutely no warranty. If they work for you then great, otherwise get visual C++ express (and the platform sdk) and build from the source.
-
-*Non-windows users*
-As with 1.04, this version works with make install.
-*New* there is a tesseract.spec for making rpms. (Thanks to Andrew Ziem for the help.)
-It might work with your OS if you know how to do that sort of thing.
-If you are linking to the libraries, as with Ocropus, there is now a single master
-library called libtesseract_full.a.
-
-*Windows users*
-If you are building from the sources, there are still dsw and dsp files for vc++6 and also
-sln and vcproj files for vc++ express.
-The dll has been updated to allow input of non-binary images. (Thanks to Glen of Jetsoft.)
-
-
-Tesseract release notes May 15, 2007 - V1.04.
-
-=== Windows users only ===
-Added a dll interface for windows. Thanks to Glen at Jetsoft for contributing
-this. To use the dll, include tessdll.h, import tessdll.lib and put tessdll.dll
-somewhere where the system can find it. There is also a small dlltest program
-to test the dll. Run with:
-dlltest phototest.tif phototest.txt
-It will output the text from phototest.tif with bounding box information.
-**New for Windows** the distribution now includes tesseract.exe and tessdll.dll
-which *might* work out of the box! There are no guarantees as you need
-VC++6 versions of mfc and crt (at least) for it to work. (Batteries not
-included, and certainly no installshield.)
-
-== Important note for anyone building with make: i.e. anyone except devstudio
-users ==
-This release includes new standardization for the data directory. To enable
-Tesseract to find its data files, you must either:
-./configure
-make
-make install
-to move the data files to the standard place, or:
-export TESSDATA_PREFIX="directory in which your tessdata resides/"
-(or equivalent) in your .profile or whatever or setenv to set the environment
-variable. Note that the directory must end in a /
-HAVING tesseract and tessdata IN THE SAME DIRECTORY DOES NOT WORK ANY MORE.
-
-== All users ==
-Fixed a bunch of name collisions - mostly with stl.
-Made some preliminary changes for unicode compatibility. Includes a new data
-file (unicharset) and renaming of the other data files to eng.* to support
-different languages.
-There are also several other minor bug fixes and portability improvements
-for 64 bit, the latest visual studio compiler etc. Thanks to all who have
-contributed these fixes.
-
-NOTE: This is likely to be the last English-only release!
-Apologies in advance to non-windows users for bloating the distribution with
-windows executables. This will probably get fixed in the next release with
-the multi-language capability, since that will also bloat the distribution.
-
-
-Tesseract release notes Feb 2, 2007 - V1.03.
-Added mftraining and cntraining. Using an image with a box file, tesseract
-generates .tr output files. cntraining runs on the .tr files to make
-normproto that lives in tessdata. mftraining runs on the .tr files to
-make inttemp and pffmtable in tessdata. These are the main data files
-that tesseract uses to recognize characters. At present, the code to make
-dictionary files is not yet available, nor are any sample box files or
-rebuilt inttemp or documentation to create any of these. Recognition is
-still limited to the ASCII set, but when this problem is fixed, documentation
-will follow.
-
-Added a new API with adaptive thresholding for grey and color images.
-See ccmain/baseapi.h/cpp for details. The main program has been converted
-to use the API as an example. See main() in ccmain/tesseractmain.cpp for
-details. The API is designed to make it easy to add subclasses with ability
-to output the bounding boxes etc from the internal structures. The adaptive
-thresholding improves accuracy (most of the time) on non-binary images.
-
-Many memory leaks have been fixed. There are no known leaks left from using
-the API correctly.
-
-The adaptive classifier was not operating correctly. This bug, and several
-others have been fixed, including poor chopping, an indefinite (if not quite
-infinite) loop in the number parser, and a couple of crash bugs. Thanks to
-all that have contributed bugs and bug fixes.
-
-It is now possible to build without any of the graphics support to save code
-size using #define GRAPHICS_DISABLED. There is also a new EMBEDDED define
-for use on operating systems with limited library support.
-
-64-bit and Mac OSX buildability is now included in the mainline source tree.
-Thanks to all that have contributed patches and comments to help with that.
-1.03 is also endian-independent, apart from the tiff i/o, so if you use
-libtiff, the code should run on all platforms, even if you get/create new
-data files of a different endinanness.
-
-Some of the bug fixes improve accuracy, and so do some of the changes to
-DangAmbigs and user-words.
-
-Tesseract release notes, Oct 4 2006 - V1.02.
-Removed dependency on aspirin. *All* code is now licensed under Apache2.0.
-
-Tesseract release notes, Sep 7 2006 - V1.01.
-
-Fixes for this release:
-Added mfcpch.cpp and getopt.cpp for VC++.
-Fixed problem with greyscale images and no libtiff.
-Stopped debug window from being used for the usage output.
-Fixed load of inttemp for big-endian architectures.
-Fixed some Mac compilation issues.
-
-This version should read uncompressed 8 bit grey and 24 bit color tiffs
-without having to have libtiff. It does a dumb threshold though, so don't
-expect good results from poor contrast or images of natural scenes etc.
-
-If you just run tesseract with no command line args you should now get a
-sensible usage message on linux, with or without X-windows.
-
-If you can get it to compile on a PPC Mac, it may now run correctly,
-although not all the build issues are fixed yet.
-
-Building Tesseract:
-Windows:
-Unpack the tar.gz archive
-Open tesseract.dsw in DevStudio (preferably version 6, higher versions will be more difficult)
-Set Win32 - Release as the active configuration.
-Build.
-Copy tesseract.exe from bin.rel up one directory level.
-Run tesseract phototest.tif phototest
-This will create phototest.txt.
-
-Linux:
-Unpack the tar.gz archive
-./configure
-make
-Copy tesseract from ccmain up one directory level (or create a symbolic link)
-Run tesseract phototest.tif phototest
-This will create phototest.txt.