mirror of
https://github.com/tesseract-ocr/tesseract.git
synced 2024-11-27 20:59:36 +08:00
7a54f5f950
git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@541 d0cd1f9f-072b-0410-8dd7-cf729c803f20
156 lines
7.7 KiB
Plaintext
156 lines
7.7 KiB
Plaintext
2010-11-29 - V3.01
|
|
* Removed old/dead serialise/deserialze methods on *LISTIZED classes.
|
|
* Total rewrite of DENORM to better encapsulate operation and make
|
|
for potential to extract features from images.
|
|
* Thread-safety! Moved all critical globals and statics to
|
|
members of the appropriate class. Tesseract is now
|
|
thread-safe (multiple instances can be used in parallel
|
|
in multiple threads.) with the minor exception that some
|
|
control parameters are still global and affect all threads.
|
|
* Added Cube, a new recognizer for Arabic. Cube can also be
|
|
used in combination with normal Tesseract for other languages
|
|
with an improvement in accuracy at the cost of (much) lower speed.
|
|
There is no training module for Cube yet.
|
|
* OcrEngineMode in Init replaces AccuracyVSpeed to control cube.
|
|
* Greatly improved segmentation search with consequent accuracy and
|
|
speed improvements, especially for Chinese.
|
|
* Added PageIterator and ResultIterator as cleaner ways to get the
|
|
full results out of Tesseract, that are not currently provided
|
|
by any of the TessBaseAPI::Get* methods.
|
|
All other methods, such as the ETEXT_STRUCT in particular are
|
|
deprecated and will be deleted in the future.
|
|
* ApplyBoxes totally rewritten to make training easier.
|
|
It can now cope with touching/overlapping training characters,
|
|
and a new boxfile format allows word boxes instead of character
|
|
boxes, BUT to use that you have to have already boostrapped the
|
|
language with character boxes. "Cyclic dependency" on traineddata.
|
|
* Auto orientation and script detection added to page layout analysis.
|
|
* Deleted *lots* of dead code.
|
|
* Fixxht module replaced with scalable data-driven module.
|
|
* Output font characteristics accuracy improved.
|
|
* Removed the double conversion at each classification.
|
|
* Upgraded oldest structs to be classes and deprecated PBLOB.
|
|
* Removed non-deterministic baseline fit.
|
|
* Added fixed length dawgs for Chinese.
|
|
* Handling of vertical text improved.
|
|
* Handling of leader dots improved.
|
|
* Table detection greatly improved.
|
|
|
|
2010-09-21 - V3.00
|
|
* Preparations for thread safety:
|
|
* Changed TessBaseAPI methods to be non-static
|
|
* Created a class hierarchy for the directories to hold instance data,
|
|
and began moving code into the classes.
|
|
* Moved thresholding code to a separate class.
|
|
* Added major new page layout analysis module.
|
|
* Added HOCR output (issues 221, 263: thanks to amkryukov).
|
|
* Added Leptonica as main image I/O and handling. Currently optional,
|
|
but in future releases linking with Leptonica will be mandatory.
|
|
* Ambiguity table rewritten to allow definite replacements in place
|
|
of fix_quotes.
|
|
* Added TessdataManager to combine data files into a single file.
|
|
* Some dead code deleted.
|
|
* VC++6 no longer supported. It can't cope with the use of templates.
|
|
* Many more languages added.
|
|
* Doxygenation of most of the function header comments.
|
|
* Added man pages.
|
|
* Added bash completion script (issue 247: thanks to neskiem)
|
|
* Fix integer overview in thresholding (issue 366: thanks to Cyanide.Drake)
|
|
* Add Danish Fraktur support (issues 300, 360: thanks to
|
|
dsl602230@vip.cybercity.dk)
|
|
* Fix file pointer leak (issue 359, thanks to yukihiro.nakadaira)
|
|
* Fix an error using user-words (Issue 345: thanks to max.markin)
|
|
* Fix a memory leak in tablefind.cpp (Issue 342, thanks to zdravco)
|
|
* Fix a segfault due to double fclose (Issue 320, thanks to souther)
|
|
* Fix an automake error (Issue 318, thanks to ichanjz)
|
|
* Fix a Win32 crash on fileFormatIsTiff() (Issues 304, 316, 317, 330, 347,
|
|
349, 352: thanks to nguyenq87, max.markin, zdenop)
|
|
* Fixed a number of errors in newer (stricter) versions of VC++ (Issues
|
|
301, among others)
|
|
|
|
2010-07-19 gettextize <bug-gnu-gettext@gnu.org>
|
|
|
|
* m4/gettext.m4: New file, from gettext-0.17.
|
|
* m4/iconv.m4: New file, from gettext-0.17.
|
|
* m4/lib-ld.m4: New file, from gettext-0.17.
|
|
* m4/lib-link.m4: New file, from gettext-0.17.
|
|
* m4/lib-prefix.m4: New file, from gettext-0.17.
|
|
* m4/nls.m4: New file, from gettext-0.17.
|
|
* m4/po.m4: New file, from gettext-0.17.
|
|
* m4/progtest.m4: New file, from gettext-0.17.
|
|
* Makefile.am (SUBDIRS): Add po.
|
|
(EXTRA_DIST): Add config/config.rpath.
|
|
* configure.ac (AC_CONFIG_FILES): Add po/Makefile.in.
|
|
|
|
June 2006 - V1.0 of open source Tesseract checked-in.
|
|
Sep 7 2006 - V1.01.
|
|
Added mfcpch.cpp and getopt.cpp for VC++.
|
|
Fixed problem with greyscale images and no libtiff.
|
|
Stopped debug window from being used for the usage output.
|
|
Fixed load of inttemp for big-endian architectures.
|
|
Fixed some Mac compilation issues.
|
|
Oct 4 2006 - V1.02
|
|
Removed dependency on Aspirin.
|
|
Fixed a few missing Apache license headers.
|
|
Removed $log.
|
|
Feb 2 2007 - V1.03
|
|
Added mftraining and cntraining.
|
|
Added baseapi with adaptive thresholding for grey and color.
|
|
Fixed many memory leaks.
|
|
Fixed several bugs including lack of use of adaptive classifier.
|
|
Added ifdefs to eliminate graphics code and add embedded platform support.
|
|
Incorporated several patches, including 64-bit builds, Mac builds.
|
|
Minor accuracy improvements.
|
|
May 15 2007 - V1.04
|
|
Added dll exports for Windows.
|
|
Fixed name collisions with stl etc.
|
|
Made some preliminary changes ready for unicodeization.
|
|
Several bug fixes discovered during unicodeization.
|
|
July 02 2007 - V2.00
|
|
Converted internal character handling to UTF8.
|
|
Trained with 6 languages.
|
|
Added unicharset_extractor, wordlist2dawg.
|
|
Added boxfile creation mode.
|
|
Added UNLV regression test capability.
|
|
Fixed problems with copyright and registered symbols.
|
|
Fixed extern "C" declarations problem.
|
|
August 27 2007 - V2.01
|
|
Fixed UTF8 input problems with box file reader.
|
|
Fixed various infinite loops and crashes in dawg code.
|
|
Removed include of config_auto.h from host.h.
|
|
Added automatic wctype encoding to unicharset_extractor.
|
|
Fixed dawg table too full error.
|
|
Removed svn files from tarball.
|
|
Added new functions to tessdll.
|
|
Increased maximum utf8 string in a classification result to 8.
|
|
|
|
January 23 2008 - V2.02
|
|
Improvements to clustering, training and classifier.
|
|
Major internationalization improvements for large-character-set
|
|
languages, eg Kannada.
|
|
Removed some compiler warnings.
|
|
Added multipage tiff support for training and running.
|
|
Updated graphics output to talk to new java-based viewer.
|
|
Added ability to save n-best lists.
|
|
Added leptonica support for more file types.
|
|
Improved Init/End to make them safe.
|
|
Reduced memory use of dictionaries.
|
|
Added some new APIs to TessBaseAPI.
|
|
April 21 2008 - V2.02 (again)
|
|
Fixed namespace collisions with jpeg library (INT32).
|
|
Portability fixes for Windows for new code.
|
|
Updates to autoconf system for new code.
|
|
April 22 2008 - V2.03
|
|
Fixed crash introduced in 2.02.
|
|
Fixed lack of tessembedded.cpp in distribution.
|
|
Added test for leptonica header files and conditional test for lib.
|
|
June 30 2009 - V2.04
|
|
Integrated bug fixes and patches and misc changes for portability.
|
|
Integrated a patch to remove some of the "access" macros.
|
|
Removed dependence on lua from the viewer, speeding it up
|
|
dramatically.
|
|
Fixed the viewer so it compiles and runs properly!
|
|
Specifically fixing issues: 1, 63, 67, 71, 76, 81, 82, 106, 111,
|
|
112, 128, 129, 130, 133, 135, 142, 143, 145, 147, 153, 154, 160,
|
|
165, 170, 175, 177, 187, 192, 195, 199, 201, 205, 209, 108, 169
|