top-skimming import from sf.net

git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk/trunk@2 d0cd1f9f-072b-0410-8dd7-cf729c803f20
This commit is contained in:
tmbdev 2007-03-07 20:03:40 +00:00
commit 425d593ebe
539 changed files with 163745 additions and 0 deletions

6
.cvsignore Normal file
View File

@ -0,0 +1,6 @@
BUILD
OWNERS
Makefile
README.google
runautoconf
config_auto.h

8
AUTHORS Normal file
View File

@ -0,0 +1,8 @@
Ray Smith (lead developer) <theraysmith@users.sourceforge.net>
Phil Cheatle
Simon Crouch
Dan Johnson
Mark Seaman
Sheelagh Huddleston
Chris Newton
... and several others.

23
COPYING Normal file
View File

@ -0,0 +1,23 @@
This package contains the Tesseract Open Source OCR Engine.
Orignally developed at Hewlett Packard Laboratories Bristol and
at Hewlett Packard Co, Greeley Colorado, all the code
in this distribution is now licensed under the Apache License:
** Licensed under the Apache License, Version 2.0 (the "License");
** you may not use this file except in compliance with the License.
** You may obtain a copy of the License at
** http://www.apache.org/licenses/LICENSE-2.0
** Unless required by applicable law or agreed to in writing, software
** distributed under the License is distributed on an "AS IS" BASIS,
** WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
** See the License for the specific language governing permissions and
** limitations under the License.
Other Dependencies and Licenses:
================================
The Aspirin/MIGRAINES system is no longer used.
Tesseract can also make use of the libtiff library. (www.libtiff.org)
Without libtiff, Tesseract can only read uncompressed and G3 compressed
TIFF files.

20
ChangeLog Normal file
View File

@ -0,0 +1,20 @@
June 2006 - V1.0 of open source Tesseract checked-in.
Sep 7 2006 - V1.01.
Added mfcpch.cpp and getopt.cpp for VC++.
Fixed problem with greyscale images and no libtiff.
Stopped debug window from being used for the usage output.
Fixed load of inttemp for big-endian architectures.
Fixed some Mac compilation issues.
Oct 4 2006 - V1.02
Removed dependency on Aspirin.
Fixed a few missing Apache license headers.
Removed $log.
Feb 2 2007 - V1.03
Added mftraining and cntraining.
Added baseapi with adaptive thresholding for grey and color.
Fixed many memory leaks.
Fixed several bugs including lack of use of adaptive classifier.
Added ifdefs to eliminate graphics code and add embedded platform support.
Incorporated several patches, including 64-bit builds, Mac builds.
Minor accuracy improvements.

229
INSTALL Normal file
View File

@ -0,0 +1,229 @@
Copyright 1994, 1995, 1996, 1999, 2000, 2001, 2002 Free Software
Foundation, Inc.
This file is free documentation; the Free Software Foundation gives
unlimited permission to copy, distribute and modify it.
Basic Installation
==================
These are generic installation instructions.
The `configure' shell script attempts to guess correct values for
various system-dependent variables used during compilation. It uses
those values to create a `Makefile' in each directory of the package.
It may also create one or more `.h' files containing system-dependent
definitions. Finally, it creates a shell script `config.status' that
you can run in the future to recreate the current configuration, and a
file `config.log' containing compiler output (useful mainly for
debugging `configure').
It can also use an optional file (typically called `config.cache'
and enabled with `--cache-file=config.cache' or simply `-C') that saves
the results of its tests to speed up reconfiguring. (Caching is
disabled by default to prevent problems with accidental use of stale
cache files.)
If you need to do unusual things to compile the package, please try
to figure out how `configure' could check whether to do them, and mail
diffs or instructions to the address given in the `README' so they can
be considered for the next release. If you are using the cache, and at
some point `config.cache' contains results you don't want to keep, you
may remove or edit it.
The file `configure.ac' (or `configure.in') is used to create
`configure' by a program called `autoconf'. You only need
`configure.ac' if you want to change it or regenerate `configure' using
a newer version of `autoconf'.
The simplest way to compile this package is:
1. `cd' to the directory containing the package's source code and type
`./configure' to configure the package for your system. If you're
using `csh' on an old version of System V, you might need to type
`sh ./configure' instead to prevent `csh' from trying to execute
`configure' itself.
Running `configure' takes awhile. While running, it prints some
messages telling which features it is checking for.
2. Type `make' to compile the package.
3. Optionally, type `make check' to run any self-tests that come with
the package.
4. Type `make install' to install the programs and any data files and
documentation.
5. You can remove the program binaries and object files from the
source code directory by typing `make clean'. To also remove the
files that `configure' created (so you can compile the package for
a different kind of computer), type `make distclean'. There is
also a `make maintainer-clean' target, but that is intended mainly
for the package's developers. If you use it, you may have to get
all sorts of other programs in order to regenerate files that came
with the distribution.
Compilers and Options
=====================
Some systems require unusual options for compilation or linking that
the `configure' script does not know about. Run `./configure --help'
for details on some of the pertinent environment variables.
You can give `configure' initial values for configuration parameters
by setting variables in the command line or in the environment. Here
is an example:
./configure CC=c89 CFLAGS=-O2 LIBS=-lposix
*Note Defining Variables::, for more details.
Compiling For Multiple Architectures
====================================
You can compile the package for more than one kind of computer at the
same time, by placing the object files for each architecture in their
own directory. To do this, you must use a version of `make' that
supports the `VPATH' variable, such as GNU `make'. `cd' to the
directory where you want the object files and executables to go and run
the `configure' script. `configure' automatically checks for the
source code in the directory that `configure' is in and in `..'.
If you have to use a `make' that does not support the `VPATH'
variable, you have to compile the package for one architecture at a
time in the source code directory. After you have installed the
package for one architecture, use `make distclean' before reconfiguring
for another architecture.
Installation Names
==================
By default, `make install' will install the package's files in
`/usr/local/bin', `/usr/local/man', etc. You can specify an
installation prefix other than `/usr/local' by giving `configure' the
option `--prefix=PATH'.
You can specify separate installation prefixes for
architecture-specific files and architecture-independent files. If you
give `configure' the option `--exec-prefix=PATH', the package will use
PATH as the prefix for installing programs and libraries.
Documentation and other data files will still use the regular prefix.
In addition, if you use an unusual directory layout you can give
options like `--bindir=PATH' to specify different values for particular
kinds of files. Run `configure --help' for a list of the directories
you can set and what kinds of files go in them.
If the package supports it, you can cause programs to be installed
with an extra prefix or suffix on their names by giving `configure' the
option `--program-prefix=PREFIX' or `--program-suffix=SUFFIX'.
Optional Features
=================
Some packages pay attention to `--enable-FEATURE' options to
`configure', where FEATURE indicates an optional part of the package.
They may also pay attention to `--with-PACKAGE' options, where PACKAGE
is something like `gnu-as' or `x' (for the X Window System). The
`README' should mention any `--enable-' and `--with-' options that the
package recognizes.
For packages that use the X Window System, `configure' can usually
find the X include and library files automatically, but if it doesn't,
you can use the `configure' options `--x-includes=DIR' and
`--x-libraries=DIR' to specify their locations.
Specifying the System Type
==========================
There may be some features `configure' cannot figure out
automatically, but needs to determine by the type of machine the package
will run on. Usually, assuming the package is built to be run on the
_same_ architectures, `configure' can figure that out, but if it prints
a message saying it cannot guess the machine type, give it the
`--build=TYPE' option. TYPE can either be a short name for the system
type, such as `sun4', or a canonical name which has the form:
CPU-COMPANY-SYSTEM
where SYSTEM can have one of these forms:
OS KERNEL-OS
See the file `config.sub' for the possible values of each field. If
`config.sub' isn't included in this package, then this package doesn't
need to know the machine type.
If you are _building_ compiler tools for cross-compiling, you should
use the `--target=TYPE' option to select the type of system they will
produce code for.
If you want to _use_ a cross compiler, that generates code for a
platform different from the build platform, you should specify the
"host" platform (i.e., that on which the generated programs will
eventually be run) with `--host=TYPE'.
Sharing Defaults
================
If you want to set default values for `configure' scripts to share,
you can create a site shell script called `config.site' that gives
default values for variables like `CC', `cache_file', and `prefix'.
`configure' looks for `PREFIX/share/config.site' if it exists, then
`PREFIX/etc/config.site' if it exists. Or, you can set the
`CONFIG_SITE' environment variable to the location of the site script.
A warning: not all `configure' scripts look for a site script.
Defining Variables
==================
Variables not defined in a site shell script can be set in the
environment passed to `configure'. However, some packages may run
configure again during the build, and the customized values of these
variables may be lost. In order to avoid this problem, you should set
them in the `configure' command line, using `VAR=value'. For example:
./configure CC=/usr/local2/bin/gcc
will cause the specified gcc to be used as the C compiler (unless it is
overridden in the site shell script).
`configure' Invocation
======================
`configure' recognizes the following options to control how it
operates.
`--help'
`-h'
Print a summary of the options to `configure', and exit.
`--version'
`-V'
Print the version of Autoconf used to generate the `configure'
script, and exit.
`--cache-file=FILE'
Enable the cache: use and save the results of the tests in FILE,
traditionally `config.cache'. FILE defaults to `/dev/null' to
disable caching.
`--config-cache'
`-C'
Alias for `--cache-file=config.cache'.
`--quiet'
`--silent'
`-q'
Do not print messages saying which checks are being made. To
suppress all normal output, redirect it to `/dev/null' (any error
messages will still be shown).
`--srcdir=DIR'
Look for the package's source code in directory DIR. Usually
`configure' can determine that directory automatically.
`configure' also accepts some other, not widely useful, options. Run
`configure --help' for more details.

15
Makefile.am Normal file
View File

@ -0,0 +1,15 @@
# TODO(luc) Add 'doc' to this list when ready
SUBDIRS = ccstruct ccutil classify cutil dict display image textord viewer wordrec ccmain training
EXTRA_DIST = tessdata phototest.tif tesseract.dsp tesseract.dsw
#EXTRA_DIST = doc/html doc/@PACKAGE_NAME@_@PACKAGE_VERSION@.pdf doc/@PACKAGE_NAME@_@PACKAGE_VERSION@.ps.gz
dist-hook:
# Need to remove CVS directories from directories
# added using EXTRA_DIST. $(distdir)/tessdata would in
# theory suffice.
rm -rf `find $(distdir) -name CVS`
# Also remove extra files not needed in a distribution
rm -rf `find $(distdir) -name configure.ac`
rm -rf `find $(distdir) -name acinclude.m4`
rm -rf `find $(distdir) -name aclocal.m4`

628
Makefile.in Normal file
View File

@ -0,0 +1,628 @@
# Makefile.in generated by automake 1.9.6 from Makefile.am.
# @configure_input@
# Copyright (C) 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002,
# 2003, 2004, 2005 Free Software Foundation, Inc.
# This Makefile.in is free software; the Free Software Foundation
# gives unlimited permission to copy and/or distribute it,
# with or without modifications, as long as this notice is preserved.
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY, to the extent permitted by law; without
# even the implied warranty of MERCHANTABILITY or FITNESS FOR A
# PARTICULAR PURPOSE.
@SET_MAKE@
srcdir = @srcdir@
top_srcdir = @top_srcdir@
VPATH = @srcdir@
pkgdatadir = $(datadir)/@PACKAGE@
pkglibdir = $(libdir)/@PACKAGE@
pkgincludedir = $(includedir)/@PACKAGE@
top_builddir = .
am__cd = CDPATH="$${ZSH_VERSION+.}$(PATH_SEPARATOR)" && cd
INSTALL = @INSTALL@
install_sh_DATA = $(install_sh) -c -m 644
install_sh_PROGRAM = $(install_sh) -c
install_sh_SCRIPT = $(install_sh) -c
INSTALL_HEADER = $(INSTALL_DATA)
transform = $(program_transform_name)
NORMAL_INSTALL = :
PRE_INSTALL = :
POST_INSTALL = :
NORMAL_UNINSTALL = :
PRE_UNINSTALL = :
POST_UNINSTALL = :
build_triplet = @build@
host_triplet = @host@
DIST_COMMON = README $(am__configure_deps) $(srcdir)/Makefile.am \
$(srcdir)/Makefile.in $(top_srcdir)/config/config.h.in \
$(top_srcdir)/configure AUTHORS COPYING ChangeLog INSTALL NEWS \
config/config.guess config/config.sub config/depcomp \
config/install-sh config/missing config/mkinstalldirs
subdir = .
ACLOCAL_M4 = $(top_srcdir)/aclocal.m4
am__aclocal_m4_deps = $(top_srcdir)/acinclude.m4 \
$(top_srcdir)/config/ac_define_versionlevel.m4 \
$(top_srcdir)/config/acinclude_custom.m4 \
$(top_srcdir)/configure.ac
am__configure_deps = $(am__aclocal_m4_deps) $(CONFIGURE_DEPENDENCIES) \
$(ACLOCAL_M4)
am__CONFIG_DISTCLEAN_FILES = config.status config.cache config.log \
configure.lineno configure.status.lineno
mkinstalldirs = $(SHELL) $(top_srcdir)/config/mkinstalldirs
CONFIG_HEADER = config_auto.h
CONFIG_CLEAN_FILES =
SOURCES =
DIST_SOURCES =
RECURSIVE_TARGETS = all-recursive check-recursive dvi-recursive \
html-recursive info-recursive install-data-recursive \
install-exec-recursive install-info-recursive \
install-recursive installcheck-recursive installdirs-recursive \
pdf-recursive ps-recursive uninstall-info-recursive \
uninstall-recursive
ETAGS = etags
CTAGS = ctags
DIST_SUBDIRS = $(SUBDIRS)
DISTFILES = $(DIST_COMMON) $(DIST_SOURCES) $(TEXINFOS) $(EXTRA_DIST)
distdir = $(PACKAGE)-$(VERSION)
top_distdir = $(distdir)
am__remove_distdir = \
{ test ! -d $(distdir) \
|| { find $(distdir) -type d ! -perm -200 -exec chmod u+w {} ';' \
&& rm -fr $(distdir); }; }
DIST_ARCHIVES = $(distdir).tar.gz
GZIP_ENV = --best
distuninstallcheck_listfiles = find . -type f -print
distcleancheck_listfiles = find . -type f -print
ACLOCAL = @ACLOCAL@
AMDEP_FALSE = @AMDEP_FALSE@
AMDEP_TRUE = @AMDEP_TRUE@
AMTAR = @AMTAR@
AUTOCONF = @AUTOCONF@
AUTOHEADER = @AUTOHEADER@
AUTOMAKE = @AUTOMAKE@
AWK = @AWK@
CC = @CC@
CCDEPMODE = @CCDEPMODE@
CFLAGS = @CFLAGS@
CPPFLAGS = @CPPFLAGS@
CXX = @CXX@
CXXCPP = @CXXCPP@
CXXDEPMODE = @CXXDEPMODE@
CXXFLAGS = @CXXFLAGS@
CXXRPOFLAGS = @CXXRPOFLAGS@
CYGPATH_W = @CYGPATH_W@
DEFS = @DEFS@
DEPDIR = @DEPDIR@
ECHO_C = @ECHO_C@
ECHO_N = @ECHO_N@
ECHO_T = @ECHO_T@
EGREP = @EGREP@
EXEEXT = @EXEEXT@
GNUWIN32_DIR = @GNUWIN32_DIR@
HAVE_GNUWIN32_FALSE = @HAVE_GNUWIN32_FALSE@
HAVE_GNUWIN32_TRUE = @HAVE_GNUWIN32_TRUE@
HAVE_LIBTIFF_FALSE = @HAVE_LIBTIFF_FALSE@
HAVE_LIBTIFF_TRUE = @HAVE_LIBTIFF_TRUE@
INSTALL_DATA = @INSTALL_DATA@
INSTALL_PROGRAM = @INSTALL_PROGRAM@
INSTALL_SCRIPT = @INSTALL_SCRIPT@
INSTALL_STRIP_PROGRAM = @INSTALL_STRIP_PROGRAM@
LDFLAGS = @LDFLAGS@
LIBOBJS = @LIBOBJS@
LIBS = @LIBS@
LIBTIFF_CFLAGS = @LIBTIFF_CFLAGS@
LIBTIFF_LIBS = @LIBTIFF_LIBS@
LTLIBOBJS = @LTLIBOBJS@
MAINT = @MAINT@
MAINTAINER_MODE_FALSE = @MAINTAINER_MODE_FALSE@
MAINTAINER_MODE_TRUE = @MAINTAINER_MODE_TRUE@
MAKEINFO = @MAKEINFO@
OBJEXT = @OBJEXT@
OPTS = @OPTS@
PACKAGE = @PACKAGE@
PACKAGE_BUGREPORT = @PACKAGE_BUGREPORT@
PACKAGE_DATE = @PACKAGE_DATE@
PACKAGE_NAME = @PACKAGE_NAME@
PACKAGE_STRING = @PACKAGE_STRING@
PACKAGE_TARNAME = @PACKAGE_TARNAME@
PACKAGE_VERSION = @PACKAGE_VERSION@
PACKAGE_YEAR = @PACKAGE_YEAR@
PATH_SEPARATOR = @PATH_SEPARATOR@
RANLIB = @RANLIB@
RPO_NO = @RPO_NO@
RPO_YES = @RPO_YES@
SET_MAKE = @SET_MAKE@
SHELL = @SHELL@
STRIP = @STRIP@
USING_CL_FALSE = @USING_CL_FALSE@
USING_CL_TRUE = @USING_CL_TRUE@
VERSION = @VERSION@
ac_ct_CC = @ac_ct_CC@
ac_ct_CXX = @ac_ct_CXX@
ac_ct_RANLIB = @ac_ct_RANLIB@
ac_ct_STRIP = @ac_ct_STRIP@
am__fastdepCC_FALSE = @am__fastdepCC_FALSE@
am__fastdepCC_TRUE = @am__fastdepCC_TRUE@
am__fastdepCXX_FALSE = @am__fastdepCXX_FALSE@
am__fastdepCXX_TRUE = @am__fastdepCXX_TRUE@
am__include = @am__include@
am__leading_dot = @am__leading_dot@
am__quote = @am__quote@
am__tar = @am__tar@
am__untar = @am__untar@
bindir = @bindir@
build = @build@
build_alias = @build_alias@
build_cpu = @build_cpu@
build_os = @build_os@
build_vendor = @build_vendor@
datadir = @datadir@
exec_prefix = @exec_prefix@
host = @host@
host_alias = @host_alias@
host_cpu = @host_cpu@
host_os = @host_os@
host_vendor = @host_vendor@
includedir = @includedir@
infodir = @infodir@
install_sh = @install_sh@
libdir = @libdir@
libexecdir = @libexecdir@
localstatedir = @localstatedir@
mandir = @mandir@
mkdir_p = @mkdir_p@
oldincludedir = @oldincludedir@
prefix = @prefix@
program_transform_name = @program_transform_name@
sbindir = @sbindir@
sharedstatedir = @sharedstatedir@
sysconfdir = @sysconfdir@
target_alias = @target_alias@
# TODO(luc) Add 'doc' to this list when ready
SUBDIRS = ccstruct ccutil classify cutil dict display image textord viewer wordrec ccmain training
EXTRA_DIST = tessdata phototest.tif tesseract.dsp tesseract.dsw
all: config_auto.h
$(MAKE) $(AM_MAKEFLAGS) all-recursive
.SUFFIXES:
am--refresh:
@:
$(srcdir)/Makefile.in: @MAINTAINER_MODE_TRUE@ $(srcdir)/Makefile.am $(am__configure_deps)
@for dep in $?; do \
case '$(am__configure_deps)' in \
*$$dep*) \
echo ' cd $(srcdir) && $(AUTOMAKE) --gnu '; \
cd $(srcdir) && $(AUTOMAKE) --gnu \
&& exit 0; \
exit 1;; \
esac; \
done; \
echo ' cd $(top_srcdir) && $(AUTOMAKE) --gnu Makefile'; \
cd $(top_srcdir) && \
$(AUTOMAKE) --gnu Makefile
.PRECIOUS: Makefile
Makefile: $(srcdir)/Makefile.in $(top_builddir)/config.status
@case '$?' in \
*config.status*) \
echo ' $(SHELL) ./config.status'; \
$(SHELL) ./config.status;; \
*) \
echo ' cd $(top_builddir) && $(SHELL) ./config.status $@ $(am__depfiles_maybe)'; \
cd $(top_builddir) && $(SHELL) ./config.status $@ $(am__depfiles_maybe);; \
esac;
$(top_builddir)/config.status: $(top_srcdir)/configure $(CONFIG_STATUS_DEPENDENCIES)
$(SHELL) ./config.status --recheck
$(top_srcdir)/configure: @MAINTAINER_MODE_TRUE@ $(am__configure_deps)
cd $(srcdir) && $(AUTOCONF)
$(ACLOCAL_M4): @MAINTAINER_MODE_TRUE@ $(am__aclocal_m4_deps)
cd $(srcdir) && $(ACLOCAL) $(ACLOCAL_AMFLAGS)
config_auto.h: stamp-h1
@if test ! -f $@; then \
rm -f stamp-h1; \
$(MAKE) stamp-h1; \
else :; fi
stamp-h1: $(top_srcdir)/config/config.h.in $(top_builddir)/config.status
@rm -f stamp-h1
cd $(top_builddir) && $(SHELL) ./config.status config_auto.h
$(top_srcdir)/config/config.h.in: @MAINTAINER_MODE_TRUE@ $(am__configure_deps)
cd $(top_srcdir) && $(AUTOHEADER)
rm -f stamp-h1
touch $@
distclean-hdr:
-rm -f config_auto.h stamp-h1
uninstall-info-am:
# This directory's subdirectories are mostly independent; you can cd
# into them and run `make' without going through this Makefile.
# To change the values of `make' variables: instead of editing Makefiles,
# (1) if the variable is set in `config.status', edit `config.status'
# (which will cause the Makefiles to be regenerated when you run `make');
# (2) otherwise, pass the desired values on the `make' command line.
$(RECURSIVE_TARGETS):
@failcom='exit 1'; \
for f in x $$MAKEFLAGS; do \
case $$f in \
*=* | --[!k]*);; \
*k*) failcom='fail=yes';; \
esac; \
done; \
dot_seen=no; \
target=`echo $@ | sed s/-recursive//`; \
list='$(SUBDIRS)'; for subdir in $$list; do \
echo "Making $$target in $$subdir"; \
if test "$$subdir" = "."; then \
dot_seen=yes; \
local_target="$$target-am"; \
else \
local_target="$$target"; \
fi; \
(cd $$subdir && $(MAKE) $(AM_MAKEFLAGS) $$local_target) \
|| eval $$failcom; \
done; \
if test "$$dot_seen" = "no"; then \
$(MAKE) $(AM_MAKEFLAGS) "$$target-am" || exit 1; \
fi; test -z "$$fail"
mostlyclean-recursive clean-recursive distclean-recursive \
maintainer-clean-recursive:
@failcom='exit 1'; \
for f in x $$MAKEFLAGS; do \
case $$f in \
*=* | --[!k]*);; \
*k*) failcom='fail=yes';; \
esac; \
done; \
dot_seen=no; \
case "$@" in \
distclean-* | maintainer-clean-*) list='$(DIST_SUBDIRS)' ;; \
*) list='$(SUBDIRS)' ;; \
esac; \
rev=''; for subdir in $$list; do \
if test "$$subdir" = "."; then :; else \
rev="$$subdir $$rev"; \
fi; \
done; \
rev="$$rev ."; \
target=`echo $@ | sed s/-recursive//`; \
for subdir in $$rev; do \
echo "Making $$target in $$subdir"; \
if test "$$subdir" = "."; then \
local_target="$$target-am"; \
else \
local_target="$$target"; \
fi; \
(cd $$subdir && $(MAKE) $(AM_MAKEFLAGS) $$local_target) \
|| eval $$failcom; \
done && test -z "$$fail"
tags-recursive:
list='$(SUBDIRS)'; for subdir in $$list; do \
test "$$subdir" = . || (cd $$subdir && $(MAKE) $(AM_MAKEFLAGS) tags); \
done
ctags-recursive:
list='$(SUBDIRS)'; for subdir in $$list; do \
test "$$subdir" = . || (cd $$subdir && $(MAKE) $(AM_MAKEFLAGS) ctags); \
done
ID: $(HEADERS) $(SOURCES) $(LISP) $(TAGS_FILES)
list='$(SOURCES) $(HEADERS) $(LISP) $(TAGS_FILES)'; \
unique=`for i in $$list; do \
if test -f "$$i"; then echo $$i; else echo $(srcdir)/$$i; fi; \
done | \
$(AWK) ' { files[$$0] = 1; } \
END { for (i in files) print i; }'`; \
mkid -fID $$unique
tags: TAGS
TAGS: tags-recursive $(HEADERS) $(SOURCES) $(TAGS_DEPENDENCIES) \
$(TAGS_FILES) $(LISP)
tags=; \
here=`pwd`; \
if ($(ETAGS) --etags-include --version) >/dev/null 2>&1; then \
include_option=--etags-include; \
empty_fix=.; \
else \
include_option=--include; \
empty_fix=; \
fi; \
list='$(SUBDIRS)'; for subdir in $$list; do \
if test "$$subdir" = .; then :; else \
test ! -f $$subdir/TAGS || \
tags="$$tags $$include_option=$$here/$$subdir/TAGS"; \
fi; \
done; \
list='$(SOURCES) $(HEADERS) $(LISP) $(TAGS_FILES)'; \
unique=`for i in $$list; do \
if test -f "$$i"; then echo $$i; else echo $(srcdir)/$$i; fi; \
done | \
$(AWK) ' { files[$$0] = 1; } \
END { for (i in files) print i; }'`; \
if test -z "$(ETAGS_ARGS)$$tags$$unique"; then :; else \
test -n "$$unique" || unique=$$empty_fix; \
$(ETAGS) $(ETAGSFLAGS) $(AM_ETAGSFLAGS) $(ETAGS_ARGS) \
$$tags $$unique; \
fi
ctags: CTAGS
CTAGS: ctags-recursive $(HEADERS) $(SOURCES) $(TAGS_DEPENDENCIES) \
$(TAGS_FILES) $(LISP)
tags=; \
here=`pwd`; \
list='$(SOURCES) $(HEADERS) $(LISP) $(TAGS_FILES)'; \
unique=`for i in $$list; do \
if test -f "$$i"; then echo $$i; else echo $(srcdir)/$$i; fi; \
done | \
$(AWK) ' { files[$$0] = 1; } \
END { for (i in files) print i; }'`; \
test -z "$(CTAGS_ARGS)$$tags$$unique" \
|| $(CTAGS) $(CTAGSFLAGS) $(AM_CTAGSFLAGS) $(CTAGS_ARGS) \
$$tags $$unique
GTAGS:
here=`$(am__cd) $(top_builddir) && pwd` \
&& cd $(top_srcdir) \
&& gtags -i $(GTAGS_ARGS) $$here
distclean-tags:
-rm -f TAGS ID GTAGS GRTAGS GSYMS GPATH tags
distdir: $(DISTFILES)
$(am__remove_distdir)
mkdir $(distdir)
$(mkdir_p) $(distdir)/config
@srcdirstrip=`echo "$(srcdir)" | sed 's|.|.|g'`; \
topsrcdirstrip=`echo "$(top_srcdir)" | sed 's|.|.|g'`; \
list='$(DISTFILES)'; for file in $$list; do \
case $$file in \
$(srcdir)/*) file=`echo "$$file" | sed "s|^$$srcdirstrip/||"`;; \
$(top_srcdir)/*) file=`echo "$$file" | sed "s|^$$topsrcdirstrip/|$(top_builddir)/|"`;; \
esac; \
if test -f $$file || test -d $$file; then d=.; else d=$(srcdir); fi; \
dir=`echo "$$file" | sed -e 's,/[^/]*$$,,'`; \
if test "$$dir" != "$$file" && test "$$dir" != "."; then \
dir="/$$dir"; \
$(mkdir_p) "$(distdir)$$dir"; \
else \
dir=''; \
fi; \
if test -d $$d/$$file; then \
if test -d $(srcdir)/$$file && test $$d != $(srcdir); then \
cp -pR $(srcdir)/$$file $(distdir)$$dir || exit 1; \
fi; \
cp -pR $$d/$$file $(distdir)$$dir || exit 1; \
else \
test -f $(distdir)/$$file \
|| cp -p $$d/$$file $(distdir)/$$file \
|| exit 1; \
fi; \
done
list='$(DIST_SUBDIRS)'; for subdir in $$list; do \
if test "$$subdir" = .; then :; else \
test -d "$(distdir)/$$subdir" \
|| $(mkdir_p) "$(distdir)/$$subdir" \
|| exit 1; \
distdir=`$(am__cd) $(distdir) && pwd`; \
top_distdir=`$(am__cd) $(top_distdir) && pwd`; \
(cd $$subdir && \
$(MAKE) $(AM_MAKEFLAGS) \
top_distdir="$$top_distdir" \
distdir="$$distdir/$$subdir" \
distdir) \
|| exit 1; \
fi; \
done
$(MAKE) $(AM_MAKEFLAGS) \
top_distdir="$(top_distdir)" distdir="$(distdir)" \
dist-hook
-find $(distdir) -type d ! -perm -777 -exec chmod a+rwx {} \; -o \
! -type d ! -perm -444 -links 1 -exec chmod a+r {} \; -o \
! -type d ! -perm -400 -exec chmod a+r {} \; -o \
! -type d ! -perm -444 -exec $(SHELL) $(install_sh) -c -m a+r {} {} \; \
|| chmod -R a+r $(distdir)
dist-gzip: distdir
tardir=$(distdir) && $(am__tar) | GZIP=$(GZIP_ENV) gzip -c >$(distdir).tar.gz
$(am__remove_distdir)
dist-bzip2: distdir
tardir=$(distdir) && $(am__tar) | bzip2 -9 -c >$(distdir).tar.bz2
$(am__remove_distdir)
dist-tarZ: distdir
tardir=$(distdir) && $(am__tar) | compress -c >$(distdir).tar.Z
$(am__remove_distdir)
dist-shar: distdir
shar $(distdir) | GZIP=$(GZIP_ENV) gzip -c >$(distdir).shar.gz
$(am__remove_distdir)
dist-zip: distdir
-rm -f $(distdir).zip
zip -rq $(distdir).zip $(distdir)
$(am__remove_distdir)
dist dist-all: distdir
tardir=$(distdir) && $(am__tar) | GZIP=$(GZIP_ENV) gzip -c >$(distdir).tar.gz
$(am__remove_distdir)
# This target untars the dist file and tries a VPATH configuration. Then
# it guarantees that the distribution is self-contained by making another
# tarfile.
distcheck: dist
case '$(DIST_ARCHIVES)' in \
*.tar.gz*) \
GZIP=$(GZIP_ENV) gunzip -c $(distdir).tar.gz | $(am__untar) ;;\
*.tar.bz2*) \
bunzip2 -c $(distdir).tar.bz2 | $(am__untar) ;;\
*.tar.Z*) \
uncompress -c $(distdir).tar.Z | $(am__untar) ;;\
*.shar.gz*) \
GZIP=$(GZIP_ENV) gunzip -c $(distdir).shar.gz | unshar ;;\
*.zip*) \
unzip $(distdir).zip ;;\
esac
chmod -R a-w $(distdir); chmod a+w $(distdir)
mkdir $(distdir)/_build
mkdir $(distdir)/_inst
chmod a-w $(distdir)
dc_install_base=`$(am__cd) $(distdir)/_inst && pwd | sed -e 's,^[^:\\/]:[\\/],/,'` \
&& dc_destdir="$${TMPDIR-/tmp}/am-dc-$$$$/" \
&& cd $(distdir)/_build \
&& ../configure --srcdir=.. --prefix="$$dc_install_base" \
$(DISTCHECK_CONFIGURE_FLAGS) \
&& $(MAKE) $(AM_MAKEFLAGS) \
&& $(MAKE) $(AM_MAKEFLAGS) dvi \
&& $(MAKE) $(AM_MAKEFLAGS) check \
&& $(MAKE) $(AM_MAKEFLAGS) install \
&& $(MAKE) $(AM_MAKEFLAGS) installcheck \
&& $(MAKE) $(AM_MAKEFLAGS) uninstall \
&& $(MAKE) $(AM_MAKEFLAGS) distuninstallcheck_dir="$$dc_install_base" \
distuninstallcheck \
&& chmod -R a-w "$$dc_install_base" \
&& ({ \
(cd ../.. && umask 077 && mkdir "$$dc_destdir") \
&& $(MAKE) $(AM_MAKEFLAGS) DESTDIR="$$dc_destdir" install \
&& $(MAKE) $(AM_MAKEFLAGS) DESTDIR="$$dc_destdir" uninstall \
&& $(MAKE) $(AM_MAKEFLAGS) DESTDIR="$$dc_destdir" \
distuninstallcheck_dir="$$dc_destdir" distuninstallcheck; \
} || { rm -rf "$$dc_destdir"; exit 1; }) \
&& rm -rf "$$dc_destdir" \
&& $(MAKE) $(AM_MAKEFLAGS) dist \
&& rm -rf $(DIST_ARCHIVES) \
&& $(MAKE) $(AM_MAKEFLAGS) distcleancheck
$(am__remove_distdir)
@(echo "$(distdir) archives ready for distribution: "; \
list='$(DIST_ARCHIVES)'; for i in $$list; do echo $$i; done) | \
sed -e '1{h;s/./=/g;p;x;}' -e '$${p;x;}'
distuninstallcheck:
@cd $(distuninstallcheck_dir) \
&& test `$(distuninstallcheck_listfiles) | wc -l` -le 1 \
|| { echo "ERROR: files left after uninstall:" ; \
if test -n "$(DESTDIR)"; then \
echo " (check DESTDIR support)"; \
fi ; \
$(distuninstallcheck_listfiles) ; \
exit 1; } >&2
distcleancheck: distclean
@if test '$(srcdir)' = . ; then \
echo "ERROR: distcleancheck can only run from a VPATH build" ; \
exit 1 ; \
fi
@test `$(distcleancheck_listfiles) | wc -l` -eq 0 \
|| { echo "ERROR: files left in build directory after distclean:" ; \
$(distcleancheck_listfiles) ; \
exit 1; } >&2
check-am: all-am
check: check-recursive
all-am: Makefile config_auto.h
installdirs: installdirs-recursive
installdirs-am:
install: install-recursive
install-exec: install-exec-recursive
install-data: install-data-recursive
uninstall: uninstall-recursive
install-am: all-am
@$(MAKE) $(AM_MAKEFLAGS) install-exec-am install-data-am
installcheck: installcheck-recursive
install-strip:
$(MAKE) $(AM_MAKEFLAGS) INSTALL_PROGRAM="$(INSTALL_STRIP_PROGRAM)" \
install_sh_PROGRAM="$(INSTALL_STRIP_PROGRAM)" INSTALL_STRIP_FLAG=-s \
`test -z '$(STRIP)' || \
echo "INSTALL_PROGRAM_ENV=STRIPPROG='$(STRIP)'"` install
mostlyclean-generic:
clean-generic:
distclean-generic:
-test -z "$(CONFIG_CLEAN_FILES)" || rm -f $(CONFIG_CLEAN_FILES)
maintainer-clean-generic:
@echo "This command is intended for maintainers to use"
@echo "it deletes files that may require special tools to rebuild."
clean: clean-recursive
clean-am: clean-generic mostlyclean-am
distclean: distclean-recursive
-rm -f $(am__CONFIG_DISTCLEAN_FILES)
-rm -f Makefile
distclean-am: clean-am distclean-generic distclean-hdr distclean-tags
dvi: dvi-recursive
dvi-am:
html: html-recursive
info: info-recursive
info-am:
install-data-am:
install-exec-am:
install-info: install-info-recursive
install-man:
installcheck-am:
maintainer-clean: maintainer-clean-recursive
-rm -f $(am__CONFIG_DISTCLEAN_FILES)
-rm -rf $(top_srcdir)/autom4te.cache
-rm -f Makefile
maintainer-clean-am: distclean-am maintainer-clean-generic
mostlyclean: mostlyclean-recursive
mostlyclean-am: mostlyclean-generic
pdf: pdf-recursive
pdf-am:
ps: ps-recursive
ps-am:
uninstall-am: uninstall-info-am
uninstall-info: uninstall-info-recursive
.PHONY: $(RECURSIVE_TARGETS) CTAGS GTAGS all all-am am--refresh check \
check-am clean clean-generic clean-recursive ctags \
ctags-recursive dist dist-all dist-bzip2 dist-gzip dist-hook \
dist-shar dist-tarZ dist-zip distcheck distclean \
distclean-generic distclean-hdr distclean-recursive \
distclean-tags distcleancheck distdir distuninstallcheck dvi \
dvi-am html html-am info info-am install install-am \
install-data install-data-am install-exec install-exec-am \
install-info install-info-am install-man install-strip \
installcheck installcheck-am installdirs installdirs-am \
maintainer-clean maintainer-clean-generic \
maintainer-clean-recursive mostlyclean mostlyclean-generic \
mostlyclean-recursive pdf pdf-am ps ps-am tags tags-recursive \
uninstall uninstall-am uninstall-info-am
#EXTRA_DIST = doc/html doc/@PACKAGE_NAME@_@PACKAGE_VERSION@.pdf doc/@PACKAGE_NAME@_@PACKAGE_VERSION@.ps.gz
dist-hook:
# Need to remove CVS directories from directories
# added using EXTRA_DIST. $(distdir)/tessdata would in
# theory suffice.
rm -rf `find $(distdir) -name CVS`
# Also remove extra files not needed in a distribution
rm -rf `find $(distdir) -name configure.ac`
rm -rf `find $(distdir) -name acinclude.m4`
rm -rf `find $(distdir) -name aclocal.m4`
# Tell versions [3.59,3.63) of GNU make to not export all variables.
# Otherwise a system limit (for SysV at least) may be exceeded.
.NOEXPORT:

1
NEWS Normal file
View File

@ -0,0 +1 @@
Stub file. To be populated at a later stage.

85
README Normal file
View File

@ -0,0 +1,85 @@
Introduction
============
This package contains the Tesseract Open Source OCR Engine.
Orignally developed at Hewlett Packard Laboratories Bristol and
at Hewlett Packard Co, Greeley Colorado, all the code
in this distribution is now licensed under the Apache License:
** Licensed under the Apache License, Version 2.0 (the "License");
** you may not use this file except in compliance with the License.
** You may obtain a copy of the License at
** http://www.apache.org/licenses/LICENSE-2.0
** Unless required by applicable law or agreed to in writing, software
** distributed under the License is distributed on an "AS IS" BASIS,
** WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
** See the License for the specific language governing permissions and
** limitations under the License.
Other Dependencies and Licenses:
================================
The Aspirin/MIGRAINES system is no longer required.
Tesseract can also make use of the libtiff library. (www.libtiff.org)
Without libtiff, Tesseract can only read uncompressed and G3 compressed
TIFF files.
History:
========
The engine was developed at Hewlett Packard Laboratories Bristol and
at Hewlett Packard Co, Greeley Colorado between 1985 and 1994, with some
more changes made in 1996 to port to Windows, and some C++izing in 1998.
A lot of the code was written in C, and then some more was written in C++.
Since then all the code has been converted to at least compile with a C++
compiler. Currently it builds under Linux with gcc2.95 and under Windows
with VC++6. The C++ code makes heavy use of a list system using macros.
This predates stl, was portable before stl, and is more efficent than stl
lists, but has the big negative that if you do get a segmentation violation,
it is hard to debug. Another "feature" of the C/C++ split is that the C++
data structures get converted to C data structures to call the low-level C
code. This is ugly, and the C++izing of the C code is a step towards
eliminating the conversion, but it has not happened yet.
Directory Structure (ordered by dependency):
============================================
ccmain Top-level code. The main program resides in tesseractmain.cpp.
display An "editor" to view and operate on the internal structures.
(Requires a working viewer - batteries not included.)
wordrec The word-level recognizer.
textord The module that organizes(orders) text into lines and words.
classify The low-level character classifiers.
ccstruct Classes to hold information about a page as it is being processed.
viewer The client side of a client server viewing system.
Unfortunately, at this time, the server side is not available.
image Image class and processing functions.
dict Language model code.
cutil Code for file I/O, lists, heaps etc, from the old C code.
ccutil Somewhat newer code for lists, memory allocation etc from the
old C++ code.
About the Engine
================
This code is a raw OCR engine. It has NO PAGE LAYOUT ANALYSIS, NO OUTPUT
FORMATTING, and NO UI. It can only process an image of a single column
and create text from it. It can detect fixed pitch vs proportional text.
Having said that, in 1995, this engine was in the top 3 in terms of character
accuracy, and it compiles and runs on both Linux and Windows. Another current
limitation is that it only recognizes English and its character set is only
US-ASCII. Training code IS included in the open source release however, and
will be included in a future release.
Using the Engine
================
The usage of both Windows and Linux versions is the same.
The executable must reside in the same directory as the tessdata directory
The command line is:
tesseract <image.tif> <output> batch
The image file requires an .tif extension for its type to be recognized
correctly. If a file exists with the .tif extension replaced by .uzn, then it
will be interpreted as a UNLV-style zone file. (See www.isri.unlv.edu for
details of the zone files.)

78
ReleaseNotes Normal file
View File

@ -0,0 +1,78 @@
Tesseract release notes Feb 2, 2007 - V1.03.
Added mftraining and cntraining. Using an image with a box file, tesseract
generates .tr output files. cntraining runs on the .tr files to make
normproto that lives in tessdata. mftraining runs on the .tr files to
make inttemp and pffmtable in tessdata. These are the main data files
that tesseract uses to recognize characters. At present, the code to make
dictionary files is not yet available, nor are any sample box files or
rebuilt inttemp or documentation to create any of these. Recognition is
still limited to the ASCII set, but when this problem is fixed, documentation
will follow.
Added a new API with adaptive thresholding for grey and color images.
See ccmain/baseapi.h/cpp for details. The main program has been converted
to use the API as an example. See main() in ccmain/tesseractmain.cpp for
details. The API is designed to make it easy to add subclasses with ability
to output the bounding boxes etc from the internal structures. The adaptive
thresholding improves accuracy (most of the time) on non-binary images.
Many memory leaks have been fixed. There are no known leaks left from using
the API correctly.
The adaptive classifier was not operating correctly. This bug, and several
others have been fixed, including poor chopping, an indefinite (if not quite
infinite) loop in the number parser, and a couple of crash bugs. Thanks to
all that have contributed bugs and bug fixes.
It is now possible to build without any of the graphics support to save code
size using #define GRAPHICS_DISABLED. There is also a new EMBEDDED define
for use on operating systems with limited library support.
64-bit and Mac OSX buildability is now included in the mainline source tree.
Thanks to all that have contributed patches and comments to help with that.
1.03 is also endian-independent, apart from the tiff i/o, so if you use
libtiff, the code should run on all platforms, even if you get/create new
data files of a different endinanness.
Some of the bug fixes improve accuracy, and so do some of the changes to
DangAmbigs and user-words.
Tesseract release notes, Oct 4 2006 - V1.02.
Removed dependency on aspirin. *All* code is now licensed under Apache2.0.
Tesseract release notes, Sep 7 2006 - V1.01.
Fixes for this release:
Added mfcpch.cpp and getopt.cpp for VC++.
Fixed problem with greyscale images and no libtiff.
Stopped debug window from being used for the usage output.
Fixed load of inttemp for big-endian architectures.
Fixed some Mac compilation issues.
This version should read uncompressed 8 bit grey and 24 bit color tiffs
without having to have libtiff. It does a dumb threshold though, so don't
expect good results from poor contrast or images of natural scenes etc.
If you just run tesseract with no command line args you should now get a
sensible usage message on linux, with or without X-windows.
If you can get it to compile on a PPC Mac, it may now run correctly,
although not all the build issues are fixed yet.
Building Tesseract:
Windows:
Unpack the tar.gz archive
Open tesseract.dsw in DevStudio (preferably version 6, higher versions will be more difficult)
Set Win32 - Release as the active configuration.
Build.
Copy tesseract.exe from bin.rel up one directory level.
Run tesseract phototest.tif phototest
This will create phototest.txt.
Linux:
Unpack the tar.gz archive
./configure
make
Copy tesseract from ccmain up one directory level (or create a symbolic link)
Run tesseract phototest.tif phototest
This will create phototest.txt.

10
acinclude.m4 Normal file
View File

@ -0,0 +1,10 @@
# Master include for AC macros. This directory structure allows
# for more flexibility with respect to CVS modules.
#
# Author: Luc Vincent
### m4_include(config/ac_compile_check_sizeof.m4)dnl
#m4_include(config/ac_create_stdint_h.m4)dnl
#m4_include(config/ax_create_stdint_h.m4)dnl
m4_include(config/ac_define_versionlevel.m4)dnl
m4_include(config/acinclude_custom.m4)dnl

920
aclocal.m4 vendored Normal file
View File

@ -0,0 +1,920 @@
# generated automatically by aclocal 1.9.6 -*- Autoconf -*-
# Copyright (C) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004,
# 2005 Free Software Foundation, Inc.
# This file is free software; the Free Software Foundation
# gives unlimited permission to copy and/or distribute it,
# with or without modifications, as long as this notice is preserved.
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY, to the extent permitted by law; without
# even the implied warranty of MERCHANTABILITY or FITNESS FOR A
# PARTICULAR PURPOSE.
# Copyright (C) 2002, 2003, 2005 Free Software Foundation, Inc.
#
# This file is free software; the Free Software Foundation
# gives unlimited permission to copy and/or distribute it,
# with or without modifications, as long as this notice is preserved.
# AM_AUTOMAKE_VERSION(VERSION)
# ----------------------------
# Automake X.Y traces this macro to ensure aclocal.m4 has been
# generated from the m4 files accompanying Automake X.Y.
AC_DEFUN([AM_AUTOMAKE_VERSION], [am__api_version="1.9"])
# AM_SET_CURRENT_AUTOMAKE_VERSION
# -------------------------------
# Call AM_AUTOMAKE_VERSION so it can be traced.
# This function is AC_REQUIREd by AC_INIT_AUTOMAKE.
AC_DEFUN([AM_SET_CURRENT_AUTOMAKE_VERSION],
[AM_AUTOMAKE_VERSION([1.9.6])])
# AM_AUX_DIR_EXPAND -*- Autoconf -*-
# Copyright (C) 2001, 2003, 2005 Free Software Foundation, Inc.
#
# This file is free software; the Free Software Foundation
# gives unlimited permission to copy and/or distribute it,
# with or without modifications, as long as this notice is preserved.
# For projects using AC_CONFIG_AUX_DIR([foo]), Autoconf sets
# $ac_aux_dir to `$srcdir/foo'. In other projects, it is set to
# `$srcdir', `$srcdir/..', or `$srcdir/../..'.
#
# Of course, Automake must honor this variable whenever it calls a
# tool from the auxiliary directory. The problem is that $srcdir (and
# therefore $ac_aux_dir as well) can be either absolute or relative,
# depending on how configure is run. This is pretty annoying, since
# it makes $ac_aux_dir quite unusable in subdirectories: in the top
# source directory, any form will work fine, but in subdirectories a
# relative path needs to be adjusted first.
#
# $ac_aux_dir/missing
# fails when called from a subdirectory if $ac_aux_dir is relative
# $top_srcdir/$ac_aux_dir/missing
# fails if $ac_aux_dir is absolute,
# fails when called from a subdirectory in a VPATH build with
# a relative $ac_aux_dir
#
# The reason of the latter failure is that $top_srcdir and $ac_aux_dir
# are both prefixed by $srcdir. In an in-source build this is usually
# harmless because $srcdir is `.', but things will broke when you
# start a VPATH build or use an absolute $srcdir.
#
# So we could use something similar to $top_srcdir/$ac_aux_dir/missing,
# iff we strip the leading $srcdir from $ac_aux_dir. That would be:
# am_aux_dir='\$(top_srcdir)/'`expr "$ac_aux_dir" : "$srcdir//*\(.*\)"`
# and then we would define $MISSING as
# MISSING="\${SHELL} $am_aux_dir/missing"
# This will work as long as MISSING is not called from configure, because
# unfortunately $(top_srcdir) has no meaning in configure.
# However there are other variables, like CC, which are often used in
# configure, and could therefore not use this "fixed" $ac_aux_dir.
#
# Another solution, used here, is to always expand $ac_aux_dir to an
# absolute PATH. The drawback is that using absolute paths prevent a
# configured tree to be moved without reconfiguration.
AC_DEFUN([AM_AUX_DIR_EXPAND],
[dnl Rely on autoconf to set up CDPATH properly.
AC_PREREQ([2.50])dnl
# expand $ac_aux_dir to an absolute path
am_aux_dir=`cd $ac_aux_dir && pwd`
])
# AM_CONDITIONAL -*- Autoconf -*-
# Copyright (C) 1997, 2000, 2001, 2003, 2004, 2005
# Free Software Foundation, Inc.
#
# This file is free software; the Free Software Foundation
# gives unlimited permission to copy and/or distribute it,
# with or without modifications, as long as this notice is preserved.
# serial 7
# AM_CONDITIONAL(NAME, SHELL-CONDITION)
# -------------------------------------
# Define a conditional.
AC_DEFUN([AM_CONDITIONAL],
[AC_PREREQ(2.52)dnl
ifelse([$1], [TRUE], [AC_FATAL([$0: invalid condition: $1])],
[$1], [FALSE], [AC_FATAL([$0: invalid condition: $1])])dnl
AC_SUBST([$1_TRUE])
AC_SUBST([$1_FALSE])
if $2; then
$1_TRUE=
$1_FALSE='#'
else
$1_TRUE='#'
$1_FALSE=
fi
AC_CONFIG_COMMANDS_PRE(
[if test -z "${$1_TRUE}" && test -z "${$1_FALSE}"; then
AC_MSG_ERROR([[conditional "$1" was never defined.
Usually this means the macro was only invoked conditionally.]])
fi])])
# Copyright (C) 1999, 2000, 2001, 2002, 2003, 2004, 2005
# Free Software Foundation, Inc.
#
# This file is free software; the Free Software Foundation
# gives unlimited permission to copy and/or distribute it,
# with or without modifications, as long as this notice is preserved.
# serial 8
# There are a few dirty hacks below to avoid letting `AC_PROG_CC' be
# written in clear, in which case automake, when reading aclocal.m4,
# will think it sees a *use*, and therefore will trigger all it's
# C support machinery. Also note that it means that autoscan, seeing
# CC etc. in the Makefile, will ask for an AC_PROG_CC use...
# _AM_DEPENDENCIES(NAME)
# ----------------------
# See how the compiler implements dependency checking.
# NAME is "CC", "CXX", "GCJ", or "OBJC".
# We try a few techniques and use that to set a single cache variable.
#
# We don't AC_REQUIRE the corresponding AC_PROG_CC since the latter was
# modified to invoke _AM_DEPENDENCIES(CC); we would have a circular
# dependency, and given that the user is not expected to run this macro,
# just rely on AC_PROG_CC.
AC_DEFUN([_AM_DEPENDENCIES],
[AC_REQUIRE([AM_SET_DEPDIR])dnl
AC_REQUIRE([AM_OUTPUT_DEPENDENCY_COMMANDS])dnl
AC_REQUIRE([AM_MAKE_INCLUDE])dnl
AC_REQUIRE([AM_DEP_TRACK])dnl
ifelse([$1], CC, [depcc="$CC" am_compiler_list=],
[$1], CXX, [depcc="$CXX" am_compiler_list=],
[$1], OBJC, [depcc="$OBJC" am_compiler_list='gcc3 gcc'],
[$1], GCJ, [depcc="$GCJ" am_compiler_list='gcc3 gcc'],
[depcc="$$1" am_compiler_list=])
AC_CACHE_CHECK([dependency style of $depcc],
[am_cv_$1_dependencies_compiler_type],
[if test -z "$AMDEP_TRUE" && test -f "$am_depcomp"; then
# We make a subdir and do the tests there. Otherwise we can end up
# making bogus files that we don't know about and never remove. For
# instance it was reported that on HP-UX the gcc test will end up
# making a dummy file named `D' -- because `-MD' means `put the output
# in D'.
mkdir conftest.dir
# Copy depcomp to subdir because otherwise we won't find it if we're
# using a relative directory.
cp "$am_depcomp" conftest.dir
cd conftest.dir
# We will build objects and dependencies in a subdirectory because
# it helps to detect inapplicable dependency modes. For instance
# both Tru64's cc and ICC support -MD to output dependencies as a
# side effect of compilation, but ICC will put the dependencies in
# the current directory while Tru64 will put them in the object
# directory.
mkdir sub
am_cv_$1_dependencies_compiler_type=none
if test "$am_compiler_list" = ""; then
am_compiler_list=`sed -n ['s/^#*\([a-zA-Z0-9]*\))$/\1/p'] < ./depcomp`
fi
for depmode in $am_compiler_list; do
# Setup a source with many dependencies, because some compilers
# like to wrap large dependency lists on column 80 (with \), and
# we should not choose a depcomp mode which is confused by this.
#
# We need to recreate these files for each test, as the compiler may
# overwrite some of them when testing with obscure command lines.
# This happens at least with the AIX C compiler.
: > sub/conftest.c
for i in 1 2 3 4 5 6; do
echo '#include "conftst'$i'.h"' >> sub/conftest.c
# Using `: > sub/conftst$i.h' creates only sub/conftst1.h with
# Solaris 8's {/usr,}/bin/sh.
touch sub/conftst$i.h
done
echo "${am__include} ${am__quote}sub/conftest.Po${am__quote}" > confmf
case $depmode in
nosideeffect)
# after this tag, mechanisms are not by side-effect, so they'll
# only be used when explicitly requested
if test "x$enable_dependency_tracking" = xyes; then
continue
else
break
fi
;;
none) break ;;
esac
# We check with `-c' and `-o' for the sake of the "dashmstdout"
# mode. It turns out that the SunPro C++ compiler does not properly
# handle `-M -o', and we need to detect this.
if depmode=$depmode \
source=sub/conftest.c object=sub/conftest.${OBJEXT-o} \
depfile=sub/conftest.Po tmpdepfile=sub/conftest.TPo \
$SHELL ./depcomp $depcc -c -o sub/conftest.${OBJEXT-o} sub/conftest.c \
>/dev/null 2>conftest.err &&
grep sub/conftst6.h sub/conftest.Po > /dev/null 2>&1 &&
grep sub/conftest.${OBJEXT-o} sub/conftest.Po > /dev/null 2>&1 &&
${MAKE-make} -s -f confmf > /dev/null 2>&1; then
# icc doesn't choke on unknown options, it will just issue warnings
# or remarks (even with -Werror). So we grep stderr for any message
# that says an option was ignored or not supported.
# When given -MP, icc 7.0 and 7.1 complain thusly:
# icc: Command line warning: ignoring option '-M'; no argument required
# The diagnosis changed in icc 8.0:
# icc: Command line remark: option '-MP' not supported
if (grep 'ignoring option' conftest.err ||
grep 'not supported' conftest.err) >/dev/null 2>&1; then :; else
am_cv_$1_dependencies_compiler_type=$depmode
break
fi
fi
done
cd ..
rm -rf conftest.dir
else
am_cv_$1_dependencies_compiler_type=none
fi
])
AC_SUBST([$1DEPMODE], [depmode=$am_cv_$1_dependencies_compiler_type])
AM_CONDITIONAL([am__fastdep$1], [
test "x$enable_dependency_tracking" != xno \
&& test "$am_cv_$1_dependencies_compiler_type" = gcc3])
])
# AM_SET_DEPDIR
# -------------
# Choose a directory name for dependency files.
# This macro is AC_REQUIREd in _AM_DEPENDENCIES
AC_DEFUN([AM_SET_DEPDIR],
[AC_REQUIRE([AM_SET_LEADING_DOT])dnl
AC_SUBST([DEPDIR], ["${am__leading_dot}deps"])dnl
])
# AM_DEP_TRACK
# ------------
AC_DEFUN([AM_DEP_TRACK],
[AC_ARG_ENABLE(dependency-tracking,
[ --disable-dependency-tracking speeds up one-time build
--enable-dependency-tracking do not reject slow dependency extractors])
if test "x$enable_dependency_tracking" != xno; then
am_depcomp="$ac_aux_dir/depcomp"
AMDEPBACKSLASH='\'
fi
AM_CONDITIONAL([AMDEP], [test "x$enable_dependency_tracking" != xno])
AC_SUBST([AMDEPBACKSLASH])
])
# Generate code to set up dependency tracking. -*- Autoconf -*-
# Copyright (C) 1999, 2000, 2001, 2002, 2003, 2004, 2005
# Free Software Foundation, Inc.
#
# This file is free software; the Free Software Foundation
# gives unlimited permission to copy and/or distribute it,
# with or without modifications, as long as this notice is preserved.
#serial 3
# _AM_OUTPUT_DEPENDENCY_COMMANDS
# ------------------------------
AC_DEFUN([_AM_OUTPUT_DEPENDENCY_COMMANDS],
[for mf in $CONFIG_FILES; do
# Strip MF so we end up with the name of the file.
mf=`echo "$mf" | sed -e 's/:.*$//'`
# Check whether this is an Automake generated Makefile or not.
# We used to match only the files named `Makefile.in', but
# some people rename them; so instead we look at the file content.
# Grep'ing the first line is not enough: some people post-process
# each Makefile.in and add a new line on top of each file to say so.
# So let's grep whole file.
if grep '^#.*generated by automake' $mf > /dev/null 2>&1; then
dirpart=`AS_DIRNAME("$mf")`
else
continue
fi
# Extract the definition of DEPDIR, am__include, and am__quote
# from the Makefile without running `make'.
DEPDIR=`sed -n 's/^DEPDIR = //p' < "$mf"`
test -z "$DEPDIR" && continue
am__include=`sed -n 's/^am__include = //p' < "$mf"`
test -z "am__include" && continue
am__quote=`sed -n 's/^am__quote = //p' < "$mf"`
# When using ansi2knr, U may be empty or an underscore; expand it
U=`sed -n 's/^U = //p' < "$mf"`
# Find all dependency output files, they are included files with
# $(DEPDIR) in their names. We invoke sed twice because it is the
# simplest approach to changing $(DEPDIR) to its actual value in the
# expansion.
for file in `sed -n "
s/^$am__include $am__quote\(.*(DEPDIR).*\)$am__quote"'$/\1/p' <"$mf" | \
sed -e 's/\$(DEPDIR)/'"$DEPDIR"'/g' -e 's/\$U/'"$U"'/g'`; do
# Make sure the directory exists.
test -f "$dirpart/$file" && continue
fdir=`AS_DIRNAME(["$file"])`
AS_MKDIR_P([$dirpart/$fdir])
# echo "creating $dirpart/$file"
echo '# dummy' > "$dirpart/$file"
done
done
])# _AM_OUTPUT_DEPENDENCY_COMMANDS
# AM_OUTPUT_DEPENDENCY_COMMANDS
# -----------------------------
# This macro should only be invoked once -- use via AC_REQUIRE.
#
# This code is only required when automatic dependency tracking
# is enabled. FIXME. This creates each `.P' file that we will
# need in order to bootstrap the dependency handling code.
AC_DEFUN([AM_OUTPUT_DEPENDENCY_COMMANDS],
[AC_CONFIG_COMMANDS([depfiles],
[test x"$AMDEP_TRUE" != x"" || _AM_OUTPUT_DEPENDENCY_COMMANDS],
[AMDEP_TRUE="$AMDEP_TRUE" ac_aux_dir="$ac_aux_dir"])
])
# Copyright (C) 1996, 1997, 2000, 2001, 2003, 2005
# Free Software Foundation, Inc.
#
# This file is free software; the Free Software Foundation
# gives unlimited permission to copy and/or distribute it,
# with or without modifications, as long as this notice is preserved.
# serial 8
# AM_CONFIG_HEADER is obsolete. It has been replaced by AC_CONFIG_HEADERS.
AU_DEFUN([AM_CONFIG_HEADER], [AC_CONFIG_HEADERS($@)])
# Do all the work for Automake. -*- Autoconf -*-
# Copyright (C) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005
# Free Software Foundation, Inc.
#
# This file is free software; the Free Software Foundation
# gives unlimited permission to copy and/or distribute it,
# with or without modifications, as long as this notice is preserved.
# serial 12
# This macro actually does too much. Some checks are only needed if
# your package does certain things. But this isn't really a big deal.
# AM_INIT_AUTOMAKE(PACKAGE, VERSION, [NO-DEFINE])
# AM_INIT_AUTOMAKE([OPTIONS])
# -----------------------------------------------
# The call with PACKAGE and VERSION arguments is the old style
# call (pre autoconf-2.50), which is being phased out. PACKAGE
# and VERSION should now be passed to AC_INIT and removed from
# the call to AM_INIT_AUTOMAKE.
# We support both call styles for the transition. After
# the next Automake release, Autoconf can make the AC_INIT
# arguments mandatory, and then we can depend on a new Autoconf
# release and drop the old call support.
AC_DEFUN([AM_INIT_AUTOMAKE],
[AC_PREREQ([2.58])dnl
dnl Autoconf wants to disallow AM_ names. We explicitly allow
dnl the ones we care about.
m4_pattern_allow([^AM_[A-Z]+FLAGS$])dnl
AC_REQUIRE([AM_SET_CURRENT_AUTOMAKE_VERSION])dnl
AC_REQUIRE([AC_PROG_INSTALL])dnl
# test to see if srcdir already configured
if test "`cd $srcdir && pwd`" != "`pwd`" &&
test -f $srcdir/config.status; then
AC_MSG_ERROR([source directory already configured; run "make distclean" there first])
fi
# test whether we have cygpath
if test -z "$CYGPATH_W"; then
if (cygpath --version) >/dev/null 2>/dev/null; then
CYGPATH_W='cygpath -w'
else
CYGPATH_W=echo
fi
fi
AC_SUBST([CYGPATH_W])
# Define the identity of the package.
dnl Distinguish between old-style and new-style calls.
m4_ifval([$2],
[m4_ifval([$3], [_AM_SET_OPTION([no-define])])dnl
AC_SUBST([PACKAGE], [$1])dnl
AC_SUBST([VERSION], [$2])],
[_AM_SET_OPTIONS([$1])dnl
AC_SUBST([PACKAGE], ['AC_PACKAGE_TARNAME'])dnl
AC_SUBST([VERSION], ['AC_PACKAGE_VERSION'])])dnl
_AM_IF_OPTION([no-define],,
[AC_DEFINE_UNQUOTED(PACKAGE, "$PACKAGE", [Name of package])
AC_DEFINE_UNQUOTED(VERSION, "$VERSION", [Version number of package])])dnl
# Some tools Automake needs.
AC_REQUIRE([AM_SANITY_CHECK])dnl
AC_REQUIRE([AC_ARG_PROGRAM])dnl
AM_MISSING_PROG(ACLOCAL, aclocal-${am__api_version})
AM_MISSING_PROG(AUTOCONF, autoconf)
AM_MISSING_PROG(AUTOMAKE, automake-${am__api_version})
AM_MISSING_PROG(AUTOHEADER, autoheader)
AM_MISSING_PROG(MAKEINFO, makeinfo)
AM_PROG_INSTALL_SH
AM_PROG_INSTALL_STRIP
AC_REQUIRE([AM_PROG_MKDIR_P])dnl
# We need awk for the "check" target. The system "awk" is bad on
# some platforms.
AC_REQUIRE([AC_PROG_AWK])dnl
AC_REQUIRE([AC_PROG_MAKE_SET])dnl
AC_REQUIRE([AM_SET_LEADING_DOT])dnl
_AM_IF_OPTION([tar-ustar], [_AM_PROG_TAR([ustar])],
[_AM_IF_OPTION([tar-pax], [_AM_PROG_TAR([pax])],
[_AM_PROG_TAR([v7])])])
_AM_IF_OPTION([no-dependencies],,
[AC_PROVIDE_IFELSE([AC_PROG_CC],
[_AM_DEPENDENCIES(CC)],
[define([AC_PROG_CC],
defn([AC_PROG_CC])[_AM_DEPENDENCIES(CC)])])dnl
AC_PROVIDE_IFELSE([AC_PROG_CXX],
[_AM_DEPENDENCIES(CXX)],
[define([AC_PROG_CXX],
defn([AC_PROG_CXX])[_AM_DEPENDENCIES(CXX)])])dnl
])
])
# When config.status generates a header, we must update the stamp-h file.
# This file resides in the same directory as the config header
# that is generated. The stamp files are numbered to have different names.
# Autoconf calls _AC_AM_CONFIG_HEADER_HOOK (when defined) in the
# loop where config.status creates the headers, so we can generate
# our stamp files there.
AC_DEFUN([_AC_AM_CONFIG_HEADER_HOOK],
[# Compute $1's index in $config_headers.
_am_stamp_count=1
for _am_header in $config_headers :; do
case $_am_header in
$1 | $1:* )
break ;;
* )
_am_stamp_count=`expr $_am_stamp_count + 1` ;;
esac
done
echo "timestamp for $1" >`AS_DIRNAME([$1])`/stamp-h[]$_am_stamp_count])
# Copyright (C) 2001, 2003, 2005 Free Software Foundation, Inc.
#
# This file is free software; the Free Software Foundation
# gives unlimited permission to copy and/or distribute it,
# with or without modifications, as long as this notice is preserved.
# AM_PROG_INSTALL_SH
# ------------------
# Define $install_sh.
AC_DEFUN([AM_PROG_INSTALL_SH],
[AC_REQUIRE([AM_AUX_DIR_EXPAND])dnl
install_sh=${install_sh-"$am_aux_dir/install-sh"}
AC_SUBST(install_sh)])
# Copyright (C) 2003, 2005 Free Software Foundation, Inc.
#
# This file is free software; the Free Software Foundation
# gives unlimited permission to copy and/or distribute it,
# with or without modifications, as long as this notice is preserved.
# serial 2
# Check whether the underlying file-system supports filenames
# with a leading dot. For instance MS-DOS doesn't.
AC_DEFUN([AM_SET_LEADING_DOT],
[rm -rf .tst 2>/dev/null
mkdir .tst 2>/dev/null
if test -d .tst; then
am__leading_dot=.
else
am__leading_dot=_
fi
rmdir .tst 2>/dev/null
AC_SUBST([am__leading_dot])])
# Add --enable-maintainer-mode option to configure. -*- Autoconf -*-
# From Jim Meyering
# Copyright (C) 1996, 1998, 2000, 2001, 2002, 2003, 2004, 2005
# Free Software Foundation, Inc.
#
# This file is free software; the Free Software Foundation
# gives unlimited permission to copy and/or distribute it,
# with or without modifications, as long as this notice is preserved.
# serial 4
AC_DEFUN([AM_MAINTAINER_MODE],
[AC_MSG_CHECKING([whether to enable maintainer-specific portions of Makefiles])
dnl maintainer-mode is disabled by default
AC_ARG_ENABLE(maintainer-mode,
[ --enable-maintainer-mode enable make rules and dependencies not useful
(and sometimes confusing) to the casual installer],
USE_MAINTAINER_MODE=$enableval,
USE_MAINTAINER_MODE=no)
AC_MSG_RESULT([$USE_MAINTAINER_MODE])
AM_CONDITIONAL(MAINTAINER_MODE, [test $USE_MAINTAINER_MODE = yes])
MAINT=$MAINTAINER_MODE_TRUE
AC_SUBST(MAINT)dnl
]
)
AU_DEFUN([jm_MAINTAINER_MODE], [AM_MAINTAINER_MODE])
# Check to see how 'make' treats includes. -*- Autoconf -*-
# Copyright (C) 2001, 2002, 2003, 2005 Free Software Foundation, Inc.
#
# This file is free software; the Free Software Foundation
# gives unlimited permission to copy and/or distribute it,
# with or without modifications, as long as this notice is preserved.
# serial 3
# AM_MAKE_INCLUDE()
# -----------------
# Check to see how make treats includes.
AC_DEFUN([AM_MAKE_INCLUDE],
[am_make=${MAKE-make}
cat > confinc << 'END'
am__doit:
@echo done
.PHONY: am__doit
END
# If we don't find an include directive, just comment out the code.
AC_MSG_CHECKING([for style of include used by $am_make])
am__include="#"
am__quote=
_am_result=none
# First try GNU make style include.
echo "include confinc" > confmf
# We grep out `Entering directory' and `Leaving directory'
# messages which can occur if `w' ends up in MAKEFLAGS.
# In particular we don't look at `^make:' because GNU make might
# be invoked under some other name (usually "gmake"), in which
# case it prints its new name instead of `make'.
if test "`$am_make -s -f confmf 2> /dev/null | grep -v 'ing directory'`" = "done"; then
am__include=include
am__quote=
_am_result=GNU
fi
# Now try BSD make style include.
if test "$am__include" = "#"; then
echo '.include "confinc"' > confmf
if test "`$am_make -s -f confmf 2> /dev/null`" = "done"; then
am__include=.include
am__quote="\""
_am_result=BSD
fi
fi
AC_SUBST([am__include])
AC_SUBST([am__quote])
AC_MSG_RESULT([$_am_result])
rm -f confinc confmf
])
# Copyright (C) 1999, 2000, 2001, 2003, 2005 Free Software Foundation, Inc.
#
# This file is free software; the Free Software Foundation
# gives unlimited permission to copy and/or distribute it,
# with or without modifications, as long as this notice is preserved.
# serial 3
# AM_PROG_CC_C_O
# --------------
# Like AC_PROG_CC_C_O, but changed for automake.
AC_DEFUN([AM_PROG_CC_C_O],
[AC_REQUIRE([AC_PROG_CC_C_O])dnl
AC_REQUIRE([AM_AUX_DIR_EXPAND])dnl
# FIXME: we rely on the cache variable name because
# there is no other way.
set dummy $CC
ac_cc=`echo $[2] | sed ['s/[^a-zA-Z0-9_]/_/g;s/^[0-9]/_/']`
if eval "test \"`echo '$ac_cv_prog_cc_'${ac_cc}_c_o`\" != yes"; then
# Losing compiler, so override with the script.
# FIXME: It is wrong to rewrite CC.
# But if we don't then we get into trouble of one sort or another.
# A longer-term fix would be to have automake use am__CC in this case,
# and then we could set am__CC="\$(top_srcdir)/compile \$(CC)"
CC="$am_aux_dir/compile $CC"
fi
])
# Fake the existence of programs that GNU maintainers use. -*- Autoconf -*-
# Copyright (C) 1997, 1999, 2000, 2001, 2003, 2005
# Free Software Foundation, Inc.
#
# This file is free software; the Free Software Foundation
# gives unlimited permission to copy and/or distribute it,
# with or without modifications, as long as this notice is preserved.
# serial 4
# AM_MISSING_PROG(NAME, PROGRAM)
# ------------------------------
AC_DEFUN([AM_MISSING_PROG],
[AC_REQUIRE([AM_MISSING_HAS_RUN])
$1=${$1-"${am_missing_run}$2"}
AC_SUBST($1)])
# AM_MISSING_HAS_RUN
# ------------------
# Define MISSING if not defined so far and test if it supports --run.
# If it does, set am_missing_run to use it, otherwise, to nothing.
AC_DEFUN([AM_MISSING_HAS_RUN],
[AC_REQUIRE([AM_AUX_DIR_EXPAND])dnl
test x"${MISSING+set}" = xset || MISSING="\${SHELL} $am_aux_dir/missing"
# Use eval to expand $SHELL
if eval "$MISSING --run true"; then
am_missing_run="$MISSING --run "
else
am_missing_run=
AC_MSG_WARN([`missing' script is too old or missing])
fi
])
# Copyright (C) 2003, 2004, 2005 Free Software Foundation, Inc.
#
# This file is free software; the Free Software Foundation
# gives unlimited permission to copy and/or distribute it,
# with or without modifications, as long as this notice is preserved.
# AM_PROG_MKDIR_P
# ---------------
# Check whether `mkdir -p' is supported, fallback to mkinstalldirs otherwise.
#
# Automake 1.8 used `mkdir -m 0755 -p --' to ensure that directories
# created by `make install' are always world readable, even if the
# installer happens to have an overly restrictive umask (e.g. 077).
# This was a mistake. There are at least two reasons why we must not
# use `-m 0755':
# - it causes special bits like SGID to be ignored,
# - it may be too restrictive (some setups expect 775 directories).
#
# Do not use -m 0755 and let people choose whatever they expect by
# setting umask.
#
# We cannot accept any implementation of `mkdir' that recognizes `-p'.
# Some implementations (such as Solaris 8's) are not thread-safe: if a
# parallel make tries to run `mkdir -p a/b' and `mkdir -p a/c'
# concurrently, both version can detect that a/ is missing, but only
# one can create it and the other will error out. Consequently we
# restrict ourselves to GNU make (using the --version option ensures
# this.)
AC_DEFUN([AM_PROG_MKDIR_P],
[if mkdir -p --version . >/dev/null 2>&1 && test ! -d ./--version; then
# We used to keeping the `.' as first argument, in order to
# allow $(mkdir_p) to be used without argument. As in
# $(mkdir_p) $(somedir)
# where $(somedir) is conditionally defined. However this is wrong
# for two reasons:
# 1. if the package is installed by a user who cannot write `.'
# make install will fail,
# 2. the above comment should most certainly read
# $(mkdir_p) $(DESTDIR)$(somedir)
# so it does not work when $(somedir) is undefined and
# $(DESTDIR) is not.
# To support the latter case, we have to write
# test -z "$(somedir)" || $(mkdir_p) $(DESTDIR)$(somedir),
# so the `.' trick is pointless.
mkdir_p='mkdir -p --'
else
# On NextStep and OpenStep, the `mkdir' command does not
# recognize any option. It will interpret all options as
# directories to create, and then abort because `.' already
# exists.
for d in ./-p ./--version;
do
test -d $d && rmdir $d
done
# $(mkinstalldirs) is defined by Automake if mkinstalldirs exists.
if test -f "$ac_aux_dir/mkinstalldirs"; then
mkdir_p='$(mkinstalldirs)'
else
mkdir_p='$(install_sh) -d'
fi
fi
AC_SUBST([mkdir_p])])
# Helper functions for option handling. -*- Autoconf -*-
# Copyright (C) 2001, 2002, 2003, 2005 Free Software Foundation, Inc.
#
# This file is free software; the Free Software Foundation
# gives unlimited permission to copy and/or distribute it,
# with or without modifications, as long as this notice is preserved.
# serial 3
# _AM_MANGLE_OPTION(NAME)
# -----------------------
AC_DEFUN([_AM_MANGLE_OPTION],
[[_AM_OPTION_]m4_bpatsubst($1, [[^a-zA-Z0-9_]], [_])])
# _AM_SET_OPTION(NAME)
# ------------------------------
# Set option NAME. Presently that only means defining a flag for this option.
AC_DEFUN([_AM_SET_OPTION],
[m4_define(_AM_MANGLE_OPTION([$1]), 1)])
# _AM_SET_OPTIONS(OPTIONS)
# ----------------------------------
# OPTIONS is a space-separated list of Automake options.
AC_DEFUN([_AM_SET_OPTIONS],
[AC_FOREACH([_AM_Option], [$1], [_AM_SET_OPTION(_AM_Option)])])
# _AM_IF_OPTION(OPTION, IF-SET, [IF-NOT-SET])
# -------------------------------------------
# Execute IF-SET if OPTION is set, IF-NOT-SET otherwise.
AC_DEFUN([_AM_IF_OPTION],
[m4_ifset(_AM_MANGLE_OPTION([$1]), [$2], [$3])])
# Check to make sure that the build environment is sane. -*- Autoconf -*-
# Copyright (C) 1996, 1997, 2000, 2001, 2003, 2005
# Free Software Foundation, Inc.
#
# This file is free software; the Free Software Foundation
# gives unlimited permission to copy and/or distribute it,
# with or without modifications, as long as this notice is preserved.
# serial 4
# AM_SANITY_CHECK
# ---------------
AC_DEFUN([AM_SANITY_CHECK],
[AC_MSG_CHECKING([whether build environment is sane])
# Just in case
sleep 1
echo timestamp > conftest.file
# Do `set' in a subshell so we don't clobber the current shell's
# arguments. Must try -L first in case configure is actually a
# symlink; some systems play weird games with the mod time of symlinks
# (eg FreeBSD returns the mod time of the symlink's containing
# directory).
if (
set X `ls -Lt $srcdir/configure conftest.file 2> /dev/null`
if test "$[*]" = "X"; then
# -L didn't work.
set X `ls -t $srcdir/configure conftest.file`
fi
rm -f conftest.file
if test "$[*]" != "X $srcdir/configure conftest.file" \
&& test "$[*]" != "X conftest.file $srcdir/configure"; then
# If neither matched, then we have a broken ls. This can happen
# if, for instance, CONFIG_SHELL is bash and it inherits a
# broken ls alias from the environment. This has actually
# happened. Such a system could not be considered "sane".
AC_MSG_ERROR([ls -t appears to fail. Make sure there is not a broken
alias in your environment])
fi
test "$[2]" = conftest.file
)
then
# Ok.
:
else
AC_MSG_ERROR([newly created file is older than distributed files!
Check your system clock])
fi
AC_MSG_RESULT(yes)])
# Copyright (C) 2001, 2003, 2005 Free Software Foundation, Inc.
#
# This file is free software; the Free Software Foundation
# gives unlimited permission to copy and/or distribute it,
# with or without modifications, as long as this notice is preserved.
# AM_PROG_INSTALL_STRIP
# ---------------------
# One issue with vendor `install' (even GNU) is that you can't
# specify the program used to strip binaries. This is especially
# annoying in cross-compiling environments, where the build's strip
# is unlikely to handle the host's binaries.
# Fortunately install-sh will honor a STRIPPROG variable, so we
# always use install-sh in `make install-strip', and initialize
# STRIPPROG with the value of the STRIP variable (set by the user).
AC_DEFUN([AM_PROG_INSTALL_STRIP],
[AC_REQUIRE([AM_PROG_INSTALL_SH])dnl
# Installed binaries are usually stripped using `strip' when the user
# run `make install-strip'. However `strip' might not be the right
# tool to use in cross-compilation environments, therefore Automake
# will honor the `STRIP' environment variable to overrule this program.
dnl Don't test for $cross_compiling = yes, because it might be `maybe'.
if test "$cross_compiling" != no; then
AC_CHECK_TOOL([STRIP], [strip], :)
fi
INSTALL_STRIP_PROGRAM="\${SHELL} \$(install_sh) -c -s"
AC_SUBST([INSTALL_STRIP_PROGRAM])])
# Check how to create a tarball. -*- Autoconf -*-
# Copyright (C) 2004, 2005 Free Software Foundation, Inc.
#
# This file is free software; the Free Software Foundation
# gives unlimited permission to copy and/or distribute it,
# with or without modifications, as long as this notice is preserved.
# serial 2
# _AM_PROG_TAR(FORMAT)
# --------------------
# Check how to create a tarball in format FORMAT.
# FORMAT should be one of `v7', `ustar', or `pax'.
#
# Substitute a variable $(am__tar) that is a command
# writing to stdout a FORMAT-tarball containing the directory
# $tardir.
# tardir=directory && $(am__tar) > result.tar
#
# Substitute a variable $(am__untar) that extract such
# a tarball read from stdin.
# $(am__untar) < result.tar
AC_DEFUN([_AM_PROG_TAR],
[# Always define AMTAR for backward compatibility.
AM_MISSING_PROG([AMTAR], [tar])
m4_if([$1], [v7],
[am__tar='${AMTAR} chof - "$$tardir"'; am__untar='${AMTAR} xf -'],
[m4_case([$1], [ustar],, [pax],,
[m4_fatal([Unknown tar format])])
AC_MSG_CHECKING([how to create a $1 tar archive])
# Loop over all known methods to create a tar archive until one works.
_am_tools='gnutar m4_if([$1], [ustar], [plaintar]) pax cpio none'
_am_tools=${am_cv_prog_tar_$1-$_am_tools}
# Do not fold the above two line into one, because Tru64 sh and
# Solaris sh will not grok spaces in the rhs of `-'.
for _am_tool in $_am_tools
do
case $_am_tool in
gnutar)
for _am_tar in tar gnutar gtar;
do
AM_RUN_LOG([$_am_tar --version]) && break
done
am__tar="$_am_tar --format=m4_if([$1], [pax], [posix], [$1]) -chf - "'"$$tardir"'
am__tar_="$_am_tar --format=m4_if([$1], [pax], [posix], [$1]) -chf - "'"$tardir"'
am__untar="$_am_tar -xf -"
;;
plaintar)
# Must skip GNU tar: if it does not support --format= it doesn't create
# ustar tarball either.
(tar --version) >/dev/null 2>&1 && continue
am__tar='tar chf - "$$tardir"'
am__tar_='tar chf - "$tardir"'
am__untar='tar xf -'
;;
pax)
am__tar='pax -L -x $1 -w "$$tardir"'
am__tar_='pax -L -x $1 -w "$tardir"'
am__untar='pax -r'
;;
cpio)
am__tar='find "$$tardir" -print | cpio -o -H $1 -L'
am__tar_='find "$tardir" -print | cpio -o -H $1 -L'
am__untar='cpio -i -H $1 -d'
;;
none)
am__tar=false
am__tar_=false
am__untar=false
;;
esac
# If the value was cached, stop now. We just wanted to have am__tar
# and am__untar set.
test -n "${am_cv_prog_tar_$1}" && break
# tar/untar a dummy directory, and stop if the command works
rm -rf conftest.dir
mkdir conftest.dir
echo GrepMe > conftest.dir/file
AM_RUN_LOG([tardir=conftest.dir && eval $am__tar_ >conftest.tar])
rm -rf conftest.dir
if test -s conftest.tar; then
AM_RUN_LOG([$am__untar <conftest.tar])
grep GrepMe conftest.dir/file >/dev/null 2>&1 && break
fi
done
rm -rf conftest.dir
AC_CACHE_VAL([am_cv_prog_tar_$1], [am_cv_prog_tar_$1=$_am_tool])
AC_MSG_RESULT([$am_cv_prog_tar_$1])])
AC_SUBST([am__tar])
AC_SUBST([am__untar])
]) # _AM_PROG_TAR
m4_include([acinclude.m4])

41
ccmain/Makefile.am Normal file
View File

@ -0,0 +1,41 @@
SUBDIRS =
AM_CPPFLAGS = \
-I$(top_srcdir)/ccutil -I$(top_srcdir)/ccstruct \
-I$(top_srcdir)/image -I$(top_srcdir)/viewer \
-I$(top_srcdir)/ccops -I$(top_srcdir)/dict \
-I$(top_srcdir)/classify -I$(top_srcdir)/display \
-I$(top_srcdir)/wordrec -I$(top_srcdir)/cutil \
-I$(top_srcdir)/textord
EXTRA_DIST = \
adaptions.h applybox.h baseapi.h blobcmp.h \
callnet.h charcut.h \
control.h docqual.h expandblob.h fixspace.h fixxht.h \
imgscale.h matmatch.h output.h paircmp.h reject.h scaleimg.h \
tessbox.h tessedit.h tesseractmain.h tessvars.h tfacep.h \
tessembedded.h tfacepp.h tstruct.h werdit.h
noinst_LIBRARIES = libtesseract_main.a
libtesseract_main_a_SOURCES = \
tessedit.cpp adaptions.cpp applybox.cpp \
baseapi.cpp blobcmp.cpp \
callnet.cpp charcut.cpp charsample.cpp control.cpp \
docqual.cpp expandblob.cpp fixspace.cpp fixxht.cpp \
imgscale.cpp matmatch.cpp output.cpp paircmp.cpp \
reject.cpp scaleimg.cpp tessbox.cpp tessvars.cpp \
tfacepp.cpp tstruct.cpp werdit.cpp
bin_PROGRAMS = tesseract
tesseract_SOURCES = tesseractmain.cpp
tesseract_LDADD = \
libtesseract_main.a \
../display/libtesseract_display.a \
../textord/libtesseract_textord.a \
../wordrec/libtesseract_wordrec.a \
../classify/libtesseract_classify.a \
../dict/libtesseract_dict.a \
../viewer/libtesseract_viewer.a \
../image/libtesseract_image.a \
../cutil/libtesseract_cutil.a \
../ccstruct/libtesseract_ccstruct.a \
../ccutil/libtesseract_ccutil.a

636
ccmain/Makefile.in Normal file
View File

@ -0,0 +1,636 @@
# Makefile.in generated by automake 1.9.6 from Makefile.am.
# @configure_input@
# Copyright (C) 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002,
# 2003, 2004, 2005 Free Software Foundation, Inc.
# This Makefile.in is free software; the Free Software Foundation
# gives unlimited permission to copy and/or distribute it,
# with or without modifications, as long as this notice is preserved.
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY, to the extent permitted by law; without
# even the implied warranty of MERCHANTABILITY or FITNESS FOR A
# PARTICULAR PURPOSE.
@SET_MAKE@
srcdir = @srcdir@
top_srcdir = @top_srcdir@
VPATH = @srcdir@
pkgdatadir = $(datadir)/@PACKAGE@
pkglibdir = $(libdir)/@PACKAGE@
pkgincludedir = $(includedir)/@PACKAGE@
top_builddir = ..
am__cd = CDPATH="$${ZSH_VERSION+.}$(PATH_SEPARATOR)" && cd
INSTALL = @INSTALL@
install_sh_DATA = $(install_sh) -c -m 644
install_sh_PROGRAM = $(install_sh) -c
install_sh_SCRIPT = $(install_sh) -c
INSTALL_HEADER = $(INSTALL_DATA)
transform = $(program_transform_name)
NORMAL_INSTALL = :
PRE_INSTALL = :
POST_INSTALL = :
NORMAL_UNINSTALL = :
PRE_UNINSTALL = :
POST_UNINSTALL = :
build_triplet = @build@
host_triplet = @host@
bin_PROGRAMS = tesseract$(EXEEXT)
subdir = ccmain
DIST_COMMON = $(srcdir)/Makefile.am $(srcdir)/Makefile.in
ACLOCAL_M4 = $(top_srcdir)/aclocal.m4
am__aclocal_m4_deps = $(top_srcdir)/acinclude.m4 \
$(top_srcdir)/config/ac_define_versionlevel.m4 \
$(top_srcdir)/config/acinclude_custom.m4 \
$(top_srcdir)/configure.ac
am__configure_deps = $(am__aclocal_m4_deps) $(CONFIGURE_DEPENDENCIES) \
$(ACLOCAL_M4)
mkinstalldirs = $(SHELL) $(top_srcdir)/config/mkinstalldirs
CONFIG_HEADER = $(top_builddir)/config_auto.h
CONFIG_CLEAN_FILES =
LIBRARIES = $(noinst_LIBRARIES)
AR = ar
ARFLAGS = cru
libtesseract_main_a_AR = $(AR) $(ARFLAGS)
libtesseract_main_a_LIBADD =
am_libtesseract_main_a_OBJECTS = tessedit.$(OBJEXT) \
adaptions.$(OBJEXT) applybox.$(OBJEXT) baseapi.$(OBJEXT) \
blobcmp.$(OBJEXT) callnet.$(OBJEXT) charcut.$(OBJEXT) \
charsample.$(OBJEXT) control.$(OBJEXT) docqual.$(OBJEXT) \
expandblob.$(OBJEXT) fixspace.$(OBJEXT) fixxht.$(OBJEXT) \
imgscale.$(OBJEXT) matmatch.$(OBJEXT) output.$(OBJEXT) \
paircmp.$(OBJEXT) reject.$(OBJEXT) scaleimg.$(OBJEXT) \
tessbox.$(OBJEXT) tessvars.$(OBJEXT) tfacepp.$(OBJEXT) \
tstruct.$(OBJEXT) werdit.$(OBJEXT)
libtesseract_main_a_OBJECTS = $(am_libtesseract_main_a_OBJECTS)
am__installdirs = "$(DESTDIR)$(bindir)"
binPROGRAMS_INSTALL = $(INSTALL_PROGRAM)
PROGRAMS = $(bin_PROGRAMS)
am_tesseract_OBJECTS = tesseractmain.$(OBJEXT)
tesseract_OBJECTS = $(am_tesseract_OBJECTS)
tesseract_DEPENDENCIES = libtesseract_main.a \
../display/libtesseract_display.a \
../textord/libtesseract_textord.a \
../wordrec/libtesseract_wordrec.a \
../classify/libtesseract_classify.a \
../dict/libtesseract_dict.a ../viewer/libtesseract_viewer.a \
../image/libtesseract_image.a ../cutil/libtesseract_cutil.a \
../ccstruct/libtesseract_ccstruct.a \
../ccutil/libtesseract_ccutil.a
DEFAULT_INCLUDES = -I. -I$(srcdir) -I$(top_builddir)
depcomp = $(SHELL) $(top_srcdir)/config/depcomp
am__depfiles_maybe = depfiles
CXXCOMPILE = $(CXX) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) \
$(AM_CPPFLAGS) $(CPPFLAGS) $(AM_CXXFLAGS) $(CXXFLAGS)
CXXLD = $(CXX)
CXXLINK = $(CXXLD) $(AM_CXXFLAGS) $(CXXFLAGS) $(AM_LDFLAGS) $(LDFLAGS) \
-o $@
SOURCES = $(libtesseract_main_a_SOURCES) $(tesseract_SOURCES)
DIST_SOURCES = $(libtesseract_main_a_SOURCES) $(tesseract_SOURCES)
RECURSIVE_TARGETS = all-recursive check-recursive dvi-recursive \
html-recursive info-recursive install-data-recursive \
install-exec-recursive install-info-recursive \
install-recursive installcheck-recursive installdirs-recursive \
pdf-recursive ps-recursive uninstall-info-recursive \
uninstall-recursive
ETAGS = etags
CTAGS = ctags
DIST_SUBDIRS = $(SUBDIRS)
DISTFILES = $(DIST_COMMON) $(DIST_SOURCES) $(TEXINFOS) $(EXTRA_DIST)
ACLOCAL = @ACLOCAL@
AMDEP_FALSE = @AMDEP_FALSE@
AMDEP_TRUE = @AMDEP_TRUE@
AMTAR = @AMTAR@
AUTOCONF = @AUTOCONF@
AUTOHEADER = @AUTOHEADER@
AUTOMAKE = @AUTOMAKE@
AWK = @AWK@
CC = @CC@
CCDEPMODE = @CCDEPMODE@
CFLAGS = @CFLAGS@
CPPFLAGS = @CPPFLAGS@
CXX = @CXX@
CXXCPP = @CXXCPP@
CXXDEPMODE = @CXXDEPMODE@
CXXFLAGS = @CXXFLAGS@
CXXRPOFLAGS = @CXXRPOFLAGS@
CYGPATH_W = @CYGPATH_W@
DEFS = @DEFS@
DEPDIR = @DEPDIR@
ECHO_C = @ECHO_C@
ECHO_N = @ECHO_N@
ECHO_T = @ECHO_T@
EGREP = @EGREP@
EXEEXT = @EXEEXT@
GNUWIN32_DIR = @GNUWIN32_DIR@
HAVE_GNUWIN32_FALSE = @HAVE_GNUWIN32_FALSE@
HAVE_GNUWIN32_TRUE = @HAVE_GNUWIN32_TRUE@
HAVE_LIBTIFF_FALSE = @HAVE_LIBTIFF_FALSE@
HAVE_LIBTIFF_TRUE = @HAVE_LIBTIFF_TRUE@
INSTALL_DATA = @INSTALL_DATA@
INSTALL_PROGRAM = @INSTALL_PROGRAM@
INSTALL_SCRIPT = @INSTALL_SCRIPT@
INSTALL_STRIP_PROGRAM = @INSTALL_STRIP_PROGRAM@
LDFLAGS = @LDFLAGS@
LIBOBJS = @LIBOBJS@
LIBS = @LIBS@
LIBTIFF_CFLAGS = @LIBTIFF_CFLAGS@
LIBTIFF_LIBS = @LIBTIFF_LIBS@
LTLIBOBJS = @LTLIBOBJS@
MAINT = @MAINT@
MAINTAINER_MODE_FALSE = @MAINTAINER_MODE_FALSE@
MAINTAINER_MODE_TRUE = @MAINTAINER_MODE_TRUE@
MAKEINFO = @MAKEINFO@
OBJEXT = @OBJEXT@
OPTS = @OPTS@
PACKAGE = @PACKAGE@
PACKAGE_BUGREPORT = @PACKAGE_BUGREPORT@
PACKAGE_DATE = @PACKAGE_DATE@
PACKAGE_NAME = @PACKAGE_NAME@
PACKAGE_STRING = @PACKAGE_STRING@
PACKAGE_TARNAME = @PACKAGE_TARNAME@
PACKAGE_VERSION = @PACKAGE_VERSION@
PACKAGE_YEAR = @PACKAGE_YEAR@
PATH_SEPARATOR = @PATH_SEPARATOR@
RANLIB = @RANLIB@
RPO_NO = @RPO_NO@
RPO_YES = @RPO_YES@
SET_MAKE = @SET_MAKE@
SHELL = @SHELL@
STRIP = @STRIP@
USING_CL_FALSE = @USING_CL_FALSE@
USING_CL_TRUE = @USING_CL_TRUE@
VERSION = @VERSION@
ac_ct_CC = @ac_ct_CC@
ac_ct_CXX = @ac_ct_CXX@
ac_ct_RANLIB = @ac_ct_RANLIB@
ac_ct_STRIP = @ac_ct_STRIP@
am__fastdepCC_FALSE = @am__fastdepCC_FALSE@
am__fastdepCC_TRUE = @am__fastdepCC_TRUE@
am__fastdepCXX_FALSE = @am__fastdepCXX_FALSE@
am__fastdepCXX_TRUE = @am__fastdepCXX_TRUE@
am__include = @am__include@
am__leading_dot = @am__leading_dot@
am__quote = @am__quote@
am__tar = @am__tar@
am__untar = @am__untar@
bindir = @bindir@
build = @build@
build_alias = @build_alias@
build_cpu = @build_cpu@
build_os = @build_os@
build_vendor = @build_vendor@
datadir = @datadir@
exec_prefix = @exec_prefix@
host = @host@
host_alias = @host_alias@
host_cpu = @host_cpu@
host_os = @host_os@
host_vendor = @host_vendor@
includedir = @includedir@
infodir = @infodir@
install_sh = @install_sh@
libdir = @libdir@
libexecdir = @libexecdir@
localstatedir = @localstatedir@
mandir = @mandir@
mkdir_p = @mkdir_p@
oldincludedir = @oldincludedir@
prefix = @prefix@
program_transform_name = @program_transform_name@
sbindir = @sbindir@
sharedstatedir = @sharedstatedir@
sysconfdir = @sysconfdir@
target_alias = @target_alias@
SUBDIRS =
AM_CPPFLAGS = \
-I$(top_srcdir)/ccutil -I$(top_srcdir)/ccstruct \
-I$(top_srcdir)/image -I$(top_srcdir)/viewer \
-I$(top_srcdir)/ccops -I$(top_srcdir)/dict \
-I$(top_srcdir)/classify -I$(top_srcdir)/display \
-I$(top_srcdir)/wordrec -I$(top_srcdir)/cutil \
-I$(top_srcdir)/textord
EXTRA_DIST = \
adaptions.h applybox.h baseapi.h blobcmp.h \
callnet.h charcut.h \
control.h docqual.h expandblob.h fixspace.h fixxht.h \
imgscale.h matmatch.h output.h paircmp.h reject.h scaleimg.h \
tessbox.h tessedit.h tesseractmain.h tessvars.h tfacep.h \
tessembedded.h tfacepp.h tstruct.h werdit.h
noinst_LIBRARIES = libtesseract_main.a
libtesseract_main_a_SOURCES = \
tessedit.cpp adaptions.cpp applybox.cpp \
baseapi.cpp blobcmp.cpp \
callnet.cpp charcut.cpp charsample.cpp control.cpp \
docqual.cpp expandblob.cpp fixspace.cpp fixxht.cpp \
imgscale.cpp matmatch.cpp output.cpp paircmp.cpp \
reject.cpp scaleimg.cpp tessbox.cpp tessvars.cpp \
tfacepp.cpp tstruct.cpp werdit.cpp
tesseract_SOURCES = tesseractmain.cpp
tesseract_LDADD = \
libtesseract_main.a \
../display/libtesseract_display.a \
../textord/libtesseract_textord.a \
../wordrec/libtesseract_wordrec.a \
../classify/libtesseract_classify.a \
../dict/libtesseract_dict.a \
../viewer/libtesseract_viewer.a \
../image/libtesseract_image.a \
../cutil/libtesseract_cutil.a \
../ccstruct/libtesseract_ccstruct.a \
../ccutil/libtesseract_ccutil.a
all: all-recursive
.SUFFIXES:
.SUFFIXES: .cpp .o .obj
$(srcdir)/Makefile.in: @MAINTAINER_MODE_TRUE@ $(srcdir)/Makefile.am $(am__configure_deps)
@for dep in $?; do \
case '$(am__configure_deps)' in \
*$$dep*) \
cd $(top_builddir) && $(MAKE) $(AM_MAKEFLAGS) am--refresh \
&& exit 0; \
exit 1;; \
esac; \
done; \
echo ' cd $(top_srcdir) && $(AUTOMAKE) --gnu ccmain/Makefile'; \
cd $(top_srcdir) && \
$(AUTOMAKE) --gnu ccmain/Makefile
.PRECIOUS: Makefile
Makefile: $(srcdir)/Makefile.in $(top_builddir)/config.status
@case '$?' in \
*config.status*) \
cd $(top_builddir) && $(MAKE) $(AM_MAKEFLAGS) am--refresh;; \
*) \
echo ' cd $(top_builddir) && $(SHELL) ./config.status $(subdir)/$@ $(am__depfiles_maybe)'; \
cd $(top_builddir) && $(SHELL) ./config.status $(subdir)/$@ $(am__depfiles_maybe);; \
esac;
$(top_builddir)/config.status: $(top_srcdir)/configure $(CONFIG_STATUS_DEPENDENCIES)
cd $(top_builddir) && $(MAKE) $(AM_MAKEFLAGS) am--refresh
$(top_srcdir)/configure: @MAINTAINER_MODE_TRUE@ $(am__configure_deps)
cd $(top_builddir) && $(MAKE) $(AM_MAKEFLAGS) am--refresh
$(ACLOCAL_M4): @MAINTAINER_MODE_TRUE@ $(am__aclocal_m4_deps)
cd $(top_builddir) && $(MAKE) $(AM_MAKEFLAGS) am--refresh
clean-noinstLIBRARIES:
-test -z "$(noinst_LIBRARIES)" || rm -f $(noinst_LIBRARIES)
libtesseract_main.a: $(libtesseract_main_a_OBJECTS) $(libtesseract_main_a_DEPENDENCIES)
-rm -f libtesseract_main.a
$(libtesseract_main_a_AR) libtesseract_main.a $(libtesseract_main_a_OBJECTS) $(libtesseract_main_a_LIBADD)
$(RANLIB) libtesseract_main.a
install-binPROGRAMS: $(bin_PROGRAMS)
@$(NORMAL_INSTALL)
test -z "$(bindir)" || $(mkdir_p) "$(DESTDIR)$(bindir)"
@list='$(bin_PROGRAMS)'; for p in $$list; do \
p1=`echo $$p|sed 's/$(EXEEXT)$$//'`; \
if test -f $$p \
; then \
f=`echo "$$p1" | sed 's,^.*/,,;$(transform);s/$$/$(EXEEXT)/'`; \
echo " $(INSTALL_PROGRAM_ENV) $(binPROGRAMS_INSTALL) '$$p' '$(DESTDIR)$(bindir)/$$f'"; \
$(INSTALL_PROGRAM_ENV) $(binPROGRAMS_INSTALL) "$$p" "$(DESTDIR)$(bindir)/$$f" || exit 1; \
else :; fi; \
done
uninstall-binPROGRAMS:
@$(NORMAL_UNINSTALL)
@list='$(bin_PROGRAMS)'; for p in $$list; do \
f=`echo "$$p" | sed 's,^.*/,,;s/$(EXEEXT)$$//;$(transform);s/$$/$(EXEEXT)/'`; \
echo " rm -f '$(DESTDIR)$(bindir)/$$f'"; \
rm -f "$(DESTDIR)$(bindir)/$$f"; \
done
clean-binPROGRAMS:
-test -z "$(bin_PROGRAMS)" || rm -f $(bin_PROGRAMS)
tesseract$(EXEEXT): $(tesseract_OBJECTS) $(tesseract_DEPENDENCIES)
@rm -f tesseract$(EXEEXT)
$(CXXLINK) $(tesseract_LDFLAGS) $(tesseract_OBJECTS) $(tesseract_LDADD) $(LIBS)
mostlyclean-compile:
-rm -f *.$(OBJEXT)
distclean-compile:
-rm -f *.tab.c
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/adaptions.Po@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/applybox.Po@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/baseapi.Po@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/blobcmp.Po@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/callnet.Po@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/charcut.Po@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/charsample.Po@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/control.Po@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/docqual.Po@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/expandblob.Po@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/fixspace.Po@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/fixxht.Po@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/imgscale.Po@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/matmatch.Po@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/output.Po@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/paircmp.Po@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/reject.Po@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/scaleimg.Po@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/tessbox.Po@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/tessedit.Po@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/tesseractmain.Po@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/tessvars.Po@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/tfacepp.Po@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/tstruct.Po@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/werdit.Po@am__quote@
.cpp.o:
@am__fastdepCXX_TRUE@ if $(CXXCOMPILE) -MT $@ -MD -MP -MF "$(DEPDIR)/$*.Tpo" -c -o $@ $<; \
@am__fastdepCXX_TRUE@ then mv -f "$(DEPDIR)/$*.Tpo" "$(DEPDIR)/$*.Po"; else rm -f "$(DEPDIR)/$*.Tpo"; exit 1; fi
@AMDEP_TRUE@@am__fastdepCXX_FALSE@ source='$<' object='$@' libtool=no @AMDEPBACKSLASH@
@AMDEP_TRUE@@am__fastdepCXX_FALSE@ DEPDIR=$(DEPDIR) $(CXXDEPMODE) $(depcomp) @AMDEPBACKSLASH@
@am__fastdepCXX_FALSE@ $(CXXCOMPILE) -c -o $@ $<
.cpp.obj:
@am__fastdepCXX_TRUE@ if $(CXXCOMPILE) -MT $@ -MD -MP -MF "$(DEPDIR)/$*.Tpo" -c -o $@ `$(CYGPATH_W) '$<'`; \
@am__fastdepCXX_TRUE@ then mv -f "$(DEPDIR)/$*.Tpo" "$(DEPDIR)/$*.Po"; else rm -f "$(DEPDIR)/$*.Tpo"; exit 1; fi
@AMDEP_TRUE@@am__fastdepCXX_FALSE@ source='$<' object='$@' libtool=no @AMDEPBACKSLASH@
@AMDEP_TRUE@@am__fastdepCXX_FALSE@ DEPDIR=$(DEPDIR) $(CXXDEPMODE) $(depcomp) @AMDEPBACKSLASH@
@am__fastdepCXX_FALSE@ $(CXXCOMPILE) -c -o $@ `$(CYGPATH_W) '$<'`
uninstall-info-am:
# This directory's subdirectories are mostly independent; you can cd
# into them and run `make' without going through this Makefile.
# To change the values of `make' variables: instead of editing Makefiles,
# (1) if the variable is set in `config.status', edit `config.status'
# (which will cause the Makefiles to be regenerated when you run `make');
# (2) otherwise, pass the desired values on the `make' command line.
$(RECURSIVE_TARGETS):
@failcom='exit 1'; \
for f in x $$MAKEFLAGS; do \
case $$f in \
*=* | --[!k]*);; \
*k*) failcom='fail=yes';; \
esac; \
done; \
dot_seen=no; \
target=`echo $@ | sed s/-recursive//`; \
list='$(SUBDIRS)'; for subdir in $$list; do \
echo "Making $$target in $$subdir"; \
if test "$$subdir" = "."; then \
dot_seen=yes; \
local_target="$$target-am"; \
else \
local_target="$$target"; \
fi; \
(cd $$subdir && $(MAKE) $(AM_MAKEFLAGS) $$local_target) \
|| eval $$failcom; \
done; \
if test "$$dot_seen" = "no"; then \
$(MAKE) $(AM_MAKEFLAGS) "$$target-am" || exit 1; \
fi; test -z "$$fail"
mostlyclean-recursive clean-recursive distclean-recursive \
maintainer-clean-recursive:
@failcom='exit 1'; \
for f in x $$MAKEFLAGS; do \
case $$f in \
*=* | --[!k]*);; \
*k*) failcom='fail=yes';; \
esac; \
done; \
dot_seen=no; \
case "$@" in \
distclean-* | maintainer-clean-*) list='$(DIST_SUBDIRS)' ;; \
*) list='$(SUBDIRS)' ;; \
esac; \
rev=''; for subdir in $$list; do \
if test "$$subdir" = "."; then :; else \
rev="$$subdir $$rev"; \
fi; \
done; \
rev="$$rev ."; \
target=`echo $@ | sed s/-recursive//`; \
for subdir in $$rev; do \
echo "Making $$target in $$subdir"; \
if test "$$subdir" = "."; then \
local_target="$$target-am"; \
else \
local_target="$$target"; \
fi; \
(cd $$subdir && $(MAKE) $(AM_MAKEFLAGS) $$local_target) \
|| eval $$failcom; \
done && test -z "$$fail"
tags-recursive:
list='$(SUBDIRS)'; for subdir in $$list; do \
test "$$subdir" = . || (cd $$subdir && $(MAKE) $(AM_MAKEFLAGS) tags); \
done
ctags-recursive:
list='$(SUBDIRS)'; for subdir in $$list; do \
test "$$subdir" = . || (cd $$subdir && $(MAKE) $(AM_MAKEFLAGS) ctags); \
done
ID: $(HEADERS) $(SOURCES) $(LISP) $(TAGS_FILES)
list='$(SOURCES) $(HEADERS) $(LISP) $(TAGS_FILES)'; \
unique=`for i in $$list; do \
if test -f "$$i"; then echo $$i; else echo $(srcdir)/$$i; fi; \
done | \
$(AWK) ' { files[$$0] = 1; } \
END { for (i in files) print i; }'`; \
mkid -fID $$unique
tags: TAGS
TAGS: tags-recursive $(HEADERS) $(SOURCES) $(TAGS_DEPENDENCIES) \
$(TAGS_FILES) $(LISP)
tags=; \
here=`pwd`; \
if ($(ETAGS) --etags-include --version) >/dev/null 2>&1; then \
include_option=--etags-include; \
empty_fix=.; \
else \
include_option=--include; \
empty_fix=; \
fi; \
list='$(SUBDIRS)'; for subdir in $$list; do \
if test "$$subdir" = .; then :; else \
test ! -f $$subdir/TAGS || \
tags="$$tags $$include_option=$$here/$$subdir/TAGS"; \
fi; \
done; \
list='$(SOURCES) $(HEADERS) $(LISP) $(TAGS_FILES)'; \
unique=`for i in $$list; do \
if test -f "$$i"; then echo $$i; else echo $(srcdir)/$$i; fi; \
done | \
$(AWK) ' { files[$$0] = 1; } \
END { for (i in files) print i; }'`; \
if test -z "$(ETAGS_ARGS)$$tags$$unique"; then :; else \
test -n "$$unique" || unique=$$empty_fix; \
$(ETAGS) $(ETAGSFLAGS) $(AM_ETAGSFLAGS) $(ETAGS_ARGS) \
$$tags $$unique; \
fi
ctags: CTAGS
CTAGS: ctags-recursive $(HEADERS) $(SOURCES) $(TAGS_DEPENDENCIES) \
$(TAGS_FILES) $(LISP)
tags=; \
here=`pwd`; \
list='$(SOURCES) $(HEADERS) $(LISP) $(TAGS_FILES)'; \
unique=`for i in $$list; do \
if test -f "$$i"; then echo $$i; else echo $(srcdir)/$$i; fi; \
done | \
$(AWK) ' { files[$$0] = 1; } \
END { for (i in files) print i; }'`; \
test -z "$(CTAGS_ARGS)$$tags$$unique" \
|| $(CTAGS) $(CTAGSFLAGS) $(AM_CTAGSFLAGS) $(CTAGS_ARGS) \
$$tags $$unique
GTAGS:
here=`$(am__cd) $(top_builddir) && pwd` \
&& cd $(top_srcdir) \
&& gtags -i $(GTAGS_ARGS) $$here
distclean-tags:
-rm -f TAGS ID GTAGS GRTAGS GSYMS GPATH tags
distdir: $(DISTFILES)
@srcdirstrip=`echo "$(srcdir)" | sed 's|.|.|g'`; \
topsrcdirstrip=`echo "$(top_srcdir)" | sed 's|.|.|g'`; \
list='$(DISTFILES)'; for file in $$list; do \
case $$file in \
$(srcdir)/*) file=`echo "$$file" | sed "s|^$$srcdirstrip/||"`;; \
$(top_srcdir)/*) file=`echo "$$file" | sed "s|^$$topsrcdirstrip/|$(top_builddir)/|"`;; \
esac; \
if test -f $$file || test -d $$file; then d=.; else d=$(srcdir); fi; \
dir=`echo "$$file" | sed -e 's,/[^/]*$$,,'`; \
if test "$$dir" != "$$file" && test "$$dir" != "."; then \
dir="/$$dir"; \
$(mkdir_p) "$(distdir)$$dir"; \
else \
dir=''; \
fi; \
if test -d $$d/$$file; then \
if test -d $(srcdir)/$$file && test $$d != $(srcdir); then \
cp -pR $(srcdir)/$$file $(distdir)$$dir || exit 1; \
fi; \
cp -pR $$d/$$file $(distdir)$$dir || exit 1; \
else \
test -f $(distdir)/$$file \
|| cp -p $$d/$$file $(distdir)/$$file \
|| exit 1; \
fi; \
done
list='$(DIST_SUBDIRS)'; for subdir in $$list; do \
if test "$$subdir" = .; then :; else \
test -d "$(distdir)/$$subdir" \
|| $(mkdir_p) "$(distdir)/$$subdir" \
|| exit 1; \
distdir=`$(am__cd) $(distdir) && pwd`; \
top_distdir=`$(am__cd) $(top_distdir) && pwd`; \
(cd $$subdir && \
$(MAKE) $(AM_MAKEFLAGS) \
top_distdir="$$top_distdir" \
distdir="$$distdir/$$subdir" \
distdir) \
|| exit 1; \
fi; \
done
check-am: all-am
check: check-recursive
all-am: Makefile $(LIBRARIES) $(PROGRAMS)
installdirs: installdirs-recursive
installdirs-am:
for dir in "$(DESTDIR)$(bindir)"; do \
test -z "$$dir" || $(mkdir_p) "$$dir"; \
done
install: install-recursive
install-exec: install-exec-recursive
install-data: install-data-recursive
uninstall: uninstall-recursive
install-am: all-am
@$(MAKE) $(AM_MAKEFLAGS) install-exec-am install-data-am
installcheck: installcheck-recursive
install-strip:
$(MAKE) $(AM_MAKEFLAGS) INSTALL_PROGRAM="$(INSTALL_STRIP_PROGRAM)" \
install_sh_PROGRAM="$(INSTALL_STRIP_PROGRAM)" INSTALL_STRIP_FLAG=-s \
`test -z '$(STRIP)' || \
echo "INSTALL_PROGRAM_ENV=STRIPPROG='$(STRIP)'"` install
mostlyclean-generic:
clean-generic:
distclean-generic:
-test -z "$(CONFIG_CLEAN_FILES)" || rm -f $(CONFIG_CLEAN_FILES)
maintainer-clean-generic:
@echo "This command is intended for maintainers to use"
@echo "it deletes files that may require special tools to rebuild."
clean: clean-recursive
clean-am: clean-binPROGRAMS clean-generic clean-noinstLIBRARIES \
mostlyclean-am
distclean: distclean-recursive
-rm -rf ./$(DEPDIR)
-rm -f Makefile
distclean-am: clean-am distclean-compile distclean-generic \
distclean-tags
dvi: dvi-recursive
dvi-am:
html: html-recursive
info: info-recursive
info-am:
install-data-am:
install-exec-am: install-binPROGRAMS
install-info: install-info-recursive
install-man:
installcheck-am:
maintainer-clean: maintainer-clean-recursive
-rm -rf ./$(DEPDIR)
-rm -f Makefile
maintainer-clean-am: distclean-am maintainer-clean-generic
mostlyclean: mostlyclean-recursive
mostlyclean-am: mostlyclean-compile mostlyclean-generic
pdf: pdf-recursive
pdf-am:
ps: ps-recursive
ps-am:
uninstall-am: uninstall-binPROGRAMS uninstall-info-am
uninstall-info: uninstall-info-recursive
.PHONY: $(RECURSIVE_TARGETS) CTAGS GTAGS all all-am check check-am \
clean clean-binPROGRAMS clean-generic clean-noinstLIBRARIES \
clean-recursive ctags ctags-recursive distclean \
distclean-compile distclean-generic distclean-recursive \
distclean-tags distdir dvi dvi-am html html-am info info-am \
install install-am install-binPROGRAMS install-data \
install-data-am install-exec install-exec-am install-info \
install-info-am install-man install-strip installcheck \
installcheck-am installdirs installdirs-am maintainer-clean \
maintainer-clean-generic maintainer-clean-recursive \
mostlyclean mostlyclean-compile mostlyclean-generic \
mostlyclean-recursive pdf pdf-am ps ps-am tags tags-recursive \
uninstall uninstall-am uninstall-binPROGRAMS uninstall-info-am
# Tell versions [3.59,3.63) of GNU make to not export all variables.
# Otherwise a system limit (for SysV at least) may be exceeded.
.NOEXPORT:

1078
ccmain/adaptions.cpp Normal file

File diff suppressed because it is too large Load Diff

109
ccmain/adaptions.h Normal file
View File

@ -0,0 +1,109 @@
/**********************************************************************
* File: adaptions.h (Formerly adaptions.h)
* Description: Functions used to adapt to blobs already confidently
* identified
* Author: Chris Newton
* Created: Thu Oct 7 10:17:28 BST 1993
*
* (C) Copyright 1992, Hewlett-Packard Ltd.
** Licensed under the Apache License, Version 2.0 (the "License");
** you may not use this file except in compliance with the License.
** You may obtain a copy of the License at
** http://www.apache.org/licenses/LICENSE-2.0
** Unless required by applicable law or agreed to in writing, software
** distributed under the License is distributed on an "AS IS" BASIS,
** WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
** See the License for the specific language governing permissions and
** limitations under the License.
*
**********************************************************************/
#ifndef ADAPTIONS_H
#define ADAPTIONS_H
#include "charsample.h"
#include "charcut.h"
#include "notdll.h"
extern BOOL_VAR_H (tessedit_reject_ems, FALSE, "Reject all m's");
extern BOOL_VAR_H (tessedit_reject_suspect_ems, FALSE, "Reject suspect m's");
extern double_VAR_H (tessedit_cluster_t1, 0.20,
"t1 threshold for clustering samples");
extern double_VAR_H (tessedit_cluster_t2, 0.40,
"t2 threshold for clustering samples");
extern double_VAR_H (tessedit_cluster_t3, 0.12,
"Extra threshold for clustering samples, only keep a new sample if best score greater than this value");
extern double_VAR_H (tessedit_cluster_accept_fraction, 0.80,
"Largest fraction of characters in cluster for it to be used for adaption");
extern INT_VAR_H (tessedit_cluster_min_size, 3,
"Smallest number of samples in a cluster for it to be used for adaption");
extern BOOL_VAR_H (tessedit_cluster_debug, FALSE,
"Generate and print debug information for adaption by clustering");
extern BOOL_VAR_H (tessedit_use_best_sample, FALSE,
"Use best sample from cluster when adapting");
extern BOOL_VAR_H (tessedit_test_cluster_input, FALSE,
"Set reject map to enable cluster input to be measured");
extern BOOL_VAR_H (tessedit_matrix_match, TRUE, "Use matrix matcher");
extern BOOL_VAR_H (tessedit_old_matrix_match, FALSE, "Use matrix matcher");
extern BOOL_VAR_H (tessedit_mm_use_non_adaption_set, FALSE,
"Don't try to adapt to characters on this list");
extern STRING_VAR_H (tessedit_non_adaption_set, ",.;:'~@*",
"Characters to be avoided when adapting");
extern BOOL_VAR_H (tessedit_mm_adapt_using_prototypes, TRUE,
"Use prototypes when adapting");
extern BOOL_VAR_H (tessedit_mm_use_prototypes, TRUE,
"Use prototypes as clusters are built");
extern BOOL_VAR_H (tessedit_mm_use_rejmap, FALSE,
"Adapt to characters using reject map");
extern BOOL_VAR_H (tessedit_mm_all_rejects, FALSE,
"Adapt to all characters using, matrix matcher");
extern BOOL_VAR_H (tessedit_mm_only_match_same_char, FALSE,
"Only match samples against clusters for the same character");
extern BOOL_VAR_H (tessedit_process_rns, FALSE, "Handle m - rn ambigs");
extern BOOL_VAR_H (tessedit_demo_adaption, FALSE,
"Display cut images and matrix match for demo purposes");
extern INT_VAR_H (tessedit_demo_word1, 62,
"Word number of first word to display");
extern INT_VAR_H (tessedit_demo_word2, 64,
"Word number of second word to display");
extern STRING_VAR_H (tessedit_demo_file, "academe",
"Name of document containing demo words");
BOOL8 word_adaptable( //should we adapt?
WERD_RES *word,
UINT16 mode);
void collect_ems_for_adaption(WERD_RES *word,
CHAR_SAMPLES_LIST *char_clusters,
CHAR_SAMPLE_LIST *chars_waiting);
void collect_characters_for_adaption(WERD_RES *word,
CHAR_SAMPLES_LIST *char_clusters,
CHAR_SAMPLE_LIST *chars_waiting);
void cluster_sample(CHAR_SAMPLE *sample,
CHAR_SAMPLES_LIST *char_clusters,
CHAR_SAMPLE_LIST *chars_waiting);
void check_wait_list(CHAR_SAMPLE_LIST *chars_waiting,
CHAR_SAMPLE *sample,
CHAR_SAMPLES *best_cluster);
void complete_clustering(CHAR_SAMPLES_LIST *char_clusters,
CHAR_SAMPLE_LIST *chars_waiting);
void adapt_to_good_ems(WERD_RES *word,
CHAR_SAMPLES_LIST *char_clusters,
CHAR_SAMPLE_LIST *chars_waiting);
void adapt_to_good_samples(WERD_RES *word,
CHAR_SAMPLES_LIST *char_clusters,
CHAR_SAMPLE_LIST *chars_waiting);
void print_em_stats(CHAR_SAMPLES_LIST *char_clusters,
CHAR_SAMPLE_LIST *chars_waiting);
//lines of the image
CHAR_SAMPLE *clip_sample(PIXROW *pixrow,
IMAGELINE *imlines,
BOX pix_box, //box of imlines extent
BOOL8 white_on_black,
char c);
void display_cluster_prototypes(CHAR_SAMPLES_LIST *char_clusters);
void reject_all_ems(WERD_RES *word);
void reject_all_fullstops(WERD_RES *word);
void reject_suspect_ems(WERD_RES *word);
void reject_suspect_fullstops(WERD_RES *word);
BOOL8 suspect_em(WERD_RES *word, INT16 index);
BOOL8 suspect_fullstop(WERD_RES *word, INT16 i);
#endif

859
ccmain/applybox.cpp Normal file
View File

@ -0,0 +1,859 @@
/**********************************************************************
* File: applybox.cpp (Formerly applybox.c)
* Description: Re segment rows according to box file data
* Author: Phil Cheatle
* Created: Wed Nov 24 09:11:23 GMT 1993
*
* (C) Copyright 1993, Hewlett-Packard Ltd.
** Licensed under the Apache License, Version 2.0 (the "License");
** you may not use this file except in compliance with the License.
** You may obtain a copy of the License at
** http://www.apache.org/licenses/LICENSE-2.0
** Unless required by applicable law or agreed to in writing, software
** distributed under the License is distributed on an "AS IS" BASIS,
** WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
** See the License for the specific language governing permissions and
** limitations under the License.
*
**********************************************************************/
/*
define SECURE_NAMES for code versions which go to UNLV to stop tessedit
including all the newdiff stuff (which contains lots of text indicating
what measures we are interested in.
*/
/* #define SECURE_NAMES done in secnames.h when necessary*/
#include "mfcpch.h"
#include "applybox.h"
#include <ctype.h>
#include <string.h>
#ifdef __UNIX__
#include <assert.h>
#include <errno.h>
#endif
#include "mainblk.h"
#include "genblob.h"
#include "fixxht.h"
#include "control.h"
#include "tessbox.h"
#include "globals.h"
#include "secname.h"
#define SECURE_NAMES
#ifndef SECURE_NAMES
#include "wordstats.h"
#endif
#define EXTERN
EXTERN BOOL_VAR (applybox_rebalance, TRUE, "Drop dead");
EXTERN INT_VAR (applybox_debug, 0, "Debug level");
EXTERN STRING_VAR (applybox_test_exclusions, "|",
"Chars ignored for testing");
EXTERN double_VAR (applybox_error_band, 0.15, "Err band as fract of xht");
/*************************************************************************
* The code re-assigns outlines to form words each with ONE labelled blob.
* Noise is left in UNLABELLED words. The chars on the page are checked crudely
* for sensible position relative to baseline and xht. Failed boxes are
* compensated for by duplicating other believable instances of the character.
*
* The box file is assumed to contain box definitions, one per line, of the
* following format:
* <Char> <left> <bottom> <right> <top> ... arbitrary trailing fields unused
*
* The approach taken is to search the WHOLE page for stuff overlapping each box.
* - This is not too inefficient and is SAFE.
* - We can detect overlapping blobs as we will be attempting to put a blob
* from a LABELLED word into the current word.
* - When all the boxes have been processed we can detect any stuff which is
* being ignored - it is the unlabelled words left on the page.
*
* A box should only overlap one row.
*
* A warning is given if the box is on the same row as the previous box, but NOT
* on the same row as the previous blob.
*
* Any OUTLINE which overlaps the box is put into the new word.
*
* ascender chars must ascend above xht significantly
* xht chars must not rise above row xht significantly
* bl chars must not descend below baseline significantly
* descender chars must descend below baseline significantly
*
* ?? Certain chars are DROPPED - to limit the training data.
*
*************************************************************************/
void apply_boxes(BLOCK_LIST *block_list //real blocks
) {
INT16 boxfile_lineno = 0;
INT16 boxfile_charno = 0;
BOX box; //boxfile box
char ch[2]; //correct ch from boxfile
ROW *row;
ROW *prev_row = NULL;
INT16 prev_box_right = MAX_INT16;
INT16 block_id;
INT16 row_id;
INT16 box_count = 0;
INT16 box_failures = 0;
INT16 labels_ok;
INT16 rows_ok;
INT16 bad_blobs;
INT16 tgt_char_counts[128]; //No. of box samples
// INT16 labelled_char_counts[128]; //No. of unique labelled samples
INT16 i;
INT16 rebalance_count = 0;
char min_char;
INT16 min_samples;
INT16 final_labelled_blob_count;
for (i = 0; i < 128; i++)
tgt_char_counts[i] = 0;
FILE* box_file;
STRING filename = imagefile;
filename += ".box";
if (!(box_file = fopen (filename.string(), "r"))) {
CANTOPENFILE.error ("read_next_box", EXIT,
"Cant open box file %s %d",
filename.string(), errno);
}
ch[1] = '\0';
clear_any_old_text(block_list);
while (read_next_box (box_file, &box, &ch[0])) {
box_count++;
tgt_char_counts[ch[0]]++;
row = find_row_of_box (block_list, box, block_id, row_id);
if (box.left () < prev_box_right) {
boxfile_lineno++;
boxfile_charno = 1;
}
else
boxfile_charno++;
if (row == NULL) {
box_failures++;
report_failed_box (boxfile_lineno, boxfile_charno, box, ch,
"FAILURE! box overlaps no blobs or blobs in multiple rows");
}
else {
if ((box.left () >= prev_box_right) && (row != prev_row))
report_failed_box (boxfile_lineno, boxfile_charno, box, ch,
"WARNING! false row break");
box_failures += resegment_box (row, box, ch, block_id, row_id,
boxfile_lineno, boxfile_charno);
prev_row = row;
}
prev_box_right = box.right ();
}
tidy_up(block_list,
labels_ok,
rows_ok,
bad_blobs,
tgt_char_counts,
rebalance_count,
min_char,
min_samples,
final_labelled_blob_count);
tprintf ("APPLY_BOXES:\n");
tprintf (" Boxes read from boxfile: %6d\n", box_count);
tprintf (" Initially labelled blobs: %6d in %d rows\n",
labels_ok, rows_ok);
tprintf (" Box failures detected: %6d\n", box_failures);
tprintf (" Duped blobs for rebalance:%6d\n", rebalance_count);
tprintf (" \"%c\" has fewest samples:%6d\n", min_char, min_samples);
tprintf (" Total unlabelled words: %6d\n",
bad_blobs);
tprintf (" Final labelled words: %6d\n",
final_labelled_blob_count);
}
void clear_any_old_text( //remove correct text
BLOCK_LIST *block_list //real blocks
) {
BLOCK_IT block_it(block_list);
ROW_IT row_it;
WERD_IT word_it;
for (block_it.mark_cycle_pt ();
!block_it.cycled_list (); block_it.forward ()) {
row_it.set_to_list (block_it.data ()->row_list ());
for (row_it.mark_cycle_pt (); !row_it.cycled_list (); row_it.forward ()) {
word_it.set_to_list (row_it.data ()->word_list ());
for (word_it.mark_cycle_pt ();
!word_it.cycled_list (); word_it.forward ()) {
word_it.data ()->set_text ("");
}
}
}
}
BOOL8 read_next_box(FILE* box_file, //
BOX *box,
char *ch) {
char buff[256]; //boxfile read buffer
char *buffptr = buff;
STRING box_filename;
static INT16 line = 0;
INT32 x_min;
INT32 y_min;
INT32 x_max;
INT32 y_max;
INT32 count = 0;
while (!feof (box_file)) {
fgets (buff, sizeof (buff) - 1, box_file);
line++;
/* Check for blank lines in box file */
for (buffptr = buff; isspace (*buffptr); buffptr++)
;
if (*buffptr != '\0') {
count =
sscanf (buff,
"%c " INT32FORMAT " " INT32FORMAT " " INT32FORMAT " "
INT32FORMAT, ch, &x_min, &y_min, &x_max, &y_max);
if (count != 5) {
tprintf ("Box file format error on line %i ignored\n", line);
}
else {
*box = BOX (ICOORD (x_min, y_min), ICOORD (x_max, y_max));
return TRUE; //read a box ok
}
}
}
return FALSE; //EOF
}
ROW *find_row_of_box( //
BLOCK_LIST *block_list, //real blocks
BOX box, //from boxfile
INT16 &block_id,
INT16 &row_id_to_process) {
BLOCK_IT block_it(block_list);
BLOCK *block;
ROW_IT row_it;
ROW *row;
ROW *row_to_process = NULL;
INT16 row_id;
WERD_IT word_it;
WERD *word;
BOOL8 polyg;
PBLOB_IT blob_it;
PBLOB *blob;
OUTLINE_IT outline_it;
OUTLINE *outline;
/*
Find row to process - error if box REALLY overlaps more than one row. (I.e
it overlaps blobs in the row - not just overlaps the bounding box of the
whole row.)
*/
block_id = 0;
for (block_it.mark_cycle_pt ();
!block_it.cycled_list (); block_it.forward ()) {
block_id++;
row_id = 0;
block = block_it.data ();
if (block->bounding_box ().overlap (box)) {
row_it.set_to_list (block->row_list ());
for (row_it.mark_cycle_pt ();
!row_it.cycled_list (); row_it.forward ()) {
row_id++;
row = row_it.data ();
if (row->bounding_box ().overlap (box)) {
word_it.set_to_list (row->word_list ());
for (word_it.mark_cycle_pt ();
!word_it.cycled_list (); word_it.forward ()) {
word = word_it.data ();
polyg = word->flag (W_POLYGON);
if (word->bounding_box ().overlap (box)) {
blob_it.set_to_list (word->gblob_list ());
for (blob_it.mark_cycle_pt ();
!blob_it.cycled_list (); blob_it.forward ()) {
blob = blob_it.data ();
if (gblob_bounding_box (blob, polyg).
overlap (box)) {
outline_it.
set_to_list (gblob_out_list
(blob, polyg));
for (outline_it.mark_cycle_pt ();
!outline_it.cycled_list ();
outline_it.forward ()) {
outline = outline_it.data ();
if (goutline_bounding_box
(outline, polyg).major_overlap (box)) {
if ((row_to_process == NULL) ||
(row_to_process == row)) {
row_to_process = row;
row_id_to_process = row_id;
}
else
/* RETURN ERROR Box overlaps blobs in more than one row */
return NULL;
}
}
}
}
}
}
}
}
}
}
return row_to_process;
}
INT16 resegment_box( //
ROW *row,
BOX box,
char *ch,
INT16 block_id,
INT16 row_id,
INT16 boxfile_lineno,
INT16 boxfile_charno) {
WERD_IT word_it;
WERD *word;
WERD *new_word = NULL;
BOOL8 polyg = false;
PBLOB_IT blob_it;
PBLOB_IT new_blob_it;
PBLOB *blob;
PBLOB *new_blob;
OUTLINE_IT outline_it;
OUTLINE_LIST dummy; // Just to initialize new_outline_it.
OUTLINE_IT new_outline_it = &dummy;
OUTLINE *outline;
BOX new_word_box;
float word_x_centre;
float baseline;
INT16 error_count = 0; //number of chars lost
word_it.set_to_list (row->word_list ());
for (word_it.mark_cycle_pt (); !word_it.cycled_list (); word_it.forward ()) {
word = word_it.data ();
polyg = word->flag (W_POLYGON);
if (word->bounding_box ().overlap (box)) {
blob_it.set_to_list (word->gblob_list ());
for (blob_it.mark_cycle_pt ();
!blob_it.cycled_list (); blob_it.forward ()) {
blob = blob_it.data ();
if (gblob_bounding_box (blob, polyg).overlap (box)) {
outline_it.set_to_list (gblob_out_list (blob, polyg));
for (outline_it.mark_cycle_pt ();
!outline_it.cycled_list (); outline_it.forward ()) {
outline = outline_it.data ();
if (goutline_bounding_box (outline, polyg).
major_overlap (box)) {
if (strlen (word->text ()) > 0) {
if (error_count == 0) {
error_count = 1;
if (applybox_debug > 4)
report_failed_box (boxfile_lineno,
boxfile_charno,
box, ch,
"FAILURE! box overlaps blob in labelled word");
}
if (applybox_debug > 4)
tprintf
("APPLY_BOXES: ALSO ignoring corrupted char blk:%d row:%d \"%s\"\n",
block_id, row_id,
word_it.data ()->text ());
word_it.data ()->set_text ("");
//UN label it
error_count++;
}
if (error_count == 0) {
if (new_word == NULL) {
/* Make a new word with a single blob */
new_word = word->shallow_copy ();
new_word->set_text (ch);
if (polyg)
new_blob = new PBLOB;
else
new_blob = (PBLOB *) new C_BLOB;
new_blob_it.set_to_list (new_word->
gblob_list ());
new_blob_it.add_to_end (new_blob);
new_outline_it.
set_to_list (gblob_out_list
(new_blob, polyg));
}
new_outline_it.add_to_end (outline_it.
extract ());
//move blob
}
}
}
//no outlines in blob
if (outline_it.empty ())
//so delete blob
delete blob_it.extract ();
}
}
if (blob_it.empty ()) //no blobs in word
//so delete word
delete word_it.extract ();
}
}
if (error_count > 0)
return error_count;
if (new_word != NULL) {
gblob_sort_list (new_word->gblob_list (), polyg);
word_it.add_to_end (new_word);
new_word_box = new_word->bounding_box ();
word_x_centre = (new_word_box.left () + new_word_box.right ()) / 2.0f;
baseline = row->base_line (word_x_centre);
if (STRING (chs_caps_ht).contains (ch[0]) &&
(new_word_box.top () <
baseline + (1 + applybox_error_band) * row->x_height ())) {
report_failed_box (boxfile_lineno, boxfile_charno, box, ch,
"FAILURE! caps-ht char didn't ascend");
new_word->set_text ("");
return 1;
}
if (STRING (chs_odd_top).contains (ch[0]) &&
(new_word_box.top () <
baseline + (1 - applybox_error_band) * row->x_height ())) {
report_failed_box (boxfile_lineno, boxfile_charno, box, ch,
"FAILURE! Odd top char below xht");
new_word->set_text ("");
return 1;
}
if (STRING (chs_x_ht).contains (ch[0]) &&
((new_word_box.top () >
baseline + (1 + applybox_error_band) * row->x_height ()) ||
(new_word_box.top () <
baseline + (1 - applybox_error_band) * row->x_height ()))) {
report_failed_box (boxfile_lineno, boxfile_charno, box, ch,
"FAILURE! x-ht char didn't have top near xht");
new_word->set_text ("");
return 1;
}
if (STRING (chs_non_ambig_bl).contains (ch[0]) &&
((new_word_box.bottom () <
baseline - applybox_error_band * row->x_height ()) ||
(new_word_box.bottom () >
baseline + applybox_error_band * row->x_height ()))) {
report_failed_box (boxfile_lineno, boxfile_charno, box, ch,
"FAILURE! non ambig BL char didnt have bottom near baseline");
new_word->set_text ("");
return 1;
}
if (STRING (chs_odd_bot).contains (ch[0]) &&
(new_word_box.bottom () >
baseline + applybox_error_band * row->x_height ())) {
report_failed_box (boxfile_lineno, boxfile_charno, box, ch,
"FAILURE! Odd bottom char above baseline");
new_word->set_text ("");
return 1;
}
if (STRING (chs_desc).contains (ch[0]) &&
(new_word_box.bottom () >
baseline - applybox_error_band * row->x_height ())) {
report_failed_box (boxfile_lineno, boxfile_charno, box, ch,
"FAILURE! Descender doesn't descend");
new_word->set_text ("");
return 1;
}
return 0;
}
else {
report_failed_box (boxfile_lineno, boxfile_charno, box, ch,
"FAILURE! Couldn't find any blobs");
return 1;
}
}
/*************************************************************************
* tidy_up()
* - report >1 block
* - sort the words in each row.
* - report any rows with no labelled words.
* - report any remaining unlabelled words
* - report total labelled words
*
*************************************************************************/
void tidy_up( //
BLOCK_LIST *block_list, //real blocks
INT16 &ok_char_count,
INT16 &ok_row_count,
INT16 &unlabelled_words,
INT16 *tgt_char_counts,
INT16 &rebalance_count,
char &min_char,
INT16 &min_samples,
INT16 &final_labelled_blob_count) {
BLOCK_IT block_it(block_list);
ROW_IT row_it;
ROW *row;
WERD_IT word_it;
WERD *word;
WERD *duplicate_word;
INT16 block_idx = 0;
INT16 row_idx;
INT16 all_row_idx = 0;
BOOL8 row_ok;
BOOL8 rebalance_needed = FALSE;
//No. of unique labelled samples
INT16 labelled_char_counts[128];
INT16 i;
char ch;
char prev_ch = '\0';
BOOL8 at_dupe_of_prev_word;
ROW *prev_row = NULL;
INT16 left;
INT16 prev_left = -1;
for (i = 0; i < 128; i++)
labelled_char_counts[i] = 0;
ok_char_count = 0;
ok_row_count = 0;
unlabelled_words = 0;
if ((applybox_debug > 4) && (block_it.length () != 1))
tprintf ("APPLY_BOXES: More than one block??\n");
for (block_it.mark_cycle_pt ();
!block_it.cycled_list (); block_it.forward ()) {
block_idx++;
row_idx = 0;
row_ok = FALSE;
row_it.set_to_list (block_it.data ()->row_list ());
for (row_it.mark_cycle_pt (); !row_it.cycled_list (); row_it.forward ()) {
row_idx++;
all_row_idx++;
row = row_it.data ();
word_it.set_to_list (row->word_list ());
word_it.sort (word_comparator);
for (word_it.mark_cycle_pt ();
!word_it.cycled_list (); word_it.forward ()) {
word = word_it.data ();
if (strlen (word->text ()) == 0) {
unlabelled_words++;
if (applybox_debug > 4) {
tprintf
("APPLY_BOXES: Unlabelled word blk:%d row:%d allrows:%d\n",
block_idx, row_idx, all_row_idx);
}
}
else {
if (word->gblob_list ()->length () != 1)
tprintf
("APPLY_BOXES: FATALITY - MULTIBLOB Labelled word blk:%d row:%d allrows:%d\n",
block_idx, row_idx, all_row_idx);
ok_char_count++;
labelled_char_counts[*word->text ()]++;
row_ok = TRUE;
}
}
if ((applybox_debug > 4) && (!row_ok)) {
tprintf
("APPLY_BOXES: Row with no labelled words blk:%d row:%d allrows:%d\n",
block_idx, row_idx, all_row_idx);
}
else
ok_row_count++;
}
}
min_samples = 9999;
for (i = 0; i < 128; i++) {
if (tgt_char_counts[i] > labelled_char_counts[i]) {
if (labelled_char_counts[i] <= 1) {
tprintf
("APPLY_BOXES: FATALITY - %d labelled samples of \"%c\" - target is %d\n",
labelled_char_counts[i], (char) i, tgt_char_counts[i]);
}
else {
rebalance_needed = TRUE;
if (applybox_debug > 0)
tprintf
("APPLY_BOXES: REBALANCE REQD \"%c\" - target of %d from %d labelled samples\n",
(char) i, tgt_char_counts[i], labelled_char_counts[i]);
}
}
if ((min_samples > labelled_char_counts[i]) && (tgt_char_counts[i] > 0)) {
min_samples = labelled_char_counts[i];
min_char = (char) i;
}
}
while (applybox_rebalance && rebalance_needed) {
block_it.set_to_list (block_list);
for (block_it.mark_cycle_pt ();
!block_it.cycled_list (); block_it.forward ()) {
row_it.set_to_list (block_it.data ()->row_list ());
for (row_it.mark_cycle_pt ();
!row_it.cycled_list (); row_it.forward ()) {
row = row_it.data ();
word_it.set_to_list (row->word_list ());
for (word_it.mark_cycle_pt ();
!word_it.cycled_list (); word_it.forward ()) {
word = word_it.data ();
left = word->bounding_box ().left ();
ch = *word->text ();
at_dupe_of_prev_word = ((row == prev_row) &&
(left = prev_left) &&
(ch == prev_ch));
if ((ch != '\0') &&
(labelled_char_counts[ch] > 1) &&
(tgt_char_counts[ch] > labelled_char_counts[ch]) &&
(!at_dupe_of_prev_word)) {
/* Duplicate the word to rebalance the labelled samples */
if (applybox_debug > 9) {
tprintf ("Duping \"%c\" from ", ch);
word->bounding_box ().print ();
}
duplicate_word = new WERD;
*duplicate_word = *word;
word_it.add_after_then_move (duplicate_word);
rebalance_count++;
labelled_char_counts[ch]++;
}
prev_row = row;
prev_left = left;
prev_ch = ch;
}
}
}
rebalance_needed = FALSE;
for (i = 0; i < 128; i++) {
if ((tgt_char_counts[i] > labelled_char_counts[i]) &&
(labelled_char_counts[i] > 1)) {
rebalance_needed = TRUE;
break;
}
}
}
/* Now final check - count labelled blobs */
final_labelled_blob_count = 0;
block_it.set_to_list (block_list);
for (block_it.mark_cycle_pt ();
!block_it.cycled_list (); block_it.forward ()) {
row_it.set_to_list (block_it.data ()->row_list ());
for (row_it.mark_cycle_pt (); !row_it.cycled_list (); row_it.forward ()) {
row = row_it.data ();
word_it.set_to_list (row->word_list ());
word_it.sort (word_comparator);
for (word_it.mark_cycle_pt ();
!word_it.cycled_list (); word_it.forward ()) {
word = word_it.data ();
if ((strlen (word->text ()) == 1) &&
(word->gblob_list ()->length () == 1))
final_labelled_blob_count++;
}
}
}
}
void report_failed_box(INT16 boxfile_lineno,
INT16 boxfile_charno,
BOX box,
char *box_ch,
const char *err_msg) {
if (applybox_debug > 4)
tprintf ("APPLY_BOXES: boxfile %1d/%1d/%s ((%1d,%1d),(%1d,%1d)): %s\n",
boxfile_lineno,
boxfile_charno,
box_ch,
box.left (), box.bottom (), box.right (), box.top (), err_msg);
}
void apply_box_training(BLOCK_LIST *block_list) {
BLOCK_IT block_it(block_list);
ROW_IT row_it;
ROW *row;
WERD_IT word_it;
WERD *word;
WERD *bln_word;
WERD copy_outword; // copy to denorm
PBLOB_IT blob_it;
DENORM denorm;
INT16 count = 0;
char ch[2];
ch[1] = '\0';
tprintf ("Generating training data\n");
for (block_it.mark_cycle_pt ();
!block_it.cycled_list (); block_it.forward ()) {
row_it.set_to_list (block_it.data ()->row_list ());
for (row_it.mark_cycle_pt (); !row_it.cycled_list (); row_it.forward ()) {
row = row_it.data ();
word_it.set_to_list (row->word_list ());
for (word_it.mark_cycle_pt ();
!word_it.cycled_list (); word_it.forward ()) {
word = word_it.data ();
if ((strlen (word->text ()) == 1) &&
(word->gblob_list ()->length () == 1)) {
/* Here is a word with a single char label and a single blob so train on it */
bln_word =
make_bln_copy (word, row, row->x_height (), &denorm);
blob_it.set_to_list (bln_word->blob_list ());
ch[0] = *word->text ();
tess_training_tester (blob_it.data (),
//single blob
&denorm, TRUE, //correct
ch, //correct ASCII char
1, //ASCII length
NULL);
copy_outword = *(bln_word);
copy_outword.baseline_denormalise (&denorm);
blob_it.set_to_list (copy_outword.blob_list ());
ch[0] = *word->text ();
delete bln_word;
count++;
}
}
}
}
tprintf ("Generated training data for %d blobs\n", count);
}
void apply_box_testing(BLOCK_LIST *block_list) {
BLOCK_IT block_it(block_list);
ROW_IT row_it;
ROW *row;
INT16 row_count = 0;
WERD_IT word_it;
WERD *word;
WERD *bln_word;
INT16 word_count = 0;
PBLOB_IT blob_it;
DENORM denorm;
INT16 count = 0;
char ch[2];
WERD *outword; //bln best choice
//segmentation
WERD_CHOICE *best_choice; //tess output
WERD_CHOICE *raw_choice; //top choice permuter
//detailed results
BLOB_CHOICE_LIST_CLIST blob_choices;
INT16 char_count = 0;
INT16 correct_count = 0;
INT16 err_count = 0;
INT16 rej_count = 0;
#ifndef SECURE_NAMES
WERDSTATS wordstats; //As from newdiff
#endif
char tess_rej_str[3];
char tess_long_str[3];
ch[1] = '\0';
strcpy (tess_rej_str, "|A");
strcpy (tess_long_str, "|B");
for (block_it.mark_cycle_pt ();
!block_it.cycled_list (); block_it.forward ()) {
row_it.set_to_list (block_it.data ()->row_list ());
for (row_it.mark_cycle_pt (); !row_it.cycled_list (); row_it.forward ()) {
row = row_it.data ();
row_count++;
word_count = 0;
word_it.set_to_list (row->word_list ());
for (word_it.mark_cycle_pt ();
!word_it.cycled_list (); word_it.forward ()) {
word = word_it.data ();
word_count++;
if ((strlen (word->text ()) == 1) &&
!STRING (applybox_test_exclusions).contains (*word->text ())
&& (word->gblob_list ()->length () == 1)) {
/* Here is a word with a single char label and a single blob so test it */
bln_word =
make_bln_copy (word, row, row->x_height (), &denorm);
blob_it.set_to_list (bln_word->blob_list ());
ch[0] = *word->text ();
char_count++;
best_choice = tess_segment_pass1 (bln_word,
&denorm,
tess_default_matcher,
raw_choice,
&blob_choices, outword);
/*
Test for TESS screw up on word. Recog_word has already ensured that the
choice list, outword blob lists and best_choice string are the same
length. A TESS screw up is indicated by a blank filled or 0 length string.
*/
if ((best_choice->string ().length () == 0) ||
(strspn (best_choice->string ().string (), " ") ==
best_choice->string ().length ())) {
rej_count++;
tprintf ("%d:%d: \"%s\" -> TESS FAILED\n",
row_count, word_count, ch);
#ifndef SECURE_NAMES
wordstats.word (tess_rej_str, 2, ch, 1);
#endif
}
else {
if ((best_choice->string ().length () !=
outword->blob_list ()->length ()) ||
(best_choice->string ().length () !=
blob_choices.length ())) {
tprintf
("ASSERT FAIL String:\"%s\"; Strlen=%d; #Blobs=%d; #Choices=%d\n",
best_choice->string ().string (),
best_choice->string ().length (),
outword->blob_list ()->length (),
blob_choices.length ());
}
ASSERT_HOST (best_choice->string ().length () ==
outword->blob_list ()->length ());
ASSERT_HOST (best_choice->string ().length () ==
blob_choices.length ());
fix_quotes ((char *) best_choice->string ().string (),
//turn to double
outword, &blob_choices);
if (strcmp (best_choice->string ().string (), ch) != 0) {
err_count++;
tprintf ("%d:%d: \"%s\" -> \"%s\"\n",
row_count, word_count, ch,
best_choice->string ().string ());
}
else
correct_count++;
#ifndef SECURE_NAMES
if (best_choice->string ().length () > 2)
wordstats.word (tess_long_str, 2, ch, 1);
else
wordstats.word ((char *) best_choice->string ().
string (),
best_choice->string ().length (), ch,
1);
#endif
}
delete bln_word;
delete outword;
delete best_choice;
delete raw_choice;
blob_choices.deep_clear ();
count++;
}
}
}
}
#ifndef SECURE_NAMES
wordstats.print (1, 100.0);
wordstats.conf_matrix ();
tprintf ("Tested %d chars: %d correct; %d rejected by tess; %d errs\n",
char_count, correct_count, rej_count, err_count);
#endif
}

71
ccmain/applybox.h Normal file
View File

@ -0,0 +1,71 @@
/**********************************************************************
* File: applybox.h (Formerly applybox.h)
* Description: Re segment rows according to box file data
* Author: Phil Cheatle
* Created: Wed Nov 24 09:11:23 GMT 1993
*
* (C) Copyright 1993, Hewlett-Packard Ltd.
** Licensed under the Apache License, Version 2.0 (the "License");
** you may not use this file except in compliance with the License.
** You may obtain a copy of the License at
** http://www.apache.org/licenses/LICENSE-2.0
** Unless required by applicable law or agreed to in writing, software
** distributed under the License is distributed on an "AS IS" BASIS,
** WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
** See the License for the specific language governing permissions and
** limitations under the License.
*
**********************************************************************/
#ifndef APPLYBOX_H
#define APPLYBOX_H
#include "varable.h"
#include "ocrblock.h"
#include "ocrrow.h"
#include "notdll.h"
extern BOOL_VAR_H (applybox_rebalance, TRUE, "Drop dead");
extern INT_VAR_H (applybox_debug, 0, "Debug level");
extern STRING_VAR_H (applybox_test_exclusions, "|",
"Chars ignored for testing");
extern double_VAR_H (applybox_error_band, 0.15, "Err band as fract of xht");
void apply_boxes(BLOCK_LIST *block_list //real blocks
);
void clear_any_old_text( //remove correct text
BLOCK_LIST *block_list //real blocks
);
BOOL8 read_next_box(FILE* box_file, //
BOX *box,
char *ch);
ROW *find_row_of_box( //
BLOCK_LIST *block_list, //real blocks
BOX box, //from boxfile
INT16 &block_id,
INT16 &row_id_to_process);
INT16 resegment_box( //
ROW *row,
BOX box,
char *ch,
INT16 block_id,
INT16 row_id,
INT16 boxfile_lineno,
INT16 boxfile_charno);
void tidy_up( //
BLOCK_LIST *block_list, //real blocks
INT16 &ok_char_count,
INT16 &ok_row_count,
INT16 &unlabelled_words,
INT16 *tgt_char_counts,
INT16 &rebalance_count,
char &min_char,
INT16 &min_samples,
INT16 &final_labelled_blob_count);
void report_failed_box(INT16 boxfile_lineno,
INT16 boxfile_charno,
BOX box,
char *box_ch,
const char *err_msg);
void apply_box_training(BLOCK_LIST *block_list);
void apply_box_testing(BLOCK_LIST *block_list);
#endif

395
ccmain/baseapi.cpp Normal file
View File

@ -0,0 +1,395 @@
/**********************************************************************
* File: baseapi.cpp
* Description: Simple API for calling tesseract.
* Author: Ray Smith
* Created: Fri Oct 06 15:35:01 PDT 2006
*
* (C) Copyright 2006, Google Inc.
** Licensed under the Apache License, Version 2.0 (the "License");
** you may not use this file except in compliance with the License.
** You may obtain a copy of the License at
** http://www.apache.org/licenses/LICENSE-2.0
** Unless required by applicable law or agreed to in writing, software
** distributed under the License is distributed on an "AS IS" BASIS,
** WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
** See the License for the specific language governing permissions and
** limitations under the License.
*
**********************************************************************/
#include "baseapi.h"
#include "tessedit.h"
#include "pageres.h"
#include "tessvars.h"
#include "control.h"
#include "applybox.h"
#include "pgedit.h"
#include "varabled.h"
#include "adaptmatch.h"
BOOL_VAR(tessedit_resegment_from_boxes, FALSE,
"Take segmentation and labeling from box file");
BOOL_VAR(tessedit_train_from_boxes, FALSE,
"Generate training data from boxed chars");
// Minimum sensible image size to be worth running tesseract.
const int kMinRectSize = 10;
// Start tesseract.
// The datapath must be the name of the data directory or some other file
// in which the data directory resides (for instance argv[0].)
// The configfile is the name of a file in the tessconfigs directory
// (eg batch) or NULL to run on defaults.
// Outputbase may also be NULL, and is the basename of various output files.
// If the output of any of these files is enabled, then a name nmust be given.
// If numeric_mode is true, only possible digits and roman numbers are
// returned. Returns 0 if successful. Crashes if not.
// The argc and argv may be 0 and NULL respectively. They are used for
// providing config files for debug/display purposes.
// TODO(rays) get the facts straight. Is it OK to call
// it more than once? Make it properly check for errors and return them.
int TessBaseAPI::Init(const char* datapath, const char* outputbase,
const char* configfile, bool numeric_mode,
int argc, char* argv[]) {
int result = init_tesseract(datapath, outputbase, configfile, argc, argv);
bln_numericmode.set_value(numeric_mode);
return result;
}
// Recognize a rectangle from an image and return the result as a string.
// May be called many times for a single Init.
// Currently has no error checking.
// Greyscale of 8 and color of 24 or 32 bits per pixel may be given.
// Palette color images will not work properly and must be converted to
// 24 bit.
// Binary images of 1 bit per pixel may also be given but they must be
// byte packed with the MSB of the first byte being the first pixel, and a
// one pixel is WHITE. For binary images set bytes_per_pixel=0.
// The recognized text is returned as a char* which (in future will be coded
// as UTF8 and) must be freed with the delete [] operator.
char* TessBaseAPI::TesseractRect(const UINT8* imagedata,
int bytes_per_pixel,
int bytes_per_line,
int left, int top,
int width, int height) {
if (width < kMinRectSize || height < kMinRectSize)
return NULL; // Nothing worth doing.
// Copy/Threshold the image to the tesseract global page_image.
CopyImageToTesseract(imagedata, bytes_per_pixel, bytes_per_line,
left, top, width, height);
return RecognizeToString();
}
// Call between pages or documents etc to free up memory and forget
// adaptive data.
void TessBaseAPI::ClearAdaptiveClassifier() {
ResetAdaptiveClassifier();
}
// Close down tesseract and free up memory.
void TessBaseAPI::End() {
ResetAdaptiveClassifier();
end_tesseract();
}
// Dump the internal binary image to a PGM file.
void TessBaseAPI::DumpPGM(const char* filename) {
IMAGELINE line;
line.init(page_image.get_xsize());
FILE *fp = fopen(filename, "w");
fprintf(fp, "P5 " INT32FORMAT " " INT32FORMAT " 255\n", page_image.get_xsize(),
page_image.get_ysize());
for (int j = page_image.get_ysize()-1; j >= 0 ; --j) {
page_image.get_line(0, j, page_image.get_xsize(), &line, 0);
for (int i = 0; i < page_image.get_xsize(); ++i) {
UINT8 b = line.pixels[i] ? 255 : 0;
fwrite(&b, 1, 1, fp);
}
}
fclose(fp);
}
// Copy the given image rectangle to Tesseract, with adaptive thresholding
// if the image is not already binary.
void TessBaseAPI::CopyImageToTesseract(const UINT8* imagedata,
int bytes_per_pixel,
int bytes_per_line,
int left, int top,
int width, int height) {
if (bytes_per_pixel > 0) {
// Threshold grey or color.
int* thresholds = new int[bytes_per_pixel];
int* hi_values = new int[bytes_per_pixel];
// Compute the thresholds.
OtsuThreshold(imagedata, bytes_per_pixel, bytes_per_line,
left, top, left + width, top + height,
thresholds, hi_values);
// Threshold the image to the tesseract global page_image.
ThresholdRect(imagedata, bytes_per_pixel, bytes_per_line,
left, top, width, height,
thresholds, hi_values);
delete [] thresholds;
delete [] hi_values;
} else {
CopyBinaryRect(imagedata, bytes_per_line, left, top, width, height);
}
}
// Compute the Otsu threshold(s) for the given image rectangle, making one
// for each channel. Each channel is always one byte per pixel.
// Returns an array of threshold values and an array of hi_values, such
// that a pixel value >threshold[channel] is considered foreground if
// hi_values[channel] is 0 or background if 1. A hi_value of -1 indicates
// that there is no apparent foreground. At least one hi_value will not be -1.
// thresholds and hi_values are assumed to be of bytes_per_pixel size.
void TessBaseAPI::OtsuThreshold(const UINT8* imagedata,
int bytes_per_pixel,
int bytes_per_line,
int left, int top, int right, int bottom,
int* thresholds,
int* hi_values) {
// Of all channels with no good hi_value, keep the best so we can always
// produce at least one answer.
int best_hi_value = 0;
int best_hi_index = 0;
bool any_good_hivalue = false;
double best_hi_dist = 0.0;
for (int ch = 0; ch < bytes_per_pixel; ++ch) {
thresholds[ch] = 0;
hi_values[ch] = -1;
// Compute the histogram of the image rectangle.
int histogram[256];
HistogramRect(imagedata + ch, bytes_per_pixel, bytes_per_line,
left, top, right, bottom, histogram);
int H;
int best_omega_0;
int best_t = OtsuStats(histogram, &H, &best_omega_0);
// To be a convincing foreground we must have a small fraction of H
// or to be a convincing background we must have a large fraction of H.
// In between we assume this channel contains no thresholding information.
int hi_value = best_omega_0 < H * 0.5;
thresholds[ch] = best_t;
if (best_omega_0 > H * 0.75) {
any_good_hivalue = true;
hi_values[ch] = 0;
}
else if (best_omega_0 < H * 0.25) {
any_good_hivalue = true;
hi_values[ch] = 1;
}
else {
// In case all channels are like this, keep the best of the bad lot.
double hi_dist = hi_value ? (H - best_omega_0) : best_omega_0;
if (hi_dist > best_hi_dist) {
best_hi_dist = hi_dist;
best_hi_value = hi_value;
best_hi_index = ch;
}
}
}
if (!any_good_hivalue) {
// Use the best of the ones that were not good enough.
hi_values[best_hi_index] = best_hi_value;
}
}
// Compute the histogram for the given image rectangle, and the given
// channel. (Channel pointed to by imagedata.) Each channel is always
// one byte per pixel.
// Bytes per pixel is used to skip channels not being
// counted with this call in a multi-channel (pixel-major) image.
// Histogram is always a 256 element array to count occurrences of
// each pixel value.
void TessBaseAPI::HistogramRect(const UINT8* imagedata,
int bytes_per_pixel,
int bytes_per_line,
int left, int top, int right, int bottom,
int* histogram) {
int width = right - left;
memset(histogram, 0, sizeof(*histogram) * 256);
const UINT8* pix = imagedata +
top*bytes_per_line +
left*bytes_per_pixel;
for (int y = top; y < bottom; ++y) {
for (int x = 0; x < width; ++x) {
++histogram[pix[x * bytes_per_pixel]];
}
pix += bytes_per_line;
}
}
// Compute the Otsu threshold(s) for the given histogram.
// Also returns H = total count in histogram, and
// omega0 = count of histogram below threshold.
int TessBaseAPI::OtsuStats(const int* histogram,
int* H_out,
int* omega0_out) {
int H = 0;
double mu_T = 0.0;
for (int i = 0; i < 256; ++i) {
H += histogram[i];
mu_T += i * histogram[i];
}
// Now maximize sig_sq_B over t.
// http://www.ctie.monash.edu.au/hargreave/Cornall_Terry_328.pdf
int best_t = -1;
int omega_0, omega_1;
int best_omega_0 = 0;
double best_sig_sq_B = 0.0;
double mu_0, mu_1, mu_t;
omega_0 = 0;
mu_t = 0.0;
for (int t = 0; t < 255; ++t) {
omega_0 += histogram[t];
mu_t += t * static_cast<double>(histogram[t]);
if (omega_0 == 0)
continue;
omega_1 = H - omega_0;
mu_0 = mu_t / omega_0;
mu_1 = (mu_T - mu_t) / omega_1;
double sig_sq_B = mu_1 - mu_0;
sig_sq_B *= sig_sq_B * omega_0 * omega_1;
if (best_t < 0 || sig_sq_B > best_sig_sq_B) {
best_sig_sq_B = sig_sq_B;
best_t = t;
best_omega_0 = omega_0;
}
}
if (H_out != NULL) *H_out = H;
if (omega0_out != NULL) *omega0_out = best_omega_0;
return best_t;
}
// Threshold the given grey or color image into the tesseract global
// image ready for recognition. Requires thresholds and hi_value
// produced by OtsuThreshold above.
void TessBaseAPI::ThresholdRect(const UINT8* imagedata,
int bytes_per_pixel,
int bytes_per_line,
int left, int top,
int width, int height,
const int* thresholds,
const int* hi_values) {
IMAGELINE line;
page_image.create(width, height, 1);
line.init(width);
// For each line in the image, fill the IMAGELINE class and put it into the
// Tesseract global page_image. Note that Tesseract stores images with the
// bottom at y=0 and 0 is black, so we need 2 kinds of inversion.
const UINT8* data = imagedata + top*bytes_per_line + left*bytes_per_pixel;
for (int y = height - 1 ; y >= 0; --y) {
const UINT8* pix = data;
for (int x = 0; x < width; ++x, pix += bytes_per_pixel) {
line.pixels[x] = 1;
for (int ch = 0; ch < bytes_per_pixel; ++ch) {
if (hi_values[ch] >= 0 &&
(pix[ch] > thresholds[ch]) == (hi_values[ch] == 0)) {
line.pixels[x] = 0;
break;
}
}
}
page_image.put_line(0, y, width, &line, 0);
data += bytes_per_line;
}
}
// Cut out the requested rectangle of the binary image to the
// tesseract global image ready for recognition.
void TessBaseAPI::CopyBinaryRect(const UINT8* imagedata,
int bytes_per_line,
int left, int top,
int width, int height) {
// Copy binary image, cutting out the required rectangle.
IMAGE image;
image.capture(const_cast<UINT8*>(imagedata),
bytes_per_line*8, top + height, 1);
page_image.create(width, height, 1);
copy_sub_image(&image, left, top, width, height, &page_image, 0, 0, false);
}
// Low-level function to recognize the current global image to a string.
char* TessBaseAPI::RecognizeToString() {
BLOCK_LIST block_list;
FindLines(&block_list);
// Now run the main recognition.
PAGE_RES* page_res = Recognize(&block_list, NULL);
return TesseractToText(page_res);
}
// Find lines from the image making the BLOCK_LIST.
void TessBaseAPI::FindLines(BLOCK_LIST* block_list) {
STRING input_file = "noname.tif";
// The following call creates a full-page block and then runs connected
// component analysis and text line creation.
pgeditor_read_file(input_file, block_list);
}
// Recognize the tesseract global image and return the result as Tesseract
// internal structures.
PAGE_RES* TessBaseAPI::Recognize(BLOCK_LIST* block_list, ETEXT_DESC* monitor) {
if (tessedit_resegment_from_boxes)
apply_boxes(block_list);
if (edit_variables)
start_variables_editor();
PAGE_RES* page_res = new PAGE_RES(block_list);
if (interactive_mode) {
pgeditor_main(block_list); //pgeditor user I/F
} else if (tessedit_train_from_boxes) {
apply_box_training(block_list);
} else {
// Now run the main recognition.
recog_all_words(page_res, monitor);
}
return page_res;
}
// Make a text string from the internal data structures.
// The input page_res is deleted.
char* TessBaseAPI::TesseractToText(PAGE_RES* page_res) {
if (page_res != NULL) {
int total_length = 2;
PAGE_RES_IT page_res_it(page_res);
// Iterate over the data structures to extract the recognition result.
for (page_res_it.restart_page(); page_res_it.word () != NULL;
page_res_it.forward()) {
WERD_RES *word = page_res_it.word();
WERD_CHOICE* choice = word->best_choice;
if (choice != NULL) {
total_length += choice->string().length() + 1;
}
}
char* result = new char[total_length];
char* ptr = result;
for (page_res_it.restart_page(); page_res_it.word () != NULL;
page_res_it.forward()) {
WERD_RES *word = page_res_it.word();
WERD_CHOICE* choice = word->best_choice;
if (choice != NULL) {
strcpy(ptr, choice->string().string());
ptr += strlen(ptr);
if (word->word->flag(W_EOL))
*ptr++ = '\n';
else
*ptr++ = ' ';
}
}
*ptr++ = '\n';
*ptr = '\0';
delete page_res;
return result;
}
return NULL;
}

154
ccmain/baseapi.h Normal file
View File

@ -0,0 +1,154 @@
///////////////////////////////////////////////////////////////////////
// File: baseapi.h
// Description: Simple API for calling tesseract.
// Author: Ray Smith
// Created: Fri Oct 06 15:35:01 PDT 2006
//
// (C) Copyright 2006, Google Inc.
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
// http://www.apache.org/licenses/LICENSE-2.0
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
//
///////////////////////////////////////////////////////////////////////
#ifndef THIRD_PARTY_TESSERACT_CCMAIN_BASEAPI_H__
#define THIRD_PARTY_TESSERACT_CCMAIN_BASEAPI_H__
#include <string>
#include "host.h"
#include "ocrclass.h"
class PAGE_RES;
class BLOCK_LIST;
// Base class for all tesseract APIs.
// Specific classes can add ability to work on different inputs or produce
// different outputs.
class TessBaseAPI {
public:
// Start tesseract.
// The datapath must be the name of the data directory or some other file
// in which the data directory resides (for instance argv[0].)
// The configfile is the name of a file in the tessconfigs directory
// (eg batch) or NULL to run on defaults.
// Outputbase may also be NULL, and is the basename of various output files.
// If the output of any of these files is enabled, then a name must be given.
// If numeric_mode is true, only possible digits and roman numbers are
// returned. Returns 0 if successful. Crashes if not.
// The argc and argv may be 0 and NULL respectively. They are used for
// providing config files for debug/display purposes.
// TODO(rays) get the facts straight. Is it OK to call
// it more than once? Make it properly check for errors and return them.
static int Init(const char* datapath, const char* outputbase,
const char* configfile, bool numeric_mode,
int argc, char* argv[]);
// Recognize a rectangle from an image and return the result as a string.
// May be called many times for a single Init.
// Currently has no error checking.
// Greyscale of 8 and color of 24 or 32 bits per pixel may be given.
// Palette color images will not work properly and must be converted to
// 24 bit.
// Binary images of 1 bit per pixel may also be given but they must be
// byte packed with the MSB of the first byte being the first pixel, and a
// 1 represents WHITE. For binary images set bytes_per_pixel=0.
// The recognized text is returned as a char* which (in future will be coded
// as UTF8 and) must be freed with the delete [] operator.
static char* TesseractRect(const UINT8* imagedata,
int bytes_per_pixel,
int bytes_per_line,
int left, int top, int width, int height);
// Call between pages or documents etc to free up memory and forget
// adaptive data.
static void ClearAdaptiveClassifier();
// Close down tesseract and free up memory.
static void End();
// Dump the internal binary image to a PGM file.
static void DumpPGM(const char* filename);
protected:
// Copy the given image rectangle to Tesseract, with adaptive thresholding
// if the image is not already binary.
static void CopyImageToTesseract(const UINT8* imagedata,
int bytes_per_pixel,
int bytes_per_line,
int left, int top, int width, int height);
// Compute the Otsu threshold(s) for the given image rectangle, making one
// for each channel. Each channel is always one byte per pixel.
// Returns an array of threshold values and an array of hi_values, such
// that a pixel value >threshold[channel] is considered foreground if
// hi_values[channel] is 0 or background if 1. A hi_value of -1 indicates
// that there is no apparent foreground. At least one hi_value will not be -1.
// thresholds and hi_values are assumed to be of bytes_per_pixel size.
static void OtsuThreshold(const UINT8* imagedata,
int bytes_per_pixel,
int bytes_per_line,
int left, int top, int right, int bottom,
int* thresholds,
int* hi_values);
// Compute the histogram for the given image rectangle, and the given
// channel. (Channel pointed to by imagedata.) Each channel is always
// one byte per pixel.
// Bytes per pixel is used to skip channels not being
// counted with this call in a multi-channel (pixel-major) image.
// Histogram is always a 256 element array to count occurrences of
// each pixel value.
static void HistogramRect(const UINT8* imagedata,
int bytes_per_pixel,
int bytes_per_line,
int left, int top, int right, int bottom,
int* histogram);
// Compute the Otsu threshold(s) for the given histogram.
// Also returns H = total count in histogram, and
// omega0 = count of histogram below threshold.
static int OtsuStats(const int* histogram,
int* H_out,
int* omega0_out);
// Threshold the given grey or color image into the tesseract global
// image ready for recognition. Requires thresholds and hi_value
// produced by OtsuThreshold above.
static void ThresholdRect(const UINT8* imagedata,
int bytes_per_pixel,
int bytes_per_line,
int left, int top,
int width, int height,
const int* thresholds,
const int* hi_values);
// Cut out the requested rectangle of the binary image to the
// tesseract global image ready for recognition.
static void CopyBinaryRect(const UINT8* imagedata,
int bytes_per_line,
int left, int top,
int width, int height);
// Low-level function to recognize the current global image to a string.
static char* RecognizeToString();
// Find lines from the image making the BLOCK_LIST.
static void FindLines(BLOCK_LIST* block_list);
// Recognize the tesseract global image and return the result as Tesseract
// internal structures.
static PAGE_RES* Recognize(BLOCK_LIST* block_list, ETEXT_DESC* monitor);
// Convert (and free) the internal data structures into a text string.
static char* TesseractToText(PAGE_RES* page_res);
};
#endif // THIRD_PARTY_TESSERACT_CCMAIN_BASEAPI_H__

76
ccmain/blobcmp.cpp Normal file
View File

@ -0,0 +1,76 @@
/**********************************************************************
* File: blobcmp.c (Formerly blobcmp.c)
* Description: Code to compare blobs using the adaptive matcher.
* Author: Ray Smith
* Created: Wed Apr 21 09:28:51 BST 1993
*
* (C) Copyright 1993, Hewlett-Packard Ltd.
** Licensed under the Apache License, Version 2.0 (the "License");
** you may not use this file except in compliance with the License.
** You may obtain a copy of the License at
** http://www.apache.org/licenses/LICENSE-2.0
** Unless required by applicable law or agreed to in writing, software
** distributed under the License is distributed on an "AS IS" BASIS,
** WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
** See the License for the specific language governing permissions and
** limitations under the License.
*
**********************************************************************/
#include "mfcpch.h"
#include "fxdefs.h"
#include "ocrfeatures.h"
#include "intmatcher.h"
#include "intproto.h"
#include "adaptive.h"
#include "adaptmatch.h"
#include "const.h"
#include "tessvars.h"
#define CMP_CLASS 'x'
/**********************************************************************
* compare_tess_blobs
*
* Match 2 blobs using the adaptive classifier.
**********************************************************************/
float compare_tess_blobs(TBLOB *blob1,
TEXTROW *row1,
TBLOB *blob2,
TEXTROW *row2) {
int fcount; /*number of features */
ADAPT_TEMPLATES ad_templates;
LINE_STATS line_stats1, line_stats2;
INT_FEATURE_ARRAY int_features;
FEATURE_SET float_features;
INT_RESULT_STRUCT int_result; /*output */
BIT_VECTOR AllProtosOn = NewBitVector (MAX_NUM_PROTOS);
BIT_VECTOR AllConfigsOn = NewBitVector (MAX_NUM_CONFIGS);
set_all_bits (AllProtosOn, WordsInVectorOfSize (MAX_NUM_PROTOS));
set_all_bits (AllConfigsOn, WordsInVectorOfSize (MAX_NUM_CONFIGS));
EnterClassifyMode;
ad_templates = NewAdaptedTemplates ();
GetLineStatsFromRow(row1, &line_stats1);
/*copy baseline stuff */
GetLineStatsFromRow(row2, &line_stats2);
MakeNewAdaptedClass(blob1, &line_stats1, CMP_CLASS, ad_templates);
fcount = GetAdaptiveFeatures (blob2, &line_stats2,
int_features, &float_features);
if (fcount > 0) {
SetBaseLineMatch();
IntegerMatcher (ClassForClassId (ad_templates->Templates, CMP_CLASS),
AllProtosOn, AllConfigsOn, fcount, fcount,
int_features, 0, 0, &int_result, testedit_match_debug);
FreeFeatureSet(float_features);
if (int_result.Rating < 0)
int_result.Rating = MAX_FLOAT32;
}
free_adapted_templates(ad_templates);
FreeBitVector(AllConfigsOn);
FreeBitVector(AllProtosOn);
return fcount > 0 ? int_result.Rating * fcount : MAX_FLOAT32;
}

29
ccmain/blobcmp.h Normal file
View File

@ -0,0 +1,29 @@
/**********************************************************************
* File: blobcmp.c
* Description: Code to compare blobs using the adaptive matcher.
* Author: Ray Smith
* Created: Wed Apr 21 09:28:51 BST 1993
*
* (C) Copyright 1993, Hewlett-Packard Ltd.
** Licensed under the Apache License, Version 2.0 (the "License");
** you may not use this file except in compliance with the License.
** You may obtain a copy of the License at
** http://www.apache.org/licenses/LICENSE-2.0
** Unless required by applicable law or agreed to in writing, software
** distributed under the License is distributed on an "AS IS" BASIS,
** WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
** See the License for the specific language governing permissions and
** limitations under the License.
*
**********************************************************************/
#ifndef BLOBCMP_H
#define BLOBCMP_H
#include "tstruct.h"
float compare_tess_blobs(TBLOB *blob1,
TEXTROW *row1,
TBLOB *blob2,
TEXTROW *row2);
#endif

93
ccmain/callnet.cpp Normal file
View File

@ -0,0 +1,93 @@
/**********************************************************************
* File: callnet.cpp (Formerly callnet.c)
* Description: Interface to Neural Net matcher
* Author: Phil Cheatle
* Created: Wed Nov 18 10:35:00 GMT 1992
*
* (C) Copyright 1992, Hewlett-Packard Ltd.
** Licensed under the Apache License, Version 2.0 (the "License");
** you may not use this file except in compliance with the License.
** You may obtain a copy of the License at
** http://www.apache.org/licenses/LICENSE-2.0
** Unless required by applicable law or agreed to in writing, software
** distributed under the License is distributed on an "AS IS" BASIS,
** WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
** See the License for the specific language governing permissions and
** limitations under the License.
*
**********************************************************************/
#include "mfcpch.h"
#include "errcode.h"
//#include "nmatch.h"
#include "globals.h"
#define OUTPUT_NODES 94
const ERRCODE NETINIT = "NN init error";
//extern "C"
//{
//extern char* demodir; /* where program lives */
void init_net() { /* Initialise net */
#ifdef ASPIRIN_INCLUDED
char wts_filename[256];
if (nmatch_init_network () != 0) {
NETINIT.error ("Init_net", EXIT, "Errcode %s", nmatch_error_string ());
}
strcpy(wts_filename, demodir);
strcat (wts_filename, "tessdata/netwts");
if (nmatch_load_network (wts_filename) != 0) {
NETINIT.error ("Init_net", EXIT, "Weights failed, Errcode %s",
nmatch_error_string ());
}
#endif
}
void callnet( /* Apply image to net */
float *input_vector,
char *top,
float *top_score,
char *next,
float *next_score) {
#ifdef ASPIRIN_INCLUDED
float *output_vector;
int i;
int max_out_i = 0;
int next_max_out_i = 0;
float max_out = -9;
float next_max_out = -9;
nmatch_set_input(input_vector);
nmatch_propagate_forward();
output_vector = nmatch_get_output ();
/* Now find top two choices */
for (i = 0; i < OUTPUT_NODES; i++) {
if (output_vector[i] > max_out) {
next_max_out = max_out;
max_out = output_vector[i];
next_max_out_i = max_out_i;
max_out_i = i;
}
else {
if (output_vector[i] > next_max_out) {
next_max_out = output_vector[i];
next_max_out_i = i;
}
}
}
*top = max_out_i + '!';
*next = next_max_out_i + '!';
*top_score = max_out;
*next_score = next_max_out;
#endif
}
//};

32
ccmain/callnet.h Normal file
View File

@ -0,0 +1,32 @@
/**********************************************************************
* File: callnet.h (Formerly callnet.h)
* Description: Interface to Neural Net matcher
* Author: Phil Cheatle
* Created: Wed Nov 18 10:35:00 GMT 1992
*
* (C) Copyright 1992, Hewlett-Packard Ltd.
** Licensed under the Apache License, Version 2.0 (the "License");
** you may not use this file except in compliance with the License.
** You may obtain a copy of the License at
** http://www.apache.org/licenses/LICENSE-2.0
** Unless required by applicable law or agreed to in writing, software
** distributed under the License is distributed on an "AS IS" BASIS,
** WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
** See the License for the specific language governing permissions and
** limitations under the License.
*
**********************************************************************/
#ifndef CALLNET_H
#define CALLNET_H
// extern "C" {
void init_net(); /* Initialise net */
void callnet( /* Apply image to net */
float *input_vector,
char *top,
float *top_score,
char *next,
float *next_score);
// };
#endif

710
ccmain/charcut.cpp Normal file
View File

@ -0,0 +1,710 @@
/**********************************************************************
* File: charcut.cpp (Formerly charclip.c)
* Description: Code for character clipping
* Author: Phil Cheatle
* Created: Wed Nov 11 08:35:15 GMT 1992
*
* (C) Copyright 1992, Hewlett-Packard Ltd.
** Licensed under the Apache License, Version 2.0 (the "License");
** you may not use this file except in compliance with the License.
** You may obtain a copy of the License at
** http://www.apache.org/licenses/LICENSE-2.0
** Unless required by applicable law or agreed to in writing, software
** distributed under the License is distributed on an "AS IS" BASIS,
** WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
** See the License for the specific language governing permissions and
** limitations under the License.
*
**********************************************************************/
#include "mfcpch.h"
#include "charcut.h"
#include "imgs.h"
#include "showim.h"
#include "evnts.h"
#include "notdll.h"
#define LARGEST(a,b) ( (a) > (b) ? (a) : (b) )
#define SMALLEST(a,b) ( (a) > (b) ? (b) : (a) )
#define BUG_OFFSET 1
#define EXTERN
EXTERN INT_VAR (pix_word_margin, 3, "How far outside word BB to grow");
extern IMAGE page_image;
ELISTIZE (PIXROW)
/*************************************************************************
* PIXROW::PIXROW()
*
* Constructor for a specified size PIXROW from a blob
*************************************************************************/
PIXROW::PIXROW(INT16 pos, INT16 count, PBLOB *blob) {
OUTLINE_LIST *outline_list;
OUTLINE_IT outline_it;
POLYPT_LIST *pts_list;
POLYPT_IT pts_it;
INT16 i;
FCOORD pt;
FCOORD vec;
float y_coord;
INT16 x_coord;
row_offset = pos;
row_count = count;
min = (INT16 *) alloc_mem (count * sizeof (INT16));
max = (INT16 *) alloc_mem (count * sizeof (INT16));
outline_list = blob->out_list ();
outline_it.set_to_list (outline_list);
for (i = 0; i < count; i++) {
min[i] = MAX_INT16 - 1;
max[i] = -MAX_INT16 + 1;
y_coord = row_offset + i + 0.5;
for (outline_it.mark_cycle_pt ();
!outline_it.cycled_list (); outline_it.forward ()) {
pts_list = outline_it.data ()->polypts ();
pts_it.set_to_list (pts_list);
for (pts_it.mark_cycle_pt ();
!pts_it.cycled_list (); pts_it.forward ()) {
pt = pts_it.data ()->pos;
vec = pts_it.data ()->vec;
if ((vec.y () != 0) &&
(((pt.y () <= y_coord) && (pt.y () + vec.y () >= y_coord))
|| ((pt.y () >= y_coord)
&& (pt.y () + vec.y () <= y_coord)))) {
/* The segment crosses y_coord so find x-point and check for min/max. */
x_coord = (INT16) floor ((y_coord -
pt.y ()) * vec.x () / vec.y () +
pt.x () + 0.5);
if (x_coord < min[i])
min[i] = x_coord;
x_coord--; //to get pix to left of line
if (x_coord > max[i])
max[i] = x_coord;
}
}
}
}
}
/*************************************************************************
* PIXROW::plot()
*
* Draw the PIXROW
*************************************************************************/
#ifndef GRAPHICS_DISABLED
void PIXROW::plot(WINDOW fd //where to paint
) const {
INT16 i;
INT16 y_coord;
for (i = 0; i < row_count; i++) {
y_coord = row_offset + i;
if (min[i] <= max[i]) {
rectangle (fd, min[i], y_coord, max[i] + 1, y_coord + 1);
}
}
}
#endif
/*************************************************************************
* PIXROW::bounding_box()
*
* Generate bounding box for blob image
*************************************************************************/
bool PIXROW::bad_box( //return true if box exceeds image
int xsize,
int ysize) const {
BOX bbox = bounding_box ();
if (bbox.left () < 0 || bbox.right () > xsize
|| bbox.top () > ysize || bbox.bottom () < 0) {
tprintf("Box (%d,%d)->(%d,%d) bad compared to %d,%d\n",
bbox.left(),bbox.bottom(), bbox.right(), bbox.top(),
xsize, ysize);
return true;
}
return false;
}
/*************************************************************************
* PIXROW::bounding_box()
*
* Generate bounding box for blob image
*************************************************************************/
BOX PIXROW::bounding_box() const {
INT16 i;
INT16 y_coord;
INT16 min_x = MAX_INT16 - 1;
INT16 min_y = MAX_INT16 - 1;
INT16 max_x = -MAX_INT16 + 1;
INT16 max_y = -MAX_INT16 + 1;
for (i = 0; i < row_count; i++) {
y_coord = row_offset + i;
if (min[i] <= max[i]) {
if (y_coord < min_y)
min_y = y_coord;
if (y_coord + 1 > max_y)
max_y = y_coord + 1;
if (min[i] < min_x)
min_x = min[i];
if (max[i] + 1 > max_x)
max_x = max[i] + 1;
}
}
if (min_x > max_x || min_y > max_y)
return BOX ();
else
return BOX (ICOORD (min_x, min_y), ICOORD (max_x, max_y));
}
/*************************************************************************
* PIXROW::contract()
*
* Reduce the mins and maxs so that they end on black pixels
*************************************************************************/
void PIXROW::contract( //image array
IMAGELINE *imlines,
INT16 x_offset, //of pixels[0]
INT16 foreground_colour //0 or 1
) {
INT16 i;
UINT8 *line_pixels;
for (i = 0; i < row_count; i++) {
if (min[i] > max[i])
continue;
line_pixels = imlines[i].pixels;
while (line_pixels[min[i] - x_offset] != foreground_colour) {
if (min[i] == max[i]) {
min[i] = MAX_INT16 - 1;
max[i] = -MAX_INT16 + 1;
goto nextline;
}
else
min[i]++;
}
while (line_pixels[max[i] - x_offset] != foreground_colour) {
if (min[i] == max[i]) {
min[i] = MAX_INT16 - 1;
max[i] = -MAX_INT16 + 1;
goto nextline;
}
else
max[i]--;
}
nextline:;
//goto label!
}
}
/*************************************************************************
* PIXROW::extend()
*
* 1 pixel extension in each direction to cover extra black area
*************************************************************************/
BOOL8 PIXROW::extend( //image array
IMAGELINE *imlines,
BOX &imbox,
PIXROW *prev, //for prev blob
PIXROW *next, //for next blob
INT16 foreground_colour) {
INT16 i;
INT16 x_offset = imbox.left ();
INT16 limit;
INT16 left_limit;
INT16 right_limit;
UINT8 *pixels = NULL;
UINT8 *pixels_below = NULL; //row below current
UINT8 *pixels_above = NULL; //row above current
BOOL8 changed = FALSE;
pixels_above = imlines[0].pixels;
for (i = 0; i < row_count; i++) {
pixels_below = pixels;
pixels = pixels_above;
if (i < (row_count - 1))
pixels_above = imlines[i + 1].pixels;
else
pixels_above = NULL;
/* Extend Left by one pixel*/
if (prev == NULL || prev->max[i] < prev->min[i])
limit = imbox.left ();
else
limit = prev->max[i] + 1;
if ((min[i] <= max[i]) &&
(min[i] > limit) &&
(pixels[min[i] - 1 - x_offset] == foreground_colour)) {
min[i]--;
changed = TRUE;
}
/* Extend Right by one pixel*/
if (next == NULL || next->min[i] > next->max[i])
limit = imbox.right () - 1;//-1 to index inside pix
else
limit = next->min[i] - 1;
if ((min[i] <= max[i]) &&
(max[i] < limit) &&
(pixels[max[i] + 1 - x_offset] == foreground_colour)) {
max[i]++;
changed = TRUE;
}
/* Extend down by one row */
if (pixels_below != NULL) {
if (min[i] < min[i - 1]) { //row goes left of row below
if (prev == NULL || prev->max[i - 1] < prev->min[i - 1])
left_limit = min[i];
else
left_limit = LARGEST (min[i], prev->max[i - 1] + 1);
}
else
left_limit = min[i - 1];
if (max[i] > max[i - 1]) { //row goes right of row below
if (next == NULL || next->min[i - 1] > next->max[i - 1])
right_limit = max[i];
else
right_limit = SMALLEST (max[i], next->min[i - 1] - 1);
}
else
right_limit = max[i - 1];
while ((left_limit <= right_limit) &&
(pixels_below[left_limit - x_offset] != foreground_colour))
left_limit++; //find black extremity
if ((left_limit <= right_limit) && (left_limit < min[i - 1])) {
min[i - 1] = left_limit; //widen left if poss
changed = TRUE;
}
while ((left_limit <= right_limit) &&
(pixels_below[right_limit - x_offset] != foreground_colour))
right_limit--; //find black extremity
if ((left_limit <= right_limit) && (right_limit > max[i - 1])) {
max[i - 1] = right_limit;//widen right if poss
changed = TRUE;
}
}
/* Extend up by one row */
if (pixels_above != NULL) {
if (min[i] < min[i + 1]) { //row goes left of row above
if (prev == NULL || prev->min[i + 1] > prev->max[i + 1])
left_limit = min[i];
else
left_limit = LARGEST (min[i], prev->max[i + 1] + 1);
}
else
left_limit = min[i + 1];
if (max[i] > max[i + 1]) { //row goes right of row above
if (next == NULL || next->min[i + 1] > next->max[i + 1])
right_limit = max[i];
else
right_limit = SMALLEST (max[i], next->min[i + 1] - 1);
}
else
right_limit = max[i + 1];
while ((left_limit <= right_limit) &&
(pixels_above[left_limit - x_offset] != foreground_colour))
left_limit++; //find black extremity
if ((left_limit <= right_limit) && (left_limit < min[i + 1])) {
min[i + 1] = left_limit; //widen left if poss
changed = TRUE;
}
while ((left_limit <= right_limit) &&
(pixels_above[right_limit - x_offset] != foreground_colour))
right_limit--; //find black extremity
if ((left_limit <= right_limit) && (right_limit > max[i + 1])) {
max[i + 1] = right_limit;//widen right if poss
changed = TRUE;
}
}
}
return changed;
}
/*************************************************************************
* PIXROW::char_clip_image()
* Cut out a sub image for a character
*************************************************************************/
void PIXROW::char_clip_image( //box of imlines extnt
IMAGELINE *imlines,
BOX &im_box,
ROW *row, //row containing word
IMAGE &clip_image, //unscaled sq subimage
float &baseline_pos //baseline ht in image
) {
INT16 clip_image_xsize; //sub image x size
INT16 clip_image_ysize; //sub image y size
INT16 x_shift; //from pixrow to subim
INT16 y_shift; //from pixrow to subim
BOX char_pix_box; //bbox of char pixels
INT16 y_dest;
INT16 x_min;
INT16 x_max;
INT16 x_min_dest;
INT16 x_max_dest;
INT16 x_width;
INT16 y;
clip_image_xsize = clip_image.get_xsize ();
clip_image_ysize = clip_image.get_ysize ();
char_pix_box = bounding_box ();
/*
The y shift is calculated by first finding the coord of the bottom of the
image relative to the image lines. Then reducing this so by the amount
relative to the clip image size, necessary to vertically position the
character.
*/
y_shift = char_pix_box.bottom () - row_offset -
(INT16) floor ((clip_image_ysize - char_pix_box.height () + 0.5) / 2);
/*
The x_shift is the shift to be applied to the page coord in the pixrow to
generate a centred char in the clip image. Thus the left hand edge of the
char is shifted to the margin width of the centred character.
*/
x_shift = char_pix_box.left () -
(INT16) floor ((clip_image_xsize - char_pix_box.width () + 0.5) / 2);
for (y = 0; y < row_count; y++) {
/*
Check that there is something in this row of the source that will fit in the
sub image. If there is, reduce x range if necessary, then copy it
*/
y_dest = y - y_shift;
if ((min[y] <= max[y]) && (y_dest >= 0) && (y_dest < clip_image_ysize)) {
x_min = min[y];
x_min_dest = x_min - x_shift;
if (x_min_dest < 0) {
x_min = x_min - x_min_dest;
x_min_dest = 0;
}
x_max = max[y];
x_max_dest = x_max - x_shift;
if (x_max_dest > clip_image_xsize - 1) {
x_max = x_max - (x_max_dest - (clip_image_xsize - 1));
x_max_dest = clip_image_xsize - 1;
}
x_width = x_max - x_min + 1;
if (x_width > 0) {
x_min -= im_box.left ();
//offset pixel ptr
imlines[y].pixels += x_min;
clip_image.put_line (x_min_dest, y_dest, x_width, imlines + y,
0);
imlines[y].init (); //reset pixel ptr
}
}
}
/*
Baseline position relative to clip image: First find the baseline relative
to the page origin at the x coord of the centre of the character. Then
make this relative to the character bottom. Finally shift by the margin
between the bottom of the character and the bottom of the clip image.
*/
if (row == NULL)
baseline_pos = 0; //Not needed
else
baseline_pos = row->base_line ((char_pix_box.left () +
char_pix_box.right ()) / 2.0)
- char_pix_box.bottom ()
+ ((clip_image_ysize - char_pix_box.height ()) / 2);
}
/*************************************************************************
* char_clip_word()
*
* Generate a PIXROW_LIST with one element for each blob in the word, together
* with the image lines for the whole word.
*************************************************************************/
void char_clip_word( //
WERD *word, //word to be processed
IMAGE &bin_image, //whole image
PIXROW_LIST *&pixrow_list, //pixrows built
IMAGELINE *&imlines, //lines cut from image
BOX &pix_box //box defining imlines
) {
BOX word_box = word->bounding_box ();
PBLOB_LIST *blob_list;
PBLOB_IT blob_it;
PIXROW_IT pixrow_it;
INT16 pix_offset; //Y pos of pixrow[0]
INT16 row_height; //No of pix rows
INT16 imlines_x_offset;
PIXROW *prev;
PIXROW *next;
PIXROW *current;
BOOL8 changed; //still improving
BOOL8 just_changed; //still improving
INT16 iteration_count = 0;
INT16 foreground_colour;
if (word->flag (W_INVERSE))
foreground_colour = 1;
else
foreground_colour = 0;
/* Define region for max pixrow expansion */
pix_box = word_box;
pix_box.move_bottom_edge (-pix_word_margin);
pix_box.move_top_edge (pix_word_margin);
pix_box.move_left_edge (-pix_word_margin);
pix_box.move_right_edge (pix_word_margin);
pix_box -= BOX (ICOORD (0, 0 + BUG_OFFSET),
ICOORD (bin_image.get_xsize (),
bin_image.get_ysize () - BUG_OFFSET));
/* Generate pixrows list */
pix_offset = pix_box.bottom ();
row_height = pix_box.height ();
blob_list = word->blob_list ();
blob_it.set_to_list (blob_list);
pixrow_list = new PIXROW_LIST;
pixrow_it.set_to_list (pixrow_list);
for (blob_it.mark_cycle_pt (); !blob_it.cycled_list (); blob_it.forward ()) {
PIXROW *row = new PIXROW (pix_offset, row_height, blob_it.data ());
ASSERT_HOST (!row->
bad_box (bin_image.get_xsize (), bin_image.get_ysize ()));
pixrow_it.add_after_then_move (row);
}
imlines = generate_imlines (bin_image, pix_box);
/* Contract pixrows - shrink min and max back to black pixels */
imlines_x_offset = pix_box.left ();
pixrow_it.move_to_first ();
for (pixrow_it.mark_cycle_pt ();
!pixrow_it.cycled_list (); pixrow_it.forward ()) {
ASSERT_HOST (!pixrow_it.data ()->
bad_box (bin_image.get_xsize (), bin_image.get_ysize ()));
pixrow_it.data ()->contract (imlines, imlines_x_offset,
foreground_colour);
ASSERT_HOST (!pixrow_it.data ()->
bad_box (bin_image.get_xsize (), bin_image.get_ysize ()));
}
/* Expand pixrows iteratively 1 pixel at a time */
do {
changed = FALSE;
pixrow_it.move_to_first ();
prev = NULL;
current = NULL;
next = pixrow_it.data ();
for (pixrow_it.mark_cycle_pt ();
!pixrow_it.cycled_list (); pixrow_it.forward ()) {
prev = current;
current = next;
if (pixrow_it.at_last ())
next = NULL;
else
next = pixrow_it.data_relative (1);
just_changed = current->extend (imlines, pix_box, prev, next,
foreground_colour);
ASSERT_HOST (!current->
bad_box (bin_image.get_xsize (),
bin_image.get_ysize ()));
changed = changed || just_changed;
}
iteration_count++;
}
while (changed);
}
/*************************************************************************
* generate_imlines()
* Get an array of IMAGELINES holding a portion of an image
*************************************************************************/
IMAGELINE *generate_imlines( //get some imagelines
IMAGE &bin_image, //from here
BOX &pix_box) {
IMAGELINE *imlines; //array of lines
int i;
imlines = new IMAGELINE[pix_box.height ()];
for (i = 0; i < pix_box.height (); i++) {
imlines[i].init (pix_box.width ());
//coord to start at
bin_image.fast_get_line (pix_box.left (),
pix_box.bottom () + i + BUG_OFFSET,
//line to get
pix_box.width (), //width to get
imlines + i); //dest imline
}
return imlines;
}
/*************************************************************************
* display_clip_image()
* All the boring user interface bits to let you see what's going on
*************************************************************************/
#ifndef GRAPHICS_DISABLED
WINDOW display_clip_image(WERD *word, //word to be processed
IMAGE &bin_image, //whole image
PIXROW_LIST *pixrow_list, //pixrows built
BOX &pix_box //box of subimage
) {
WINDOW clip_window; //window for debug
BOX word_box = word->bounding_box ();
int border = word_box.height () / 2;
BOX display_box = word_box;
display_box.move_bottom_edge (-border);
display_box.move_top_edge (border);
display_box.move_left_edge (-border);
display_box.move_right_edge (border);
display_box -= BOX (ICOORD (0, 0 - BUG_OFFSET),
ICOORD (bin_image.get_xsize (),
bin_image.get_ysize () - BUG_OFFSET));
pgeditor_msg ("Creating Clip window...");
clip_window =
create_window ("Clipped Blobs",
SCROLLINGWIN,
editor_word_xpos, editor_word_ypos,
3 * (word_box.width () + 2 * border),
3 * (word_box.height () + 2 * border),
//window width,height
// xmin, xmax
display_box.left (), display_box.right (),
display_box.bottom () - BUG_OFFSET,
display_box.top () - BUG_OFFSET,
// ymin, ymax
TRUE, FALSE, FALSE, TRUE); // down event & key only
pgeditor_msg ("Creating Clip window...Done");
clear_view_surface(clip_window);
show_sub_image (&bin_image,
display_box.left (),
display_box.bottom (),
display_box.width (),
display_box.height (),
clip_window,
display_box.left (), display_box.bottom () - BUG_OFFSET);
word->plot (clip_window, RED);
word_box.plot (clip_window, INT_HOLLOW, TRUE, BLUE, BLUE);
pix_box.plot (clip_window, INT_HOLLOW, TRUE, BLUE, BLUE);
plot_pixrows(pixrow_list, clip_window);
overlap_picture_ops(TRUE);
return clip_window;
}
/*************************************************************************
* display_images()
* Show a pair of clip and scaled character images and wait for key before
* continuing.
*************************************************************************/
void display_images(IMAGE &clip_image, IMAGE &scaled_image) {
WINDOW clip_im_window; //window for debug
WINDOW scale_im_window; //window for debug
INT16 i;
GRAPHICS_EVENT event; // c;
// xmin xmax ymin ymax
clip_im_window = create_window ("Clipped Blob", SCROLLINGWIN, editor_word_xpos - 20, editor_word_ypos - 100, 5 * clip_image.get_xsize (), 5 * clip_image.get_ysize (), 0, clip_image.get_xsize (), 0, clip_image.get_ysize (),
TRUE, FALSE, FALSE, TRUE); // down event & key only
clear_view_surface(clip_im_window);
show_sub_image (&clip_image,
0, 0,
clip_image.get_xsize (), clip_image.get_ysize (),
clip_im_window, 0, 0);
line_color_index(clip_im_window, RED);
for (i = 1; i < clip_image.get_xsize (); i++) {
move2d (clip_im_window, i, 0);
draw2d (clip_im_window, i, clip_image.get_xsize ());
}
for (i = 1; i < clip_image.get_ysize (); i++) {
move2d (clip_im_window, 0, i);
draw2d (clip_im_window, clip_image.get_xsize (), i);
}
// xmin xmax ymin ymax
scale_im_window = create_window ("Scaled Blob", SCROLLINGWIN, editor_word_xpos + 300, editor_word_ypos - 100, 5 * scaled_image.get_xsize (), 5 * scaled_image.get_ysize (), 0, scaled_image.get_xsize (), 0, scaled_image.get_ysize (),
TRUE, FALSE, FALSE, TRUE); // down event & key only
clear_view_surface(scale_im_window);
show_sub_image (&scaled_image,
0, 0,
scaled_image.get_xsize (), scaled_image.get_ysize (),
scale_im_window, 0, 0);
line_color_index(scale_im_window, RED);
for (i = 1; i < scaled_image.get_xsize (); i++) {
move2d (scale_im_window, i, 0);
draw2d (scale_im_window, i, scaled_image.get_xsize ());
}
for (i = 1; i < scaled_image.get_ysize (); i++) {
move2d (scale_im_window, 0, i);
draw2d (scale_im_window, scaled_image.get_xsize (), i);
}
overlap_picture_ops(TRUE);
await_event(scale_im_window, TRUE, ANY_EVENT, &event);
destroy_window(clip_im_window);
destroy_window(scale_im_window);
}
/*************************************************************************
* plot_pixrows()
* Display a list of pixrows
*************************************************************************/
void plot_pixrows( //plot for all blobs
PIXROW_LIST *pixrow_list,
WINDOW win) {
PIXROW_IT pixrow_it(pixrow_list);
INT16 colour = RED;
for (pixrow_it.mark_cycle_pt ();
!pixrow_it.cycled_list (); pixrow_it.forward ()) {
if (colour > RED + 7)
colour = RED;
perimeter_color_index (win, (COLOUR) colour);
interior_style(win, INT_HOLLOW, TRUE);
pixrow_it.data ()->plot (win);
colour++;
}
}
#endif

119
ccmain/charcut.h Normal file
View File

@ -0,0 +1,119 @@
/**********************************************************************
* File: charcut.h (Formerly charclip.h)
* Description: Code for character clipping
* Author: Phil Cheatle
* Created: Wed Nov 11 08:35:15 GMT 1992
*
* (C) Copyright 1991, Hewlett-Packard Ltd.
** Licensed under the Apache License, Version 2.0 (the "License");
** you may not use this file except in compliance with the License.
** You may obtain a copy of the License at
** http://www.apache.org/licenses/LICENSE-2.0
** Unless required by applicable law or agreed to in writing, software
** distributed under the License is distributed on an "AS IS" BASIS,
** WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
** See the License for the specific language governing permissions and
** limitations under the License.
*
**********************************************************************/
#ifndef CHARCUT_H
#define CHARCUT_H
#include "pgedit.h"
#include "notdll.h"
#include "notdll.h"
/*************************************************************************
* CLASS PIXROW
*
* This class describes the pixels occupied by a blob. It uses two arrays, (min
* and max), each with one element per row, to identify the min and max x
* coordinates of the black pixels in the character on that row of the image.
* The number of rows used to describe the blob is held in row_count - note that
* some rows may be unoccupied - signified by max < min. The page coordinate of
* the row defined by min[0] and max[0] is held in row_offset.
*************************************************************************/
class PIXROW:public ELIST_LINK
{
public:
INT16 row_offset; //y coord of min[0]
INT16 row_count; //length of arrays
INT16 *min; //array of min x
INT16 *max; //array of max x
PIXROW() { //empty constructor
row_offset = 0;
row_count = 0;
min = NULL;
max = NULL;
}
PIXROW( //specified size
INT16 pos,
INT16 count,
PBLOB *blob);
~PIXROW () { //destructor
if (min != NULL)
free_mem(min);
if (max != NULL)
free_mem(max);
max = NULL;
}
void plot( //use current settings
WINDOW fd) const; //where to paint
BOX bounding_box() const; //return bounding box
//return true if box exceeds image
bool bad_box(int xsize, int ysize) const;
void contract( //force end on black
IMAGELINE *imlines, //image array
INT16 x_offset, //of pixels[0]
INT16 foreground_colour); //0 or 1
//image array
BOOL8 extend(IMAGELINE *imlines,
BOX &imbox,
PIXROW *prev, //for prev blob
PIXROW *next, //for next blob
INT16 foreground_colour); //0 or 1
//box of imlines extnt
void char_clip_image(IMAGELINE *imlines,
BOX &im_box,
ROW *row, //row containing word
IMAGE &clip_image, //unscaled char image
float &baseline_pos); //baseline ht in image
};
ELISTIZEH (PIXROW)
extern INT_VAR_H (pix_word_margin, 3, "How far outside word BB to grow");
extern BOOL_VAR_H (show_char_clipping, TRUE, "Show clip image window?");
extern INT_VAR_H (net_image_width, 40, "NN input image width");
extern INT_VAR_H (net_image_height, 36, "NN input image height");
extern INT_VAR_H (net_image_x_height, 22, "NN input image x_height");
void char_clip_word( //
WERD *word, //word to be processed
IMAGE &bin_image, //whole image
PIXROW_LIST *&pixrow_list, //pixrows built
IMAGELINE *&imlines, //lines cut from image
BOX &pix_box //box defining imlines
);
IMAGELINE *generate_imlines( //get some imagelines
IMAGE &bin_image, //from here
BOX &pix_box);
//word to be processed
WINDOW display_clip_image(WERD *word,
IMAGE &bin_image, //whole image
PIXROW_LIST *pixrow_list, //pixrows built
BOX &pix_box //box of subimage
);
void display_images(IMAGE &clip_image, IMAGE &scaled_image);
void plot_pixrows( //plot for all blobs
PIXROW_LIST *pixrow_list,
WINDOW win);
#endif

698
ccmain/charsample.cpp Normal file
View File

@ -0,0 +1,698 @@
/**********************************************************************
* File: charsample.cpp (Formerly charsample.c)
* Description: Class to contain character samples and match scores
* to be used for adaption
* Author: Chris Newton
* Created: Thu Oct 7 13:40:37 BST 1993
*
* (C) Copyright 1993, Hewlett-Packard Ltd.
** Licensed under the Apache License, Version 2.0 (the "License");
** you may not use this file except in compliance with the License.
** You may obtain a copy of the License at
** http://www.apache.org/licenses/LICENSE-2.0
** Unless required by applicable law or agreed to in writing, software
** distributed under the License is distributed on an "AS IS" BASIS,
** WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
** See the License for the specific language governing permissions and
** limitations under the License.
*
**********************************************************************/
#include "mfcpch.h"
#include <stdio.h>
#include <ctype.h>
#include <math.h>
#ifdef __UNIX__
#include <assert.h>
#include <unistd.h>
#endif
#include "memry.h"
#include "tessvars.h"
#include "statistc.h"
#include "charsample.h"
#include "paircmp.h"
#include "matmatch.h"
#include "adaptions.h"
#include "secname.h"
#include "notdll.h"
extern INT32 demo_word; // Hack for demos
ELISTIZE (CHAR_SAMPLE) ELISTIZE (CHAR_SAMPLES) CHAR_SAMPLE::CHAR_SAMPLE () {
sample_blob = NULL;
sample_denorm = NULL;
sample_image = NULL;
ch = '\0';
n_samples_matched = 0;
total_match_scores = 0.0;
sumsq_match_scores = 0.0;
}
CHAR_SAMPLE::CHAR_SAMPLE(PBLOB *blob, DENORM *denorm, char c) {
sample_blob = blob;
sample_denorm = denorm;
sample_image = NULL;
ch = c;
n_samples_matched = 0;
total_match_scores = 0.0;
sumsq_match_scores = 0.0;
}
CHAR_SAMPLE::CHAR_SAMPLE(IMAGE *image, char c) {
sample_blob = NULL;
sample_denorm = NULL;
sample_image = image;
ch = c;
n_samples_matched = 0;
total_match_scores = 0.0;
sumsq_match_scores = 0.0;
}
float CHAR_SAMPLE::match_sample( // Update match scores
CHAR_SAMPLE *test_sample,
BOOL8 updating) {
float score1;
float score2;
IMAGE *image = test_sample->image ();
if (sample_blob != NULL && test_sample->blob () != NULL) {
PBLOB *blob = test_sample->blob ();
DENORM *denorm = test_sample->denorm ();
score1 = compare_bln_blobs (sample_blob, sample_denorm, blob, denorm);
score2 = compare_bln_blobs (blob, denorm, sample_blob, sample_denorm);
score1 = (score1 > score2) ? score1 : score2;
}
else if (sample_image != NULL && image != NULL) {
CHAR_PROTO *sample = new CHAR_PROTO (this);
score1 = matrix_match (sample_image, image);
delete sample;
}
else
return BAD_SCORE;
if ((tessedit_use_best_sample || tessedit_cluster_debug) && updating) {
n_samples_matched++;
total_match_scores += score1;
sumsq_match_scores += score1 * score1;
}
return score1;
}
double CHAR_SAMPLE::mean_score() {
if (n_samples_matched > 0)
return (total_match_scores / n_samples_matched);
else
return BAD_SCORE;
}
double CHAR_SAMPLE::variance() {
double mean = mean_score ();
if (n_samples_matched > 0) {
return (sumsq_match_scores / n_samples_matched) - mean * mean;
}
else
return BAD_SCORE;
}
void CHAR_SAMPLE::print(FILE *f) {
if (!tessedit_cluster_debug)
return;
if (n_samples_matched > 0)
fprintf (f,
"%c - sample matched against " INT32FORMAT
" blobs, mean: %f, var: %f\n", ch, n_samples_matched,
mean_score (), variance ());
else
fprintf (f, "No matches for this sample (%c)\n", ch);
}
void CHAR_SAMPLE::reset_match_statistics() {
n_samples_matched = 0;
total_match_scores = 0.0;
sumsq_match_scores = 0.0;
}
CHAR_SAMPLES::CHAR_SAMPLES() {
type = UNKNOWN;
samples.clear ();
ch = '\0';
best_sample = NULL;
proto = NULL;
}
CHAR_SAMPLES::CHAR_SAMPLES(CHAR_SAMPLE *sample) {
CHAR_SAMPLE_IT sample_it = &samples;
ASSERT_HOST (sample->image () != NULL || sample->blob () != NULL);
if (sample->image () != NULL)
type = IMAGE_CLUSTER;
else if (sample->blob () != NULL)
type = BLOB_CLUSTER;
samples.clear ();
sample_it.add_to_end (sample);
if (tessedit_mm_only_match_same_char)
ch = sample->character ();
else
ch = '\0';
best_sample = NULL;
proto = NULL;
}
void CHAR_SAMPLES::add_sample(CHAR_SAMPLE *sample) {
CHAR_SAMPLE_IT sample_it = &samples;
if (tessedit_use_best_sample || tessedit_cluster_debug)
for (sample_it.mark_cycle_pt ();
!sample_it.cycled_list (); sample_it.forward ()) {
sample_it.data ()->match_sample (sample, TRUE);
sample->match_sample (sample_it.data (), TRUE);
}
sample_it.add_to_end (sample);
if (tessedit_mm_use_prototypes && type == IMAGE_CLUSTER)
if (samples.length () == tessedit_mm_prototype_min_size)
this->build_prototype ();
else if (samples.length () > tessedit_mm_prototype_min_size)
this->add_sample_to_prototype (sample);
}
void CHAR_SAMPLES::add_sample_to_prototype(CHAR_SAMPLE *sample) {
BOOL8 rebuild = FALSE;
INT32 new_xsize = proto->x_size ();
INT32 new_ysize = proto->y_size ();
INT32 sample_xsize = sample->image ()->get_xsize ();
INT32 sample_ysize = sample->image ()->get_ysize ();
if (sample_xsize > new_xsize) {
new_xsize = sample_xsize;
rebuild = TRUE;
}
if (sample_ysize > new_ysize) {
new_ysize = sample_ysize;
rebuild = TRUE;
}
if (rebuild)
proto->enlarge_prototype (new_xsize, new_ysize);
proto->add_sample (sample);
}
void CHAR_SAMPLES::build_prototype() {
CHAR_SAMPLE_IT sample_it = &samples;
CHAR_SAMPLE *sample;
INT32 proto_xsize = 0;
INT32 proto_ysize = 0;
if (type != IMAGE_CLUSTER
|| samples.length () < tessedit_mm_prototype_min_size)
return;
for (sample_it.mark_cycle_pt ();
!sample_it.cycled_list (); sample_it.forward ()) {
sample = sample_it.data ();
if (sample->image ()->get_xsize () > proto_xsize)
proto_xsize = sample->image ()->get_xsize ();
if (sample->image ()->get_ysize () > proto_ysize)
proto_ysize = sample->image ()->get_ysize ();
}
proto = new CHAR_PROTO (proto_xsize, proto_ysize, 0, 0, '\0');
for (sample_it.mark_cycle_pt ();
!sample_it.cycled_list (); sample_it.forward ())
this->add_sample_to_prototype (sample_it.data ());
}
void CHAR_SAMPLES::find_best_sample() {
CHAR_SAMPLE_IT sample_it = &samples;
double score;
double best_score = MAX_INT32;
if (ch == '\0' || samples.length () < tessedit_mm_prototype_min_size)
return;
for (sample_it.mark_cycle_pt ();
!sample_it.cycled_list (); sample_it.forward ()) {
score = sample_it.data ()->mean_score ();
if (score < best_score) {
best_score = score;
best_sample = sample_it.data ();
}
}
#ifndef SECURE_NAMES
if (tessedit_cluster_debug) {
tprintf ("Best sample for this %c cluster:\n", ch);
best_sample->print (debug_fp);
}
#endif
}
float CHAR_SAMPLES::match_score(CHAR_SAMPLE *sample) {
if (tessedit_mm_only_match_same_char && sample->character () != ch)
return BAD_SCORE;
if (tessedit_use_best_sample && best_sample != NULL)
return best_sample->match_sample (sample, FALSE);
else if ((tessedit_mm_use_prototypes
|| tessedit_mm_adapt_using_prototypes) && proto != NULL)
return proto->match_sample (sample);
else
return this->nn_match_score (sample);
}
float CHAR_SAMPLES::nn_match_score(CHAR_SAMPLE *sample) {
CHAR_SAMPLE_IT sample_it = &samples;
float score;
float min_score = MAX_INT32;
for (sample_it.mark_cycle_pt ();
!sample_it.cycled_list (); sample_it.forward ()) {
score = sample_it.data ()->match_sample (sample, FALSE);
if (score < min_score)
min_score = score;
}
return min_score;
}
void CHAR_SAMPLES::assign_to_char() {
STATS char_frequency(FIRST_CHAR, LAST_CHAR);
CHAR_SAMPLE_IT sample_it = &samples;
INT32 i;
INT32 max_index = 0;
INT32 max_freq = 0;
if (samples.length () == 0 || tessedit_mm_only_match_same_char)
return;
for (sample_it.mark_cycle_pt ();
!sample_it.cycled_list (); sample_it.forward ())
char_frequency.add ((INT32) sample_it.data ()->character (), 1);
for (i = FIRST_CHAR; i <= LAST_CHAR; i++)
if (char_frequency.pile_count (i) > max_freq) {
max_index = i;
max_freq = char_frequency.pile_count (i);
}
if (samples.length () >= tessedit_cluster_min_size
&& max_freq > samples.length () * tessedit_cluster_accept_fraction)
ch = (char) max_index;
}
void CHAR_SAMPLES::print(FILE *f) {
CHAR_SAMPLE_IT sample_it = &samples;
fprintf (f, "Collected " INT32FORMAT " samples\n", samples.length ());
#ifndef SECURE_NAMES
if (tessedit_cluster_debug)
for (sample_it.mark_cycle_pt ();
!sample_it.cycled_list (); sample_it.forward ())
sample_it.data ()->print (f);
if (ch == '\0')
fprintf (f, "\nCluster not used for adaption\n");
else
fprintf (f, "\nCluster used to adapt to '%c's\n", ch);
#endif
}
CHAR_PROTO::CHAR_PROTO() {
xsize = 0;
ysize = 0;
ch = '\0';
nsamples = 0;
proto_data = NULL;
proto = NULL;
}
CHAR_PROTO::CHAR_PROTO(INT32 x_size,
INT32 y_size,
INT32 n_samples,
float initial_value,
char c) {
INT32 x;
INT32 y;
xsize = x_size;
ysize = y_size;
ch = c;
nsamples = n_samples;
ALLOC_2D_ARRAY(xsize, ysize, proto_data, proto, float);
for (y = 0; y < ysize; y++)
for (x = 0; x < xsize; x++)
proto[x][y] = initial_value;
}
CHAR_PROTO::CHAR_PROTO(CHAR_SAMPLE *sample) {
INT32 x;
INT32 y;
IMAGELINE imline_s;
if (sample->image () == NULL) {
xsize = 0;
ysize = 0;
ch = '\0';
nsamples = 0;
proto_data = NULL;
proto = NULL;
}
else {
ch = sample->character ();
xsize = sample->image ()->get_xsize ();
ysize = sample->image ()->get_ysize ();
nsamples = 1;
ALLOC_2D_ARRAY(xsize, ysize, proto_data, proto, float);
for (y = 0; y < ysize; y++) {
sample->image ()->fast_get_line (0, y, xsize, &imline_s);
for (x = 0; x < xsize; x++)
if (imline_s.pixels[x] == BINIM_WHITE)
proto[x][y] = 1.0;
else
proto[x][y] = -1.0;
}
}
}
CHAR_PROTO::~CHAR_PROTO () {
if (proto_data != NULL)
FREE_2D_ARRAY(proto_data, proto);
}
float CHAR_PROTO::match_sample(CHAR_SAMPLE *test_sample) {
CHAR_PROTO *test_proto;
float score;
if (test_sample->image () != NULL) {
test_proto = new CHAR_PROTO (test_sample);
if (xsize > test_proto->x_size ())
score = this->match (test_proto);
else {
demo_word = -demo_word; // Flag different call
score = test_proto->match (this);
}
}
else
return BAD_SCORE;
delete test_proto;
return score;
}
float CHAR_PROTO::match(CHAR_PROTO *test_proto) {
INT32 xsize2 = test_proto->x_size ();
INT32 y_size;
INT32 y_size2;
INT32 x_offset;
INT32 y_offset;
INT32 x;
INT32 y;
CHAR_PROTO *match_proto;
float score;
float sum = 0.0;
ASSERT_HOST (xsize >= xsize2);
x_offset = (xsize - xsize2) / 2;
if (ysize < test_proto->y_size ()) {
y_size = test_proto->y_size ();
y_size2 = ysize;
y_offset = (y_size - y_size2) / 2;
match_proto = new CHAR_PROTO (xsize,
y_size,
nsamples * test_proto->n_samples (),
0, '\0');
for (y = 0; y < y_offset; y++) {
for (x = 0; x < xsize2; x++) {
match_proto->data ()[x + x_offset][y] =
test_proto->data ()[x][y] * nsamples;
sum += match_proto->data ()[x + x_offset][y];
}
}
for (y = y_offset + y_size2; y < y_size; y++) {
for (x = 0; x < xsize2; x++) {
match_proto->data ()[x + x_offset][y] =
test_proto->data ()[x][y] * nsamples;
sum += match_proto->data ()[x + x_offset][y];
}
}
for (y = y_offset; y < y_offset + y_size2; y++) {
for (x = 0; x < x_offset; x++) {
match_proto->data ()[x][y] = proto[x][y - y_offset] *
test_proto->n_samples ();
sum += match_proto->data ()[x][y];
}
for (x = x_offset + xsize2; x < xsize; x++) {
match_proto->data ()[x][y] = proto[x][y - y_offset] *
test_proto->n_samples ();
sum += match_proto->data ()[x][y];
}
for (x = x_offset; x < x_offset + xsize2; x++) {
match_proto->data ()[x][y] =
proto[x][y - y_offset] * test_proto->data ()[x - x_offset][y];
sum += match_proto->data ()[x][y];
}
}
}
else {
y_size = ysize;
y_size2 = test_proto->y_size ();
y_offset = (y_size - y_size2) / 2;
match_proto = new CHAR_PROTO (xsize,
y_size,
nsamples * test_proto->n_samples (),
0, '\0');
for (y = 0; y < y_offset; y++)
for (x = 0; x < xsize; x++) {
match_proto->data ()[x][y] =
proto[x][y] * test_proto->n_samples ();
sum += match_proto->data ()[x][y];
}
for (y = y_offset + y_size2; y < y_size; y++)
for (x = 0; x < xsize; x++) {
match_proto->data ()[x][y] =
proto[x][y] * test_proto->n_samples ();
sum += match_proto->data ()[x][y];
}
for (y = y_offset; y < y_offset + y_size2; y++) {
for (x = 0; x < x_offset; x++) {
match_proto->data ()[x][y] =
proto[x][y] * test_proto->n_samples ();
sum += match_proto->data ()[x][y];
}
for (x = x_offset + xsize2; x < xsize; x++) {
match_proto->data ()[x][y] =
proto[x][y] * test_proto->n_samples ();
sum += match_proto->data ()[x][y];
}
for (x = x_offset; x < x_offset + xsize2; x++) {
match_proto->data ()[x][y] = proto[x][y] *
test_proto->data ()[x - x_offset][y - y_offset];
sum += match_proto->data ()[x][y];
}
}
}
score = (1.0 - sum /
(xsize * y_size * nsamples * test_proto->n_samples ()));
if (tessedit_mm_debug) {
if (score < 0) {
tprintf ("Match score %f\n", score);
tprintf ("x: %d, y: %d, ns: %d, nt: %d, dx %d, dy: %d\n",
xsize, y_size, nsamples, test_proto->n_samples (),
x_offset, y_offset);
for (y = 0; y < y_size; y++) {
tprintf ("\n%d", y);
for (x = 0; x < xsize; x++)
tprintf ("\t%d", match_proto->data ()[x][y]);
}
tprintf ("\n");
fflush(debug_fp);
}
}
#ifndef GRAPHICS_DISABLED
if (tessedit_display_mm) {
tprintf ("Match score %f\n", score);
display_images (this->make_image (),
test_proto->make_image (), match_proto->make_image ());
}
else if (demo_word != 0) {
if (demo_word > 0)
display_image (test_proto->make_image (), "Test sample",
300, 400, FALSE);
else
display_image (this->make_image (), "Test sample", 300, 400, FALSE);
display_image (match_proto->make_image (), "Best match",
700, 400, TRUE);
}
#endif
delete match_proto;
return score;
}
void CHAR_PROTO::enlarge_prototype(INT32 new_xsize, INT32 new_ysize) {
float *old_proto_data = proto_data;
float **old_proto = proto;
INT32 old_xsize = xsize;
INT32 old_ysize = ysize;
INT32 x_offset;
INT32 y_offset;
INT32 x;
INT32 y;
ASSERT_HOST (new_xsize >= xsize && new_ysize >= ysize);
xsize = new_xsize;
ysize = new_ysize;
ALLOC_2D_ARRAY(xsize, ysize, proto_data, proto, float);
x_offset = (xsize - old_xsize) / 2;
y_offset = (ysize - old_ysize) / 2;
for (y = 0; y < y_offset; y++)
for (x = 0; x < xsize; x++)
proto[x][y] = nsamples;
for (y = y_offset + old_ysize; y < ysize; y++)
for (x = 0; x < xsize; x++)
proto[x][y] = nsamples;
for (y = y_offset; y < y_offset + old_ysize; y++) {
for (x = 0; x < x_offset; x++)
proto[x][y] = nsamples;
for (x = x_offset + old_xsize; x < xsize; x++)
proto[x][y] = nsamples;
for (x = x_offset; x < x_offset + old_xsize; x++)
proto[x][y] = old_proto[x - x_offset][y - y_offset];
}
FREE_2D_ARRAY(old_proto_data, old_proto);
}
void CHAR_PROTO::add_sample(CHAR_SAMPLE *sample) {
INT32 x_offset;
INT32 y_offset;
INT32 x;
INT32 y;
IMAGELINE imline_s;
INT32 sample_xsize = sample->image ()->get_xsize ();
INT32 sample_ysize = sample->image ()->get_ysize ();
x_offset = (xsize - sample_xsize) / 2;
y_offset = (ysize - sample_ysize) / 2;
ASSERT_HOST (x_offset >= 0 && y_offset >= 0);
for (y = 0; y < y_offset; y++)
for (x = 0; x < xsize; x++)
proto[x][y]++; // Treat pixels outside the
// range as white
for (y = y_offset + sample_ysize; y < ysize; y++)
for (x = 0; x < xsize; x++)
proto[x][y]++;
for (y = y_offset; y < y_offset + sample_ysize; y++) {
sample->image ()->fast_get_line (0,
y - y_offset, sample_xsize, &imline_s);
for (x = x_offset; x < x_offset + sample_xsize; x++) {
if (imline_s.pixels[x - x_offset] == BINIM_WHITE)
proto[x][y]++;
else
proto[x][y]--;
}
for (x = 0; x < x_offset; x++)
proto[x][y]++;
for (x = x_offset + sample_xsize; x < xsize; x++)
proto[x][y]++;
}
nsamples++;
}
IMAGE *CHAR_PROTO::make_image() {
IMAGE *image;
IMAGELINE imline_p;
INT32 x;
INT32 y;
ASSERT_HOST (nsamples != 0);
image = new (IMAGE);
image->create (xsize, ysize, 8);
for (y = 0; y < ysize; y++) {
image->fast_get_line (0, y, xsize, &imline_p);
for (x = 0; x < xsize; x++) {
imline_p.pixels[x] = 128 +
(UINT8) ((proto[x][y] * 128.0) / (0.00001 + nsamples));
}
image->fast_put_line (0, y, xsize, &imline_p);
}
return image;
}

1668
ccmain/control.cpp Normal file

File diff suppressed because it is too large Load Diff

193
ccmain/control.h Normal file
View File

@ -0,0 +1,193 @@
/**********************************************************************
* File: control.h (Formerly control.h)
* Description: Module-independent matcher controller.
* Author: Ray Smith
* Created: Thu Apr 23 11:09:58 BST 1992
*
* (C) Copyright 1992, Hewlett-Packard Ltd.
** Licensed under the Apache License, Version 2.0 (the "License");
** you may not use this file except in compliance with the License.
** You may obtain a copy of the License at
** http://www.apache.org/licenses/LICENSE-2.0
** Unless required by applicable law or agreed to in writing, software
** distributed under the License is distributed on an "AS IS" BASIS,
** WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
** See the License for the specific language governing permissions and
** limitations under the License.
*
**********************************************************************/
#ifndef CONTROL_H
#define CONTROL_H
#include "varable.h"
#include "ocrblock.h"
//#include "epapdest.h"
#include "ratngs.h"
#include "statistc.h"
//#include "epapconv.h"
#include "ocrshell.h"
#include "pageres.h"
#include "charsample.h"
#include "notdll.h"
enum ACCEPTABLE_WERD_TYPE
{
AC_UNACCEPTABLE, //Unacceptable word
AC_LOWER_CASE, //ALL lower case
AC_UPPER_CASE, //ALL upper case
AC_INITIAL_CAP, //ALL but initial lc
AC_LC_ABBREV, //a.b.c.
AC_UC_ABBREV //A.B.C.
};
typedef BOOL8 (*BLOB_REJECTOR) (PBLOB *, BLOB_CHOICE_IT *, void *);
extern INT_VAR_H (tessedit_single_match, FALSE, "Top choice only from CP");
//extern BOOL_VAR_H(tessedit_small_match,FALSE,"Use small matrix matcher");
extern BOOL_VAR_H (tessedit_print_text, FALSE, "Write text to stdout");
extern BOOL_VAR_H (tessedit_draw_words, FALSE, "Draw source words");
extern BOOL_VAR_H (tessedit_draw_outwords, FALSE, "Draw output words");
extern BOOL_VAR_H (tessedit_training_wiseowl, FALSE,
"Call WO to learn blobs");
extern BOOL_VAR_H (tessedit_training_tess, FALSE, "Call Tess to learn blobs");
extern BOOL_VAR_H (tessedit_matcher_is_wiseowl, FALSE, "Call WO to classify");
extern BOOL_VAR_H (tessedit_dump_choices, FALSE, "Dump char choices");
extern BOOL_VAR_H (tessedit_fix_fuzzy_spaces, TRUE,
"Try to improve fuzzy spaces");
extern BOOL_VAR_H (tessedit_unrej_any_wd, FALSE,
"Dont bother with word plausibility");
extern BOOL_VAR_H (tessedit_fix_hyphens, TRUE, "Crunch double hyphens?");
extern BOOL_VAR_H (tessedit_reject_fullstops, FALSE, "Reject all fullstops");
extern BOOL_VAR_H (tessedit_reject_suspect_fullstops, FALSE,
"Reject suspect fullstops");
extern BOOL_VAR_H (tessedit_redo_xheight, TRUE, "Check/Correct x-height");
extern BOOL_VAR_H (tessedit_cluster_adaption_on, TRUE,
"Do our own adaption - ems only");
extern BOOL_VAR_H (tessedit_enable_doc_dict, TRUE,
"Add words to the document dictionary");
extern BOOL_VAR_H (word_occ_first, FALSE, "Do word occ before re-est xht");
extern BOOL_VAR_H (tessedit_xht_fiddles_on_done_wds, TRUE,
"Apply xht fix up even if done");
extern BOOL_VAR_H (tessedit_xht_fiddles_on_no_rej_wds, TRUE,
"Apply xht fix up even in no rejects");
extern INT_VAR_H (x_ht_check_word_occ, 2, "Check Char Block occupancy");
extern INT_VAR_H (x_ht_stringency, 1, "How many confirmed a/n to accept?");
extern BOOL_VAR_H (x_ht_quality_check, TRUE, "Dont allow worse quality");
extern BOOL_VAR_H (tessedit_debug_block_rejection, FALSE,
"Block and Row stats");
extern INT_VAR_H (debug_x_ht_level, 0, "Reestimate debug");
extern BOOL_VAR_H (rej_use_xht, TRUE, "Individual rejection control");
extern BOOL_VAR_H (debug_acceptable_wds, FALSE, "Dump word pass/fail chk");
extern STRING_VAR_H (chs_leading_punct, "('`\"", "Leading punctuation");
extern
STRING_VAR_H (chs_trailing_punct1, ").,;:?!", "1st Trailing punctuation");
extern STRING_VAR_H (chs_trailing_punct2, ")'`\"",
"2nd Trailing punctuation");
extern double_VAR_H (quality_rej_pc, 0.08,
"good_quality_doc lte rejection limit");
extern double_VAR_H (quality_blob_pc, 0.0,
"good_quality_doc gte good blobs limit");
extern double_VAR_H (quality_outline_pc, 1.0,
"good_quality_doc lte outline error limit");
extern double_VAR_H (quality_char_pc, 0.95,
"good_quality_doc gte good char limit");
extern INT_VAR_H (quality_min_initial_alphas_reqd, 2,
"alphas in a good word");
extern BOOL_VAR_H (tessedit_tess_adapt_to_rejmap, FALSE,
"Use reject map to control Tesseract adaption");
extern INT_VAR_H (tessedit_tess_adaption_mode, 3,
"Adaptation decision algorithm for tess");
extern INT_VAR_H (tessedit_em_adaption_mode, 62,
"Adaptation decision algorithm for ems matrix matcher");
extern BOOL_VAR_H (tessedit_cluster_adapt_after_pass1, FALSE,
"Adapt using clusterer after pass 1");
extern BOOL_VAR_H (tessedit_cluster_adapt_after_pass2, FALSE,
"Adapt using clusterer after pass 1");
extern BOOL_VAR_H (tessedit_cluster_adapt_after_pass3, FALSE,
"Adapt using clusterer after pass 1");
extern BOOL_VAR_H (tessedit_cluster_adapt_before_pass1, FALSE,
"Adapt using clusterer before Tess adaping during pass 1");
extern INT_VAR_H (tessedit_cluster_adaption_mode, 0,
"Adaptation decision algorithm for matrix matcher");
extern BOOL_VAR_H (tessedit_adaption_debug, FALSE,
"Generate and print debug information for adaption");
extern BOOL_VAR_H (tessedit_minimal_rej_pass1, FALSE,
"Do minimal rejection on pass 1 output");
extern BOOL_VAR_H (tessedit_test_adaption, FALSE,
"Test adaption criteria");
extern BOOL_VAR_H (tessedit_global_adaption, FALSE,
"Adapt to all docs over time");
extern BOOL_VAR_H (tessedit_matcher_log, FALSE, "Log matcher activity");
extern INT_VAR_H (tessedit_test_adaption_mode, 3,
"Adaptation decision algorithm for tess");
extern BOOL_VAR_H (test_pt, FALSE, "Test for point");
extern double_VAR_H (test_pt_x, 99999.99, "xcoord");
extern double_VAR_H (test_pt_y, 99999.99, "ycoord");
void recog_pseudo_word( //recognize blobs
BLOCK_LIST *block_list, //blocks to check
BOX &selection_box);
BOOL8 recog_interactive( //recognize blobs
BLOCK *, //block
ROW *row, //row of word
WERD *word //word to recognize
);
void recog_all_words( //process words
PAGE_RES *page_res, //page structure
volatile ETEXT_DESC *monitor //progress monitor
);
void classify_word_pass1( //recog one word
WERD_RES *word, //word to do
ROW *row,
BOOL8 cluster_adapt,
CHAR_SAMPLES_LIST *char_clusters,
CHAR_SAMPLE_LIST *chars_waiting);
//word to do
void classify_word_pass2(WERD_RES *word, ROW *row);
void match_word_pass2( //recog one word
WERD_RES *word, //word to do
ROW *row,
float x_height);
void fix_rep_char( //Repeated char word
WERD_RES *word //word to do
);
void fix_quotes( //make double quotes
char *string, //string to fix
WERD *word, //word to do //char choices
BLOB_CHOICE_LIST_CLIST *blob_choices);
void fix_hyphens( //crunch double hyphens
char *string, //string to fix
WERD *word, //word to do //char choices
BLOB_CHOICE_LIST_CLIST *blob_choices);
void merge_blobs( //combine 2 blobs
PBLOB *blob1, //dest blob
PBLOB *blob2 //source blob
);
void choice_dump_tester( //dump chars in word
PBLOB *, //blob
DENORM *, //de-normaliser
BOOL8 correct, //ly segmented
char *text, //correct text
INT32 count, //chars in text
BLOB_CHOICE_LIST *ratings //list of results
);
WERD *make_bln_copy(WERD *src_word, ROW *row, float x_height, DENORM *denorm);
ACCEPTABLE_WERD_TYPE acceptable_word_string(const char *s);
BOOL8 check_debug_pt(WERD_RES *word, int location);
void set_word_fonts( //good chars in word
WERD_RES *word, //word to adapt to //detailed results
BLOB_CHOICE_LIST_CLIST *blob_choices);
void font_recognition_pass( //good chars in word
PAGE_RES_IT &page_res_it);
void add_in_one_row( //good chars in word
ROW_RES *row, //current row
STATS *fonts, //font stats
INT8 *italic, //output count
INT8 *bold //output count
);
void find_modal_font( //good chars in word
STATS *fonts, //font stats
INT8 *font_out, //output font
INT8 *font_count //output count
);
#endif

1453
ccmain/docqual.cpp Normal file

File diff suppressed because it is too large Load Diff

155
ccmain/docqual.h Normal file
View File

@ -0,0 +1,155 @@
/******************************************************************
* File: docqual.h (Formerly docqual.h)
* Description: Document Quality Metrics
* Author: Phil Cheatle
* Created: Mon May 9 11:27:28 BST 1994
*
* (C) Copyright 1994, Hewlett-Packard Ltd.
** Licensed under the Apache License, Version 2.0 (the "License");
** you may not use this file except in compliance with the License.
** You may obtain a copy of the License at
** http://www.apache.org/licenses/LICENSE-2.0
** Unless required by applicable law or agreed to in writing, software
** distributed under the License is distributed on an "AS IS" BASIS,
** WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
** See the License for the specific language governing permissions and
** limitations under the License.
*
**********************************************************************/
#ifndef DOCQUAL_H
#define DOCQUAL_H
#include "control.h"
#include "notdll.h"
enum GARBAGE_LEVEL
{
G_NEVER_CRUNCH,
G_OK,
G_DODGY,
G_TERRIBLE
};
extern STRING_VAR_H (outlines_odd, "%| ", "Non standard number of outlines");
extern STRING_VAR_H (outlines_2, "ij!?%\":;",
"Non standard number of outlines");
extern BOOL_VAR_H (docqual_excuse_outline_errs, FALSE,
"Allow outline errs in unrejection?");
extern BOOL_VAR_H (tessedit_good_quality_unrej, TRUE,
"Reduce rejection on good docs");
extern BOOL_VAR_H (tessedit_use_reject_spaces, TRUE, "Reject spaces?");
extern double_VAR_H (tessedit_reject_doc_percent, 65.00,
"%rej allowed before rej whole doc");
extern double_VAR_H (tessedit_reject_block_percent, 45.00,
"%rej allowed before rej whole block");
extern double_VAR_H (tessedit_reject_row_percent, 40.00,
"%rej allowed before rej whole row");
extern double_VAR_H (tessedit_whole_wd_rej_row_percent, 70.00,
"%of row rejects in whole word rejects which prevents whole row rejection");
extern BOOL_VAR_H (tessedit_preserve_blk_rej_perfect_wds, TRUE,
"Only rej partially rejected words in block rejection");
extern BOOL_VAR_H (tessedit_preserve_row_rej_perfect_wds, TRUE,
"Only rej partially rejected words in row rejection");
extern BOOL_VAR_H (tessedit_dont_blkrej_good_wds, FALSE,
"Use word segmentation quality metric");
extern BOOL_VAR_H (tessedit_dont_rowrej_good_wds, FALSE,
"Use word segmentation quality metric");
extern INT_VAR_H (tessedit_preserve_min_wd_len, 2,
"Only preserve wds longer than this");
extern BOOL_VAR_H (tessedit_row_rej_good_docs, TRUE,
"Apply row rejection to good docs");
extern double_VAR_H (tessedit_good_doc_still_rowrej_wd, 1.1,
"rej good doc wd if more than this fraction rejected");
extern BOOL_VAR_H (tessedit_reject_bad_qual_wds, TRUE,
"Reject all bad quality wds");
extern BOOL_VAR_H (tessedit_debug_doc_rejection, FALSE, "Page stats");
extern BOOL_VAR_H (tessedit_debug_quality_metrics, FALSE,
"Output data to debug file");
extern BOOL_VAR_H (bland_unrej, FALSE, "unrej potential with no chekcs");
extern double_VAR_H (quality_rowrej_pc, 1.1,
"good_quality_doc gte good char limit");
extern BOOL_VAR_H (unlv_tilde_crunching, TRUE,
"Mark v.bad words for tilde crunch");
extern BOOL_VAR_H (crunch_early_merge_tess_fails, TRUE,
"Before word crunch?");
extern BOOL_VAR_H (crunch_early_convert_bad_unlv_chs, FALSE,
"Take out ~^ early?");
extern double_VAR_H (crunch_terrible_rating, 80.0, "crunch rating lt this");
extern BOOL_VAR_H (crunch_terrible_garbage, TRUE, "As it says");
extern double_VAR_H (crunch_poor_garbage_cert, -9.0,
"crunch garbage cert lt this");
extern double_VAR_H (crunch_poor_garbage_rate, 60,
"crunch garbage rating lt this");
extern double_VAR_H (crunch_pot_poor_rate, 40,
"POTENTIAL crunch rating lt this");
extern double_VAR_H (crunch_pot_poor_cert, -8.0,
"POTENTIAL crunch cert lt this");
extern BOOL_VAR_H (crunch_pot_garbage, TRUE, "POTENTIAL crunch garbage");
extern double_VAR_H (crunch_del_rating, 60,
"POTENTIAL crunch rating lt this");
extern double_VAR_H (crunch_del_cert, -10.0, "POTENTIAL crunch cert lt this");
extern double_VAR_H (crunch_del_min_ht, 0.7, "Del if word ht lt xht x this");
extern double_VAR_H (crunch_del_max_ht, 3.0, "Del if word ht gt xht x this");
extern double_VAR_H (crunch_del_min_width, 3.0,
"Del if word width lt xht x this");
extern double_VAR_H (crunch_del_high_word, 1.5,
"Del if word gt xht x this above bl");
extern double_VAR_H (crunch_del_low_word, 0.5,
"Del if word gt xht x this below bl");
extern double_VAR_H (crunch_small_outlines_size, 0.6,
"Small if lt xht x this");
extern INT_VAR_H (crunch_rating_max, 10, "For adj length in rating per ch");
extern INT_VAR_H (crunch_pot_indicators, 1,
"How many potential indicators needed");
extern BOOL_VAR_H (crunch_leave_ok_strings, TRUE,
"Dont touch sensible strings");
extern BOOL_VAR_H (crunch_accept_ok, TRUE, "Use acceptability in okstring");
extern BOOL_VAR_H (crunch_leave_accept_strings, FALSE,
"Dont pot crunch sensible strings");
extern BOOL_VAR_H (crunch_include_numerals, FALSE, "Fiddle alpha figures");
extern INT_VAR_H (crunch_leave_lc_strings, 4,
"Dont crunch words with long lower case strings");
extern INT_VAR_H (crunch_leave_uc_strings, 4,
"Dont crunch words with long lower case strings");
extern INT_VAR_H (crunch_long_repetitions, 3,
"Crunch words with long repetitions");
extern INT_VAR_H (crunch_debug, 0, "As it says");
INT16 word_blob_quality( //Blob seg changes
WERD_RES *word,
ROW *row);
BOOL8 crude_match_blobs(PBLOB *blob1, PBLOB *blob2);
INT16 word_outline_errs( //Outline count errs
WERD_RES *word);
void word_char_quality( //Blob seg changes
WERD_RES *word,
ROW *row,
INT16 *match_count,
INT16 *accepted_match_count);
void unrej_good_chs(WERD_RES *word, ROW *row);
void print_boxes(WERD *word);
INT16 count_outline_errs(char c, INT16 outline_count);
void quality_based_rejection(PAGE_RES_IT &page_res_it, BOOL8 good_quality_doc);
void unrej_good_quality_words( //unreject potential
PAGE_RES_IT &page_res_it);
void doc_and_block_rejection( //reject big chunks
PAGE_RES_IT &page_res_it,
BOOL8 good_quality_doc);
void reject_whole_page(PAGE_RES_IT &page_res_it);
void tilde_crunch(PAGE_RES_IT &page_res_it);
BOOL8 terrible_word_crunch(WERD_RES *word, GARBAGE_LEVEL garbage_level);
BOOL8 potential_word_crunch(WERD_RES *word,
GARBAGE_LEVEL garbage_level,
BOOL8 ok_dict_word);
void tilde_delete(PAGE_RES_IT &page_res_it);
//word to do
void convert_bad_unlv_chs(WERD_RES *word_res);
//word to do
void merge_tess_fails(WERD_RES *word_res);
GARBAGE_LEVEL garbage_word(WERD_RES *word, BOOL8 ok_dict_word);
CRUNCH_MODE word_deletable(WERD_RES *word, INT16 &delete_mode);
INT16 failure_count(WERD_RES *word);
BOOL8 noise_outlines(WERD *word);
//word to do
void insert_rej_cblobs(WERD_RES *word);
#endif

82
ccmain/expandblob.cpp Normal file
View File

@ -0,0 +1,82 @@
/**************************************************************************
* Revision 5.1 89/07/27 11:46:53 11:46:53 ray ()
* (C) Copyright 1989, Hewlett-Packard Ltd.
** Licensed under the Apache License, Version 2.0 (the "License");
** you may not use this file except in compliance with the License.
** You may obtain a copy of the License at
** http://www.apache.org/licenses/LICENSE-2.0
** Unless required by applicable law or agreed to in writing, software
** distributed under the License is distributed on an "AS IS" BASIS,
** WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
** See the License for the specific language governing permissions and
** limitations under the License.
*
**************************************************************************/
#include "mfcpch.h"
#include "expandblob.h"
#include "tessclas.h"
#include "const.h"
#include "structures.h"
#include "freelist.h"
/***********************************************************************
free_blob(blob) frees the blob and everything it is connected to,
i.e. outlines, nodes, edgepts, bytevecs, ratings etc
*************************************************************************/
void free_blob( /*blob to free */
register TBLOB *blob) {
if (blob == NULL)
return; /*duff blob */
free_tree (blob->outlines); /*do the tree of outlines */
oldblob(blob); /*free the actual blob */
}
/***************************************************************************
free_tree(outline) frees the current outline
and then its sub-tree
*****************************************************************************/
void free_tree( /*outline to draw */
register TESSLINE *outline) {
if (outline == NULL)
return; /*duff outline */
if (outline->next != NULL)
free_tree (outline->next);
if (outline->child != NULL)
free_tree (outline->child); /*and sub-tree */
free_outline(outline); /*free the outline */
}
/*******************************************************************************
free_outline(outline) frees an outline and anything connected to it
*********************************************************************************/
void free_outline( /*outline to free */
register TESSLINE *outline) {
if (outline->compactloop != NULL)
/*no compact loop */
memfree (outline->compactloop);
if (outline->loop != NULL)
free_loop (outline->loop);
oldoutline(outline);
}
/*********************************************************************************
free_loop(startpt) frees all the elements of the closed loop
starting at startpt
***********************************************************************************/
void free_loop( /*outline to free */
register EDGEPT *startpt) {
register EDGEPT *edgept; /*current point */
if (startpt == NULL)
return;
edgept = startpt;
do {
edgept = oldedgept (edgept); /*free it and move on */
}
while (edgept != startpt);
}

13
ccmain/expandblob.h Normal file
View File

@ -0,0 +1,13 @@
#ifndef EXPANDBLOB_H
#define EXPANDBLOB_H
#include "tessclas.h"
void free_blob(register TBLOB *blob);
void free_tree(register TESSLINE *outline);
void free_outline(register TESSLINE *outline);
void free_loop(register EDGEPT *startpt);
#endif

974
ccmain/fixspace.cpp Normal file
View File

@ -0,0 +1,974 @@
/******************************************************************
* File: fixspace.cpp (Formerly fixspace.c)
* Description: Implements a pass over the page res, exploring the alternative
* spacing possibilities, trying to use context to improve the
word spacing
* Author: Phil Cheatle
* Created: Thu Oct 21 11:38:43 BST 1993
*
* (C) Copyright 1993, Hewlett-Packard Ltd.
** Licensed under the Apache License, Version 2.0 (the "License");
** you may not use this file except in compliance with the License.
** You may obtain a copy of the License at
** http://www.apache.org/licenses/LICENSE-2.0
** Unless required by applicable law or agreed to in writing, software
** distributed under the License is distributed on an "AS IS" BASIS,
** WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
** See the License for the specific language governing permissions and
** limitations under the License.
*
**********************************************************************/
#include "mfcpch.h"
#include <ctype.h>
#include "reject.h"
#include "statistc.h"
#include "genblob.h"
#include "control.h"
#include "fixspace.h"
#include "tessvars.h"
#include "tessbox.h"
#include "secname.h"
#define EXTERN
EXTERN BOOL_VAR (fixsp_check_for_fp_noise_space, TRUE,
"Try turning noise to space in fixed pitch");
EXTERN BOOL_VAR (fixsp_fp_eval, TRUE, "Use alternate evaluation for fp");
EXTERN BOOL_VAR (fixsp_noise_score_fixing, TRUE, "More sophisticated?");
EXTERN INT_VAR (fixsp_non_noise_limit, 1,
"How many non-noise blbs either side?");
EXTERN double_VAR (fixsp_small_outlines_size, 0.28, "Small if lt xht x this");
EXTERN BOOL_VAR (fixsp_ignore_punct, TRUE, "In uniform spacing calc");
EXTERN BOOL_VAR (fixsp_numeric_fix, TRUE, "Try to deal with numeric punct");
EXTERN BOOL_VAR (fixsp_prefer_joined_1s, TRUE, "Arbitrary boost");
EXTERN BOOL_VAR (tessedit_test_uniform_wd_spacing, FALSE,
"Limit context word spacing");
EXTERN BOOL_VAR (tessedit_prefer_joined_punct, FALSE,
"Reward punctation joins");
EXTERN INT_VAR (fixsp_done_mode, 1, "What constitues done for spacing");
EXTERN INT_VAR (debug_fix_space_level, 0, "Contextual fixspace debug");
EXTERN STRING_VAR (numeric_punctuation, ".,",
"Punct. chs expected WITHIN numbers");
#define PERFECT_WERDS 999
#define MAXSPACING 128 /*max expected spacing in pix */
/*************************************************************************
* fix_fuzzy_spaces()
* Walk over the page finding sequences of words joined by fuzzy spaces. Extract
* them as a sublist, process the sublist to find the optimal arrangement of
* spaces then replace the sublist in the ROW_RES.
*************************************************************************/
void fix_fuzzy_spaces( //find fuzzy words
volatile ETEXT_DESC *monitor, //progress monitor
INT32 word_count, //count of words in doc
PAGE_RES *page_res) {
BLOCK_RES_IT block_res_it; //iterators
ROW_RES_IT row_res_it;
WERD_RES_IT word_res_it_from;
WERD_RES_IT word_res_it_to;
WERD_RES *word_res;
WERD_RES_LIST fuzzy_space_words;
INT16 new_length;
BOOL8 prevent_null_wd_fixsp; //DONT process blobless wds
INT32 word_index; //current word
block_res_it.set_to_list (&page_res->block_res_list);
word_index = 0;
for (block_res_it.mark_cycle_pt ();
!block_res_it.cycled_list (); block_res_it.forward ()) {
row_res_it.set_to_list (&block_res_it.data ()->row_res_list);
for (row_res_it.mark_cycle_pt ();
!row_res_it.cycled_list (); row_res_it.forward ()) {
word_res_it_from.set_to_list (&row_res_it.data ()->word_res_list);
while (!word_res_it_from.at_last ()) {
word_res = word_res_it_from.data ();
while (!word_res_it_from.at_last () &&
!(word_res->combination ||
word_res_it_from.data_relative (1)->
word->flag (W_FUZZY_NON) ||
word_res_it_from.data_relative (1)->
word->flag (W_FUZZY_SP))) {
fix_sp_fp_word (word_res_it_from, row_res_it.data ()->row);
word_res = word_res_it_from.forward ();
word_index++;
if (monitor != NULL) {
monitor->ocr_alive = TRUE;
monitor->progress = 90 + 5 * word_index / word_count;
}
}
if (!word_res_it_from.at_last ()) {
word_res_it_to = word_res_it_from;
prevent_null_wd_fixsp =
word_res->word->gblob_list ()->empty ();
if (check_debug_pt (word_res, 60))
debug_fix_space_level.set_value (10);
word_res_it_to.forward ();
word_index++;
if (monitor != NULL) {
monitor->ocr_alive = TRUE;
monitor->progress = 90 + 5 * word_index / word_count;
}
while (!word_res_it_to.at_last () &&
(word_res_it_to.data_relative (1)->
word->flag (W_FUZZY_NON) ||
word_res_it_to.data_relative (1)->
word->flag (W_FUZZY_SP))) {
if (check_debug_pt (word_res, 60))
debug_fix_space_level.set_value (10);
if (word_res->word->gblob_list ()->empty ())
prevent_null_wd_fixsp = TRUE;
word_res = word_res_it_to.forward ();
}
if (check_debug_pt (word_res, 60))
debug_fix_space_level.set_value (10);
if (word_res->word->gblob_list ()->empty ())
prevent_null_wd_fixsp = TRUE;
if (prevent_null_wd_fixsp)
word_res_it_from = word_res_it_to;
else {
fuzzy_space_words.assign_to_sublist (&word_res_it_from,
&word_res_it_to);
fix_fuzzy_space_list (fuzzy_space_words,
row_res_it.data ()->row);
new_length = fuzzy_space_words.length ();
word_res_it_from.add_list_before (&fuzzy_space_words);
for (;
(!word_res_it_from.at_last () &&
(new_length > 0)); new_length--) {
word_res_it_from.forward ();
}
}
if (test_pt)
debug_fix_space_level.set_value (0);
}
fix_sp_fp_word (word_res_it_from, row_res_it.data ()->row);
//Last word in row
}
}
}
}
void fix_fuzzy_space_list( //space explorer
WERD_RES_LIST &best_perm,
ROW *row) {
INT16 best_score;
WERD_RES_LIST current_perm;
INT16 current_score;
BOOL8 improved = FALSE;
//default score
best_score = eval_word_spacing (best_perm);
dump_words (best_perm, best_score, 1, improved);
if (best_score != PERFECT_WERDS)
initialise_search(best_perm, current_perm);
while ((best_score != PERFECT_WERDS) && !current_perm.empty ()) {
match_current_words(current_perm, row);
current_score = eval_word_spacing (current_perm);
dump_words (current_perm, current_score, 2, improved);
if (current_score > best_score) {
best_perm.clear ();
best_perm.deep_copy (&current_perm);
best_score = current_score;
improved = TRUE;
}
if (current_score < PERFECT_WERDS)
transform_to_next_perm(current_perm);
}
dump_words (best_perm, best_score, 3, improved);
}
void initialise_search(WERD_RES_LIST &src_list, WERD_RES_LIST &new_list) {
WERD_RES_IT src_it(&src_list);
WERD_RES_IT new_it(&new_list);
WERD_RES *src_wd;
WERD_RES *new_wd;
for (src_it.mark_cycle_pt (); !src_it.cycled_list (); src_it.forward ()) {
src_wd = src_it.data ();
if (!src_wd->combination) {
new_wd = new WERD_RES (*src_wd);
new_wd->combination = FALSE;
new_wd->part_of_combo = FALSE;
new_it.add_after_then_move (new_wd);
}
}
}
void match_current_words(WERD_RES_LIST &words, ROW *row) {
WERD_RES_IT word_it(&words);
WERD_RES *word;
for (word_it.mark_cycle_pt (); !word_it.cycled_list (); word_it.forward ()) {
word = word_it.data ();
if ((!word->part_of_combo) && (word->outword == NULL))
classify_word_pass2(word, row);
}
}
/*************************************************************************
* eval_word_spacing()
* The basic measure is the number of characters in contextually confirmed
* words. (I.e the word is done)
* If all words are contextually confirmed the evaluation is deemed perfect.
*
* Some fiddles are done to handle "1"s as these are VERY frequent causes of
* fuzzy spaces. The problem with the basic measure is that "561 63" would score
* the same as "56163", though given our knowledge that the space is fuzzy, and
* that there is a "1" next to the fuzzy space, we need to ensure that "56163"
* is prefered.
*
* The solution is to NOT COUNT the score of any word which has a digit at one
* end and a "1Il" as the character the other side of the space.
*
* Conversly, any character next to a "1" within a word is counted as a positive
* score. Thus "561 63" would score 4 (3 chars in a numeric word plus 1 side of
* the "1" joined). "56163" would score 7 - all chars in a numeric word + 2
* sides of a "1" joined.
*
* The joined 1 rule is applied to any word REGARDLESS of contextual
* confirmation. Thus "PS7a71 3/7a" scores 1 (neither word is contexutally
* confirmed. The only score is from the joined 1. "PS7a713/7a" scores 2.
*
*************************************************************************/
INT16 eval_word_spacing(WERD_RES_LIST &word_res_list) {
WERD_RES_IT word_res_it(&word_res_list);
INT16 total_score = 0;
INT16 word_count = 0;
INT16 done_word_count = 0;
INT16 word_len;
INT16 i;
WERD_RES *word; //current word
INT16 prev_word_score = 0;
BOOL8 prev_word_done = FALSE;
BOOL8 prev_char_1 = FALSE; //prev ch a "1/I/l"?
BOOL8 prev_char_digit = FALSE; //prev ch 2..9 or 0
BOOL8 current_char_1 = FALSE;
BOOL8 current_word_ok_so_far;
STRING punct_chars = "!\"`',.:;";
BOOL8 prev_char_punct = FALSE;
BOOL8 current_char_punct = FALSE;
BOOL8 word_done = FALSE;
do {
word = word_res_it.data ();
word_done = fixspace_thinks_word_done (word);
word_count++;
if (word->tess_failed) {
total_score += prev_word_score;
if (prev_word_done)
done_word_count++;
prev_word_score = 0;
prev_char_1 = FALSE;
prev_char_digit = FALSE;
prev_word_done = FALSE;
}
else {
/*
Can we add the prev word score and potentially count this word?
Yes IF it didnt end in a 1 when the first char of this word is a digit
AND it didnt end in a digit when the first char of this word is a 1
*/
word_len = word->reject_map.length ();
current_word_ok_so_far = FALSE;
if (!((prev_char_1 &&
digit_or_numeric_punct (word,
word->best_choice->string ()[0])) ||
(prev_char_digit &&
((word_done &&
(word->best_choice->string ()[0] == '1')) ||
(!word_done &&
STRING (conflict_set_I_l_1).contains (word->best_choice->
string ()[0])))))) {
total_score += prev_word_score;
if (prev_word_done)
done_word_count++;
current_word_ok_so_far = word_done;
}
if ((current_word_ok_so_far) &&
(!tessedit_test_uniform_wd_spacing ||
((word->best_choice->permuter () == NUMBER_PERM) ||
uniformly_spaced (word)))) {
prev_word_done = TRUE;
prev_word_score = word_len;
}
else {
prev_word_done = FALSE;
prev_word_score = 0;
}
if (fixsp_prefer_joined_1s) {
/* Add 1 to total score for every joined 1 regardless of context and rejtn */
for (i = 0, prev_char_1 = FALSE; i < word_len; i++) {
current_char_1 = word->best_choice->string ()[i] == '1';
if (prev_char_1 || (current_char_1 && (i > 0)))
total_score++;
prev_char_1 = current_char_1;
}
}
/* Add 1 to total score for every joined punctuation regardless of context
and rejtn */
if (tessedit_prefer_joined_punct) {
for (i = 0, prev_char_punct = FALSE; i < word_len; i++) {
current_char_punct =
punct_chars.contains (word->best_choice->string ()[i]);
if (prev_char_punct || (current_char_punct && (i > 0)))
total_score++;
prev_char_punct = current_char_punct;
}
}
prev_char_digit = digit_or_numeric_punct (word,
word->best_choice->
string ()[word_len - 1]);
prev_char_1 =
((word_done
&& (word->best_choice->string ()[word_len - 1] == '1'))
|| (!word_done
&& STRING (conflict_set_I_l_1).contains (word->best_choice->
string ()[word_len -
1])));
}
/* Find next word */
do
word_res_it.forward ();
while (word_res_it.data ()->part_of_combo);
}
while (!word_res_it.at_first ());
total_score += prev_word_score;
if (prev_word_done)
done_word_count++;
if (done_word_count == word_count)
return PERFECT_WERDS;
else
return total_score;
}
BOOL8 digit_or_numeric_punct(WERD_RES *word, char ch) {
return (isdigit (ch) ||
(fixsp_numeric_fix &&
(word->best_choice->permuter () == NUMBER_PERM) &&
STRING (numeric_punctuation).contains (ch)));
}
/*************************************************************************
* transform_to_next_perm()
* Examines the current word list to find the smallest word gap size. Then walks
* the word list closing any gaps of this size by either inserted new
* combination words, or extending existing ones.
*
* The routine COULD be limited to stop it building words longer than N blobs.
*
* If there are no more gaps then it DELETES the entire list and returns the
* empty list to cause termination.
*************************************************************************/
void transform_to_next_perm(WERD_RES_LIST &words) {
WERD_RES_IT word_it(&words);
WERD_RES_IT prev_word_it(&words);
WERD_RES *word;
WERD_RES *prev_word;
WERD_RES *combo;
WERD *copy_word;
INT16 prev_right = -1;
BOX box;
INT16 gap;
INT16 min_gap = MAX_INT16;
for (word_it.mark_cycle_pt (); !word_it.cycled_list (); word_it.forward ()) {
word = word_it.data ();
if (!word->part_of_combo) {
box = word->word->bounding_box ();
if (prev_right >= 0) {
gap = box.left () - prev_right;
if (gap < min_gap)
min_gap = gap;
}
prev_right = box.right ();
}
}
if (min_gap < MAX_INT16) {
prev_right = -1; //back to start
word_it.set_to_list (&words);
for (; //cant use cycle pt due to inserted combos at start of list
(prev_right < 0) || !word_it.at_first (); word_it.forward ()) {
word = word_it.data ();
if (!word->part_of_combo) {
box = word->word->bounding_box ();
if (prev_right >= 0) {
gap = box.left () - prev_right;
if (gap <= min_gap) {
prev_word = prev_word_it.data ();
if (prev_word->combination)
combo = prev_word;
else {
/* Make a new combination and insert before the first word being joined */
copy_word = new WERD;
*copy_word = *(prev_word->word);
//deep copy
combo = new WERD_RES (copy_word);
combo->combination = TRUE;
prev_word->part_of_combo = TRUE;
prev_word_it.add_before_then_move (combo);
}
combo->word->set_flag (W_EOL, word->word->flag (W_EOL));
if (word->combination) {
combo->word->join_on (word->word);
//Move blbs to combo
//old combo no longer needed
delete word_it.extract ();
}
else {
//Cpy current wd to combo
combo->copy_on (word);
word->part_of_combo = TRUE;
}
combo->done = FALSE;
if (combo->outword != NULL) {
delete combo->outword;
delete combo->best_choice;
delete combo->raw_choice;
combo->outword = NULL;
combo->best_choice = NULL;
combo->raw_choice = NULL;
}
}
else
//catch up
prev_word_it = word_it;
}
prev_right = box.right ();
}
}
}
else
words.clear (); //signal termination
}
void dump_words(WERD_RES_LIST &perm, INT16 score, INT16 mode, BOOL8 improved) {
WERD_RES_IT word_res_it(&perm);
static STRING initial_str;
if (debug_fix_space_level > 0) {
if (mode == 1) {
initial_str = "";
for (word_res_it.mark_cycle_pt ();
!word_res_it.cycled_list (); word_res_it.forward ()) {
if (!word_res_it.data ()->part_of_combo) {
initial_str += word_res_it.data ()->best_choice->string ();
initial_str += ' ';
}
}
}
#ifndef SECURE_NAMES
if (debug_fix_space_level > 1) {
switch (mode) {
case 1:
tprintf ("EXTRACTED (%d): \"", score);
break;
case 2:
tprintf ("TESTED (%d): \"", score);
break;
case 3:
tprintf ("RETURNED (%d): \"", score);
break;
}
for (word_res_it.mark_cycle_pt ();
!word_res_it.cycled_list (); word_res_it.forward ()) {
if (!word_res_it.data ()->part_of_combo)
tprintf ("%s/%1d ",
word_res_it.data ()->best_choice->string ().
string (),
(int) word_res_it.data ()->best_choice->permuter ());
}
tprintf ("\"\n");
}
else if (improved) {
tprintf ("FIX SPACING \"%s\" => \"", initial_str.string ());
for (word_res_it.mark_cycle_pt ();
!word_res_it.cycled_list (); word_res_it.forward ()) {
if (!word_res_it.data ()->part_of_combo)
tprintf ("%s/%1d ",
word_res_it.data ()->best_choice->string ().
string (),
(int) word_res_it.data ()->best_choice->permuter ());
}
tprintf ("\"\n");
}
#endif
}
}
/*************************************************************************
* uniformly_spaced()
* Return true if one of the following are true:
* - All inter-char gaps are the same width
* - The largest gap is no larger than twice the mean/median of the others
* - The largest gap is < 64/5 = 13 and all others are <= 0
* **** REMEMBER - WE'RE NOW WORKING WITH A BLN WERD !!!
*************************************************************************/
BOOL8 uniformly_spaced( //sensible word
WERD_RES *word) {
PBLOB_IT blob_it;
BOX box;
INT16 prev_right = -MAX_INT16;
INT16 gap;
INT16 max_gap = -MAX_INT16;
INT16 max_gap_count = 0;
STATS gap_stats (0, MAXSPACING);
BOOL8 result;
const ROW *row = word->denorm.row ();
float max_non_space;
float normalised_max_nonspace;
INT16 i = 0;
STRING punct_chars = "\"`',.:;";
blob_it.set_to_list (word->outword->blob_list ());
for (blob_it.mark_cycle_pt (); !blob_it.cycled_list (); blob_it.forward ()) {
box = blob_it.data ()->bounding_box ();
if ((prev_right > -MAX_INT16) &&
(!fixsp_ignore_punct ||
(!punct_chars.contains (word->best_choice->string ()[i - 1]) &&
!punct_chars.contains (word->best_choice->string ()[i])))) {
gap = box.left () - prev_right;
if (gap < max_gap)
gap_stats.add (gap, 1);
else if (gap == max_gap)
max_gap_count++;
else {
if (max_gap_count > 0)
gap_stats.add (max_gap, max_gap_count);
max_gap = gap;
max_gap_count = 1;
}
}
prev_right = box.right ();
i++;
}
max_non_space = (row->space () + 3 * row->kern ()) / 4;
normalised_max_nonspace = max_non_space * bln_x_height / row->x_height ();
result = ((gap_stats.get_total () == 0) ||
(max_gap <= normalised_max_nonspace) ||
((gap_stats.get_total () > 2) &&
(max_gap <= 2 * gap_stats.median ())) ||
((gap_stats.get_total () <= 2) &&
(max_gap <= 2 * gap_stats.mean ())));
#ifndef SECURE_NAMES
if ((debug_fix_space_level > 1)) {
if (result)
tprintf
("ACCEPT SPACING FOR: \"%s\" norm_maxnon = %f max=%d maxcount=%d total=%d mean=%f median=%f\n",
word->best_choice->string ().string (), normalised_max_nonspace,
max_gap, max_gap_count, gap_stats.get_total (), gap_stats.mean (),
gap_stats.median ());
else
tprintf
("REJECT SPACING FOR: \"%s\" norm_maxnon = %f max=%d maxcount=%d total=%d mean=%f median=%f\n",
word->best_choice->string ().string (), normalised_max_nonspace,
max_gap, max_gap_count, gap_stats.get_total (), gap_stats.mean (),
gap_stats.median ());
}
#endif
return result;
}
BOOL8 fixspace_thinks_word_done(WERD_RES *word) {
if (word->done)
return TRUE;
/*
Use all the standard pass 2 conditions for mode 5 in set_done() in
reject.c BUT DONT REJECT IF THE WERD IS AMBIGUOUS - FOR SPACING WE DONT
CARE WHETHER WE HAVE of/at on/an etc.
*/
if ((fixsp_done_mode > 0) &&
(word->tess_accepted ||
((fixsp_done_mode == 2) &&
(word->reject_map.reject_count () == 0)) ||
(fixsp_done_mode == 3)) &&
(strchr (word->best_choice->string ().string (), ' ') == NULL) &&
((word->best_choice->permuter () == SYSTEM_DAWG_PERM) ||
(word->best_choice->permuter () == FREQ_DAWG_PERM) ||
(word->best_choice->permuter () == USER_DAWG_PERM) ||
(word->best_choice->permuter () == NUMBER_PERM)))
return TRUE;
else
return FALSE;
}
/*************************************************************************
* fix_sp_fp_word()
* Test the current word to see if it can be split by deleting noise blobs. If
* so, do the buisiness.
* Return with the iterator pointing to the same place if the word is unchanged,
* or the last of the replacement words.
*************************************************************************/
void fix_sp_fp_word(WERD_RES_IT &word_res_it, ROW *row) {
WERD_RES *word_res;
WERD_RES_LIST sub_word_list;
WERD_RES_IT sub_word_list_it(&sub_word_list);
INT16 blob_index;
INT16 new_length;
float junk;
word_res = word_res_it.data ();
if (!fixsp_check_for_fp_noise_space ||
word_res->word->flag (W_REP_CHAR) ||
word_res->combination ||
word_res->part_of_combo || !word_res->word->flag (W_DONT_CHOP))
return;
blob_index = worst_noise_blob (word_res, &junk);
if (blob_index < 0)
return;
#ifndef SECURE_NAMES
if (debug_fix_space_level > 1) {
tprintf ("FP fixspace working on \"%s\"\n",
word_res->best_choice->string ().string ());
}
#endif
gblob_sort_list ((PBLOB_LIST *) word_res->word->rej_cblob_list (), FALSE);
sub_word_list_it.add_after_stay_put (word_res_it.extract ());
fix_noisy_space_list(sub_word_list, row);
new_length = sub_word_list.length ();
word_res_it.add_list_before (&sub_word_list);
for (; (!word_res_it.at_last () && (new_length > 1)); new_length--) {
word_res_it.forward ();
}
}
void fix_noisy_space_list(WERD_RES_LIST &best_perm, ROW *row) {
INT16 best_score;
WERD_RES_IT best_perm_it(&best_perm);
WERD_RES_LIST current_perm;
WERD_RES_IT current_perm_it(&current_perm);
WERD_RES *old_word_res;
WERD_RES *new_word_res;
INT16 current_score;
BOOL8 improved = FALSE;
//default score
best_score = fp_eval_word_spacing (best_perm);
dump_words (best_perm, best_score, 1, improved);
new_word_res = new WERD_RES;
old_word_res = best_perm_it.data ();
//Kludge to force deep copy
old_word_res->combination = TRUE;
*new_word_res = *old_word_res; //deep copy
//Undo kludge
old_word_res->combination = FALSE;
//Undo kludge
new_word_res->combination = FALSE;
current_perm_it.add_to_end (new_word_res);
break_noisiest_blob_word(current_perm);
while ((best_score != PERFECT_WERDS) && !current_perm.empty ()) {
match_current_words(current_perm, row);
current_score = fp_eval_word_spacing (current_perm);
dump_words (current_perm, current_score, 2, improved);
if (current_score > best_score) {
best_perm.clear ();
best_perm.deep_copy (&current_perm);
best_score = current_score;
improved = TRUE;
}
if (current_score < PERFECT_WERDS)
break_noisiest_blob_word(current_perm);
}
dump_words (best_perm, best_score, 3, improved);
}
/*************************************************************************
* break_noisiest_blob_word()
* Find the word with the blob which looks like the worst noise.
* Break the word into two, deleting the noise blob.
*************************************************************************/
void break_noisiest_blob_word(WERD_RES_LIST &words) {
WERD_RES_IT word_it(&words);
WERD_RES_IT worst_word_it;
float worst_noise_score = 9999;
int worst_blob_index = -1; //noisiest blb of noisiest wd
int blob_index; //of wds noisiest blb
float noise_score; //of wds noisiest blb
WERD_RES *word_res;
C_BLOB_IT blob_it;
C_BLOB_IT rej_cblob_it;
C_BLOB_LIST new_blob_list;
C_BLOB_IT new_blob_it;
C_BLOB_IT new_rej_cblob_it;
WERD *new_word;
INT16 start_of_noise_blob;
INT16 i;
for (word_it.mark_cycle_pt (); !word_it.cycled_list (); word_it.forward ()) {
blob_index = worst_noise_blob (word_it.data (), &noise_score);
if ((blob_index > -1) && (worst_noise_score > noise_score)) {
worst_noise_score = noise_score;
worst_blob_index = blob_index;
worst_word_it = word_it;
}
}
if (worst_blob_index < 0) {
words.clear (); //signal termination
return;
}
/* Now split the worst_word_it */
word_res = worst_word_it.data ();
/* Move blobs before noise blob to a new bloblist */
new_blob_it.set_to_list (&new_blob_list);
blob_it.set_to_list (word_res->word->cblob_list ());
for (i = 0; i < worst_blob_index; i++, blob_it.forward ()) {
new_blob_it.add_after_then_move (blob_it.extract ());
}
start_of_noise_blob = blob_it.data ()->bounding_box ().left ();
delete blob_it.extract (); //throw out noise blb
new_word = new WERD (&new_blob_list, word_res->word);
new_word->set_flag (W_EOL, FALSE);
word_res->word->set_flag (W_BOL, FALSE);
word_res->word->set_blanks (1);//After break
new_rej_cblob_it.set_to_list (new_word->rej_cblob_list ());
rej_cblob_it.set_to_list (word_res->word->rej_cblob_list ());
for (;
(!rej_cblob_it.empty () &&
(rej_cblob_it.data ()->bounding_box ().left () <
start_of_noise_blob)); rej_cblob_it.forward ()) {
new_rej_cblob_it.add_after_then_move (rej_cblob_it.extract ());
}
worst_word_it.add_before_then_move (new WERD_RES (new_word));
word_res->done = FALSE;
if (word_res->outword != NULL) {
delete word_res->outword;
delete word_res->best_choice;
delete word_res->raw_choice;
word_res->outword = NULL;
word_res->best_choice = NULL;
word_res->raw_choice = NULL;
}
}
INT16 worst_noise_blob(WERD_RES *word_res, float *worst_noise_score) {
PBLOB_IT blob_it;
INT16 blob_count;
float noise_score[512];
int i;
int min_noise_blob; //1st contender
int max_noise_blob; //last contender
int non_noise_count;
int worst_noise_blob; //Worst blob
float small_limit = bln_x_height * fixsp_small_outlines_size;
float non_noise_limit = bln_x_height * 0.8;
blob_it.set_to_list (word_res->outword->blob_list ());
//normalised
blob_count = blob_it.length ();
ASSERT_HOST (blob_count <= 512);
if (blob_count < 5)
return -1; //too short to split
/* Get the noise scores for all blobs */
#ifndef SECURE_NAMES
if (debug_fix_space_level > 5)
tprintf ("FP fixspace Noise metrics for \"%s\": ",
word_res->best_choice->string ().string ());
#endif
for (i = 0; i < blob_count; i++, blob_it.forward ()) {
if (word_res->reject_map[i].accepted ())
noise_score[i] = non_noise_limit;
else
noise_score[i] = blob_noise_score (blob_it.data ());
if (debug_fix_space_level > 5)
tprintf ("%1.1f ", noise_score[i]);
}
if (debug_fix_space_level > 5)
tprintf ("\n");
/* Now find the worst one which is far enough away from the end of the word */
non_noise_count = 0;
for (i = 0;
(i < blob_count) && (non_noise_count < fixsp_non_noise_limit); i++) {
if (noise_score[i] >= non_noise_limit)
non_noise_count++;
}
if (non_noise_count < fixsp_non_noise_limit)
return -1;
min_noise_blob = i;
non_noise_count = 0;
for (i = blob_count - 1;
(i >= 0) && (non_noise_count < fixsp_non_noise_limit); i--) {
if (noise_score[i] >= non_noise_limit)
non_noise_count++;
}
if (non_noise_count < fixsp_non_noise_limit)
return -1;
max_noise_blob = i;
if (min_noise_blob > max_noise_blob)
return -1;
*worst_noise_score = small_limit;
worst_noise_blob = -1;
for (i = min_noise_blob; i <= max_noise_blob; i++) {
if (noise_score[i] < *worst_noise_score) {
worst_noise_blob = i;
*worst_noise_score = noise_score[i];
}
}
return worst_noise_blob;
}
float blob_noise_score(PBLOB *blob) {
OUTLINE_IT outline_it;
BOX box; //BB of outline
INT16 outline_count = 0;
INT16 max_dimension;
INT16 largest_outline_dimension = 0;
outline_it.set_to_list (blob->out_list ());
for (outline_it.mark_cycle_pt ();
!outline_it.cycled_list (); outline_it.forward ()) {
outline_count++;
box = outline_it.data ()->bounding_box ();
if (box.height () > box.width ())
max_dimension = box.height ();
else
max_dimension = box.width ();
if (largest_outline_dimension < max_dimension)
largest_outline_dimension = max_dimension;
}
if (fixsp_noise_score_fixing) {
if (outline_count > 5)
//penalise LOTS of blobs
largest_outline_dimension *= 2;
box = blob->bounding_box ();
if ((box.bottom () > bln_baseline_offset * 4) ||
(box.top () < bln_baseline_offset / 2))
//Lax blob is if high or low
largest_outline_dimension /= 2;
}
return largest_outline_dimension;
}
void fixspace_dbg(WERD_RES *word) {
BOX box = word->word->bounding_box ();
BOOL8 show_map_detail = FALSE;
INT16 i;
box.print ();
#ifndef SECURE_NAMES
tprintf (" \"%s\" ", word->best_choice->string ().string ());
tprintf ("Blob count: %d (word); %d/%d (outword)\n",
word->word->gblob_list ()->length (),
word->outword->gblob_list ()->length (),
word->outword->rej_blob_list ()->length ());
word->reject_map.print (debug_fp);
tprintf ("\n");
if (show_map_detail) {
tprintf ("\"%s\"\n", word->best_choice->string ().string ());
for (i = 0; word->best_choice->string ()[i] != '\0'; i++) {
tprintf ("**** \"%c\" ****\n", word->best_choice->string ()[i]);
word->reject_map[i].full_print (debug_fp);
}
}
tprintf ("Tess Accepted: %s\n", word->tess_accepted ? "TRUE" : "FALSE");
tprintf ("Done flag: %s\n\n", word->done ? "TRUE" : "FALSE");
#endif
}
/*************************************************************************
* fp_eval_word_spacing()
* Evaluation function for fixed pitch word lists.
*
* Basically, count the number of "nice" characters - those which are in tess
* acceptable words or in dict words and are not rejected.
* Penalise any potential noise chars
*************************************************************************/
INT16 fp_eval_word_spacing(WERD_RES_LIST &word_res_list) {
WERD_RES_IT word_it(&word_res_list);
WERD_RES *word;
PBLOB_IT blob_it;
INT16 word_length;
INT16 score = 0;
INT16 i;
const char *chs;
float small_limit = bln_x_height * fixsp_small_outlines_size;
if (!fixsp_fp_eval)
return (eval_word_spacing (word_res_list));
for (word_it.mark_cycle_pt (); !word_it.cycled_list (); word_it.forward ()) {
word = word_it.data ();
word_length = word->reject_map.length ();
chs = word->best_choice->string ().string ();
if ((word->done ||
word->tess_accepted) ||
(word->best_choice->permuter () == SYSTEM_DAWG_PERM) ||
(word->best_choice->permuter () == FREQ_DAWG_PERM) ||
(word->best_choice->permuter () == USER_DAWG_PERM) ||
(safe_dict_word (chs) > 0)) {
blob_it.set_to_list (word->outword->blob_list ());
for (i = 0; i < word_length; i++, blob_it.forward ()) {
if ((chs[i] == ' ') ||
(blob_noise_score (blob_it.data ()) < small_limit))
score -= 1; //penalise possibly erroneous non-space
else if (word->reject_map[i].accepted ())
score++;
}
}
}
if (score < 0)
score = 0;
return score;
}

72
ccmain/fixspace.h Normal file
View File

@ -0,0 +1,72 @@
/******************************************************************
* File: fixspace.h (Formerly fixspace.h)
* Description: Implements a pass over the page res, exploring the alternative
* spacing possibilities, trying to use context to improve the
word spacing
* Author: Phil Cheatle
* Created: Thu Oct 21 11:38:43 BST 1993
*
* (C) Copyright 1993, Hewlett-Packard Ltd.
** Licensed under the Apache License, Version 2.0 (the "License");
** you may not use this file except in compliance with the License.
** You may obtain a copy of the License at
** http://www.apache.org/licenses/LICENSE-2.0
** Unless required by applicable law or agreed to in writing, software
** distributed under the License is distributed on an "AS IS" BASIS,
** WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
** See the License for the specific language governing permissions and
** limitations under the License.
*
**********************************************************************/
#ifndef FIXSPACE_H
#define FIXSPACE_H
#include "pageres.h"
#include "varable.h"
#include "ocrclass.h"
#include "notdll.h"
extern BOOL_VAR_H (fixsp_check_for_fp_noise_space, TRUE,
"Try turning noise to space in fixed pitch");
extern BOOL_VAR_H (fixsp_fp_eval, TRUE, "Use alternate evaluation for fp");
extern BOOL_VAR_H (fixsp_noise_score_fixing, TRUE, "More sophisticated?");
extern INT_VAR_H (fixsp_non_noise_limit, 1,
"How many non-noise blbs either side?");
extern double_VAR_H (fixsp_small_outlines_size, 0.28,
"Small if lt xht x this");
extern BOOL_VAR_H (fixsp_ignore_punct, TRUE, "In uniform spacing calc");
extern BOOL_VAR_H (fixsp_numeric_fix, TRUE, "Try to deal with numeric punct");
extern BOOL_VAR_H (fixsp_prefer_joined_1s, TRUE, "Arbitrary boost");
extern BOOL_VAR_H (tessedit_test_uniform_wd_spacing, FALSE,
"Limit context word spacing");
extern BOOL_VAR_H (tessedit_prefer_joined_punct, FALSE,
"Reward punctation joins");
extern INT_VAR_H (fixsp_done_mode, 1, "What constitues done for spacing");
extern INT_VAR_H (debug_fix_space_level, 0, "Contextual fixspace debug");
extern STRING_VAR_H (numeric_punctuation, ".,",
"Punct. chs expected WITHIN numbers");
void fix_fuzzy_spaces( //find fuzzy words
volatile ETEXT_DESC *monitor, //progress monitor
INT32 word_count, //count of words in doc
PAGE_RES *page_res);
void fix_fuzzy_space_list( //space explorer
WERD_RES_LIST &best_perm,
ROW *row);
void initialise_search(WERD_RES_LIST &src_list, WERD_RES_LIST &new_list);
void match_current_words(WERD_RES_LIST &words, ROW *row);
INT16 eval_word_spacing(WERD_RES_LIST &word_res_list);
BOOL8 digit_or_numeric_punct(WERD_RES *word, char ch);
void transform_to_next_perm(WERD_RES_LIST &words);
void dump_words(WERD_RES_LIST &perm, INT16 score, INT16 mode, BOOL8 improved);
BOOL8 uniformly_spaced( //sensible word
WERD_RES *word);
BOOL8 fixspace_thinks_word_done(WERD_RES *word);
void fix_sp_fp_word(WERD_RES_IT &word_res_it, ROW *row);
void fix_noisy_space_list(WERD_RES_LIST &best_perm, ROW *row);
void break_noisiest_blob_word(WERD_RES_LIST &words);
INT16 worst_noise_blob(WERD_RES *word_res, float *worst_noise_score);
float blob_noise_score(PBLOB *blob);
void fixspace_dbg(WERD_RES *word);
INT16 fp_eval_word_spacing(WERD_RES_LIST &word_res_list);
#endif

790
ccmain/fixxht.cpp Normal file
View File

@ -0,0 +1,790 @@
/**********************************************************************
* File: fixxht.cpp (Formerly fixxht.c)
* Description: Improve x_ht and look out for case inconsistencies
* Author: Phil Cheatle
* Created: Thu Aug 5 14:11:08 BST 1993
*
* (C) Copyright 1992, Hewlett-Packard Ltd.
** Licensed under the Apache License, Version 2.0 (the "License");
** you may not use this file except in compliance with the License.
** You may obtain a copy of the License at
** http://www.apache.org/licenses/LICENSE-2.0
** Unless required by applicable law or agreed to in writing, software
** distributed under the License is distributed on an "AS IS" BASIS,
** WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
** See the License for the specific language governing permissions and
** limitations under the License.
*
**********************************************************************/
#include "mfcpch.h"
#include <string.h>
#include <ctype.h>
#include "varable.h"
#include "tessvars.h"
#include "control.h"
#include "reject.h"
#include "fixxht.h"
#include "secname.h"
#define EXTERN
EXTERN double_VAR (x_ht_fraction_of_caps_ht, 0.7,
"Fract of cps ht est of xht");
EXTERN double_VAR (x_ht_variation, 0.35,
"Err band as fract of caps/xht dist");
EXTERN double_VAR (x_ht_sub_variation, 0.5,
"Err band as fract of caps/xht dist");
EXTERN BOOL_VAR (rej_trial_ambigs, TRUE,
"reject x-ht ambigs when under trial");
EXTERN BOOL_VAR (x_ht_conservative_ambigs, FALSE,
"Dont rely on ambigs + maxht");
EXTERN BOOL_VAR (x_ht_check_est, TRUE, "Cross check estimates");
EXTERN BOOL_VAR (x_ht_case_flip, FALSE, "Flip or reject suspect case");
EXTERN BOOL_VAR (x_ht_include_dodgy_blobs, TRUE,
"Include blobs with possible noise?");
EXTERN BOOL_VAR (x_ht_limit_flip_trials, TRUE,
"Dont do trial flips when ambigs are close to xht?");
EXTERN BOOL_VAR (rej_use_check_block_occ, TRUE,
"Analyse rejection behaviour");
EXTERN STRING_VAR (chs_non_ambig_caps_ht,
"!#$%&()/12346789?ABDEFGHIKLNQRT[]\\bdfhkl",
"Reliable ascenders");
EXTERN STRING_VAR (chs_x_ht, "acegmnopqrsuvwxyz", "X height chars");
EXTERN STRING_VAR (chs_non_ambig_x_ht, "aenqr", "reliable X height chars");
EXTERN STRING_VAR (chs_ambig_caps_x, "cCmMoO05sSuUvVwWxXzZ",
"X ht or caps ht chars");
EXTERN STRING_VAR (chs_bl_ambig_caps_x, "pPyY", " Caps or descender ambigs");
/* The following arent used in this module but are used in applybox.c */
EXTERN STRING_VAR (chs_caps_ht,
"!#$%&()/0123456789?ABCDEFGHIJKLMNOPQRSTUVWXYZ[]\\bdfhkl{|}",
"Ascender chars");
EXTERN STRING_VAR (chs_desc, "gjpqy", "Descender chars");
EXTERN STRING_VAR (chs_non_ambig_bl,
"!#$%&01246789?ABCDEFGHIKLMNORSTUVWXYZabcdehiklmnorstuvwxz",
"Reliable baseline chars");
EXTERN STRING_VAR (chs_odd_top, "ijt", "Chars with funny ascender region");
EXTERN STRING_VAR (chs_odd_bot, "()35JQ[]\\/{}|", "Chars with funny base");
/* The following arent used but are defined for completeness */
EXTERN STRING_VAR (chs_bl,
"!#$%&()/01246789?ABCDEFGHIJKLMNOPRSTUVWXYZ[]\\abcdefhiklmnorstuvwxz{}",
"Baseline chars");
EXTERN STRING_VAR (chs_non_ambig_desc, "gq", "Reliable descender chars");
/*************************************************************************
* re_estimate_x_ht()
*
* Walk the blobs in the word together with the text string and reject map.
* NOTE: All evaluation is done on the baseline normalised word. This is so that
* the BOX class can be used (integer). The reasons for this are:
* a) We must use the outword - ie the Tess result
* b) The outword is always converted to integer representation as that is how
* Tess works
* c) We would like to use the BOX class, cos its there - this is integer
* precision.
* d) If we de-normed the outword we would get rounding errors and would find
* that integers are too imprecise (x-height around 15 pixels instead of a
* scale of 128 in bln form.
* CONVINCED?
*
* A) Try to re-estimatate x-ht and caps ht from confirmed pts in word.
*
* FOR each non reject blob
* IF char is baseline posn ambiguous
* Remove ambiguity by comparing its posn with respect to baseline.
* IF char is a confirmed x-ht char
* Add x-ht posn to confirmed_x_ht pts for word
* IF char is a confirmed caps-ht char
* Add blob_ht to caps ht pts for word
*
* IF Std Dev of caps hts < 2 (AND # samples > 0)
* Use mean as caps ht estimate (Dont use median as we can expect a
* fair variation between the heights of the NON_AMBIG_CAPS_HT_CHS)
* IF Std Dev of caps hts >= 2 (AND # samples > 0)
* Suspect small caps font.
* Look for 2 clusters, each with Std Dev < 2.
* IF 2 clusters found
* Pick the smaller median as the caps ht estimate of the smallcaps.
*
* IF failed to estimate a caps ht
* Use the median caps ht if there is one,
* ELSE use the caps ht estimate of the previous word. NO!!!
*
*
* IF there are confirmed x-height chars
* Estimate confirmed x-height as the median value
* ELSE IF there is a confirmed caps ht
* Estimate confirmed x-height as a fraction of confirmed caps ht value
* ELSE
* Use the value for the previous word or the row value if this is the
* first word in the block. NO!!!
*
* B) Add in case ambiguous blobs based on confirmed x-ht/caps ht, changing case
* as necessary. Reestimate caps ht and x-ht as in A, using the extended
* clusters.
*
* C) If word contains rejects, and x-ht estimate significantly differs from
* original estimate, return TRUE so that the word can be rematched
*************************************************************************/
void re_estimate_x_ht( //improve for 1 word
WERD_RES *word_res, //word to do
float *trial_x_ht //new match value
) {
PBLOB_IT blob_it;
INT16 blob_ht_above_baseline;
const char *word_str;
INT16 i;
STATS all_blobs_ht (0, 300); //every blob in word
STATS x_ht (0, 300); //confirmed pts in wd
STATS caps_ht (0, 300); //confirmed pts in wd
STATS case_ambig (0, 300); //lower case ambigs
INT16 rej_blobs_count = 0;
INT16 rej_blobs_max_height = 0;
INT32 rej_blobs_max_area = 0;
float x_ht_ok_variation;
float max_blob_ht;
float marginally_above_x_ht;
BOX blob_box; //blob bounding box
float est_x_ht = 0.0; //word estimate
float est_caps_ht = 0.0; //word estimate
//based on hard data?
BOOL8 est_caps_ht_certain = FALSE;
BOOL8 est_x_ht_certain = FALSE;//based on hard data?
BOOL8 trial = FALSE; //Sepeculative values?
BOOL8 no_comment = FALSE; //No change in xht
float ambig_lc_x_est;
float ambig_uc_caps_est;
INT16 x_ht_ambigs = 0;
INT16 caps_ht_ambigs = 0;
/* Calculate default variation of blob x_ht from bln x_ht for bln word */
x_ht_ok_variation =
(bln_x_height / x_ht_fraction_of_caps_ht - bln_x_height) * x_ht_variation;
word_str = word_res->best_choice->string ().string ();
/*
Cycle blobs, allocating to one of the stats sets when possible.
*/
blob_it.set_to_list (word_res->outword->blob_list ());
for (blob_it.mark_cycle_pt (), i = 0;
!blob_it.cycled_list (); blob_it.forward (), i++) {
if (!dodgy_blob (blob_it.data ())) {
blob_box = blob_it.data ()->bounding_box ();
blob_ht_above_baseline = blob_box.top () - bln_baseline_offset;
all_blobs_ht.add (blob_ht_above_baseline, 1);
if (word_res->reject_map[i].rejected ()) {
rej_blobs_count++;
if (blob_box.height () > rej_blobs_max_height)
rej_blobs_max_height = blob_box.height ();
if (blob_box.area () > rej_blobs_max_area)
rej_blobs_max_area = blob_box.area ();
}
else {
if (STRING (chs_non_ambig_x_ht).contains (word_str[i]))
x_ht.add (blob_ht_above_baseline, 1);
if (STRING (chs_non_ambig_caps_ht).contains (word_str[i]))
caps_ht.add (blob_ht_above_baseline, 1);
if (STRING (chs_ambig_caps_x).contains (word_str[i])) {
case_ambig.add (blob_ht_above_baseline, 1);
if (STRING (chs_x_ht).contains (word_str[i]))
x_ht_ambigs++;
else
caps_ht_ambigs++;
}
if (STRING (chs_bl_ambig_caps_x).contains (word_str[i])) {
if (STRING (chs_x_ht).contains (word_str[i])) {
/* confirm x_height provided > 15% total height below baseline */
if ((bln_baseline_offset - blob_box.bottom ()) /
(float) blob_box.height () > 0.15)
x_ht.add (blob_ht_above_baseline, 1);
}
else {
/* confirm caps_height provided < 5% total height below baseline */
if ((bln_baseline_offset - blob_box.bottom ()) /
(float) blob_box.height () < 0.05)
caps_ht.add (blob_ht_above_baseline, 1);
}
}
}
}
}
est_caps_ht = estimate_from_stats (caps_ht);
est_x_ht = estimate_from_stats (x_ht);
est_ambigs(word_res, case_ambig, &ambig_lc_x_est, &ambig_uc_caps_est);
max_blob_ht = all_blobs_ht.ile (0.9999);
#ifndef SECURE_NAMES
if (debug_x_ht_level >= 20) {
tprintf ("Mode20:A: %s ", word_str);
word_res->reject_map.print (debug_fp);
tprintf (" XHT:%f CAP:%f MAX:%f AMBIG X:%f CAP:%f\n",
est_x_ht, est_caps_ht, max_blob_ht,
ambig_lc_x_est, ambig_uc_caps_est);
}
#endif
if (!x_ht_conservative_ambigs &&
(ambig_lc_x_est > 0) &&
(ambig_lc_x_est == ambig_uc_caps_est) &&
(max_blob_ht > ambig_lc_x_est + x_ht_ok_variation)) {
//may be zero but believe xht
ambig_uc_caps_est = est_caps_ht;
#ifndef SECURE_NAMES
if (debug_x_ht_level >= 20)
tprintf ("Mode20:B: Fiddle ambig_uc_caps_est to %f\n",
ambig_lc_x_est);
#endif
}
/* Now make some estimates */
if ((est_x_ht > 0) ||
(est_caps_ht > 0) ||
((ambig_lc_x_est > 0) && (ambig_lc_x_est != ambig_uc_caps_est))) {
/* There is some sensible data to go on so make the most of it. */
if (debug_x_ht_level >= 20)
tprintf ("Mode20:C: Sensible Data\n", ambig_lc_x_est);
if (est_x_ht > 0) {
est_x_ht_certain = TRUE;
if (est_caps_ht == 0) {
if ((ambig_uc_caps_est > ambig_lc_x_est) &&
(ambig_uc_caps_est > est_x_ht + x_ht_ok_variation))
est_caps_ht = ambig_uc_caps_est;
else
est_caps_ht = est_x_ht / x_ht_fraction_of_caps_ht;
}
if (case_ambig.get_total () > 0)
improve_estimate(word_res, est_x_ht, est_caps_ht, x_ht, caps_ht);
est_caps_ht_certain = caps_ht.get_total () > 0;
#ifndef SECURE_NAMES
if (debug_x_ht_level >= 20)
tprintf ("Mode20:D: Est from xht XHT:%f CAP:%f\n",
est_x_ht, est_caps_ht);
#endif
}
else if (est_caps_ht > 0) {
est_caps_ht_certain = TRUE;
if ((ambig_lc_x_est > 0) &&
(ambig_lc_x_est < est_caps_ht - x_ht_ok_variation))
est_x_ht = ambig_lc_x_est;
else
est_x_ht = est_caps_ht * x_ht_fraction_of_caps_ht;
if (ambig_lc_x_est + ambig_uc_caps_est > 0)
improve_estimate(word_res, est_x_ht, est_caps_ht, x_ht, caps_ht);
est_x_ht_certain = x_ht.get_total () > 0;
#ifndef SECURE_NAMES
if (debug_x_ht_level >= 20)
tprintf ("Mode20:E: Est from caps XHT:%f CAP:%f\n",
est_x_ht, est_caps_ht);
#endif
}
else {
/* Do something based on case ambig chars alone - we have guessed that the
ambigs are lower case. */
est_x_ht = ambig_lc_x_est;
est_x_ht_certain = TRUE;
if (ambig_uc_caps_est > ambig_lc_x_est) {
est_caps_ht = ambig_uc_caps_est;
est_caps_ht_certain = TRUE;
}
else
est_caps_ht = est_x_ht / x_ht_fraction_of_caps_ht;
#ifndef SECURE_NAMES
if (debug_x_ht_level >= 20)
tprintf ("Mode20:F: Est from ambigs XHT:%f CAP:%f\n",
est_x_ht, est_caps_ht);
#endif
}
/* Check for sane interpretation of evidence:
Try shifting caps ht if min certain caps ht is not significantly greater
than the estimated x ht or the max certain x ht is not significantly less
than the estimated caps ht. */
if (x_ht_check_est) {
if ((caps_ht.get_total () > 0) &&
(est_x_ht + x_ht_ok_variation >= caps_ht.ile (0.0001))) {
trial = TRUE;
est_caps_ht = est_x_ht;
est_x_ht = x_ht_fraction_of_caps_ht * est_caps_ht;
#ifndef SECURE_NAMES
if (debug_x_ht_level >= 20)
tprintf ("Mode20:G: Trial XHT:%f CAP:%f\n",
est_x_ht, est_caps_ht);
#endif
}
else if ((x_ht.get_total () > 0) &&
(est_caps_ht - x_ht_ok_variation <= x_ht.ile (0.9999))) {
trial = TRUE;
est_x_ht = est_caps_ht;
est_caps_ht = est_x_ht / x_ht_fraction_of_caps_ht;
#ifndef SECURE_NAMES
if (debug_x_ht_level >= 20)
tprintf ("Mode20:H: Trial XHT:%f CAP:%f\n",
est_x_ht, est_caps_ht);
#endif
}
}
}
else {
/* There is no sensible data so we're in the dark. */
marginally_above_x_ht = bln_x_height +
x_ht_ok_variation * x_ht_sub_variation;
/*
If there are no rejects, or the only rejects have a narrow height, or have
a small area compared to a normal char, then estimate the x-height as the
original one. (I.e dont fiddle about if the only rejects look like
punctuation) - we use max height as mean or median will be too low if
there are only two blobs - Eg "F."
*/
if (debug_x_ht_level >= 20)
tprintf ("Mode20:I: In the dark\n");
if ((rej_blobs_count == 0) ||
(rej_blobs_max_height < 0.3 * max_blob_ht) ||
(rej_blobs_max_area < 0.3 * max_blob_ht * max_blob_ht)) {
no_comment = TRUE;
if (debug_x_ht_level >= 20)
tprintf ("Mode20:J: No comment due to no rejects\n");
}
else if (x_ht_limit_flip_trials &&
((max_blob_ht < marginally_above_x_ht) ||
((ambig_lc_x_est > 0) &&
(ambig_lc_x_est == ambig_uc_caps_est) &&
(ambig_lc_x_est < marginally_above_x_ht)))) {
no_comment = TRUE;
if (debug_x_ht_level >= 20)
tprintf ("Mode20:K: No comment as close to xht %f < %f\n",
ambig_lc_x_est, marginally_above_x_ht);
}
else if (x_ht_conservative_ambigs && (ambig_uc_caps_est > 0)) {
trial = TRUE;
est_caps_ht = ambig_lc_x_est;
est_x_ht = x_ht_fraction_of_caps_ht * est_caps_ht;
#ifndef SECURE_NAMES
if (debug_x_ht_level >= 20)
tprintf ("Mode20:L: Trial XHT:%f CAP:%f\n",
est_x_ht, est_caps_ht);
#endif
}
/*
If the top of the word is nowhere near where we expect ascenders to be
(less than half the x_ht -> caps_ht distance) - suspect an all caps word
at the x-ht. Estimate x-ht accordingly - but only as a TRIAL!
NOTE we do NOT check location of baseline. Commas can descend as much as
real descenders so we would need to do something to make sure that any
disqualifying descenders were not at the end.
*/
else {
if (max_blob_ht <
(bln_x_height + bln_x_height / x_ht_fraction_of_caps_ht) / 2.0) {
trial = TRUE;
est_x_ht = x_ht_fraction_of_caps_ht * max_blob_ht;
est_caps_ht = max_blob_ht;
#ifndef SECURE_NAMES
if (debug_x_ht_level >= 20)
tprintf ("Mode20:M: Trial XHT:%f CAP:%f\n",
est_x_ht, est_caps_ht);
#endif
}
else {
no_comment = TRUE;
if (debug_x_ht_level >= 20)
tprintf ("Mode20:N: No comment as nothing else matched\n");
}
}
}
/* Sanity check - reject word if fails */
if (!no_comment &&
((est_x_ht > 2 * bln_x_height) ||
(est_x_ht / word_res->denorm.scale () <= min_sane_x_ht_pixels) ||
(est_caps_ht <= est_x_ht) || (est_caps_ht >= 2.5 * est_x_ht))) {
no_comment = TRUE;
if (!trial && rej_use_xht) {
if (debug_x_ht_level >= 2) {
tprintf ("Sanity check rejecting %s ", word_str);
word_res->reject_map.print (debug_fp);
tprintf ("\n");
}
word_res->reject_map.rej_word_xht_fixup ();
}
if (debug_x_ht_level >= 20)
tprintf ("Mode20:O: No comment as nothing else matched\n");
}
if (no_comment || trial) {
word_res->x_height = bln_x_height / word_res->denorm.scale ();
word_res->guessed_x_ht = TRUE;
word_res->caps_height = (bln_x_height / x_ht_fraction_of_caps_ht) /
word_res->denorm.scale ();
word_res->guessed_caps_ht = TRUE;
/*
Reject ambigs in the current word if we are uncertain and:
there are rejects OR
there is only one char which is an ambig OR
there is conflict between the case of the ambigs even though there is
no height separation Eg "Ms" recognised from "MS"
*/
if (rej_trial_ambigs &&
((word_res->reject_map.reject_count () > 0) ||
(word_res->reject_map.length () == 1) ||
((x_ht_ambigs > 0) && (caps_ht_ambigs > 0)))) {
#ifndef SECURE_NAMES
if (debug_x_ht_level >= 2) {
tprintf ("TRIAL Rej Ambigs %s ", word_str);
word_res->reject_map.print (debug_fp);
}
#endif
reject_ambigs(word_res);
if (debug_x_ht_level >= 2) {
tprintf (" ");
word_res->reject_map.print (debug_fp);
tprintf ("\n");
}
}
}
else {
word_res->x_height = est_x_ht / word_res->denorm.scale ();
word_res->guessed_x_ht = !est_x_ht_certain;
word_res->caps_height = est_caps_ht / word_res->denorm.scale ();
word_res->guessed_caps_ht = !est_caps_ht_certain;
}
if (!no_comment && (fabs (est_x_ht - bln_x_height) > x_ht_ok_variation))
*trial_x_ht = est_x_ht / word_res->denorm.scale ();
else
*trial_x_ht = 0.0;
#ifndef SECURE_NAMES
if (((*trial_x_ht > 0) && (debug_x_ht_level >= 3)) ||
(debug_x_ht_level >= 5)) {
tprintf ("%s ", word_str);
word_res->reject_map.print (debug_fp);
tprintf
(" X:%0.2f Cps:%0.2f Mxht:%0.2f RJ MxHt:%d MxAr:%d Rematch:%c\n",
est_x_ht, est_caps_ht, max_blob_ht, rej_blobs_max_height,
rej_blobs_max_area, *trial_x_ht > 0 ? '*' : ' ');
}
#endif
}
/*************************************************************************
* check_block_occ()
* Checks word for coarse block occupancy, rejecting more chars and flipping
* case of case ambiguous chars as required.
*************************************************************************/
void check_block_occ(WERD_RES *word_res) {
PBLOB_IT blob_it;
STRING new_string;
REJMAP new_map = word_res->reject_map;
WERD_CHOICE *new_choice;
const char *word_str = word_res->best_choice->string ().string ();
INT16 i;
INT16 reject_count = 0;
char confirmed_char;
float x_ht;
float caps_ht;
if (word_res->x_height > 0)
x_ht = word_res->x_height * word_res->denorm.scale ();
else
x_ht = bln_x_height;
if (word_res->caps_height > 0)
caps_ht = word_res->caps_height * word_res->denorm.scale ();
else
caps_ht = x_ht / x_ht_fraction_of_caps_ht;
blob_it.set_to_list (word_res->outword->blob_list ());
for (blob_it.mark_cycle_pt (), i = 0;
!blob_it.cycled_list (); blob_it.forward (), i++) {
new_string += word_str[i]; //default copy
if (word_res->reject_map[i].accepted ()) {
confirmed_char = check_blob_occ (word_str[i],
blob_it.data ()->bounding_box ().
top () - bln_baseline_offset, x_ht,
caps_ht);
if (confirmed_char == '\0') {
if (rej_use_check_block_occ) {
new_map[i].setrej_xht_fixup ();
reject_count++;
}
}
else
new_string[i] = confirmed_char;
}
}
if ((reject_count > 0) || (new_string != word_str)) {
if (debug_x_ht_level >= 2) {
tprintf ("Shape Verification: %s ", word_str);
word_res->reject_map.print (debug_fp);
tprintf (" -> %s ", new_string.string ());
new_map.print (debug_fp);
tprintf ("\n");
}
new_choice = new WERD_CHOICE (new_string.string (),
word_res->best_choice->rating (),
word_res->best_choice->certainty (),
word_res->best_choice->permuter ());
delete word_res->best_choice;
word_res->best_choice = new_choice;
word_res->reject_map = new_map;
}
}
/*************************************************************************
* check_blob_occ()
*
* Checks blob for position relative to position above baseline
* Returns 0 for reject, or (possibly case shifted) confirmed char
*************************************************************************/
char check_blob_occ(char proposed_char,
INT16 blob_ht_above_baseline,
float x_ht,
float caps_ht) {
BOOL8 blob_definite_x_ht;
BOOL8 blob_definite_caps_ht;
float acceptable_variation;
acceptable_variation = (caps_ht - x_ht) * x_ht_variation;
/* ??? REJECT if expected descender and nothing significantly below BL */
/* ??? REJECT if expected ascender and nothing significantly above x-ht */
/*
IF AMBIG_CAPS_X_CHS
IF blob is definitely an ascender ( > xht + xht err )AND
char is an x-ht char
THEN
flip case
IF blob is defintiely an x-ht ( <= xht + xht err ) AND
char is an ascender char
THEN
flip case
*/
blob_definite_x_ht = blob_ht_above_baseline <= x_ht + acceptable_variation;
blob_definite_caps_ht = blob_ht_above_baseline >=
caps_ht - acceptable_variation;
if (STRING (chs_ambig_caps_x).contains (proposed_char)) {
if ((!blob_definite_x_ht && !blob_definite_caps_ht) ||
(proposed_char == '0' && !blob_definite_caps_ht) ||
(proposed_char == 'o' && !blob_definite_x_ht))
return '\0';
else if (blob_definite_caps_ht &&
STRING (chs_x_ht).contains (proposed_char)) {
if (x_ht_case_flip)
//flip to upper case
return (char) toupper (proposed_char);
else
return '\0';
}
else if (blob_definite_x_ht &&
!STRING (chs_x_ht).contains (proposed_char)) {
if (x_ht_case_flip)
//flip to lower case
return (char) tolower (proposed_char);
else
return '\0';
}
}
else
if ((STRING (chs_non_ambig_x_ht).contains (proposed_char)
&& !blob_definite_x_ht)
|| (STRING (chs_non_ambig_caps_ht).contains (proposed_char)
&& !blob_definite_caps_ht))
return '\0';
return proposed_char;
}
float estimate_from_stats(STATS &stats) {
if (stats.get_total () <= 0)
return 0.0;
else if (stats.get_total () >= 3)
return stats.ile (0.5); //median
else
return stats.mean ();
}
void improve_estimate(WERD_RES *word_res,
float &est_x_ht,
float &est_caps_ht,
STATS &x_ht,
STATS &caps_ht) {
PBLOB_IT blob_it;
INT16 blob_ht_above_baseline;
const char *word_str;
INT16 i;
BOX blob_box; //blob bounding box
char confirmed_char;
float new_val;
/* IMPROVE estimates here - if good estimates, and case ambig chars,
rescan blobs to fix case ambig blobs, re-estimate hts ??? maybe always do
it after deciding x-height
*/
blob_it.set_to_list (word_res->outword->blob_list ());
word_str = word_res->best_choice->string ().string ();
for (blob_it.mark_cycle_pt (), i = 0;
!blob_it.cycled_list (); blob_it.forward (), i++) {
if ((STRING (chs_ambig_caps_x).contains (word_str[i])) &&
(!dodgy_blob (blob_it.data ()))) {
blob_box = blob_it.data ()->bounding_box ();
blob_ht_above_baseline = blob_box.top () - bln_baseline_offset;
confirmed_char = check_blob_occ (word_str[i],
blob_ht_above_baseline,
est_x_ht, est_caps_ht);
if (confirmed_char != '\0')
if (STRING (chs_x_ht).contains (confirmed_char))
x_ht.add (blob_ht_above_baseline, 1);
else
caps_ht.add (blob_ht_above_baseline, 1);
}
}
new_val = estimate_from_stats (x_ht);
if (new_val > 0)
est_x_ht = new_val;
new_val = estimate_from_stats (caps_ht);
if (new_val > 0)
est_caps_ht = new_val;
}
void reject_ambigs( //rej any accepted xht ambig chars
WERD_RES *word) {
const char *word_str;
int i = 0;
word_str = word->best_choice->string ().string ();
while (*word_str != '\0') {
if (STRING (chs_ambig_caps_x).contains (*word_str))
word->reject_map[i].setrej_xht_fixup ();
word_str++;
i++;
}
}
void est_ambigs( //xht ambig ht stats
WERD_RES *word_res,
STATS &stats,
float *ambig_lc_x_est, //xht est
float *ambig_uc_caps_est //caps est
) {
float x_ht_ok_variation;
STATS short_ambigs (0, 300);
STATS tall_ambigs (0, 300);
PBLOB_IT blob_it;
BOX blob_box; //blob bounding box
INT16 blob_ht_above_baseline;
const char *word_str;
INT16 i;
float min; //min ambig ch ht
float max; //max ambig ch ht
float short_limit; // for lower case
float tall_limit; // for upper case
x_ht_ok_variation =
(bln_x_height / x_ht_fraction_of_caps_ht - bln_x_height) * x_ht_variation;
if (stats.get_total () == 0) {
*ambig_lc_x_est = 0;
*ambig_uc_caps_est = 0;
}
else {
min = stats.ile (0.0);
max = stats.ile (0.99999);
if ((max - min) < x_ht_ok_variation) {
*ambig_lc_x_est = *ambig_uc_caps_est = stats.mean ();
//close enough
}
else {
/* Try reclustering into lower and upper case chars */
short_limit = min + (max - min) * x_ht_variation;
tall_limit = max - (max - min) * x_ht_variation;
word_str = word_res->best_choice->string ().string ();
blob_it.set_to_list (word_res->outword->blob_list ());
for (blob_it.mark_cycle_pt (), i = 0;
!blob_it.cycled_list (); blob_it.forward (), i++) {
if (word_res->reject_map[i].accepted () &&
STRING (chs_ambig_caps_x).contains (word_str[i]) &&
(!dodgy_blob (blob_it.data ()))) {
blob_box = blob_it.data ()->bounding_box ();
blob_ht_above_baseline =
blob_box.top () - bln_baseline_offset;
if (blob_ht_above_baseline <= short_limit)
short_ambigs.add (blob_ht_above_baseline, 1);
else if (blob_ht_above_baseline >= tall_limit)
tall_ambigs.add (blob_ht_above_baseline, 1);
}
}
*ambig_lc_x_est = short_ambigs.mean ();
*ambig_uc_caps_est = tall_ambigs.mean ();
/* Cop out if we havent got sensible clusters. */
if (*ambig_uc_caps_est - *ambig_lc_x_est <= x_ht_ok_variation)
*ambig_lc_x_est = *ambig_uc_caps_est = stats.mean ();
//close enough
}
}
}
/*************************************************************************
* dodgy_blob()
* Returns true if the blob has more than one outline, one above the other.
* These are dodgy as the top blob could be noise, causing the bounding box xht
* to be misleading
*************************************************************************/
BOOL8 dodgy_blob(PBLOB *blob) {
OUTLINE_IT outline_it = blob->out_list ();
INT16 highest_bottom = -MAX_INT16;
INT16 lowest_top = MAX_INT16;
BOX outline_box;
if (x_ht_include_dodgy_blobs)
return FALSE; //no blob is ever dodgy
for (outline_it.mark_cycle_pt ();
!outline_it.cycled_list (); outline_it.forward ()) {
outline_box = outline_it.data ()->bounding_box ();
if (lowest_top > outline_box.top ())
lowest_top = outline_box.top ();
if (highest_bottom < outline_box.bottom ())
highest_bottom = outline_box.bottom ();
}
return highest_bottom >= lowest_top;
}

92
ccmain/fixxht.h Normal file
View File

@ -0,0 +1,92 @@
/**********************************************************************
* File: fixxht.h (Formerly fixxht.h)
* Description: Improve x_ht and look out for case inconsistencies
* Author: Phil Cheatle
* Created: Thu Aug 5 14:11:08 BST 1993
*
* (C) Copyright 1992, Hewlett-Packard Ltd.
** Licensed under the Apache License, Version 2.0 (the "License");
** you may not use this file except in compliance with the License.
** You may obtain a copy of the License at
** http://www.apache.org/licenses/LICENSE-2.0
** Unless required by applicable law or agreed to in writing, software
** distributed under the License is distributed on an "AS IS" BASIS,
** WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
** See the License for the specific language governing permissions and
** limitations under the License.
*
**********************************************************************/
#ifndef FIXXHT_H
#define FIXXHT_H
#include "varable.h"
#include "statistc.h"
#include "pageres.h"
#include "notdll.h"
extern double_VAR_H (x_ht_fraction_of_caps_ht, 0.7,
"Fract of cps ht est of xht");
extern double_VAR_H (x_ht_variation, 0.35,
"Err band as fract of caps/xht dist");
extern double_VAR_H (x_ht_sub_variation, 0.5,
"Err band as fract of caps/xht dist");
extern BOOL_VAR_H (rej_trial_ambigs, TRUE,
"reject x-ht ambigs when under trial");
extern BOOL_VAR_H (x_ht_conservative_ambigs, FALSE,
"Dont rely on ambigs + maxht");
extern BOOL_VAR_H (x_ht_check_est, TRUE, "Cross check estimates");
extern BOOL_VAR_H (x_ht_case_flip, FALSE, "Flip or reject suspect case");
extern BOOL_VAR_H (x_ht_include_dodgy_blobs, TRUE,
"Include blobs with possible noise?");
extern BOOL_VAR_H (x_ht_limit_flip_trials, TRUE,
"Dont do trial flips when ambigs are close to xht?");
extern BOOL_VAR_H (rej_use_check_block_occ, TRUE,
"Analyse rejection behaviour");
extern STRING_VAR_H (chs_non_ambig_caps_ht,
"!#$%&()/12346789?ABDEFGHIKLNQRT[]\\bdfhkl",
"Reliable ascenders");
extern STRING_VAR_H (chs_x_ht, "acegmnopqrsuvwxyz", "X height chars");
extern STRING_VAR_H (chs_non_ambig_x_ht, "aenqr", "reliable X height chars");
extern STRING_VAR_H (chs_ambig_caps_x, "cCmMoO05sSuUvVwWxXzZ",
"X ht or caps ht chars");
extern STRING_VAR_H (chs_bl_ambig_caps_x, "pPyY",
" Caps or descender ambigs");
extern STRING_VAR_H (chs_caps_ht,
"!#$%&()/0123456789?ABCDEFGHIJKLMNOPQRSTUVWXYZ[]\\bdfhkl{|}",
"Ascender chars");
extern STRING_VAR_H (chs_desc, "gjpqy", "Descender chars");
extern STRING_VAR_H (chs_non_ambig_bl,
"!#$%&01246789?ABCDEFGHIKLMNORSTUVWXYZabcdehiklmnorstuvwxz",
"Reliable baseline chars");
extern STRING_VAR_H (chs_odd_top, "ijt", "Chars with funny ascender region");
extern STRING_VAR_H (chs_odd_bot, "()35JQ[]\\/{}|", "Chars with funny base");
extern STRING_VAR_H (chs_bl,
"!#$%&()/01246789?ABCDEFGHIJKLMNOPRSTUVWXYZ[]\\abcdefhiklmnorstuvwxz{}",
"Baseline chars");
extern STRING_VAR_H (chs_non_ambig_desc, "gq", "Reliable descender chars");
void re_estimate_x_ht( //improve for 1 word
WERD_RES *word_res, //word to do
float *trial_x_ht //new match value
);
void check_block_occ(WERD_RES *word_res);
char check_blob_occ(char proposed_char,
INT16 blob_ht_above_baseline,
float x_ht,
float caps_ht);
float estimate_from_stats(STATS &stats);
void improve_estimate(WERD_RES *word_res,
float &est_x_ht,
float &est_caps_ht,
STATS &x_ht,
STATS &caps_ht);
void reject_ambigs( //rej any accepted xht ambig chars
WERD_RES *word);
//xht ambig ht stats
void est_ambigs(WERD_RES *word_res,
STATS &stats,
float *ambig_lc_x_est, //xht est
float *ambig_uc_caps_est //caps est
);
BOOL8 dodgy_blob(PBLOB *blob);
#endif

154
ccmain/imgscale.cpp Normal file
View File

@ -0,0 +1,154 @@
/**********************************************************************
* File: imgscale.cpp (Formerly dyn_prog.c)
* Description: Dynamic programming for smart scaling of images.
* Author: Phil Cheatle
* Created: Wed Nov 18 16:12:03 GMT 1992
*
* (C) Copyright 1992, Hewlett-Packard Ltd.
** Licensed under the Apache License, Version 2.0 (the "License");
** you may not use this file except in compliance with the License.
** You may obtain a copy of the License at
** http://www.apache.org/licenses/LICENSE-2.0
** Unless required by applicable law or agreed to in writing, software
** distributed under the License is distributed on an "AS IS" BASIS,
** WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
** See the License for the specific language governing permissions and
** limitations under the License.
*
**********************************************************************/
/*************************************************************************
* This is really Sheelagh's code that I've hacked into a more usable form.
* It is used by scaleim.c All I did to it was to change "factor" from int to
* float.
*************************************************************************/
/************************************************************************
* This version uses the result of the previous row to influence the
* current row's calculation.
************************************************************************/
#include "mfcpch.h"
#include <stdio.h>
#include <stdlib.h>
#include "errcode.h"
#define f(xc, yc) ((xc - factor*yc)*(xc - factor*yc))
#define g(oldyc, yc, oldxc, xc) (factor*factor*(oldyc - yc)*(oldyc - yc)/(abs(oldxc - xc) + 1))
void
dyn_exit (const char s[]) {
fprintf (stderr, "%s", s);
err_exit();
}
void dyn_prog( //The clever bit
int n,
int *x,
int *y,
int ymax,
int *oldx,
int *oldy,
int oldn,
float factor) {
int i, z, j, matchflag;
int **ymin;
float **F, fz;
/* F[i][z] gives minimum over y <= z */
F = (float **) calloc (n, sizeof (float *));
ymin = (int **) calloc (n, sizeof (int *));
if ((F == NULL) || (ymin == NULL))
dyn_exit ("Error in calloc\n");
for (i = 0; i < n; i++) {
F[i] = (float *) calloc (ymax - n + i + 1, sizeof (float));
ymin[i] = (int *) calloc (ymax - n + i + 1, sizeof (int));
if ((F[i] == NULL) || (ymin[i] == NULL))
dyn_exit ("Error in calloc\n");
}
F[0][0] = f (x[0], 0);
/* find nearest transition of same sign (white to black) */
j = 0;
while ((j < oldn) && (oldx[j] < x[0]))
j += 2;
if (j >= oldn)
j -= 2;
else if ((j - 2 >= 0) && ((x[0] - oldx[j - 2]) < (oldx[j] - x[0])))
j -= 2;
if (abs (oldx[j] - x[0]) < factor) {
matchflag = 1;
F[0][0] += g (oldy[j], 0, oldx[j], x[0]);
}
else
matchflag = 0;
ymin[0][0] = 0;
for (z = 1; z < ymax - n + 1; z++) {
fz = f (x[0], z);
/* add penalty for deviating from previous row if necessary */
if (matchflag)
fz += g (oldy[j], z, oldx[j], x[0]);
if (fz < F[0][z - 1]) {
F[0][z] = fz;
ymin[0][z] = z;
}
else {
F[0][z] = F[0][z - 1];
ymin[0][z] = ymin[0][z - 1];
}
}
for (i = 1; i < n; i++) {
F[i][i] = f (x[i], i) + F[i - 1][i - 1];
/* add penalty for deviating from previous row if necessary */
if (j > 0)
j--;
else
j++;
while ((j < oldn) && (oldx[j] < x[i]))
j += 2;
if (j >= oldn)
j -= 2;
else if ((j - 2 >= 0) && ((x[i] - oldx[j - 2]) < (oldx[j] - x[i])))
j -= 2;
if (abs (oldx[j] - x[i]) < factor) {
matchflag = 1;
F[i][i] += g (oldy[j], i, oldx[j], x[i]);
}
else
matchflag = 0;
ymin[i][i] = i;
for (z = i + 1; z < ymax - n + i + 1; z++) {
fz = f (x[i], z) + F[i - 1][z - 1];
/* add penalty for deviating from previous row if necessary */
if (matchflag)
fz += g (oldy[j], z, oldx[j], x[i]);
if (fz < F[i][z - 1]) {
F[i][z] = fz;
ymin[i][z] = z;
}
else {
F[i][z] = F[i][z - 1];
ymin[i][z] = ymin[i][z - 1];
}
}
}
y[n - 1] = ymin[n - 1][ymax - 1];
for (i = n - 2; i >= 0; i--)
y[i] = ymin[i][y[i + 1] - 1];
for (i = 0; i < n; i++) {
free (F[i]);
free (ymin[i]);
}
free(F);
free(ymin);
return;
}

32
ccmain/imgscale.h Normal file
View File

@ -0,0 +1,32 @@
/**********************************************************************
* File: imgscale.h (Formerly dyn_prog.h)
* Description: Dynamic programming for smart scaling of images.
* Author: Phil Cheatle
* Created: Wed Nov 18 16:12:03 GMT 1992
*
* (C) Copyright 1992, Hewlett-Packard Ltd.
** Licensed under the Apache License, Version 2.0 (the "License");
** you may not use this file except in compliance with the License.
** You may obtain a copy of the License at
** http://www.apache.org/licenses/LICENSE-2.0
** Unless required by applicable law or agreed to in writing, software
** distributed under the License is distributed on an "AS IS" BASIS,
** WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
** See the License for the specific language governing permissions and
** limitations under the License.
*
**********************************************************************/
#ifndef IMGSCALE_H
#define IMGSCALE_H
void dyn_prog( //The clever bit
int n,
int *x,
int *y,
int ymax,
int *oldx,
int *oldy,
int oldn,
float factor);
#endif

404
ccmain/matmatch.cpp Normal file
View File

@ -0,0 +1,404 @@
/**********************************************************************
* File: matmatch.cpp (Formerly matrix_match.c)
* Description: matrix matching routines for Tessedit
* Author: Chris Newton
* Created: Wed Nov 24 15:57:41 GMT 1993
*
* (C) Copyright 1993, Hewlett-Packard Ltd.
** Licensed under the Apache License, Version 2.0 (the "License");
** you may not use this file except in compliance with the License.
** You may obtain a copy of the License at
** http://www.apache.org/licenses/LICENSE-2.0
** Unless required by applicable law or agreed to in writing, software
** distributed under the License is distributed on an "AS IS" BASIS,
** WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
** See the License for the specific language governing permissions and
** limitations under the License.
*
**********************************************************************/
#include "mfcpch.h"
#include <stdlib.h>
#include <math.h>
#include <string.h>
#include <ctype.h>
#ifdef __UNIX__
#include <assert.h>
#endif
#include "tessvars.h"
#include "stderr.h"
#include "img.h"
#include "evnts.h"
#include "showim.h"
#include "hosthplb.h"
#include "grphics.h"
#include "evnts.h"
#include "adaptions.h"
#include "matmatch.h"
#include "secname.h"
#define EXTERN
EXTERN BOOL_VAR (tessedit_display_mm, FALSE, "Display matrix matches");
EXTERN BOOL_VAR (tessedit_mm_debug, FALSE,
"Print debug information for matrix matcher");
EXTERN INT_VAR (tessedit_mm_prototype_min_size, 3,
"Smallest number of samples in a cluster for a prototype to be used");
// Colours for displaying the match
#define BB_COLOUR 0
#define BW_COLOUR 1
#define WB_COLOUR 3
#define UB_COLOUR 5
#define BU_COLOUR 7
#define UU_COLOUR 9
#define WU_COLOUR 11
#define UW_COLOUR 13
#define WW_COLOUR 15
#define BINIM_BLACK 0
#define BINIM_WHITE 1
float matrix_match( // returns match score
IMAGE *image1,
IMAGE *image2) {
ASSERT_HOST (image1->get_bpp () == 1 && image2->get_bpp () == 1);
if (image1->get_xsize () >= image2->get_xsize ())
return match1 (image1, image2);
else
return match1 (image2, image1);
}
float match1( /* returns match score */
IMAGE *image_w,
IMAGE *image_n) {
INT32 x_offset;
INT32 y_offset;
INT32 x_size = image_w->get_xsize ();
INT32 y_size;
INT32 x_size2 = image_n->get_xsize ();
INT32 y_size2;
IMAGE match_image;
IMAGELINE imline_w;
IMAGELINE imline_n;
IMAGELINE match_imline;
INT32 x;
INT32 y;
float sum = 0.0;
x_offset = (image_w->get_xsize () - image_n->get_xsize ()) / 2;
ASSERT_HOST (x_offset >= 0);
match_imline.init (x_size);
sum = 0;
if (image_w->get_ysize () < image_n->get_ysize ()) {
y_size = image_n->get_ysize ();
y_size2 = image_w->get_ysize ();
y_offset = (y_size - y_size2) / 2;
if (tessedit_display_mm && !tessedit_mm_use_prototypes)
tprintf ("I1 (%d, %d), I2 (%d, %d), MI (%d, %d)\n", x_size,
image_w->get_ysize (), x_size2, image_n->get_ysize (),
x_size, y_size);
match_image.create (x_size, y_size, 4);
for (y = 0; y < y_offset; y++) {
image_n->fast_get_line (0, y, x_size2, &imline_n);
for (x = 0; x < x_size2; x++) {
if (imline_n.pixels[x] == BINIM_BLACK) {
sum += -1;
match_imline.pixels[x] = UB_COLOUR;
}
else {
match_imline.pixels[x] = UW_COLOUR;
}
}
match_image.fast_put_line (x_offset, y, x_size2, &match_imline);
}
for (y = y_offset + y_size2; y < y_size; y++) {
image_n->fast_get_line (0, y, x_size2, &imline_n);
for (x = 0; x < x_size2; x++) {
if (imline_n.pixels[x] == BINIM_BLACK) {
sum += -1.0;
match_imline.pixels[x] = UB_COLOUR;
}
else {
match_imline.pixels[x] = UW_COLOUR;
}
}
match_image.fast_put_line (x_offset, y, x_size2, &match_imline);
}
for (y = y_offset; y < y_offset + y_size2; y++) {
image_w->fast_get_line (0, y - y_offset, x_size, &imline_w);
image_n->fast_get_line (0, y, x_size2, &imline_n);
for (x = 0; x < x_offset; x++) {
if (imline_w.pixels[x] == BINIM_BLACK) {
sum += -1.0;
match_imline.pixels[x] = BU_COLOUR;
}
else {
match_imline.pixels[x] = WU_COLOUR;
}
}
for (x = x_offset + x_size2; x < x_size; x++) {
if (imline_w.pixels[x] == BINIM_BLACK) {
sum += -1.0;
match_imline.pixels[x] = BU_COLOUR;
}
else {
match_imline.pixels[x] = WU_COLOUR;
}
}
for (x = x_offset; x < x_offset + x_size2; x++) {
if (imline_n.pixels[x - x_offset] == imline_w.pixels[x]) {
sum += 1.0;
if (imline_w.pixels[x] == BINIM_BLACK)
match_imline.pixels[x] = BB_COLOUR;
else
match_imline.pixels[x] = WW_COLOUR;
}
else {
sum += -1.0;
if (imline_w.pixels[x] == BINIM_BLACK)
match_imline.pixels[x] = BW_COLOUR;
else
match_imline.pixels[x] = WB_COLOUR;
}
}
match_image.fast_put_line (0, y, x_size, &match_imline);
}
}
else {
y_size = image_w->get_ysize ();
y_size2 = image_n->get_ysize ();
y_offset = (y_size - y_size2) / 2;
if (tessedit_display_mm && !tessedit_mm_use_prototypes)
tprintf ("I1 (%d, %d), I2 (%d, %d), MI (%d, %d)\n", x_size,
image_w->get_ysize (), x_size2, image_n->get_ysize (),
x_size, y_size);
match_image.create (x_size, y_size, 4);
for (y = 0; y < y_offset; y++) {
image_w->fast_get_line (0, y, x_size, &imline_w);
for (x = 0; x < x_size; x++) {
if (imline_w.pixels[x] == BINIM_BLACK) {
sum += -1;
match_imline.pixels[x] = BU_COLOUR;
}
else {
match_imline.pixels[x] = WU_COLOUR;
}
}
match_image.fast_put_line (0, y, x_size, &match_imline);
}
for (y = y_offset + y_size2; y < y_size; y++) {
image_w->fast_get_line (0, y, x_size, &imline_w);
for (x = 0; x < x_size; x++) {
if (imline_w.pixels[x] == BINIM_BLACK) {
sum += -1;
match_imline.pixels[x] = BU_COLOUR;
}
else {
match_imline.pixels[x] = WU_COLOUR;
}
}
match_image.fast_put_line (0, y, x_size, &match_imline);
}
for (y = y_offset; y < y_offset + y_size2; y++) {
image_w->fast_get_line (0, y, x_size, &imline_w);
image_n->fast_get_line (0, y - y_offset, x_size2, &imline_n);
for (x = 0; x < x_offset; x++) {
if (imline_w.pixels[x] == BINIM_BLACK) {
sum += -1.0;
match_imline.pixels[x] = BU_COLOUR;
}
else {
match_imline.pixels[x] = WU_COLOUR;
}
}
for (x = x_offset + x_size2; x < x_size; x++) {
if (imline_w.pixels[x] == BINIM_BLACK) {
sum += -1.0;
match_imline.pixels[x] = BU_COLOUR;
}
else {
match_imline.pixels[x] = WU_COLOUR;
}
}
for (x = x_offset; x < x_offset + x_size2; x++) {
if (imline_n.pixels[x - x_offset] == imline_w.pixels[x]) {
sum += 1.0;
if (imline_w.pixels[x] == BINIM_BLACK)
match_imline.pixels[x] = BB_COLOUR;
else
match_imline.pixels[x] = WW_COLOUR;
}
else {
sum += -1.0;
if (imline_w.pixels[x] == BINIM_BLACK)
match_imline.pixels[x] = BW_COLOUR;
else
match_imline.pixels[x] = WB_COLOUR;
}
}
match_image.fast_put_line (0, y, x_size, &match_imline);
}
}
#ifndef GRAPHICS_DISABLED
if (tessedit_display_mm && !tessedit_mm_use_prototypes) {
tprintf ("Match score %f\n", 1.0 - sum / (x_size * y_size));
display_images(image_w, image_n, &match_image);
}
#endif
if (tessedit_mm_debug)
tprintf ("Match score %f\n", 1.0 - sum / (x_size * y_size));
return (1.0 - sum / (x_size * y_size));
}
/*************************************************************************
* display_images()
*
* Show a pair of images, plus the match image
*
*************************************************************************/
#ifndef GRAPHICS_DISABLED
void display_images(IMAGE *image_w, IMAGE *image_n, IMAGE *match_image) {
WINDOW w_im_window;
WINDOW n_im_window;
WINDOW match_window;
GRAPHICS_EVENT event; //output event
INT16 i;
// xmin xmax ymin ymax
w_im_window = create_window ("Image 1", SCROLLINGWIN, 20, 100, 10 * image_w->get_xsize (), 10 * image_w->get_ysize (), 0, image_w->get_xsize (), 0, image_w->get_ysize (),
TRUE, FALSE, FALSE, TRUE); // down event & key only
clear_view_surface(w_im_window);
show_sub_image (image_w,
0, 0,
image_w->get_xsize (), image_w->get_ysize (),
w_im_window, 0, 0);
line_color_index(w_im_window, RED);
for (i = 1; i < image_w->get_xsize (); i++) {
move2d (w_im_window, i, 0);
draw2d (w_im_window, i, image_w->get_ysize ());
}
for (i = 1; i < image_w->get_ysize (); i++) {
move2d (w_im_window, 0, i);
draw2d (w_im_window, image_w->get_xsize (), i);
}
// xmin xmax ymin ymax
n_im_window = create_window ("Image 2", SCROLLINGWIN, 240, 100, 10 * image_n->get_xsize (), 10 * image_n->get_ysize (), 0, image_n->get_xsize (), 0, image_n->get_ysize (),
TRUE, FALSE, FALSE, TRUE); // down event & key only
clear_view_surface(n_im_window);
show_sub_image (image_n,
0, 0,
image_n->get_xsize (), image_n->get_ysize (),
n_im_window, 0, 0);
line_color_index(n_im_window, RED);
for (i = 1; i < image_n->get_xsize (); i++) {
move2d (n_im_window, i, 0);
draw2d (n_im_window, i, image_n->get_ysize ());
}
for (i = 1; i < image_n->get_ysize (); i++) {
move2d (n_im_window, 0, i);
draw2d (n_im_window, image_n->get_xsize (), i);
}
overlap_picture_ops(TRUE);
// xmin xmax ymin ymax
match_window = create_window ("Match Result", SCROLLINGWIN, 460, 100, 10 * match_image->get_xsize (), 10 * match_image->get_ysize (), 0, match_image->get_xsize (), 0, match_image->get_ysize (),
TRUE, FALSE, FALSE, TRUE); // down event & key only
clear_view_surface(match_window);
show_sub_image (match_image,
0, 0,
match_image->get_xsize (), match_image->get_ysize (),
match_window, 0, 0);
line_color_index(match_window, RED);
for (i = 1; i < match_image->get_xsize (); i++) {
move2d (match_window, i, 0);
draw2d (match_window, i, match_image->get_ysize ());
}
for (i = 1; i < match_image->get_ysize (); i++) {
move2d (match_window, 0, i);
draw2d (match_window, match_image->get_xsize (), i);
}
overlap_picture_ops(TRUE);
await_event(match_window, TRUE, ANY_EVENT, &event);
destroy_window(w_im_window);
destroy_window(n_im_window);
destroy_window(match_window);
}
/*************************************************************************
* display_image()
*
* Show a single image
*
*************************************************************************/
WINDOW display_image(IMAGE *image,
const char *title,
INT32 x,
INT32 y,
BOOL8 wait) {
WINDOW im_window;
INT16 i;
GRAPHICS_EVENT event; //output event
// xmin xmax ymin ymax
im_window = create_window (title, SCROLLINGWIN, x, y, 10 * image->get_xsize (), 10 * image->get_ysize (), 0, image->get_xsize (), 0, image->get_ysize (),
TRUE, FALSE, FALSE, TRUE); // down event & key only
clear_view_surface(im_window);
show_sub_image (image,
0, 0,
image->get_xsize (), image->get_ysize (), im_window, 0, 0);
line_color_index(im_window, RED);
for (i = 1; i < image->get_xsize (); i++) {
move2d (im_window, i, 0);
draw2d (im_window, i, image->get_ysize ());
}
for (i = 1; i < image->get_ysize (); i++) {
move2d (im_window, 0, i);
draw2d (im_window, image->get_xsize (), i);
}
overlap_picture_ops(TRUE);
if (wait)
await_event(im_window, TRUE, ANY_EVENT, &event);
return im_window;
}
#endif

48
ccmain/matmatch.h Normal file
View File

@ -0,0 +1,48 @@
/**********************************************************************
* File: matmatch.h (Formerly matrix_match.h)
* Description: matrix matching routines for Tessedit
* Author: Chris Newton
* Created: Wed Nov 24 15:57:41 GMT 1993
*
* (C) Copyright 1993, Hewlett-Packard Ltd.
** Licensed under the Apache License, Version 2.0 (the "License");
** you may not use this file except in compliance with the License.
** You may obtain a copy of the License at
** http://www.apache.org/licenses/LICENSE-2.0
** Unless required by applicable law or agreed to in writing, software
** distributed under the License is distributed on an "AS IS" BASIS,
** WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
** See the License for the specific language governing permissions and
** limitations under the License.
*
**********************************************************************/
#ifndef MATMATCH_H
#define MATMATCH_H
#include "img.h"
#include "hosthplb.h"
#include "notdll.h"
#define BINIM_BLACK 0
#define BINIM_WHITE 1
#define BAD_MATCH 9999.0
extern BOOL_VAR_H (tessedit_display_mm, FALSE, "Display matrix matches");
extern BOOL_VAR_H (tessedit_mm_debug, FALSE,
"Print debug information for matrix matcher");
extern INT_VAR_H (tessedit_mm_prototype_min_size, 3,
"Smallest number of samples in a cluster for a prototype to be used");
float matrix_match( // returns match score
IMAGE *image1,
IMAGE *image2);
float match1( /* returns match score */
IMAGE *image_w,
IMAGE *image_n);
void display_images(IMAGE *image_w, IMAGE *image_n, IMAGE *match_image);
WINDOW display_image(IMAGE *image,
const char *title,
INT32 x,
INT32 y,
BOOL8 wait);
#endif

1185
ccmain/output.cpp Normal file

File diff suppressed because it is too large Load Diff

112
ccmain/output.h Normal file
View File

@ -0,0 +1,112 @@
/******************************************************************
* File: output.h (Formerly output.h)
* Description: Output pass
* Author: Phil Cheatle
* Created: Thu Aug 4 10:56:08 BST 1994
*
* (C) Copyright 1994, Hewlett-Packard Ltd.
** Licensed under the Apache License, Version 2.0 (the "License");
** you may not use this file except in compliance with the License.
** You may obtain a copy of the License at
** http://www.apache.org/licenses/LICENSE-2.0
** Unless required by applicable law or agreed to in writing, software
** distributed under the License is distributed on an "AS IS" BASIS,
** WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
** See the License for the specific language governing permissions and
** limitations under the License.
*
**********************************************************************/
#ifndef OUTPUT_H
#define OUTPUT_H
#include "varable.h"
//#include "epapconv.h"
#include "pageres.h"
#include "notdll.h"
extern BOOL_EVAR_H (tessedit_write_block_separators, TRUE,
"Write block separators in output");
extern BOOL_VAR_H (tessedit_write_raw_output, FALSE,
"Write raw stuff to name.raw");
extern BOOL_EVAR_H (tessedit_write_output, TRUE, "Write text to name.txt");
extern BOOL_EVAR_H (tessedit_write_txt_map, TRUE,
"Write .txt to .etx map file");
extern BOOL_EVAR_H (tessedit_write_rep_codes, TRUE,
"Write repetition char code");
extern BOOL_EVAR_H (tessedit_write_unlv, FALSE, "Write .unlv output file");
extern STRING_EVAR_H (unrecognised_char, "|",
"Output char for unidentified blobs");
extern INT_EVAR_H (suspect_level, 99, "Suspect marker level");
extern INT_VAR_H (suspect_space_level, 100,
"Min suspect level for rejecting spaces");
extern INT_VAR_H (suspect_short_words, 2,
"Dont Suspect dict wds longer than this");
extern BOOL_VAR_H (suspect_constrain_1Il, FALSE,
"UNLV keep 1Il chars rejected");
extern double_VAR_H (suspect_rating_per_ch, 999.9,
"Dont touch bad rating limit");
extern double_VAR_H (suspect_accept_rating, -999.9,
"Accept good rating limit");
extern BOOL_EVAR_H (tessedit_minimal_rejection, FALSE,
"Only reject tess failures");
extern BOOL_VAR_H (tessedit_zero_rejection, FALSE, "Dont reject ANYTHING");
extern BOOL_VAR_H (tessedit_word_for_word, FALSE,
"Make output have exactly one word per WERD");
extern BOOL_VAR_H (tessedit_consistent_reps, TRUE,
"Force all rep chars the same");
void output_pass( //Tess output pass //send to api
PAGE_RES_IT &page_res_it,
BOOL8 write_to_shm);
void write_results( //output a word
PAGE_RES_IT &page_res_it, //full info
char newline_type, //type of newline
BOOL8 force_eol, //override tilde crunch?
BOOL8 write_to_shm //send to api
);
WERD_CHOICE *make_epaper_choice( //convert one word
WERD_RES *word, //word to do
char newline_type //type of newline
);
INT16 make_reject ( //make reject code
BOX * inset_box, //bounding box
INT16 prevright, //previous char
INT16 nextleft, //next char
DENORM * denorm, //de-normalizer
char word_string[] //output string
);
char determine_newline_type( //test line ends
WERD *word, //word to do
BLOCK *block, //current block
WERD *next_word, //next word
BLOCK *next_block //block of next word
);
void write_cooked_text( //write output
WERD *word, //word to do
const STRING &text, //text to write
BOOL8 acceptable, //good stuff
BOOL8 pass2, //done on pass2
FILE *fp //file to write
);
void write_shm_text( //write output
WERD_RES *word, //word to do
BLOCK *block, //block it is from
ROW_RES *row, //row it is from
const STRING &text //text to write
);
void write_map( //output a map file
FILE *mapfile, //mapfile to write to
WERD_RES *word);
FILE *open_outfile( //open .map & .unlv file
const char *extension);
void write_unlv_text(WERD_RES *word);
char get_rep_char( // what char is repeated?
WERD_RES *word);
void ensure_rep_chars_are_consistent(WERD_RES *word);
void set_unlv_suspects(WERD_RES *word);
INT16 count_alphas( //how many alphas
const char *s);
INT16 count_alphanums( //how many alphanums
const char *s);
BOOL8 acceptable_number_string(const char *s);
#endif

107
ccmain/paircmp.cpp Normal file
View File

@ -0,0 +1,107 @@
/**********************************************************************
* File: paircmp.cpp (Formerly paircmp.c)
* Description: Code to compare two blobs using the adaptive matcher
* Author: Ray Smith
* Created: Wed Apr 21 09:31:02 BST 1993
*
* (C) Copyright 1993, Hewlett-Packard Ltd.
** Licensed under the Apache License, Version 2.0 (the "License");
** you may not use this file except in compliance with the License.
** You may obtain a copy of the License at
** http://www.apache.org/licenses/LICENSE-2.0
** Unless required by applicable law or agreed to in writing, software
** distributed under the License is distributed on an "AS IS" BASIS,
** WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
** See the License for the specific language governing permissions and
** limitations under the License.
*
**********************************************************************/
#include "mfcpch.h"
#include "blobcmp.h"
#include "tfacep.h"
#include "paircmp.h"
#define EXTERN
/**********************************************************************
* compare_blob_pairs
*
* A blob processor to compare pairs of selected blobs.
**********************************************************************/
BOOL8 compare_blob_pairs( //blob processor
BLOCK *,
ROW *row, //row it came from
WERD *,
PBLOB *blob //blob to compare
) {
static ROW *prev_row = NULL; //other in pair
static PBLOB *prev_blob = NULL;
float rating; //from matcher
if (prev_row == NULL || prev_blob == NULL) {
prev_row = row;
prev_blob = blob;
}
else {
rating = compare_blobs (prev_blob, prev_row, blob, row);
tprintf ("Rating=%g\n", rating);
prev_row = NULL;
prev_blob = NULL;
}
return TRUE;
}
/**********************************************************************
* compare_blobs
*
* Compare 2 blobs and return the rating.
**********************************************************************/
float compare_blobs( //match 2 blobs
PBLOB *blob1, //first blob
ROW *row1, //row it came from
PBLOB *blob2, //other blob
ROW *row2) {
PBLOB *bn_blob1; //baseline norm
PBLOB *bn_blob2;
DENORM denorm1, denorm2;
float rating; //match result
bn_blob1 = blob1->baseline_normalise (row1, &denorm1);
bn_blob2 = blob2->baseline_normalise (row2, &denorm2);
rating = compare_bln_blobs (bn_blob1, &denorm1, bn_blob2, &denorm2);
delete bn_blob1;
delete bn_blob2;
return rating;
}
/**********************************************************************
* compare_bln_blobs
*
* Compare 2 baseline normalised blobs and return the rating.
**********************************************************************/
float compare_bln_blobs( //match 2 blobs
PBLOB *blob1, //first blob
DENORM *denorm1,
PBLOB *blob2, //other blob
DENORM *denorm2) {
TBLOB *tblob1; //tessblobs
TBLOB *tblob2;
TEXTROW tessrow1, tessrow2; //tess rows
float rating; //match result
tblob1 = make_tess_blob (blob1, TRUE);
make_tess_row(denorm1, &tessrow1);
tblob2 = make_tess_blob (blob2, TRUE);
make_tess_row(denorm2, &tessrow2);
rating = compare_tess_blobs (tblob1, &tessrow1, tblob2, &tessrow2);
free_blob(tblob1);
free_blob(tblob2);
return rating;
}

43
ccmain/paircmp.h Normal file
View File

@ -0,0 +1,43 @@
/**********************************************************************
* File: paircmp.h (Formerly paircmp.h)
* Description: Code to compare two blobs using the adaptive matcher
* Author: Ray Smith
* Created: Wed Apr 21 09:31:02 BST 1993
*
* (C) Copyright 1993, Hewlett-Packard Ltd.
** Licensed under the Apache License, Version 2.0 (the "License");
** you may not use this file except in compliance with the License.
** You may obtain a copy of the License at
** http://www.apache.org/licenses/LICENSE-2.0
** Unless required by applicable law or agreed to in writing, software
** distributed under the License is distributed on an "AS IS" BASIS,
** WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
** See the License for the specific language governing permissions and
** limitations under the License.
*
**********************************************************************/
#ifndef PAIRCMP_H
#define PAIRCMP_H
#include "ocrblock.h"
#include "varable.h"
#include "notdll.h"
BOOL8 compare_blob_pairs( //blob processor
BLOCK *,
ROW *row, //row it came from
WERD *,
PBLOB *blob //blob to compare
);
float compare_blobs( //match 2 blobs
PBLOB *blob1, //first blob
ROW *row1, //row it came from
PBLOB *blob2, //other blob
ROW *row2);
float compare_bln_blobs( //match 2 blobs
PBLOB *blob1, //first blob
DENORM *denorm1,
PBLOB *blob2, //other blob
DENORM *denorm2);
#endif

1655
ccmain/reject.cpp Normal file

File diff suppressed because it is too large Load Diff

175
ccmain/reject.h Normal file
View File

@ -0,0 +1,175 @@
/**********************************************************************
* File: reject.h (Formerly reject.h)
* Description: Rejection functions used in tessedit
* Author: Phil Cheatle
* Created: Wed Sep 23 16:50:21 BST 1992
*
* (C) Copyright 1992, Hewlett-Packard Ltd.
** Licensed under the Apache License, Version 2.0 (the "License");
** you may not use this file except in compliance with the License.
** You may obtain a copy of the License at
** http://www.apache.org/licenses/LICENSE-2.0
** Unless required by applicable law or agreed to in writing, software
** distributed under the License is distributed on an "AS IS" BASIS,
** WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
** See the License for the specific language governing permissions and
** limitations under the License.
*
**********************************************************************/
#ifndef REJECT_H
#define REJECT_H
#include "varable.h"
#include "pageres.h"
#include "notdll.h"
extern INT_VAR_H (tessedit_reject_mode, 5, "Rejection algorithm");
extern INT_VAR_H (tessedit_ok_mode, 5, "Acceptance decision algorithm");
extern BOOL_VAR_H (tessedit_use_nn, TRUE, "");
extern BOOL_VAR_H (tessedit_rejection_debug, FALSE, "Adaption debug");
extern BOOL_VAR_H (tessedit_rejection_stats, FALSE, "Show NN stats");
extern BOOL_VAR_H (tessedit_flip_0O, TRUE, "Contextual 0O O0 flips");
extern double_VAR_H (tessedit_lower_flip_hyphen, 1.5,
"Aspect ratio dot/hyphen test");
extern double_VAR_H (tessedit_upper_flip_hyphen, 1.8,
"Aspect ratio dot/hyphen test");
extern BOOL_VAR_H (rej_trust_doc_dawg, FALSE,
"Use DOC dawg in 11l conf. detector");
extern BOOL_VAR_H (rej_1Il_use_dict_word, FALSE, "Use dictword test");
extern BOOL_VAR_H (rej_1Il_trust_permuter_type, TRUE, "Dont double check");
extern BOOL_VAR_H (one_ell_conflict_default, TRUE,
"one_ell_conflict default");
extern BOOL_VAR_H (show_char_clipping, FALSE, "Show clip image window?");
extern BOOL_VAR_H (nn_debug, FALSE, "NN DEBUGGING?");
extern BOOL_VAR_H (nn_reject_debug, FALSE, "NN DEBUG each char?");
extern BOOL_VAR_H (nn_lax, FALSE, "Use 2nd rate matches");
extern BOOL_VAR_H (nn_double_check_dict, FALSE, "Double check");
extern BOOL_VAR_H (nn_conf_double_check_dict, TRUE,
"Double check for confusions");
extern BOOL_VAR_H (nn_conf_1Il, TRUE, "NN use 1Il conflicts");
extern BOOL_VAR_H (nn_conf_Ss, TRUE, "NN use Ss conflicts");
extern BOOL_VAR_H (nn_conf_hyphen, TRUE, "NN hyphen conflicts");
extern BOOL_VAR_H (nn_conf_test_good_qual, FALSE, "NN dodgy 1Il cross check");
extern BOOL_VAR_H (nn_conf_test_dict, TRUE, "NN dodgy 1Il cross check");
extern BOOL_VAR_H (nn_conf_test_sensible, TRUE, "NN dodgy 1Il cross check");
extern BOOL_VAR_H (nn_conf_strict_on_dodgy_chs, TRUE,
"Require stronger NN match");
extern double_VAR_H (nn_dodgy_char_threshold, 0.99, "min accept score");
extern INT_VAR_H (nn_conf_accept_level, 4, "NN accept dodgy 1Il matches? ");
extern INT_VAR_H (nn_conf_initial_i_level, 3,
"NN accept initial Ii match level ");
extern BOOL_VAR_H (no_unrej_dubious_chars, TRUE,
"Dubious chars next to reject?");
extern BOOL_VAR_H (no_unrej_no_alphanum_wds, TRUE,
"Stop unrej of non A/N wds?");
extern BOOL_VAR_H (no_unrej_1Il, FALSE, "Stop unrej of 1Ilchars?");
extern BOOL_VAR_H (rej_use_tess_accepted, TRUE,
"Individual rejection control");
extern BOOL_VAR_H (rej_use_tess_blanks, TRUE, "Individual rejection control");
extern BOOL_VAR_H (rej_use_good_perm, TRUE, "Individual rejection control");
extern BOOL_VAR_H (rej_use_sensible_wd, FALSE, "Extend permuter check");
extern BOOL_VAR_H (rej_alphas_in_number_perm, FALSE, "Extend permuter check");
extern double_VAR_H (rej_whole_of_mostly_reject_word_fract, 0.85,
"if >this fract");
extern INT_VAR_H (rej_mostly_reject_mode, 1,
"0-never, 1-afterNN, 2-after new xht");
extern double_VAR_H (tessed_fullstop_aspect_ratio, 1.2,
"if >this fract then reject");
extern INT_VAR_H (net_image_width, 40, "NN input image width");
extern INT_VAR_H (net_image_height, 36, "NN input image height");
extern INT_VAR_H (net_image_x_height, 22, "NN input image x_height");
extern INT_VAR_H (tessedit_image_border, 2, "Rej blbs near image edge limit");
extern INT_VAR_H (net_bl_nodes, 20, "Number of baseline nodes");
extern double_VAR_H (nn_reject_threshold, 0.5, "NN min accept score");
extern double_VAR_H (nn_reject_head_and_shoulders, 0.6,
"top scores sep factor");
extern STRING_VAR_H (ok_single_ch_non_alphanum_wds, "-?\075",
"Allow NN to unrej");
extern STRING_VAR_H (ok_repeated_ch_non_alphanum_wds, "-?*\075",
"Allow NN to unrej");
extern STRING_VAR_H (conflict_set_I_l_1, "Il1[]", "Il1 conflict set");
extern STRING_VAR_H (conflict_set_S_s, "Ss$", "Ss conflict set");
extern STRING_VAR_H (conflict_set_hyphen, "-_~", "hyphen conflict set");
extern STRING_VAR_H (dubious_chars_left_of_reject, "!'+`()-./\\<>;:^_,~\"",
"Unreliable chars");
extern STRING_VAR_H (dubious_chars_right_of_reject, "!'+`()-./\\<>;:^_,~\"",
"Unreliable chars");
extern INT_VAR_H (min_sane_x_ht_pixels, 8,
"Reject any x-ht lt or eq than this");
void set_done( //set done flag
WERD_RES *word,
INT16 pass);
void make_reject_map( //make rej map for wd //detailed results
WERD_RES *word,
BLOB_CHOICE_LIST_CLIST *blob_choices,
ROW *row,
INT16 pass //1st or 2nd?
);
void reject_blanks(WERD_RES *word);
void reject_I_1_L(WERD_RES *word);
//detailed results
void reject_poor_matches(WERD_RES *word, BLOB_CHOICE_LIST_CLIST *blob_choices);
float compute_reject_threshold( //compute threshold //detailed results
BLOB_CHOICE_LIST_CLIST *blob_choices);
int sort_floats( //qsort function
const void *arg1, //ptrs to floats
const void *arg2);
void reject_edge_blobs(WERD_RES *word);
BOOL8 one_ell_conflict(WERD_RES *word_res, BOOL8 update_map);
INT16 first_alphanum_pos(const char *word);
INT16 alpha_count(const char *word);
BOOL8 word_contains_non_1_digit(const char *word);
BOOL8 test_ambig_word( //test for ambiguity
WERD_RES *word);
//original word
BOOL8 ambig_word(const char *start_word,
char *temp_word, //alterable copy
INT16 test_char_pos //idx to char to alter
);
const char *char_ambiguities(char c);
#ifndef EMBEDDED
void test_ambigs(const char *word);
#endif
void nn_recover_rejects(WERD_RES *word, ROW *row);
void nn_match_word( //Match a word
WERD_RES *word,
ROW *row);
//of character
INT16 nn_match_char(IMAGE &scaled_image,
float baseline_pos, //rel to scaled_image
BOOL8 dict_word, //part of dict wd?
BOOL8 checked_dict_word, //part of dict wd?
BOOL8 sensible_word, //part acceptable str?
BOOL8 centre, //not at word ends?
BOOL8 good_quality_word, //initial segmentation
char tess_ch //confirm this?
);
INT16 evaluate_net_match(char top,
float top_score,
char next,
float next_score,
char tess_ch,
BOOL8 dict_word,
BOOL8 checked_dict_word,
BOOL8 sensible_word,
BOOL8 centre,
BOOL8 good_quality_word);
void dont_allow_dubious_chars(WERD_RES *word);
void dont_allow_1Il(WERD_RES *word);
INT16 count_alphanums( //how many alphanums
WERD_RES *word);
void reject_mostly_rejects( //rej all if most rejectd
WERD_RES *word);
BOOL8 repeated_nonalphanum_wd(WERD_RES *word, ROW *row);
BOOL8 repeated_ch_string(const char *rep_ch_str);
INT16 safe_dict_word(const char *s);
void flip_hyphens(WERD_RES *word);
void flip_0O(WERD_RES *word);
BOOL8 non_O_upper(char c);
BOOL8 non_0_digit(char c);
#endif

366
ccmain/scaleimg.cpp Normal file
View File

@ -0,0 +1,366 @@
/**********************************************************************
* File: scaleimg.cpp (Formerly scaleim.c)
* Description: Smart scaling of images.
* Author: Phil Cheatle
* Created: Wed Nov 18 16:12:03 GMT 1992
*
* (C) Copyright 1992, Hewlett-Packard Ltd.
** Licensed under the Apache License, Version 2.0 (the "License");
** you may not use this file except in compliance with the License.
** You may obtain a copy of the License at
** http://www.apache.org/licenses/LICENSE-2.0
** Unless required by applicable law or agreed to in writing, software
** distributed under the License is distributed on an "AS IS" BASIS,
** WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
** See the License for the specific language governing permissions and
** limitations under the License.
*
**********************************************************************/
/*************************************************************************
* This is really Sheelagh's code that I've hacked into a more usable form.
* You simply call scale_image() passing in source and target images. The target
* image should be empty, but created - in order to define the destination
* size.
*************************************************************************/
#include "mfcpch.h"
#include <stdlib.h>
#include <string.h>
#include "fileerr.h"
#include "tprintf.h"
#include "grphics.h"
#include "img.h"
//#include "basefile.h"
#include "imgscale.h"
#include "scaleimg.h"
void scale_image( //scale an image
IMAGE &image, //source image
IMAGE &target_image //target image
) {
INT32 xsize, ysize, new_xsize, new_ysize;
IMAGELINE line, new_line;
int *hires, *lores, *oldhires, *oldlores;
int i, j, n, oldn, row, col;
int offset = 0; //not used here
float factor;
UINT8 curr_colour, new_colour;
int dummy = -1;
IMAGE image2; //horiz scaled image
xsize = image.get_xsize ();
ysize = image.get_ysize ();
new_xsize = target_image.get_xsize ();
new_ysize = target_image.get_ysize ();
if (new_ysize > new_xsize)
new_line.init (new_ysize);
else
new_line.init (new_xsize);
factor = (float) xsize / (float) new_xsize;
hires = (int *) calloc (xsize, sizeof (int));
lores = (int *) calloc (new_xsize, sizeof (int));
oldhires = (int *) calloc (xsize, sizeof (int));
oldlores = (int *) calloc (new_xsize, sizeof (int));
if ((hires == NULL) || (lores == NULL) || (oldhires == NULL)
|| (oldlores == NULL)) {
fprintf (stderr, "Calloc error in scale_image\n");
err_exit();
}
image2.create (new_xsize, ysize, image.get_bpp ());
oldn = 0;
/* do first row separately because hires[col-1] doesn't make sense here */
image.fast_get_line (0, 0, xsize, &line);
/* each line nominally begins with white */
curr_colour = 1;
n = 0;
for (i = 0; i < xsize; i++) {
new_colour = *(line.pixels + i);
if (new_colour != curr_colour) {
hires[n] = i;
n++;
curr_colour = new_colour;
}
}
if (offset != 0)
for (i = 0; i < n; i++)
hires[i] += offset;
if (n > new_xsize) {
tprintf ("Too many transitions (%d) on line 0\n", n);
scale_image_cop_out(image,
target_image,
factor,
hires,
lores,
oldhires,
oldlores);
return;
}
else if (n > 0)
dyn_prog (n, hires, lores, new_xsize, &dummy, &dummy, 0, factor);
else
lores[0] = new_xsize;
curr_colour = 1;
j = 0;
for (i = 0; i < new_xsize; i++) {
if (lores[j] == i) {
curr_colour = 1 - curr_colour;
j++;
}
*(new_line.pixels + i) = curr_colour;
}
image2.put_line (0, 0, new_xsize, &new_line, 0);
for (i = 0; i < n; i++) {
oldhires[i] = hires[i];
oldlores[i] = lores[i];
}
for (i = n; i < oldn; i++) {
oldhires[i] = 0;
oldlores[i] = 0;
}
oldn = n;
for (row = 1; row < ysize; row++) {
image.fast_get_line (0, row, xsize, &line);
/* each line nominally begins with white */
curr_colour = 1;
n = 0;
for (i = 0; i < xsize; i++) {
new_colour = *(line.pixels + i);
if (new_colour != curr_colour) {
hires[n] = i;
n++;
curr_colour = new_colour;
}
}
for (i = n; i < oldn; i++) {
hires[i] = 0;
lores[i] = 0;
}
if (offset != 0)
for (i = 0; i < n; i++)
hires[i] += offset;
if (n > new_xsize) {
tprintf ("Too many transitions (%d) on line %d\n", n, row);
scale_image_cop_out(image,
target_image,
factor,
hires,
lores,
oldhires,
oldlores);
return;
}
else if (n > 0)
dyn_prog(n, hires, lores, new_xsize, oldhires, oldlores, oldn, factor);
else
lores[0] = new_xsize;
curr_colour = 1;
j = 0;
for (i = 0; i < new_xsize; i++) {
if (lores[j] == i) {
curr_colour = 1 - curr_colour;
j++;
}
*(new_line.pixels + i) = curr_colour;
}
image2.put_line (0, row, new_xsize, &new_line, 0);
for (i = 0; i < n; i++) {
oldhires[i] = hires[i];
oldlores[i] = lores[i];
}
for (i = n; i < oldn; i++) {
oldhires[i] = 0;
oldlores[i] = 0;
}
oldn = n;
}
free(hires);
free(lores);
free(oldhires);
free(oldlores);
/* NOW DO THE VERTICAL SCALING from image2 to target_image*/
xsize = new_xsize;
factor = (float) ysize / (float) new_ysize;
offset = 0;
hires = (int *) calloc (ysize, sizeof (int));
lores = (int *) calloc (new_ysize, sizeof (int));
oldhires = (int *) calloc (ysize, sizeof (int));
oldlores = (int *) calloc (new_ysize, sizeof (int));
if ((hires == NULL) || (lores == NULL) || (oldhires == NULL)
|| (oldlores == NULL)) {
fprintf (stderr, "Calloc error in scale_image (vert)\n");
err_exit();
}
oldn = 0;
/* do first col separately because hires[col-1] doesn't make sense here */
image2.get_column (0, 0, ysize, &line, 0);
/* each line nominally begins with white */
curr_colour = 1;
n = 0;
for (i = 0; i < ysize; i++) {
new_colour = *(line.pixels + i);
if (new_colour != curr_colour) {
hires[n] = i;
n++;
curr_colour = new_colour;
}
}
if (offset != 0)
for (i = 0; i < n; i++)
hires[i] += offset;
if (n > new_ysize) {
tprintf ("Too many transitions (%d) on column 0\n", n);
scale_image_cop_out(image,
target_image,
factor,
hires,
lores,
oldhires,
oldlores);
return;
}
else if (n > 0)
dyn_prog (n, hires, lores, new_ysize, &dummy, &dummy, 0, factor);
else
lores[0] = new_ysize;
curr_colour = 1;
j = 0;
for (i = 0; i < new_ysize; i++) {
if (lores[j] == i) {
curr_colour = 1 - curr_colour;
j++;
}
*(new_line.pixels + i) = curr_colour;
}
target_image.put_column (0, 0, new_ysize, &new_line, 0);
for (i = 0; i < n; i++) {
oldhires[i] = hires[i];
oldlores[i] = lores[i];
}
for (i = n; i < oldn; i++) {
oldhires[i] = 0;
oldlores[i] = 0;
}
oldn = n;
for (col = 1; col < xsize; col++) {
image2.get_column (col, 0, ysize, &line, 0);
/* each line nominally begins with white */
curr_colour = 1;
n = 0;
for (i = 0; i < ysize; i++) {
new_colour = *(line.pixels + i);
if (new_colour != curr_colour) {
hires[n] = i;
n++;
curr_colour = new_colour;
}
}
for (i = n; i < oldn; i++) {
hires[i] = 0;
lores[i] = 0;
}
if (offset != 0)
for (i = 0; i < n; i++)
hires[i] += offset;
if (n > new_ysize) {
tprintf ("Too many transitions (%d) on column %d\n", n, col);
scale_image_cop_out(image,
target_image,
factor,
hires,
lores,
oldhires,
oldlores);
return;
}
else if (n > 0)
dyn_prog(n, hires, lores, new_ysize, oldhires, oldlores, oldn, factor);
else
lores[0] = new_ysize;
curr_colour = 1;
j = 0;
for (i = 0; i < new_ysize; i++) {
if (lores[j] == i) {
curr_colour = 1 - curr_colour;
j++;
}
*(new_line.pixels + i) = curr_colour;
}
target_image.put_column (col, 0, new_ysize, &new_line, 0);
for (i = 0; i < n; i++) {
oldhires[i] = hires[i];
oldlores[i] = lores[i];
}
for (i = n; i < oldn; i++) {
oldhires[i] = 0;
oldlores[i] = 0;
}
oldn = n;
}
free(hires);
free(lores);
free(oldhires);
free(oldlores);
}
/**********************************************************************
* scale_image_cop_out
*
* Cop-out of scale_image by doing it the easy way and free the data.
**********************************************************************/
void scale_image_cop_out( //scale an image
IMAGE &image, //source image
IMAGE &target_image, //target image
float factor, //scale factor
int *hires,
int *lores,
int *oldhires,
int *oldlores) {
INT32 xsize, ysize, new_xsize, new_ysize;
xsize = image.get_xsize ();
ysize = image.get_ysize ();
new_xsize = target_image.get_xsize ();
new_ysize = target_image.get_ysize ();
if (factor <= 0.5)
reduce_sub_image (&image, 0, 0, xsize, ysize,
&target_image, 0, 0, (INT32) (1.0 / factor), FALSE);
else if (factor >= 2)
enlarge_sub_image (&image, 0, 0, &target_image,
0, 0, new_xsize, new_ysize, (INT32) factor, FALSE);
else
copy_sub_image (&image, 0, 0, xsize, ysize, &target_image, 0, 0, FALSE);
free(hires);
free(lores);
free(oldhires);
free(oldlores);
}

35
ccmain/scaleimg.h Normal file
View File

@ -0,0 +1,35 @@
/**********************************************************************
* File: scaleimg.h (Formerly scaleim.h)
* Description: Smart scaling of images.
* Author: Phil Cheatle
* Created: Wed Nov 18 16:12:03 GMT 1992
*
* (C) Copyright 1992, Hewlett-Packard Ltd.
** Licensed under the Apache License, Version 2.0 (the "License");
** you may not use this file except in compliance with the License.
** You may obtain a copy of the License at
** http://www.apache.org/licenses/LICENSE-2.0
** Unless required by applicable law or agreed to in writing, software
** distributed under the License is distributed on an "AS IS" BASIS,
** WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
** See the License for the specific language governing permissions and
** limitations under the License.
*
**********************************************************************/
#ifndef SCALEIMG_H
#define SCALEIMG_H
void scale_image( //scale an image
IMAGE &image, //source image
IMAGE &target_image //target image
);
void scale_image_cop_out( //scale an image
IMAGE &image, //source image
IMAGE &target_image, //target image
float factor, //scale factor
int *hires,
int *lores,
int *oldhires,
int *oldlores);
#endif

370
ccmain/tessbox.cpp Normal file
View File

@ -0,0 +1,370 @@
/**********************************************************************
* File: tessbox.cpp (Formerly tessbox.c)
* Description: Black boxed Tess for developing a resaljet.
* Author: Ray Smith
* Created: Thu Apr 23 11:03:36 BST 1992
*
* (C) Copyright 1992, Hewlett-Packard Ltd.
** Licensed under the Apache License, Version 2.0 (the "License");
** you may not use this file except in compliance with the License.
** You may obtain a copy of the License at
** http://www.apache.org/licenses/LICENSE-2.0
** Unless required by applicable law or agreed to in writing, software
** distributed under the License is distributed on an "AS IS" BASIS,
** WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
** See the License for the specific language governing permissions and
** limitations under the License.
*
**********************************************************************/
#include "mfcpch.h"
#include "tfacep.h"
#include "tfacepp.h"
#include "tessbox.h"
#include "mfoutline.h"
#define EXTERN
/**********************************************************************
* tess_segment_pass1
*
* Segment a word using the pass1 conditions of the tess segmenter.
**********************************************************************/
WERD_CHOICE *tess_segment_pass1( //recog one word
WERD *word, //bln word to do
DENORM *denorm, //de-normaliser
POLY_MATCHER matcher, //matcher function
WERD_CHOICE *&raw_choice, //raw result //list of blob lists
BLOB_CHOICE_LIST_CLIST *blob_choices,
WERD *&outword //bln word output
) {
WERD_CHOICE *result; //return value
int saved_enable_assoc = 0;
int saved_chop_enable = 0;
if (word->flag (W_DONT_CHOP)) {
saved_enable_assoc = enable_assoc;
saved_chop_enable = chop_enable;
enable_assoc = 0;
chop_enable = 0;
if (word->flag (W_REP_CHAR))
permute_only_top = 1;
}
set_pass1();
// tprintf("pass1 chop on=%d, seg=%d, onlytop=%d",chop_enable,enable_assoc,permute_only_top);
result = recog_word (word, denorm, matcher, NULL, NULL, FALSE,
raw_choice, blob_choices, outword);
if (word->flag (W_DONT_CHOP)) {
enable_assoc = saved_enable_assoc;
chop_enable = saved_chop_enable;
permute_only_top = 0;
}
return result;
}
/**********************************************************************
* tess_segment_pass2
*
* Segment a word using the pass2 conditions of the tess segmenter.
**********************************************************************/
WERD_CHOICE *tess_segment_pass2( //recog one word
WERD *word, //bln word to do
DENORM *denorm, //de-normaliser
POLY_MATCHER matcher, //matcher function
WERD_CHOICE *&raw_choice, //raw result //list of blob lists
BLOB_CHOICE_LIST_CLIST *blob_choices,
WERD *&outword //bln word output
) {
WERD_CHOICE *result; //return value
int saved_enable_assoc = 0;
int saved_chop_enable = 0;
if (word->flag (W_DONT_CHOP)) {
saved_enable_assoc = enable_assoc;
saved_chop_enable = chop_enable;
enable_assoc = 0;
chop_enable = 0;
if (word->flag (W_REP_CHAR))
permute_only_top = 1;
}
set_pass2();
result = recog_word (word, denorm, matcher, NULL, NULL, FALSE,
raw_choice, blob_choices, outword);
if (word->flag (W_DONT_CHOP)) {
enable_assoc = saved_enable_assoc;
chop_enable = saved_chop_enable;
permute_only_top = 0;
}
return result;
}
/**********************************************************************
* correct_segment_pass2
*
* Segment a word correctly using the pass2 conditions of the tess segmenter.
* Then call the tester with all the correctly segmented blobs.
* If the correct segmentation cannot be found, the tester is called
* with the segmentation found by tess and all the correct flags set to
* false and all strings are NULL.
**********************************************************************/
WERD_CHOICE *correct_segment_pass2( //recog one word
WERD *word, //bln word to do
DENORM *denorm, //de-normaliser
POLY_MATCHER matcher, //matcher function
POLY_TESTER tester, //tester function
WERD_CHOICE *&raw_choice, //raw result //list of blob lists
BLOB_CHOICE_LIST_CLIST *blob_choices,
WERD *&outword //bln word output
) {
set_pass2();
return recog_word (word, denorm, matcher, NULL, tester, TRUE,
raw_choice, blob_choices, outword);
}
/**********************************************************************
* test_segment_pass2
*
* Segment a word correctly using the pass2 conditions of the tess segmenter.
* Then call the tester on all words used by tess in its search.
* Do this only on words where the correct segmentation could be found.
**********************************************************************/
WERD_CHOICE *test_segment_pass2( //recog one word
WERD *word, //bln word to do
DENORM *denorm, //de-normaliser
POLY_MATCHER matcher, //matcher function
POLY_TESTER tester, //tester function
WERD_CHOICE *&raw_choice, //raw result //list of blob lists
BLOB_CHOICE_LIST_CLIST *blob_choices,
WERD *&outword //bln word output
) {
set_pass2();
return recog_word (word, denorm, matcher, tester, NULL, TRUE,
raw_choice, blob_choices, outword);
}
/**********************************************************************
* tess_acceptable_word
*
* Return true if the word is regarded as "good enough".
**********************************************************************/
BOOL8 tess_acceptable_word( //test acceptability
WERD_CHOICE *word_choice, //after context
WERD_CHOICE *raw_choice //before context
) {
A_CHOICE choice; //after context
A_CHOICE tess_raw; //before
choice.rating = word_choice->rating ();
choice.certainty = word_choice->certainty ();
choice.string = (char *) word_choice->string ().string ();
tess_raw.rating = raw_choice->rating ();
tess_raw.certainty = raw_choice->certainty ();
tess_raw.string = (char *) raw_choice->string ().string ();
//call tess
return AcceptableResult (&choice, &tess_raw);
}
/**********************************************************************
* tess_adaptable_word
*
* Return true if the word is regarded as "good enough".
**********************************************************************/
BOOL8 tess_adaptable_word( //test adaptability
WERD *word, //word to test
WERD_CHOICE *word_choice, //after context
WERD_CHOICE *raw_choice //before context
) {
TWERD *tessword; //converted word
INT32 result; //answer
tessword = make_tess_word (word, NULL);
result = AdaptableWord (tessword, word_choice->string ().string (),
raw_choice->string ().string ());
delete_word(tessword);
return result != 0;
}
/**********************************************************************
* tess_cn_matcher
*
* Match a blob using the Tess Char Normalized (non-adaptive) matcher
* only.
**********************************************************************/
void tess_cn_matcher( //call tess
PBLOB *pblob, //previous blob
PBLOB *blob, //blob to match
PBLOB *nblob, //next blob
WERD *word, //word it came from
DENORM *denorm, //de-normaliser
BLOB_CHOICE_LIST &ratings //list of results
) {
LIST result; //tess output
TBLOB *tessblob; //converted blob
TEXTROW tessrow; //dummy row
tess_cn_matching = TRUE; //turn it on
tess_bn_matching = FALSE;
//convert blob
tessblob = make_tess_blob (blob, TRUE);
//make dummy row
make_tess_row(denorm, &tessrow);
//classify
result = AdaptiveClassifier (tessblob, NULL, &tessrow);
free_blob(tessblob);
//make our format
convert_choice_list(result, ratings);
}
/**********************************************************************
* tess_bn_matcher
*
* Match a blob using the Tess Baseline Normalized (adaptive) matcher
* only.
**********************************************************************/
void tess_bn_matcher( //call tess
PBLOB *pblob, //previous blob
PBLOB *blob, //blob to match
PBLOB *nblob, //next blob
WERD *word, //word it came from
DENORM *denorm, //de-normaliser
BLOB_CHOICE_LIST &ratings //list of results
) {
LIST result; //tess output
TBLOB *tessblob; //converted blob
TEXTROW tessrow; //dummy row
tess_bn_matching = TRUE; //turn it on
tess_cn_matching = FALSE;
//convert blob
tessblob = make_tess_blob (blob, TRUE);
//make dummy row
make_tess_row(denorm, &tessrow);
//classify
result = AdaptiveClassifier (tessblob, NULL, &tessrow);
free_blob(tessblob);
//make our format
convert_choice_list(result, ratings);
}
/**********************************************************************
* tess_default_matcher
*
* Match a blob using the default functionality of the Tess matcher.
**********************************************************************/
void tess_default_matcher( //call tess
PBLOB *pblob, //previous blob
PBLOB *blob, //blob to match
PBLOB *nblob, //next blob
WERD *word, //word it came from
DENORM *denorm, //de-normaliser
BLOB_CHOICE_LIST &ratings //list of results
) {
LIST result; //tess output
TBLOB *tessblob; //converted blob
TEXTROW tessrow; //dummy row
tess_bn_matching = FALSE; //turn it off
tess_cn_matching = FALSE;
//convert blob
tessblob = make_tess_blob (blob, TRUE);
//make dummy row
make_tess_row(denorm, &tessrow);
//classify
result = AdaptiveClassifier (tessblob, NULL, &tessrow);
free_blob(tessblob);
//make our format
convert_choice_list(result, ratings);
}
/**********************************************************************
* tess_training_tester
*
* Matcher tester function which actually trains tess.
**********************************************************************/
void tess_training_tester( //call tess
PBLOB *blob, //blob to match
DENORM *denorm, //de-normaliser
BOOL8 correct, //ly segmented
char *text, //correct text
INT32 count, //chars in text
BLOB_CHOICE_LIST *ratings //list of results
) {
TBLOB *tessblob; //converted blob
TEXTROW tessrow; //dummy row
if (correct) {
NormMethod = character; //Force char norm spc 30/11/93
tess_bn_matching = FALSE; //turn it off
tess_cn_matching = FALSE;
//convert blob
tessblob = make_tess_blob (blob, TRUE);
//make dummy row
make_tess_row(denorm, &tessrow);
//learn it
LearnBlob(tessblob, &tessrow, text, count);
free_blob(tessblob);
}
}
/**********************************************************************
* tess_adapter
*
* Adapt to the word using the Tesseract mechanism.
**********************************************************************/
void tess_adapter( //adapt to word
WERD *word, //bln word
DENORM *denorm, //de-normalise
const char *string, //string for word
const char *raw_string, //before context
const char *rejmap //reject map
) {
TWERD *tessword; //converted word
static TEXTROW tessrow; //dummy row
//make dummy row
make_tess_row(denorm, &tessrow);
//make a word
tessword = make_tess_word (word, &tessrow);
AdaptToWord(tessword, &tessrow, string, raw_string, rejmap);
//adapt to it
delete_word(tessword); //free it
}
/**********************************************************************
* tess_add_doc_word
*
* Add the given word to the document dictionary
**********************************************************************/
void tess_add_doc_word( //test acceptability
WERD_CHOICE *word_choice //after context
) {
A_CHOICE choice; //after context
choice.rating = word_choice->rating ();
choice.certainty = word_choice->certainty ();
choice.string = (char *) word_choice->string ().string ();
add_document_word(&choice);
}

110
ccmain/tessbox.h Normal file
View File

@ -0,0 +1,110 @@
/**********************************************************************
* File: tessbox.h (Formerly tessbox.h)
* Description: Black boxed Tess for developing a resaljet.
* Author: Ray Smith
* Created: Thu Apr 23 11:03:36 BST 1992
*
* (C) Copyright 1992, Hewlett-Packard Ltd.
** Licensed under the Apache License, Version 2.0 (the "License");
** you may not use this file except in compliance with the License.
** You may obtain a copy of the License at
** http://www.apache.org/licenses/LICENSE-2.0
** Unless required by applicable law or agreed to in writing, software
** distributed under the License is distributed on an "AS IS" BASIS,
** WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
** See the License for the specific language governing permissions and
** limitations under the License.
*
**********************************************************************/
#ifndef TESSBOX_H
#define TESSBOX_H
#include "ratngs.h"
#include "notdll.h"
WERD_CHOICE *tess_segment_pass1( //recog one word
WERD *word, //bln word to do
DENORM *denorm, //de-normaliser
POLY_MATCHER matcher, //matcher function
WERD_CHOICE *&raw_choice, //raw result //list of blob lists
BLOB_CHOICE_LIST_CLIST *blob_choices,
WERD *&outword //bln word output
);
WERD_CHOICE *tess_segment_pass2( //recog one word
WERD *word, //bln word to do
DENORM *denorm, //de-normaliser
POLY_MATCHER matcher, //matcher function
WERD_CHOICE *&raw_choice, //raw result //list of blob lists
BLOB_CHOICE_LIST_CLIST *blob_choices,
WERD *&outword //bln word output
);
//recog one word
WERD_CHOICE *correct_segment_pass2(WERD *word, //bln word to do
DENORM *denorm, //de-normaliser
POLY_MATCHER matcher, //matcher function
POLY_TESTER tester, //tester function
WERD_CHOICE *&raw_choice, //raw result //list of blob lists
BLOB_CHOICE_LIST_CLIST *blob_choices,
WERD *&outword //bln word output
);
WERD_CHOICE *test_segment_pass2( //recog one word
WERD *word, //bln word to do
DENORM *denorm, //de-normaliser
POLY_MATCHER matcher, //matcher function
POLY_TESTER tester, //tester function
WERD_CHOICE *&raw_choice, //raw result //list of blob lists
BLOB_CHOICE_LIST_CLIST *blob_choices,
WERD *&outword //bln word output
);
BOOL8 tess_acceptable_word( //test acceptability
WERD_CHOICE *word_choice, //after context
WERD_CHOICE *raw_choice //before context
);
BOOL8 tess_adaptable_word( //test adaptability
WERD *word, //word to test
WERD_CHOICE *word_choice, //after context
WERD_CHOICE *raw_choice //before context
);
void tess_cn_matcher( //call tess
PBLOB *pblob, //previous blob
PBLOB *blob, //blob to match
PBLOB *nblob, //next blob
WERD *word, //word it came from
DENORM *denorm, //de-normaliser
BLOB_CHOICE_LIST &ratings //list of results
);
void tess_bn_matcher( //call tess
PBLOB *pblob, //previous blob
PBLOB *blob, //blob to match
PBLOB *nblob, //next blob
WERD *word, //word it came from
DENORM *denorm, //de-normaliser
BLOB_CHOICE_LIST &ratings //list of results
);
void tess_default_matcher( //call tess
PBLOB *pblob, //previous blob
PBLOB *blob, //blob to match
PBLOB *nblob, //next blob
WERD *word, //word it came from
DENORM *denorm, //de-normaliser
BLOB_CHOICE_LIST &ratings //list of results
);
void tess_training_tester( //call tess
PBLOB *blob, //blob to match
DENORM *denorm, //de-normaliser
BOOL8 correct, //ly segmented
char *text, //correct text
INT32 count, //chars in text
BLOB_CHOICE_LIST *ratings //list of results
);
void tess_adapter( //adapt to word
WERD *word, //bln word
DENORM *denorm, //de-normalise
const char *string, //string for word
const char *raw_string, //before context
const char *rejmap);
void tess_add_doc_word( //test acceptability
WERD_CHOICE *word_choice //after context
);
#endif

321
ccmain/tessedit.cpp Normal file
View File

@ -0,0 +1,321 @@
/**********************************************************************
* File: tessedit.cpp (Formerly tessedit.c)
* Description: Main program for merge of tess and editor.
* Author: Ray Smith
* Created: Tue Jan 07 15:21:46 GMT 1992
*
* (C) Copyright 1992, Hewlett-Packard Ltd.
** Licensed under the Apache License, Version 2.0 (the "License");
** you may not use this file except in compliance with the License.
** You may obtain a copy of the License at
** http://www.apache.org/licenses/LICENSE-2.0
** Unless required by applicable law or agreed to in writing, software
** distributed under the License is distributed on an "AS IS" BASIS,
** WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
** See the License for the specific language governing permissions and
** limitations under the License.
*
**********************************************************************/
#include "mfcpch.h"
//#include <osfcn.h>
//#include <signal.h>
//#include <time.h>
//#include <unistd.h>
#include "tfacep.h" //must be before main.h
//#include "fileerr.h"
#include "stderr.h"
#include "basedir.h"
#include "tessvars.h"
//#include "debgwin.h"
//#include "epapdest.h"
#include "control.h"
#include "imgs.h"
#include "reject.h"
#include "pageres.h"
//#include "gpapdest.h"
#include "mainblk.h"
#include "nwmain.h"
#include "pgedit.h"
#include "ocrshell.h"
#include "tprintf.h"
//#include "ipeerr.h"
//#include "restart.h"
#include "tessedit.h"
//#include "fontfind.h"
#include "permute.h"
#include "permdawg.h"
#include "permnum.h"
#include "stopper.h"
#include "adaptmatch.h"
#include "intmatcher.h"
#include "chop.h"
#include "globals.h"
//extern "C" {
#include "callnet.h" //phils nn stuff
//}
#include "notdll.h" //phils nn stuff
#define VARDIR "configs/" /*variables files */
//config under api
#define API_CONFIG "configs/api_config"
#define EXTERN
EXTERN BOOL_EVAR (tessedit_write_vars, FALSE, "Write all vars to file");
EXTERN BOOL_VAR (tessedit_tweaking_tess_vars, FALSE,
"Fiddle tess config values");
EXTERN INT_VAR (tweak_ReliableConfigThreshold, 2, "Tess VAR");
EXTERN double_VAR (tweak_garbage, 1.5, "Tess VAR");
EXTERN double_VAR (tweak_ok_word, 1.25, "Tess VAR");
EXTERN double_VAR (tweak_good_word, 1.1, "Tess VAR");
EXTERN double_VAR (tweak_freq_word, 1.0, "Tess VAR");
EXTERN double_VAR (tweak_ok_number, 1.4, "Tess VAR");
EXTERN double_VAR (tweak_good_number, 1.1, "Tess VAR");
EXTERN double_VAR (tweak_non_word, 1.25, "Tess VAR");
EXTERN double_VAR (tweak_CertaintyPerChar, -0.5, "Tess VAR");
EXTERN double_VAR (tweak_NonDictCertainty, -2.5, "Tess VAR");
EXTERN double_VAR (tweak_RejectCertaintyOffset, 1.0, "Tess VAR");
EXTERN double_VAR (tweak_GoodAdaptiveMatch, 0.125, "Tess VAR");
EXTERN double_VAR (tweak_GreatAdaptiveMatch, 0.10, "Tess VAR");
EXTERN INT_VAR (tweak_AdaptProtoThresh, 230, "Tess VAR");
EXTERN INT_VAR (tweak_AdaptFeatureThresh, 230, "Tess VAR");
EXTERN INT_VAR (tweak_min_outline_points, 6, "Tess VAR");
EXTERN INT_VAR (tweak_min_outline_area, 2000, "Tess VAR");
EXTERN double_VAR (tweak_good_split, 50.0, "Tess VAR");
EXTERN double_VAR (tweak_ok_split, 100.0, "Tess VAR");
extern INT16 XOFFSET;
extern INT16 YOFFSET;
extern int NO_BLOCK;
//progress monitor
ETEXT_DESC *global_monitor = NULL;
int init_tesseract(const char *arg0,
const char *textbase,
const char *configfile,
int configc,
const char *const *configv) {
FILE *var_file;
static char c_path[MAX_PATH]; //path for c code
// Set the basename, compute the data directory and read C++ configs.
main_setup(arg0, textbase, configc, configv);
debug_window_on.set_value (FALSE);
if (tessedit_write_vars) {
var_file = fopen ("edited.cfg", "w");
if (var_file != NULL) {
print_variables(var_file);
fclose(var_file);
}
}
strcpy (c_path, datadir.string ());
c_path[strlen (c_path) - strlen (m_data_sub_dir.string ())] = '\0';
demodir = c_path;
start_recog(configfile, textbase);
ReliableConfigThreshold = tweak_ReliableConfigThreshold;
set_tess_tweak_vars();
if (tessedit_use_nn) //phils nn stuff
init_net();
return 0; //Normal exit
}
void end_tesseract() {
end_recog();
}
#ifdef _TIFFIO_
void read_tiff_image(TIFF* tif, IMAGE* image) {
tdata_t buf;
uint32 image_width, image_height;
uint16 photometric;
short bpp;
TIFFGetField(tif, TIFFTAG_IMAGEWIDTH, &image_width);
TIFFGetField(tif, TIFFTAG_IMAGELENGTH, &image_height);
TIFFGetField(tif, TIFFTAG_BITSPERSAMPLE, &bpp);
TIFFGetField(tif, TIFFTAG_PHOTOMETRIC, &photometric);
// Tesseract's internal representation is 0-is-black,
// so if the photometric is 1 (min is black) then high-valued pixels
// are 1 (white), otherwise they are 0 (black).
UINT8 high_value = photometric == 1;
image->create(image_width, image_height, bpp);
IMAGELINE line;
line.init(image_width);
buf = _TIFFmalloc(TIFFScanlineSize(tif));
int bytes_per_line = (image_width*bpp + 7)/8;
UINT8* dest_buf = image->get_buffer();
// This will go badly wrong with one of the more exotic tiff formats,
// but the majority will work OK.
for (int y = 0; y < image_height; ++y) {
TIFFReadScanline(tif, buf, y);
memcpy(dest_buf, buf, bytes_per_line);
dest_buf += bytes_per_line;
}
if (high_value == 0)
invert_image(image);
_TIFFfree(buf);
}
#endif
/* Define command type identifiers */
enum CMD_EVENTS
{
ACTION_1_CMD_EVENT,
RECOG_WERDS,
RECOG_PSEUDO,
ACTION_2_CMD_EVENT
};
/**********************************************************************
* extend_menu()
*
* Function called by pgeditor to let you extend the command menu.
* Items can be added to the "MODES" and "OTHER" menus. The modes_id_base
* and other_id_base parameters are required to offset your command event ids
* from those of pgeditor, and to let the pgeditor which commands are mode
* changes and which are unmoded commands. (Sorry if you think these offsets
* are a bit kludgy, the alternative would be to duplicate all the menu
* constructor modes within pgeditor so that the offsets could be hidden.)
*
* Items for the "MODES" menu may only be simple menu items (just a name and
* id). Items for the "OTHER" menu can be editable parameters or boolean
* toggles. Refer to menu.h to see how to build different types.
**********************************************************************/
void extend_menu( //handle for "MODES"
RADIO_MENU *modes_menu,
INT16 modes_id_base, //mode cmd ids offset
NON_RADIO_MENU *other_menu, //handle for "OTHER"
INT16 other_id_base //mode cmd ids offset
) {
/* Example new mode */
modes_menu->add_child (new RADIO_MENU_LEAF ("Recog Words",
modes_id_base + RECOG_WERDS));
modes_menu->add_child (new RADIO_MENU_LEAF ("Recog Blobs",
modes_id_base + RECOG_PSEUDO));
/* Example toggle
other_menu->add_child(
new TOGGLE_MENU_LEAF( "Action 2", //Display string
other_id_base + ACTION_2_CMD_EVENT, //offset command id
FALSE ) ); //Initial value
Example text parm (commented out)
other_menu->add_child(
new VARIABLE_MENU_LEAF( "Parm change", //Display string
other_id_base + ACTION_3_CMD_EVENT, //offset command id
"default value" ) ); //default value string
*/
}
/**********************************************************************
* extend_moded_commands()
*
* Function called by pgeditor when the user is in one of the extended modes
* defined by extend_menu() and the user has selected an area in the image
* window.
**********************************************************************/
void extend_moded_commands( //current mode
INT32 mode,
BOX selection_box //area selected
) {
char msg[MAX_CHARS + 1];
switch (mode) {
case RECOG_WERDS:
command_window->msg ("Recogging selected words");
/* This is how to apply a "word processor" function to each selected word */
process_selected_words(current_block_list,
selection_box,
&recog_interactive);
break;
case RECOG_PSEUDO:
command_window->msg ("Recogging selected blobs");
/* This is how to apply a "word processor" function to each selected word */
recog_pseudo_word(current_block_list, selection_box);
break;
default:
sprintf (msg, "Unexpected extended mode " INT32FORMAT, mode);
command_window->msg (msg);
}
}
/**********************************************************************
* extend_unmoded_commands()
*
* Function called by pgeditor when the user has selected one of the unmoded
* extended menu options.
**********************************************************************/
void extend_unmoded_commands( //current mode
INT32 cmd_event,
char *new_value //changed value if any
) {
char msg[MAX_CHARS + 1];
switch (cmd_event) {
case ACTION_2_CMD_EVENT: //a toggle event
if (new_value[0] == 'T')
//Display message
command_window->msg ("Extended Action 2 ON!!");
else
command_window->msg ("Extended Action 2 OFF!!");
break;
default:
sprintf (msg, "Unrecognised extended command " INT32FORMAT " (%s)",
cmd_event, new_value);
command_window->msg (msg);
break;
}
}
/*************************************************************************
* set_tess_tweak_vars()
* Set TESS vars from the tweek value - This is only really of use during search
* of the space of tess configs - othertimes the default values are set
*
*************************************************************************/
void set_tess_tweak_vars() {
if (tessedit_tweaking_tess_vars) {
garbage = tweak_garbage;
ok_word = tweak_ok_word;
good_word = tweak_good_word;
freq_word = tweak_freq_word;
ok_number = tweak_ok_number;
good_number = tweak_good_number;
non_word = tweak_non_word;
CertaintyPerChar = tweak_CertaintyPerChar;
NonDictCertainty = tweak_NonDictCertainty;
RejectCertaintyOffset = tweak_RejectCertaintyOffset;
GoodAdaptiveMatch = tweak_GoodAdaptiveMatch;
GreatAdaptiveMatch = tweak_GreatAdaptiveMatch;
AdaptProtoThresh = tweak_AdaptProtoThresh;
AdaptFeatureThresh = tweak_AdaptFeatureThresh;
min_outline_points = tweak_min_outline_points;
min_outline_area = tweak_min_outline_area;
good_split = tweak_good_split;
ok_split = tweak_ok_split;
}
// if (expiry_day * 24 * 60 * 60 < time(NULL))
// err_exit();
}

67
ccmain/tessedit.h Normal file
View File

@ -0,0 +1,67 @@
/**********************************************************************
* File: tessedit.h (Formerly tessedit.h)
* Description: Main program for merge of tess and editor.
* Author: Ray Smith
* Created: Tue Jan 07 15:21:46 GMT 1992
*
* (C) Copyright 1992, Hewlett-Packard Ltd.
** Licensed under the Apache License, Version 2.0 (the "License");
** you may not use this file except in compliance with the License.
** You may obtain a copy of the License at
** http://www.apache.org/licenses/LICENSE-2.0
** Unless required by applicable law or agreed to in writing, software
** distributed under the License is distributed on an "AS IS" BASIS,
** WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
** See the License for the specific language governing permissions and
** limitations under the License.
*
**********************************************************************/
#ifndef TESSEDIT_H
#define TESSEDIT_H
#include "tessclas.h"
#include "ocrclass.h"
#include "pgedit.h"
#include "notdll.h"
// Includes libtiff if HAVE_LIBTIFF is defined
#ifdef HAVE_LIBTIFF
#ifdef GOOGLE3
#include "third_party/tiff/tiffio.h"
#else
#include "tiffio.h"
#endif
#endif
//progress monitor
extern ETEXT_DESC *global_monitor;
int init_tesseract(const char *arg0,
const char *textbase,
const char *configfile,
int configc,
const char *const *configv);
void recognize_page(STRING& image_name);
void end_tesseract();
#ifdef _TIFFIO_
void read_tiff_image(TIFF* tif, IMAGE* image);
#endif
//handle for "MODES"
void extend_menu(RADIO_MENU *modes_menu,
INT16 modes_id_base, //mode cmd ids offset
NON_RADIO_MENU *other_menu, //handle for "OTHER"
INT16 other_id_base //mode cmd ids offset
);
//current mode
void extend_moded_commands(INT32 mode,
BOX selection_box //area selected
);
//current mode
void extend_unmoded_commands(INT32 cmd_event,
char *new_value //changed value if any
);
void set_tess_tweak_vars();
#endif

38
ccmain/tessembedded.h Normal file
View File

@ -0,0 +1,38 @@
/**********************************************************************
* File: tessembedded.h
* Description: Access to initialization functions in embedded environment
* Author: Marius Renn
* Created: Sun Oct 21
*
* (C) Copyright 1992, Hewlett-Packard Ltd.
** Licensed under the Apache License, Version 2.0 (the "License");
** you may not use this file except in compliance with the License.
** You may obtain a copy of the License at
** http://www.apache.org/licenses/LICENSE-2.0
** Unless required by applicable law or agreed to in writing, software
** distributed under the License is distributed on an "AS IS" BASIS,
** WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
** See the License for the specific language governing permissions and
** limitations under the License.
*
**********************************************************************/
#ifndef TESSEMBEDDED_H
#define TESSEMBEDDED_H
#include "ocrblock.h"
#include "varable.h"
#include "notdll.h"
int init_tessembedded(const char *arg0,
const char *textbase,
const char *configfile,
int configc,
const char *const *configv);
void tessembedded_read_file(STRING &name,
BLOCK_LIST *blocks);
void end_tessembedded();
#endif

311
ccmain/tesseractmain.cpp Normal file
View File

@ -0,0 +1,311 @@
/**********************************************************************
* File: tessedit.cpp (Formerly tessedit.c)
* Description: Main program for merge of tess and editor.
* Author: Ray Smith
* Created: Tue Jan 07 15:21:46 GMT 1992
*
* (C) Copyright 1992, Hewlett-Packard Ltd.
** Licensed under the Apache License, Version 2.0 (the "License");
** you may not use this file except in compliance with the License.
** You may obtain a copy of the License at
** http://www.apache.org/licenses/LICENSE-2.0
** Unless required by applicable law or agreed to in writing, software
** distributed under the License is distributed on an "AS IS" BASIS,
** WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
** See the License for the specific language governing permissions and
** limitations under the License.
*
**********************************************************************/
#include "mfcpch.h"
#include "applybox.h"
#include "control.h"
#include "tessvars.h"
#include "tessedit.h"
#include "baseapi.h"
#include "pageres.h"
#include "imgs.h"
#include "varabled.h"
#include "tprintf.h"
#include "tesseractmain.h"
#include "stderr.h"
#include "notdll.h"
#include "mainblk.h"
#include "globals.h"
#include "tfacep.h"
#include "callnet.h"
#define VARDIR "configs/" /*variables files */
//config under api
#define API_CONFIG "configs/api_config"
#define EXTERN
EXTERN BOOL_VAR (tessedit_read_image, TRUE, "Ensure the image is read");
EXTERN BOOL_VAR (tessedit_write_images, FALSE,
"Capture the image from the IPE");
EXTERN BOOL_VAR (tessedit_debug_to_screen, FALSE, "Dont use debug file");
extern INT16 XOFFSET;
extern INT16 YOFFSET;
extern int NO_BLOCK;
const ERRCODE USAGE = "Usage";
char szAppName[] = "Tessedit"; //app name
/**********************************************************************
* main()
*
**********************************************************************/
#ifndef GRAPHICS_DISABLED
int main(int argc, char **argv) {
STRING outfile; //output file
if (argc < 3) {
USAGE.error (argv[0], EXIT,
"%s imagename outputbase [configfile [[+|-]varfile]...]\n", argv[0]);
}
if (argc == 3)
TessBaseAPI::Init(argv[0], argv[1], NULL, false, 0, argv + 2);
else
TessBaseAPI::Init(argv[0], argv[1], argv[3], false, argc - 4, argv + 4);
tprintf ("Tesseract Open Source OCR Engine\n");
IMAGE image;
#ifdef _TIFFIO_
TIFF* tif = TIFFOpen(argv[1], "r");
if (tif) {
read_tiff_image(tif, &image);
TIFFClose(tif);
} else {
READFAILED.error (argv[0], EXIT, argv[1]);
}
#else
if (image.read_header(argv[1]) < 0)
READFAILED.error (argv[0], EXIT, argv[1]);
if (image.read(image.get_ysize ()) < 0) {
MEMORY_OUT.error(argv[0], EXIT, "Read of image %s",
argv[1]);
}
#endif
int bytes_per_line = check_legal_image_size(image.get_xsize(),
image.get_ysize(),
image.get_bpp());
char* text = TessBaseAPI::TesseractRect(image.get_buffer(), image.get_bpp()/8,
bytes_per_line, 0, 0,
image.get_xsize(), image.get_ysize());
outfile = argv[2];
outfile += ".txt";
FILE* fp = fopen(outfile.string(), "w");
if (fp != NULL) {
fwrite(text, 1, strlen(text), fp);
fclose(fp);
}
delete [] text;
TessBaseAPI::End();
return 0; //Normal exit
}
#else
int main(int argc, char **argv) {
UINT16 lang; //language
STRING pagefile; //input file
if (argc < 4) {
USAGE.error (argv[0], EXIT,
"%s imagename outputbase configfile [[+|-]varfile]...\n", argv[0]);
}
time_t t_start = time(NULL);
init_tessembedded (argv[0], argv[2], argv[3], argc - 4, argv + 4);
tprintf ("Tesseract Open Source OCR Engine (graphics disabled)\n");
if (tessedit_read_image) {
#ifdef _TIFFIO_
TIFF* tif = TIFFOpen(argv[1], "r");
if (tif) {
read_tiff_image(tif);
TIFFClose(tif);
} else
READFAILED.error (argv[0], EXIT, argv[1]);
#else
if (page_image.read_header (argv[1]) < 0)
READFAILED.error (argv[0], EXIT, argv[1]);
if (page_image.read (page_image.get_ysize ()) < 0) {
MEMORY_OUT.error (argv[0], EXIT, "Read of image %s",
argv[1]);
}
#endif
}
pagefile = argv[1];
BLOCK_LIST current_block_list;
tessembedded_read_file(pagefile, &current_block_list);
tprintf ("Done reading files.\n");
PAGE_RES page_res(&current_block_list);
recog_all_words(&page_res, NULL);
current_block_list.clear();
ResetAdaptiveClassifier();
time_t t_end = time(NULL);
double secs = difftime(t_end, t_start);
tprintf ("Done. Number of seconds: %d\n", (int)secs);
return 0; //Normal exit
}
#endif
int initialized = 0;
#ifdef __MSW32__
/**********************************************************************
* WinMain
*
* Main function for a windows program.
**********************************************************************/
int WINAPI WinMain( //main for windows //command line
HINSTANCE hInstance,
HINSTANCE hPrevInstance,
LPSTR lpszCmdLine,
int nCmdShow) {
WNDCLASS wc;
HWND hwnd;
MSG msg;
char **argv;
char *argsin[2];
int argc;
int exit_code;
wc.style = CS_NOCLOSE | CS_OWNDC;
wc.lpfnWndProc = (WNDPROC) WndProc;
wc.cbClsExtra = 0;
wc.cbWndExtra = 0;
wc.hInstance = hInstance;
wc.hIcon = NULL; //LoadIcon (NULL, IDI_APPLICATION);
wc.hCursor = NULL; //LoadCursor (NULL, IDC_ARROW);
wc.hbrBackground = (HBRUSH) (COLOR_WINDOW + 1);
wc.lpszMenuName = NULL;
wc.lpszClassName = szAppName;
RegisterClass(&wc);
hwnd = CreateWindow (szAppName, szAppName,
WS_OVERLAPPEDWINDOW | WS_DISABLED,
CW_USEDEFAULT, CW_USEDEFAULT, CW_USEDEFAULT,
CW_USEDEFAULT, HWND_DESKTOP, NULL, hInstance, NULL);
argsin[0] = strdup (szAppName);
argsin[1] = strdup (lpszCmdLine);
/*allocate memory for the args. There can never be more than half*/
/*the total number of characters in the arguments.*/
argv =
(char **) malloc (((strlen (argsin[0]) + strlen (argsin[1])) / 2 + 1) *
sizeof (char *));
/*now construct argv as it should be for C.*/
argc = parse_args (2, argsin, argv);
// ShowWindow (hwnd, nCmdShow);
// UpdateWindow (hwnd);
if (initialized) {
exit_code = main (argc, argv);
free (argsin[0]);
free (argsin[1]);
free(argv);
return exit_code;
}
while (GetMessage (&msg, NULL, 0, 0)) {
TranslateMessage(&msg);
DispatchMessage(&msg);
if (initialized) {
exit_code = main (argc, argv);
break;
}
else
exit_code = msg.wParam;
}
free (argsin[0]);
free (argsin[1]);
free(argv);
return exit_code;
}
/**********************************************************************
* WndProc
*
* Function to respond to messages.
**********************************************************************/
LONG WINAPI WndProc( //message handler
HWND hwnd, //window with message
UINT msg, //message typ
WPARAM wParam,
LPARAM lParam) {
HDC hdc;
if (msg == WM_CREATE) {
//
// Create a rendering context.
//
hdc = GetDC (hwnd);
ReleaseDC(hwnd, hdc);
initialized = 1;
return 0;
}
return DefWindowProc (hwnd, msg, wParam, lParam);
}
/**********************************************************************
* parse_args
*
* Turn a list of args into a new list of args with each separate
* whitespace spaced string being an arg.
**********************************************************************/
int
parse_args ( /*refine arg list */
int argc, /*no of input args */
char *argv[], /*input args */
char *arglist[] /*output args */
) {
int argcount; /*converted argc */
char *testchar; /*char in option string */
int arg; /*current argument */
argcount = 0; /*no of options */
for (arg = 0; arg < argc; arg++) {
testchar = argv[arg]; /*start of arg */
do {
while (*testchar
&& (*testchar == ' ' || *testchar == '\n'
|| *testchar == '\t'))
testchar++; /*skip white space */
if (*testchar) {
/*new arg */
arglist[argcount++] = testchar;
/*skip to white space */
for (testchar++; *testchar && *testchar != ' ' && *testchar != '\n' && *testchar != '\t'; testchar++);
if (*testchar)
*testchar++ = '\0'; /*turn to separate args */
}
}
while (*testchar);
}
return argcount; /*new number of args */
}
#endif

58
ccmain/tesseractmain.h Normal file
View File

@ -0,0 +1,58 @@
/**********************************************************************
* File: tessedit.h (Formerly tessedit.h)
* Description: Main program for merge of tess and editor.
* Author: Ray Smith
* Created: Tue Jan 07 15:21:46 GMT 1992
*
* (C) Copyright 1992, Hewlett-Packard Ltd.
** Licensed under the Apache License, Version 2.0 (the "License");
** you may not use this file except in compliance with the License.
** You may obtain a copy of the License at
** http://www.apache.org/licenses/LICENSE-2.0
** Unless required by applicable law or agreed to in writing, software
** distributed under the License is distributed on an "AS IS" BASIS,
** WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
** See the License for the specific language governing permissions and
** limitations under the License.
*
**********************************************************************/
#ifndef TESSERACTMAIN_H
#define TESSERACTMAIN_H
#include "varable.h"
#include "tessclas.h"
#include "notdll.h"
#include "tessembedded.h"
extern BOOL_VAR_H (tessedit_read_image, TRUE, "Ensure the image is read");
INT32 api_main( //run from api
const char *arg0, //program name
UINT16 lang //language
);
INT16 setup_info( //setup dummy engine info
UINT16 lang, //user language
const char *name, //of engine
const char *version //of engine
);
INT16 read_image( //read dummy image info
IMAGE *im_out //output image
);
#ifdef __MSW32__
int WINAPI WinMain( //main for windows //command line
HINSTANCE hInstance,
HINSTANCE hPrevInstance,
LPSTR lpszCmdLine,
int nCmdShow);
LONG WINAPI WndProc( //message handler
HWND hwnd, //window with message
UINT msg, //message typ
WPARAM wParam,
LPARAM lParam);
int parse_args ( /*refine arg list */
int argc, /*no of input args */
char *argv[], /*input args */
char *arglist[] /*output args */
);
#endif
#endif

38
ccmain/tessvars.cpp Normal file
View File

@ -0,0 +1,38 @@
/**********************************************************************
* File: tessvars.cpp (Formerly tessvars.c)
* Description: Variables and other globals for tessedit.
* Author: Ray Smith
* Created: Mon Apr 13 13:13:23 BST 1992
*
* (C) Copyright 1992, Hewlett-Packard Ltd.
** Licensed under the Apache License, Version 2.0 (the "License");
** you may not use this file except in compliance with the License.
** You may obtain a copy of the License at
** http://www.apache.org/licenses/LICENSE-2.0
** Unless required by applicable law or agreed to in writing, software
** distributed under the License is distributed on an "AS IS" BASIS,
** WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
** See the License for the specific language governing permissions and
** limitations under the License.
*
**********************************************************************/
#include "mfcpch.h"
#include "tessvars.h"
#define EXTERN
EXTERN INT_VAR (tessedit_adapt_kludge, 0,
"Use acceptable result or dangambigs");
EXTERN BOOL_VAR (interactive_mode, FALSE, "Run interactively?");
EXTERN BOOL_VAR (edit_variables, FALSE, "Variables Editor Window?");
// xiaofan EXTERN STRING_VAR(file_type,".bl","Filename extension");
EXTERN STRING_VAR (file_type, ".tif", "Filename extension");
INT_VAR (testedit_match_debug, 0, "Integer match debug ctrl");
EXTERN INT_VAR (tessedit_dangambigs_chop, FALSE,
"Use DangAmbigs to direct chop");
EXTERN INT_VAR (tessedit_dangambigs_assoc, FALSE,
"Use DangAmbigs to direct assoc");
EXTERN IMAGE page_image; //image of page
EXTERN FILE *debug_fp; //write debug stuff here

48
ccmain/tessvars.h Normal file
View File

@ -0,0 +1,48 @@
/**********************************************************************
* File: tessvars.h (Formerly tessvars.h)
* Description: Variables and other globals for tessedit.
* Author: Ray Smith
* Created: Mon Apr 13 13:13:23 BST 1992
*
* (C) Copyright 1992, Hewlett-Packard Ltd.
** Licensed under the Apache License, Version 2.0 (the "License");
** you may not use this file except in compliance with the License.
** You may obtain a copy of the License at
** http://www.apache.org/licenses/LICENSE-2.0
** Unless required by applicable law or agreed to in writing, software
** distributed under the License is distributed on an "AS IS" BASIS,
** WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
** See the License for the specific language governing permissions and
** limitations under the License.
*
**********************************************************************/
#ifndef TESSVARS_H
#define TESSVARS_H
#include "varable.h"
#include "img.h"
#include "tordmain.h"
#include "notdll.h"
extern INT_VAR_H (tessedit_adapt_kludge, 0,
"Use acceptable result or dangambigs");
extern BOOL_VAR_H (interactive_mode, FALSE, "Run interactively?");
extern BOOL_VAR_H (edit_variables, FALSE, "Variables Editor Window?");
//xiaofan extern STRING_VAR_H(file_type,".bl","Filename extension");
extern STRING_VAR_H (file_type, ".tif", "Filename extension");
extern INT_VAR_H (tessedit_truncate_wordchoice_log, 10,
"Max words to keep in list");
extern INT_VAR_H (testedit_match_debug, 0, "Integer match debug ctrl");
extern INT_VAR_H (tessedit_truncate_chopper, 1,
"Shorten chopper seam search");
extern INT_VAR_H (tessedit_fix_sideways_chops, 1,
"Fix sideways chop problem");
extern INT_VAR_H (tessedit_dangambigs_chop, FALSE,
"Use DangAmbigs to direct chop");
extern INT_VAR_H (tessedit_dangambigs_assoc, FALSE,
"Use DangAmbigs to direct assoc");
extern IMAGE page_image; //image of page
extern FILE *debug_fp; //write debug stuff here
#endif

121
ccmain/tfacep.h Normal file
View File

@ -0,0 +1,121 @@
/**********************************************************************
* File: tfacep.h (Formerly tfacep.h)
* Description: Declarations of C functions and C owned data.
* Author: Ray Smith
* Created: Mon Apr 27 12:51:28 BST 1992
*
* (C) Copyright 1992, Hewlett-Packard Ltd.
** Licensed under the Apache License, Version 2.0 (the "License");
** you may not use this file except in compliance with the License.
** You may obtain a copy of the License at
** http://www.apache.org/licenses/LICENSE-2.0
** Unless required by applicable law or agreed to in writing, software
** distributed under the License is distributed on an "AS IS" BASIS,
** WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
** See the License for the specific language governing permissions and
** limitations under the License.
*
**********************************************************************/
#ifndef TFACEP_H
#define TFACEP_H
#include "hosthplb.h"
#include "tessclas.h"
#include "tessarray.h"
#include "tstruct.h"
#include "notdll.h"
#include "choices.h"
#include "oldlist.h"
#include "hyphen.h"
#include "tface.h"
#include "permute.h"
#include "adaptmatch.h"
#include "blobclass.h"
#include "stopper.h"
#include "associate.h"
#include "chop.h"
#include "expandblob.h"
#include "tordvars.h"
#include "metrics.h"
#include "tface.h"
#include "badwords.h"
#include "structures.h"
#define BLOB_MATCHING_ON
typedef void (*TESS_TESTER) (TBLOB *, BOOL8, char *, INT32, LIST);
typedef LIST (*TESS_MATCHER) (TBLOB *, TBLOB *, TBLOB *, void *, TEXTROW *);
extern "C"
{
/*
int start_recog( //Real main in C
int argc,
char *argv[]);
void program_editup2( //afterforking part
int argc,
char** argv);
int end_recog( //Real main in C
int argc,
char *argv[]);
void set_interactive_pass();
void set_pass1();
void set_pass2();
//ARRAY cc_recog(TWERD*,TESS_CHOICE*,TESS_CHOICE*,TESS_TESTER,
// TESS_TESTER);*/
//void wo_learn_blob(TBLOB*,TEXTROW*,char*,INT32);
//LIST AdaptiveClassifier(TBLOB*,TBLOB*,TEXTROW*);
//void LearnBlob(TBLOB*,TEXTROW*,char*,INT32);
//TWERD *newword();
//TBLOB *newblob();
//TESSLINE *newoutline();
//EDGEPT *newedgept();
//void oldedgept(EDGEPT*);
//void destroy_nodes(void*,void (*)(void*));
//TESS_LIST *append_choice(TESS_LIST*,char*,double,double,char);
//void fix_quotes (char*);
//void record_certainty(double,int);
//int AcceptableResult(A_CHOICE*,A_CHOICE*);
//int AdaptableWord(TWERD*,const char*,const char*);
//void delete_word(TWERD*);
//void free_blob(TBLOB*);
//void add_document_word(A_CHOICE*);
//void AdaptToWord(TWERD*,TEXTROW*,const char*,const char*,const char*);
//void SaveBadWord(const char*,double);
//void free_choice(TESS_CHOICE*);
//TWERD *newword();
//TBLOB *newblob();
//void free_blob( //free a blob
// TBLOB *blob); //blob to free
//int dict_word( const char* );
//extern int tess_cn_matching;
//extern int tess_bn_matching;
//extern int last_word_on_line;
extern TEXTROW normalized_row;
//extern TESS_MATCHER blob_matchers[];
//extern FILE *rawfile;
//extern FILE *textfile;
//extern int character_count;
//extern int word_count;
//extern int enable_assoc;
//extern int chop_enable;
//extern int permute_only_top;
extern int display_ratings;
};
#if 0
#define strsave(s) \
((s) ? \
((char*) strcpy ((char*)alloc_string (strlen(s)+1), s)) : \
(NULL))
#endif
#define BOLD_ON "&dB(s3B"
#define BOLD_OFF "&d@(s0B"
#define UNDERLINE_ON "&dD"
#define UNDERLINE_OFF "&d@"
#endif

411
ccmain/tfacepp.cpp Normal file
View File

@ -0,0 +1,411 @@
/**********************************************************************
* File: tfacepp.cpp (Formerly tface++.c)
* Description: C++ side of the C/C++ Tess/Editor interface.
* Author: Ray Smith
* Created: Thu Apr 23 15:39:23 BST 1992
*
* (C) Copyright 1992, Hewlett-Packard Ltd.
** Licensed under the Apache License, Version 2.0 (the "License");
** you may not use this file except in compliance with the License.
** You may obtain a copy of the License at
** http://www.apache.org/licenses/LICENSE-2.0
** Unless required by applicable law or agreed to in writing, software
** distributed under the License is distributed on an "AS IS" BASIS,
** WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
** See the License for the specific language governing permissions and
** limitations under the License.
*
**********************************************************************/
#include "mfcpch.h"
#ifdef __UNIX__
#include <assert.h>
#endif
#include "errcode.h"
#include "tessarray.h"
//#include "fxtop.h"
#include "werd.h"
#include "tfacep.h"
#include "tstruct.h"
#include "tfacepp.h"
#include "tessvars.h"
#include "reject.h"
#define EXTERN
EXTERN BOOL_VAR (tessedit_override_permuter, TRUE, "According to dict_word");
static POLY_MATCHER tess_matcher;//current matcher
static POLY_TESTER tess_tester; //current tester
static POLY_TESTER tess_trainer; //current trainer
static DENORM *tess_denorm; //current denorm
static WERD *tess_word; //current word
#define MAX_UNDIVIDED_LENGTH 24
/**********************************************************************
* recog_word
*
* Convert the word to tess form and pass it to the tess segmenter.
* Convert the output back to editor form.
**********************************************************************/
WERD_CHOICE *recog_word( //recog one owrd
WERD *word, //word to do
DENORM *denorm, //de-normaliser
POLY_MATCHER matcher, //matcher function
POLY_TESTER tester, //tester function
POLY_TESTER trainer, //trainer function
BOOL8 testing, //true if answer driven
WERD_CHOICE *&raw_choice, //raw result //list of blob lists
BLOB_CHOICE_LIST_CLIST *blob_choices,
WERD *&outword //bln word output
) {
WERD_CHOICE *word_choice;
UINT8 perm_type;
UINT8 real_dict_perm_type;
if (word->blob_list ()->empty ()) {
word_choice = new WERD_CHOICE ("", 10.0f, -1.0f, TOP_CHOICE_PERM);
raw_choice = new WERD_CHOICE ("", 10.0f, -1.0f, TOP_CHOICE_PERM);
outword = word->poly_copy (denorm->row ()->x_height ());
}
else
word_choice = recog_word_recursive (word, denorm, matcher, tester,
trainer, testing, raw_choice,
blob_choices, outword);
if ((word_choice->string ().length () !=
outword->blob_list ()->length ()) ||
(word_choice->string ().length () != blob_choices->length ())) {
tprintf
("recog_word ASSERT FAIL String:\"%s\"; Strlen=%d; #Blobs=%d; #Choices=%d\n",
word_choice->string ().string (), word_choice->string ().length (),
outword->blob_list ()->length (), blob_choices->length ());
}
ASSERT_HOST (word_choice->string ().length () ==
outword->blob_list ()->length ());
ASSERT_HOST (word_choice->string ().length () == blob_choices->length ());
/* Copy any reject blobs into the outword */
outword->rej_blob_list ()->deep_copy (word->rej_blob_list ());
if (tessedit_override_permuter) {
/* Override the permuter type if a straight dictionary check disagrees. */
perm_type = word_choice->permuter ();
if ((perm_type != SYSTEM_DAWG_PERM) &&
(perm_type != FREQ_DAWG_PERM) && (perm_type != USER_DAWG_PERM)) {
real_dict_perm_type = dict_word (word_choice->string ().string ());
if (((real_dict_perm_type == SYSTEM_DAWG_PERM) ||
(real_dict_perm_type == FREQ_DAWG_PERM) ||
(real_dict_perm_type == USER_DAWG_PERM)) &&
(alpha_count (word_choice->string ().string ()) > 0))
word_choice->set_permuter (real_dict_perm_type);
//Use dict perm
}
if (tessedit_rejection_debug && perm_type != word_choice->permuter ()) {
tprintf ("Permuter Type Flipped from %d to %d\n",
perm_type, word_choice->permuter ());
}
}
assert ((word_choice == NULL) == (raw_choice == NULL));
return word_choice;
}
/**********************************************************************
* recog_word_recursive
*
* Convert the word to tess form and pass it to the tess segmenter.
* Convert the output back to editor form.
**********************************************************************/
WERD_CHOICE *recog_word_recursive( //recog one owrd
WERD *word, //word to do
DENORM *denorm, //de-normaliser
POLY_MATCHER matcher, //matcher function
POLY_TESTER tester, //tester function
POLY_TESTER trainer, //trainer function
BOOL8 testing, //true if answer driven
WERD_CHOICE *&raw_choice, //raw result //list of blob lists
BLOB_CHOICE_LIST_CLIST *blob_choices,
WERD *&outword //bln word output
) {
INT32 initial_blob_choice_len;
INT32 word_length; //no of blobs
STRING word_string; //converted from tess
ARRAY tess_ratings; //tess results
A_CHOICE tess_choice; //best word
A_CHOICE tess_raw; //raw result
TWERD *tessword; //tess format
BLOB_CHOICE_LIST *choice_list; //fake list
//iterator
BLOB_CHOICE_LIST_C_IT choice_it;
tess_matcher = matcher; //install matcher
tess_tester = testing ? tester : NULL;
tess_trainer = testing ? trainer : NULL;
tess_denorm = denorm;
tess_word = word;
// blob_matchers[1]=call_matcher;
if (word->blob_list ()->length () > MAX_UNDIVIDED_LENGTH) {
return split_and_recog_word (word, denorm, matcher, tester, trainer,
testing, raw_choice, blob_choices,
outword);
}
else {
if (word->flag (W_EOL))
last_word_on_line = TRUE;
else
last_word_on_line = FALSE;
initial_blob_choice_len = blob_choices->length ();
tessword = make_tess_word (word, NULL);
tess_ratings = cc_recog (tessword, &tess_choice, &tess_raw,
testing
&& tester != NULL /* ? call_tester : NULL */ ,
testing
&& trainer !=
NULL /* ? call_train_tester : NULL */ );
//convert word
outword = make_ed_word (tessword, word);
if (outword == NULL) {
outword = word->poly_copy (denorm->row ()->x_height ());
}
delete_word(tessword); //get rid of it
//no of blobs
word_length = outword->blob_list ()->length ();
//convert all ratings
convert_choice_lists(tess_ratings, blob_choices);
//copy string
word_string = tess_raw.string;
while (word_string.length () < word_length)
word_string += " "; //pad with blanks
raw_choice = new WERD_CHOICE (word_string.string (),
tess_raw.rating, tess_raw.certainty,
tess_raw.permuter);
word_string = tess_choice.string;
if (word_string.length () > word_length) {
tprintf ("recog_word: Discarded long string \"%s\"\n",
word_string.string ());
word_string = NULL; //should never happen
}
if (blob_choices->length () - initial_blob_choice_len != word_length) {
word_string = NULL; //force rejection
tprintf ("recog_word: Choices list len:%d; blob lists len:%d\n",
blob_choices->length (), word_length);
//list of lists
choice_it.set_to_list (blob_choices);
while (blob_choices->length () - initial_blob_choice_len <
word_length) {
//get fake one
choice_list = new BLOB_CHOICE_LIST;
//add to list
choice_it.add_to_end (choice_list);
tprintf ("recog_word: Added dummy choice list\n");
}
while (blob_choices->length () - initial_blob_choice_len >
word_length) {
choice_it.move_to_last ();
//should never happen
delete choice_it.extract ();
tprintf ("recog_word: Deleted choice list\n");
}
}
while (word_string.length () < word_length)
word_string += " "; //pad with blanks
assert (raw_choice != NULL);
if (tess_choice.string)
strfree(tess_choice.string);
if (tess_raw.string)
strfree(tess_raw.string);
return new WERD_CHOICE (word_string.string (),
tess_choice.rating, tess_choice.certainty,
tess_choice.permuter);
}
}
/**********************************************************************
* split_and_recog_word
*
* Convert the word to tess form and pass it to the tess segmenter.
* Convert the output back to editor form.
**********************************************************************/
WERD_CHOICE *split_and_recog_word( //recog one owrd
WERD *word, //word to do
DENORM *denorm, //de-normaliser
POLY_MATCHER matcher, //matcher function
POLY_TESTER tester, //tester function
POLY_TESTER trainer, //trainer function
BOOL8 testing, //true if answer driven
WERD_CHOICE *&raw_choice, //raw result //list of blob lists
BLOB_CHOICE_LIST_CLIST *blob_choices,
WERD *&outword //bln word output
) {
// INT32 outword1_len;
// INT32 outword2_len;
WERD *first_word; //poly copy of word
WERD *second_word; //fabricated word
WERD *outword2; //2nd output word
PBLOB *blob;
WERD_CHOICE *result; //resturn value
WERD_CHOICE *result2; //output of 2nd word
WERD_CHOICE *raw_choice2; //raw version of 2nd
float gap; //blob gap
float bestgap; //biggest gap
PBLOB_LIST new_blobs; //list of gathered blobs
PBLOB_IT blob_it;
//iterator
PBLOB_IT new_blob_it = &new_blobs;
first_word = word->poly_copy (denorm->row ()->x_height ());
blob_it.set_to_list (first_word->blob_list ());
bestgap = -MAX_INT32;
while (!blob_it.at_last ()) {
blob = blob_it.data ();
//gap to next
gap = blob_it.data_relative (1)->bounding_box ().left () - blob->bounding_box ().right ();
blob_it.forward ();
if (gap > bestgap) {
bestgap = gap; //find biggest
new_blob_it = blob_it; //save position
}
}
//take 2nd half
new_blobs.assign_to_sublist (&new_blob_it, &blob_it);
//make it a word
second_word = new WERD (&new_blobs, 1, NULL);
ASSERT_HOST (word->blob_list ()->length () ==
first_word->blob_list ()->length () +
second_word->blob_list ()->length ());
result = recog_word_recursive (first_word, denorm, matcher,
tester, trainer, testing, raw_choice,
blob_choices, outword);
delete first_word; //done that one
result2 = recog_word_recursive (second_word, denorm, matcher,
tester, trainer, testing, raw_choice2,
blob_choices, outword2);
delete second_word; //done that too
*result += *result2; //combine ratings
delete result2;
*raw_choice += *raw_choice2;
delete raw_choice2; //finished with it
// outword1_len= outword->blob_list()->length();
// outword2_len= outword2->blob_list()->length();
outword->join_on (outword2); //join words
delete outword2;
// if ( outword->blob_list()->length() != outword1_len + outword2_len )
// tprintf( "Split&Recog: part1len=%d; part2len=%d; combinedlen=%d\n",
// outword1_len, outword2_len, outword->blob_list()->length() );
// ASSERT_HOST( outword->blob_list()->length() == outword1_len + outword2_len );
return result;
}
/**********************************************************************
* call_matcher
*
* Called from Tess with a blob in tess form.
* Convert the blob to editor form.
* Call the matcher setup by the segmenter in tess_matcher.
* Convert the output choices back to tess form.
**********************************************************************/
LIST call_matcher( //call a matcher
TBLOB *ptblob, //previous
TBLOB *tessblob, //blob to match
TBLOB *ntblob, //next
void *, //unused parameter
TEXTROW * //always null anyway
) {
PBLOB *pblob; //converted blob
PBLOB *blob; //converted blob
PBLOB *nblob; //converted blob
LIST result; //tess output
BLOB_CHOICE *choice; //current choice
char string[2]; //char converted
BLOB_CHOICE_LIST ratings; //matcher result
BLOB_CHOICE_IT it; //iterator
blob = make_ed_blob (tessblob);//convert blob
if (blob == NULL)
return NULL; //can't do it
pblob = ptblob != NULL ? make_ed_blob (ptblob) : NULL;
nblob = ntblob != NULL ? make_ed_blob (ntblob) : NULL;
(*tess_matcher) (pblob, blob, nblob, tess_word, tess_denorm, ratings);
//match it
delete blob; //don't need that now
if (pblob != NULL)
delete pblob;
if (nblob != NULL)
delete nblob;
it.set_to_list (&ratings); //get list
result = NULL;
string[1] = '\0';
for (it.mark_cycle_pt (); !it.cycled_list (); it.forward ()) {
choice = it.data ();
string[0] = choice->char_class ();
result = append_choice (result, string,
choice->rating (), choice->certainty (),
choice->config ());
}
return result; //converted list
}
/**********************************************************************
* call_tester
*
* Called from Tess with a blob in tess form.
* Convert the blob to editor form.
* Call the tester setup by the segmenter in tess_tester.
**********************************************************************/
void call_tester( //call a tester
TBLOB *tessblob, //blob to test
BOOL8 correct_blob, //true if good
char *text, //source text
INT32 count, //chars in text
LIST result //output of matcher
) {
PBLOB *blob; //converted blob
BLOB_CHOICE_LIST ratings; //matcher result
blob = make_ed_blob (tessblob);//convert blob
if (blob == NULL)
return;
//make it right type
convert_choice_list(result, ratings);
if (tess_tester != NULL)
(*tess_tester) (blob, tess_denorm, correct_blob, text, count, &ratings);
delete blob; //don't need that now
}
/**********************************************************************
* call_train_tester
*
* Called from Tess with a blob in tess form.
* Convert the blob to editor form.
* Call the trainer setup by the segmenter in tess_trainer.
**********************************************************************/
void call_train_tester( //call a tester
TBLOB *tessblob, //blob to test
BOOL8 correct_blob, //true if good
char *text, //source text
INT32 count, //chars in text
LIST result //output of matcher
) {
PBLOB *blob; //converted blob
BLOB_CHOICE_LIST ratings; //matcher result
blob = make_ed_blob (tessblob);//convert blob
if (blob == NULL)
return;
//make it right type
convert_choice_list(result, ratings);
if (tess_trainer != NULL)
(*tess_trainer) (blob, tess_denorm, correct_blob, text, count, &ratings);
delete blob; //don't need that now
}

85
ccmain/tfacepp.h Normal file
View File

@ -0,0 +1,85 @@
/**********************************************************************
* File: tfacepp.h (Formerly tface++.h)
* Description: C++ side of the C/C++ Tess/Editor interface.
* Author: Ray Smith
* Created: Thu Apr 23 15:39:23 BST 1992
*
* (C) Copyright 1992, Hewlett-Packard Ltd.
** Licensed under the Apache License, Version 2.0 (the "License");
** you may not use this file except in compliance with the License.
** You may obtain a copy of the License at
** http://www.apache.org/licenses/LICENSE-2.0
** Unless required by applicable law or agreed to in writing, software
** distributed under the License is distributed on an "AS IS" BASIS,
** WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
** See the License for the specific language governing permissions and
** limitations under the License.
*
**********************************************************************/
#ifndef TFACEPP_H
#define TFACEPP_H
#include "varable.h"
#include "tstruct.h"
#include "ratngs.h"
#include "tessclas.h"
#include "notdll.h"
extern BOOL_VAR_H (tessedit_override_permuter, TRUE,
"According to dict_word");
WERD_CHOICE *recog_word( //recog one owrd
WERD *word, //word to do
DENORM *denorm, //de-normaliser
POLY_MATCHER matcher, //matcher function
POLY_TESTER tester, //tester function
POLY_TESTER trainer, //trainer function
BOOL8 testing, //true if answer driven
WERD_CHOICE *&raw_choice, //raw result //list of blob lists
BLOB_CHOICE_LIST_CLIST *blob_choices,
WERD *&outword //bln word output
);
//recog one owrd
WERD_CHOICE *recog_word_recursive(WERD *word, //word to do
DENORM *denorm, //de-normaliser
POLY_MATCHER matcher, //matcher function
POLY_TESTER tester, //tester function
POLY_TESTER trainer, //trainer function
BOOL8 testing, //true if answer driven
WERD_CHOICE *&raw_choice, //raw result //list of blob lists
BLOB_CHOICE_LIST_CLIST *blob_choices,
WERD *&outword //bln word output
);
//recog one owrd
WERD_CHOICE *split_and_recog_word(WERD *word, //word to do
DENORM *denorm, //de-normaliser
POLY_MATCHER matcher, //matcher function
POLY_TESTER tester, //tester function
POLY_TESTER trainer, //trainer function
BOOL8 testing, //true if answer driven
WERD_CHOICE *&raw_choice, //raw result //list of blob lists
BLOB_CHOICE_LIST_CLIST *blob_choices,
WERD *&outword //bln word output
);
LIST call_matcher( //call a matcher
TBLOB *ptblob, //previous
TBLOB *tessblob, //blob to match
TBLOB *ntblob, //next
void *, //unused parameter
TEXTROW * //always null anyway
);
void call_tester( //call a tester
TBLOB *tessblob, //blob to test
BOOL8 correct_blob, //true if good
char *text, //source text
INT32 count, //chars in text
LIST result //output of matcher
);
void call_train_tester( //call a tester
TBLOB *tessblob, //blob to test
BOOL8 correct_blob, //true if good
char *text, //source text
INT32 count, //chars in text
LIST result //output of matcher
);
#endif

511
ccmain/tstruct.cpp Normal file
View File

@ -0,0 +1,511 @@
/**********************************************************************
* File: tstruct.cpp (Formerly tstruct.c)
* Description: Code to manipulate the structures of the C++/C interface.
* Author: Ray Smith
* Created: Thu Apr 23 15:49:29 BST 1992
*
* (C) Copyright 1992, Hewlett-Packard Ltd.
** Licensed under the Apache License, Version 2.0 (the "License");
** you may not use this file except in compliance with the License.
** You may obtain a copy of the License at
** http://www.apache.org/licenses/LICENSE-2.0
** Unless required by applicable law or agreed to in writing, software
** distributed under the License is distributed on an "AS IS" BASIS,
** WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
** See the License for the specific language governing permissions and
** limitations under the License.
*
**********************************************************************/
#include "mfcpch.h"
#include "tfacep.h"
#include "tstruct.h"
//#include "structures.h"
static ERRCODE BADFRAGMENTS = "Couldn't find matching fragment ends";
ELISTIZE (FRAGMENT)
//extern /*"C"*/ oldoutline(TESSLINE*);
/**********************************************************************
* FRAGMENT::FRAGMENT
*
* Constructor for fragments.
**********************************************************************/
FRAGMENT::FRAGMENT ( //constructor
EDGEPT * head_pt, //start point
EDGEPT * tail_pt //end point
):head (head_pt->pos.x, head_pt->pos.y), tail (tail_pt->pos.x,
tail_pt->pos.y) {
headpt = head_pt; //save ptrs
tailpt = tail_pt;
}
/**********************************************************************
* make_ed_word
*
* Make an editor format word from the tess style word.
**********************************************************************/
WERD *make_ed_word( //construct word
TWERD *tessword, //word to convert
WERD *clone //clone this one
) {
WERD *word; //converted word
TBLOB *tblob; //current blob
PBLOB *blob; //new blob
PBLOB_LIST blobs; //list of blobs
PBLOB_IT blob_it = &blobs; //iterator
for (tblob = tessword->blobs; tblob != NULL; tblob = tblob->next) {
blob = make_ed_blob (tblob);
if (blob != NULL)
blob_it.add_after_then_move (blob);
}
if (!blobs.empty ())
word = new WERD (&blobs, clone);
else
word = NULL;
return word;
}
/**********************************************************************
* make_ed_blob
*
* Make an editor format blob from the tess style blob.
**********************************************************************/
PBLOB *make_ed_blob( //construct blob
TBLOB *tessblob //blob to convert
) {
TESSLINE *tessol; //tess outline
FRAGMENT_LIST fragments; //list of fragments
OUTLINE *outline; //current outline
OUTLINE_LIST out_list; //list of outlines
OUTLINE_IT out_it = &out_list; //iterator
for (tessol = tessblob->outlines; tessol != NULL; tessol = tessol->next) {
//stick in list
register_outline(tessol, &fragments);
}
while (!fragments.empty ()) {
outline = make_ed_outline (&fragments);
if (outline != NULL)
out_it.add_after_then_move (outline);
}
if (out_it.empty())
return NULL; //couldn't do it
return new PBLOB (&out_list); //turn to blob
}
/**********************************************************************
* make_ed_outline
*
* Make an editor format outline from the list of fragments.
**********************************************************************/
OUTLINE *make_ed_outline( //constructoutline
FRAGMENT_LIST *list //list of fragments
) {
FRAGMENT *fragment; //current fragment
EDGEPT *edgept; //current point
ICOORD headpos; //coords of head
ICOORD tailpos; //coords of tail
FCOORD pos; //coords of edgept
FCOORD vec; //empty
POLYPT *polypt; //current point
POLYPT_LIST poly_list; //list of point
POLYPT_IT poly_it = &poly_list;//iterator
FRAGMENT_IT fragment_it = list;//fragment
headpos = fragment_it.data ()->head;
do {
fragment = fragment_it.data ();
edgept = fragment->headpt; //start of segment
do {
pos = FCOORD (edgept->pos.x, edgept->pos.y);
vec = FCOORD (edgept->vec.x, edgept->vec.y);
polypt = new POLYPT (pos, vec);
//add to list
poly_it.add_after_then_move (polypt);
edgept = edgept->next;
}
while (edgept != fragment->tailpt);
tailpos = ICOORD (edgept->pos.x, edgept->pos.y);
//get rid of it
delete fragment_it.extract ();
if (tailpos != headpos) {
if (fragment_it.empty ()) {
// tprintf("Bad tailpos (%d,%d), Head=(%d,%d), no fragments.\n",
// fragment->head.x(),fragment->head.y(),
// headpos.x(),headpos.y());
return NULL;
}
fragment_it.forward ();
//find next segment
for (fragment_it.mark_cycle_pt (); !fragment_it.cycled_list () && fragment_it.data ()->head != tailpos;
fragment_it.forward ());
if (fragment_it.data ()->head != tailpos) {
// tprintf("Bad tailpos (%d,%d), Fragments are:\n",
// tailpos.x(),tailpos.y());
for (fragment_it.mark_cycle_pt ();
!fragment_it.cycled_list (); fragment_it.forward ()) {
fragment = fragment_it.extract ();
// tprintf("Head=(%d,%d), tail=(%d,%d)\n",
// fragment->head.x(),fragment->head.y(),
// fragment->tail.x(),fragment->tail.y());
delete fragment;
}
return NULL; //can't do it
// BADFRAGMENTS.error("make_ed_blob",ABORT,NULL);
}
}
}
while (tailpos != headpos);
return new OUTLINE (&poly_it); //turn to outline
}
/**********************************************************************
* register_outline
*
* Add the fragments in the given outline to the list
**********************************************************************/
void register_outline( //add fragments
TESSLINE *outline, //tess format
FRAGMENT_LIST *list //list to add to
) {
EDGEPT *startpt; //start of outline
EDGEPT *headpt; //start of fragment
EDGEPT *tailpt; //end of fragment
FRAGMENT *fragment; //new fragment
FRAGMENT_IT it = list; //iterator
startpt = outline->loop;
do {
startpt = startpt->next;
if (startpt == NULL)
return; //illegal!
}
while (startpt->flags[0] == 0 && startpt != outline->loop);
headpt = startpt;
do
startpt = startpt->next;
while (startpt->flags[0] != 0 && startpt != headpt);
if (startpt->flags[0] != 0)
return; //all hidden!
headpt = startpt;
do {
tailpt = headpt;
do
tailpt = tailpt->next;
while (tailpt->flags[0] == 0 && tailpt != startpt);
fragment = new FRAGMENT (headpt, tailpt);
it.add_after_then_move (fragment);
while (tailpt->flags[0] != 0)
tailpt = tailpt->next;
headpt = tailpt;
}
while (tailpt != startpt);
}
/**********************************************************************
* convert_choice_lists
*
* Convert the ARRAY of TESS_LIST of TESS_CHOICEs into a BLOB_CHOICE_LIST.
**********************************************************************/
void convert_choice_lists( //convert lists
ARRAY tessarray, //list from tess
BLOB_CHOICE_LIST_CLIST *ratings //list of results
) {
INT32 length; //elements in array
INT32 index; //index to array
LIST result; //tess output
//iterator
BLOB_CHOICE_LIST_C_IT it = ratings;
BLOB_CHOICE_LIST *choice; //created choice
if (tessarray != NULL) {
length = array_count (tessarray);
for (index = 0; index < length; index++) {
result = (LIST) array_value (tessarray, index);
//make one
choice = new BLOB_CHOICE_LIST;
//convert blob choices
convert_choice_list(result, *choice);
//add to super list
it.add_after_then_move (choice);
}
free_mem(tessarray); //lists already freed
}
}
/**********************************************************************
* convert_choice_list
*
* Convert the LIST of TESS_CHOICEs into a BLOB_CHOICE_LIST.
**********************************************************************/
void convert_choice_list( //convert lists
LIST list, //list from tess
BLOB_CHOICE_LIST &ratings //list of results
) {
LIST result; //tess output
BLOB_CHOICE_IT it = &ratings; //iterator
BLOB_CHOICE *choice; //created choice
A_CHOICE *tesschoice; //choice to convert
for (result = list; result != NULL; result = result->next) {
//traverse list
tesschoice = (A_CHOICE *) result->node;
//make one
choice = new BLOB_CHOICE (tesschoice->string[0], tesschoice->rating, tesschoice->certainty, tesschoice->config);
it.add_after_then_move (choice);
}
destroy_nodes (list, (void (*)(void *)) free_choice);
//get rid of it
}
/**********************************************************************
* make_tess_row
*
* Make a fake row structure to pass to the tesseract matchers.
**********************************************************************/
void make_tess_row( //make fake row
DENORM *denorm, //row info
TEXTROW *tessrow //output row
) {
tessrow->baseline.segments = 1;
tessrow->baseline.xstarts[0] = -32767;
tessrow->baseline.xstarts[1] = 32767;
tessrow->baseline.quads[0].a = 0;
tessrow->baseline.quads[0].b = 0;
tessrow->baseline.quads[0].c = bln_baseline_offset;
tessrow->xheight.segments = 1;
tessrow->xheight.xstarts[0] = -32767;
tessrow->xheight.xstarts[1] = 32767;
tessrow->xheight.quads[0].a = 0;
tessrow->xheight.quads[0].b = 0;
tessrow->xheight.quads[0].c = bln_x_height + bln_baseline_offset;
tessrow->lineheight = bln_x_height;
tessrow->ascrise = denorm->row ()->ascenders () * denorm->scale ();
tessrow->descdrop = denorm->row ()->descenders () * denorm->scale ();
}
/**********************************************************************
* make_tess_word
*
* Convert the word to Tess format.
**********************************************************************/
TWERD *make_tess_word( //convert owrd
WERD *word, //word to do
TEXTROW *row //fake row
) {
TWERD *tessword; //tess format
tessword = newword (); //use old allocator
tessword->row = row; //give them something
//copy string
tessword->correct = strsave (word->text ());
tessword->guess = NULL;
tessword->blobs = make_tess_blobs (word->blob_list ());
tessword->blanks = 1;
tessword->blobcount = word->blob_list ()->length ();
tessword->next = NULL;
return tessword;
}
/**********************************************************************
* make_tess_blobs
*
* Make Tess style blobs from a list of BLOBs.
**********************************************************************/
TBLOB *make_tess_blobs( //make tess blobs
PBLOB_LIST *bloblist //list to convert
) {
PBLOB_IT it = bloblist; //iterator
PBLOB *blob; //current blob
TBLOB *head; //output list
TBLOB *tail; //end of list
TBLOB *tessblob;
head = NULL;
tail = NULL;
for (it.mark_cycle_pt (); !it.cycled_list (); it.forward ()) {
blob = it.data ();
tessblob = make_tess_blob (blob, TRUE);
if (head)
tail->next = tessblob;
else
head = tessblob;
tail = tessblob;
}
return head;
}
/**********************************************************************
* make_tess_blob
*
* Make a single Tess style blob
**********************************************************************/
TBLOB *make_tess_blob( //make tess blob
PBLOB *blob, //blob to convert
BOOL8 flatten //flatten outline structure
) {
INT32 index;
TBLOB *tessblob;
tessblob = newblob ();
tessblob->outlines = (struct olinestruct *)
make_tess_outlines (blob->out_list (), flatten);
for (index = 0; index < TBLOBFLAGS; index++)
tessblob->flags[index] = 0; //!!
tessblob->correct = 0;
tessblob->guess = 0;
for (index = 0; index < MAX_WO_CLASSES; index++) {
tessblob->classes[index] = 0;
tessblob->values[index] = 0;
}
tessblob->next = NULL;
return tessblob;
}
/**********************************************************************
* make_tess_outlines
*
* Make Tess style outlines from a list of OUTLINEs.
**********************************************************************/
TESSLINE *make_tess_outlines( //make tess outlines
OUTLINE_LIST *outlinelist, //list to convert
BOOL8 flatten //flatten outline structure
) {
OUTLINE_IT it = outlinelist; //iterator
OUTLINE *outline; //current outline
TESSLINE *head; //output list
TESSLINE *tail; //end of list
TESSLINE *tessoutline;
head = NULL;
tail = NULL;
for (it.mark_cycle_pt (); !it.cycled_list (); it.forward ()) {
outline = it.data ();
tessoutline = newoutline ();
tessoutline->compactloop = NULL;
tessoutline->loop = make_tess_edgepts (outline->polypts (),
tessoutline->topleft,
tessoutline->botright);
if (tessoutline->loop == NULL) {
oldoutline(tessoutline);
continue;
}
tessoutline->start = tessoutline->loop->pos;
tessoutline->node = NULL;
tessoutline->next = NULL;
tessoutline->child = NULL;
if (!outline->child ()->empty ()) {
if (flatten)
tessoutline->next = (struct olinestruct *)
make_tess_outlines (outline->child (), flatten);
else {
tessoutline->next = NULL;
tessoutline->child = (struct olinestruct *)
make_tess_outlines (outline->child (), flatten);
}
}
else
tessoutline->next = NULL;
if (head)
tail->next = tessoutline;
else
head = tessoutline;
while (tessoutline->next != NULL)
tessoutline = tessoutline->next;
tail = tessoutline;
}
return head;
}
/**********************************************************************
* make_tess_edgepts
*
* Make Tess style edgepts from a list of POLYPTs.
**********************************************************************/
EDGEPT *make_tess_edgepts( //make tess edgepts
POLYPT_LIST *edgeptlist, //list to convert
TPOINT &tl, //bounding box
TPOINT &br) {
INT32 index;
POLYPT_IT it = edgeptlist; //iterator
POLYPT *edgept; //current edgept
EDGEPT *head; //output list
EDGEPT *tail; //end of list
EDGEPT *tessedgept;
head = NULL;
tail = NULL;
tl.x = MAX_INT16;
tl.y = -MAX_INT16;
br.x = -MAX_INT16;
br.y = MAX_INT16;
for (it.mark_cycle_pt (); !it.cycled_list ();) {
edgept = it.data ();
tessedgept = newedgept ();
tessedgept->pos.x = (INT16) edgept->pos.x ();
tessedgept->pos.y = (INT16) edgept->pos.y ();
if (tessedgept->pos.x < tl.x)
tl.x = tessedgept->pos.x;
if (tessedgept->pos.x > br.x)
br.x = tessedgept->pos.x;
if (tessedgept->pos.y > tl.y)
tl.y = tessedgept->pos.y;
if (tessedgept->pos.y < br.y)
br.y = tessedgept->pos.y;
if (head != NULL && tessedgept->pos.x == tail->pos.x
&& tessedgept->pos.y == tail->pos.y) {
oldedgept(tessedgept);
}
else {
for (index = 0; index < EDGEPTFLAGS; index++)
tessedgept->flags[index] = 0;
if (head != NULL) {
tail->vec.x = tessedgept->pos.x - tail->pos.x;
tail->vec.y = tessedgept->pos.y - tail->pos.y;
tessedgept->prev = tail;
}
tessedgept->next = head;
if (head)
tail->next = tessedgept;
else
head = tessedgept;
tail = tessedgept;
}
it.forward ();
}
head->prev = tail;
tail->vec.x = head->pos.x - tail->pos.x;
tail->vec.y = head->pos.y - tail->pos.y;
if (head == tail) {
oldedgept(head);
return NULL; //empty
}
return head;
}

108
ccmain/tstruct.h Normal file
View File

@ -0,0 +1,108 @@
/**********************************************************************
* File: tstruct.h (Formerly tstruct.h)
* Description: Code to manipulate the structures of the C++/C interface.
* Author: Ray Smith
* Created: Thu Apr 23 15:49:29 BST 1992
*
* (C) Copyright 1992, Hewlett-Packard Ltd.
** Licensed under the Apache License, Version 2.0 (the "License");
** you may not use this file except in compliance with the License.
** You may obtain a copy of the License at
** http://www.apache.org/licenses/LICENSE-2.0
** Unless required by applicable law or agreed to in writing, software
** distributed under the License is distributed on an "AS IS" BASIS,
** WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
** See the License for the specific language governing permissions and
** limitations under the License.
*
**********************************************************************/
#ifndef TSTRUCT_H
#define TSTRUCT_H
#include "tessarray.h"
#include "werd.h"
#include "tessclas.h"
#include "ratngs.h"
#include "notdll.h"
#include "oldlist.h"
/*
struct TESS_LIST
{
TESS_LIST *node; //data
TESS_LIST *next; //next in list
};
struct TESS_CHOICE
{
float rating; //scaled
float certainty; //absolute
char permuter; //which permuter code
INT8 config; //which config
char* string; //really can!
};
*/
class FRAGMENT:public ELIST_LINK
{
public:
FRAGMENT() { //constructor
}
FRAGMENT(EDGEPT *head_pt, //start
EDGEPT *tail_pt); //end
ICOORD head; //coords of start
ICOORD tail; //coords of end
EDGEPT *headpt; //start point
EDGEPT *tailpt; //end point
NEWDELETE2 (FRAGMENT)
};
ELISTIZEH (FRAGMENT)
WERD *make_ed_word( //construct word
TWERD *tessword, //word to convert
WERD *clone //clone this one
);
PBLOB *make_ed_blob( //construct blob
TBLOB *tessblob //blob to convert
);
OUTLINE *make_ed_outline( //constructoutline
FRAGMENT_LIST *list //list of fragments
);
void register_outline( //add fragments
TESSLINE *outline, //tess format
FRAGMENT_LIST *list //list to add to
);
void convert_choice_lists( //convert lists
ARRAY tessarray, //list from tess
BLOB_CHOICE_LIST_CLIST *ratings //list of results
);
void convert_choice_list( //convert lists
LIST list, //list from tess
BLOB_CHOICE_LIST &ratings //list of results
);
void make_tess_row( //make fake row
DENORM *denorm, //row info
TEXTROW *tessrow //output row
);
TWERD *make_tess_word( //convert owrd
WERD *word, //word to do
TEXTROW *row //fake row
);
TBLOB *make_tess_blobs( //make tess blobs
PBLOB_LIST *bloblist //list to convert
);
TBLOB *make_tess_blob( //make tess blob
PBLOB *blob, //blob to convert
BOOL8 flatten //flatten outline structure
);
TESSLINE *make_tess_outlines( //make tess outlines
OUTLINE_LIST *outlinelist, //list to convert
BOOL8 flatten //flatten outline structure
);
EDGEPT *make_tess_edgepts( //make tess edgepts
POLYPT_LIST *edgeptlist, //list to convert
TPOINT &tl, //bounding box
TPOINT &br);
#endif

193
ccmain/werdit.cpp Normal file
View File

@ -0,0 +1,193 @@
/**********************************************************************
* File: werdit.cpp (Formerly wordit.c)
* Description: An iterator for passing over all the words in a document.
* Author: Ray Smith
* Created: Mon Apr 27 08:51:22 BST 1992
*
* (C) Copyright 1992, Hewlett-Packard Ltd.
** Licensed under the Apache License, Version 2.0 (the "License");
** you may not use this file except in compliance with the License.
** You may obtain a copy of the License at
** http://www.apache.org/licenses/LICENSE-2.0
** Unless required by applicable law or agreed to in writing, software
** distributed under the License is distributed on an "AS IS" BASIS,
** WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
** See the License for the specific language governing permissions and
** limitations under the License.
*
**********************************************************************/
#include "mfcpch.h"
#include "werdit.h"
#define EXTERN
//EXTERN BOOL_VAR(wordit_linearc,FALSE,"Pass poly of linearc to Tess");
/**********************************************************************
* WERDIT::start_page
*
* Get ready to iterate over the page by setting the iterators.
**********************************************************************/
void WERDIT::start_page( //set iterators
BLOCK_LIST *block_list //blocks to check
) {
block_it.set_to_list (block_list);
block_it.mark_cycle_pt ();
do {
while (block_it.data ()->row_list ()->empty ()
&& !block_it.cycled_list ()) {
block_it.forward ();
}
if (!block_it.data ()->row_list ()->empty ()) {
row_it.set_to_list (block_it.data ()->row_list ());
row_it.mark_cycle_pt ();
while (row_it.data ()->word_list ()->empty ()
&& !row_it.cycled_list ()) {
row_it.forward ();
}
if (!row_it.data ()->word_list ()->empty ()) {
word_it.set_to_list (row_it.data ()->word_list ());
word_it.mark_cycle_pt ();
}
}
}
while (!block_it.cycled_list () && row_it.data ()->word_list ()->empty ());
}
/**********************************************************************
* WERDIT::forward
*
* Give the next word on the page, or NULL if none left.
* This code assumes all rows to be non-empty, but blocks are allowed
* to be empty as eventually we will have non-text blocks.
* The output is always a copy and needs to be deleted by somebody.
**********************************************************************/
WERD *WERDIT::forward() { //use iterators
WERD *word; //actual word
// WERD *larc_word; //linearc copy
WERD *result; //output word
ROW *row; //row of word
if (word_it.cycled_list ()) {
return NULL; //finished page
}
else {
word = word_it.data ();
row = row_it.data ();
word_it.forward ();
if (word_it.cycled_list ()) {
row_it.forward (); //finished row
if (row_it.cycled_list ()) {
do {
block_it.forward (); //finished block
if (!block_it.cycled_list ()) {
row_it.set_to_list (block_it.data ()->row_list ());
row_it.mark_cycle_pt ();
}
}
//find non-empty block
while (!block_it.cycled_list ()
&& row_it.cycled_list ());
}
if (!row_it.cycled_list ()) {
word_it.set_to_list (row_it.data ()->word_list ());
word_it.mark_cycle_pt ();
}
}
// if (wordit_linearc && !word->flag(W_POLYGON))
// {
// larc_word=word->larc_copy(row->x_height());
// result=larc_word->poly_copy(row->x_height());
// delete larc_word;
// }
// else
result = word->poly_copy (row->x_height ());
return result;
}
}
/**********************************************************************
* make_pseudo_word
*
* Make all the blobs inside a selection into a single word.
* The word is always a copy and needs to be deleted.
**********************************************************************/
WERD *make_pseudo_word( //make fake word
BLOCK_LIST *block_list, //blocks to check //block of selection
BOX &selection_box,
BLOCK *&pseudo_block,
ROW *&pseudo_row //row of selection
) {
BLOCK_IT block_it(block_list);
BLOCK *block;
ROW_IT row_it;
ROW *row;
WERD_IT word_it;
WERD *word;
PBLOB_IT blob_it;
PBLOB *blob;
PBLOB_LIST new_blobs; //list of gathered blobs
//iterator
PBLOB_IT new_blob_it = &new_blobs;
WERD *pseudo_word; //fabricated word
WERD *poly_word; //poly copy of word
// WERD *larc_word; //linearc copy
for (block_it.mark_cycle_pt ();
!block_it.cycled_list (); block_it.forward ()) {
block = block_it.data ();
if (block->bounding_box ().overlap (selection_box)) {
pseudo_block = block;
row_it.set_to_list (block->row_list ());
for (row_it.mark_cycle_pt ();
!row_it.cycled_list (); row_it.forward ()) {
row = row_it.data ();
if (row->bounding_box ().overlap (selection_box)) {
word_it.set_to_list (row->word_list ());
for (word_it.mark_cycle_pt ();
!word_it.cycled_list (); word_it.forward ()) {
word = word_it.data ();
if (word->bounding_box ().overlap (selection_box)) {
// if (wordit_linearc && !word->flag(W_POLYGON))
// {
// larc_word=word->larc_copy(row->x_height());
// poly_word=larc_word->poly_copy(row->x_height());
// delete larc_word;
// }
// else
poly_word = word->poly_copy (row->x_height ());
blob_it.set_to_list (poly_word->blob_list ());
for (blob_it.mark_cycle_pt ();
!blob_it.cycled_list (); blob_it.forward ()) {
blob = blob_it.data ();
if (blob->bounding_box ().
overlap (selection_box)) {
new_blob_it.add_after_then_move (blob_it.
extract
());
//steal off list
pseudo_row = row;
}
}
delete poly_word; //get rid of it
}
}
}
}
}
}
if (!new_blobs.empty ()) {
//make new word
pseudo_word = new WERD (&new_blobs, 1, NULL);
}
else
pseudo_word = NULL;
return pseudo_word;
}

67
ccmain/werdit.h Normal file
View File

@ -0,0 +1,67 @@
/**********************************************************************
* File: wordit.c
* Description: An iterator for passing over all the words in a document.
* Author: Ray Smith
* Created: Mon Apr 27 08:51:22 BST 1992
*
* (C) Copyright 1992, Hewlett-Packard Ltd.
** Licensed under the Apache License, Version 2.0 (the "License");
** you may not use this file except in compliance with the License.
** You may obtain a copy of the License at
** http://www.apache.org/licenses/LICENSE-2.0
** Unless required by applicable law or agreed to in writing, software
** distributed under the License is distributed on an "AS IS" BASIS,
** WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
** See the License for the specific language governing permissions and
** limitations under the License.
*
**********************************************************************/
#ifndef WERDIT_H
#define WERDIT_H
#include "varable.h"
#include "ocrblock.h"
#include "notdll.h"
class WERDIT
{
public:
WERDIT() {
} //empty contructor
WERDIT( //empty contructor
BLOCK_LIST *blocklist) { //blocks on page
start_page(blocklist); //ready to scan
}
void start_page( //get ready
BLOCK_LIST *blocklist); //blocks on page
WERD *forward(); //get next word
WERD *next_word() { //get next word
return word_it.data (); //already at next
}
ROW *row() { //get current row
return word_it.cycled_list ()? NULL : row_it.data ();
}
ROW *next_row() { //get next row
return row_it.data_relative (1);
}
BLOCK *block() { //get current block
return block_it.data ();
}
private:
BLOCK_IT block_it; //iterators
ROW_IT row_it;
WERD_IT word_it;
};
//extern BOOL_VAR_H(wordit_linearc,FALSE,"Pass poly of linearc to Tess");
WERD *make_pseudo_word( //make fake word
BLOCK_LIST *block_list, //blocks to check //block of selection
BOX &selection_box,
BLOCK *&pseudo_block,
ROW *&pseudo_row //row of selection
);
#endif

25
ccstruct/Makefile.am Normal file
View File

@ -0,0 +1,25 @@
SUBDIRS =
AM_CPPFLAGS = \
-I$(top_srcdir)/ccutil -I$(top_srcdir)/cutil \
-I$(top_srcdir)/image -I$(top_srcdir)/viewer
EXTRA_DIST = \
blckerr.h blobbox.h blobs.h blread.h coutln.h crakedge.h \
genblob.h hpddef.h hpdsizes.h ipoints.h labls.h linlsq.h \
lmedsq.h mod128.h normalis.h ocrblock.h ocrrow.h pageblk.h \
pageres.h pdblock.h pdclass.h points.h polyaprx.h polyblk.h \
polyblob.h polyvert.h poutline.h quadlsq.h quadratc.h \
quspline.h ratngs.h rect.h rejctmap.h rwpoly.h statistc.h \
stepblob.h txtregn.h vecfuncs.h werd.h
noinst_LIBRARIES = libtesseract_ccstruct.a
libtesseract_ccstruct_a_SOURCES = \
blobbox.cpp blobs.cpp blread.cpp callcpp.cpp \
coutln.cpp genblob.cpp labls.cpp linlsq.cpp \
lmedsq.cpp mod128.cpp normalis.cpp ocrblock.cpp \
ocrrow.cpp pageblk.cpp pageres.cpp pdblock.cpp \
points.cpp polyaprx.cpp polyblk.cpp polyblob.cpp \
polyvert.cpp poutline.cpp quadlsq.cpp quadratc.cpp \
quspline.cpp ratngs.cpp rect.cpp rejctmap.cpp \
rwpoly.cpp statistc.cpp stepblob.cpp txtregn.cpp \
vecfuncs.cpp werd.cpp

587
ccstruct/Makefile.in Normal file
View File

@ -0,0 +1,587 @@
# Makefile.in generated by automake 1.9.6 from Makefile.am.
# @configure_input@
# Copyright (C) 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002,
# 2003, 2004, 2005 Free Software Foundation, Inc.
# This Makefile.in is free software; the Free Software Foundation
# gives unlimited permission to copy and/or distribute it,
# with or without modifications, as long as this notice is preserved.
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY, to the extent permitted by law; without
# even the implied warranty of MERCHANTABILITY or FITNESS FOR A
# PARTICULAR PURPOSE.
@SET_MAKE@
srcdir = @srcdir@
top_srcdir = @top_srcdir@
VPATH = @srcdir@
pkgdatadir = $(datadir)/@PACKAGE@
pkglibdir = $(libdir)/@PACKAGE@
pkgincludedir = $(includedir)/@PACKAGE@
top_builddir = ..
am__cd = CDPATH="$${ZSH_VERSION+.}$(PATH_SEPARATOR)" && cd
INSTALL = @INSTALL@
install_sh_DATA = $(install_sh) -c -m 644
install_sh_PROGRAM = $(install_sh) -c
install_sh_SCRIPT = $(install_sh) -c
INSTALL_HEADER = $(INSTALL_DATA)
transform = $(program_transform_name)
NORMAL_INSTALL = :
PRE_INSTALL = :
POST_INSTALL = :
NORMAL_UNINSTALL = :
PRE_UNINSTALL = :
POST_UNINSTALL = :
build_triplet = @build@
host_triplet = @host@
subdir = ccstruct
DIST_COMMON = $(srcdir)/Makefile.am $(srcdir)/Makefile.in
ACLOCAL_M4 = $(top_srcdir)/aclocal.m4
am__aclocal_m4_deps = $(top_srcdir)/acinclude.m4 \
$(top_srcdir)/config/ac_define_versionlevel.m4 \
$(top_srcdir)/config/acinclude_custom.m4 \
$(top_srcdir)/configure.ac
am__configure_deps = $(am__aclocal_m4_deps) $(CONFIGURE_DEPENDENCIES) \
$(ACLOCAL_M4)
mkinstalldirs = $(SHELL) $(top_srcdir)/config/mkinstalldirs
CONFIG_HEADER = $(top_builddir)/config_auto.h
CONFIG_CLEAN_FILES =
LIBRARIES = $(noinst_LIBRARIES)
AR = ar
ARFLAGS = cru
libtesseract_ccstruct_a_AR = $(AR) $(ARFLAGS)
libtesseract_ccstruct_a_LIBADD =
am_libtesseract_ccstruct_a_OBJECTS = blobbox.$(OBJEXT) blobs.$(OBJEXT) \
blread.$(OBJEXT) callcpp.$(OBJEXT) coutln.$(OBJEXT) \
genblob.$(OBJEXT) labls.$(OBJEXT) linlsq.$(OBJEXT) \
lmedsq.$(OBJEXT) mod128.$(OBJEXT) normalis.$(OBJEXT) \
ocrblock.$(OBJEXT) ocrrow.$(OBJEXT) pageblk.$(OBJEXT) \
pageres.$(OBJEXT) pdblock.$(OBJEXT) points.$(OBJEXT) \
polyaprx.$(OBJEXT) polyblk.$(OBJEXT) polyblob.$(OBJEXT) \
polyvert.$(OBJEXT) poutline.$(OBJEXT) quadlsq.$(OBJEXT) \
quadratc.$(OBJEXT) quspline.$(OBJEXT) ratngs.$(OBJEXT) \
rect.$(OBJEXT) rejctmap.$(OBJEXT) rwpoly.$(OBJEXT) \
statistc.$(OBJEXT) stepblob.$(OBJEXT) txtregn.$(OBJEXT) \
vecfuncs.$(OBJEXT) werd.$(OBJEXT)
libtesseract_ccstruct_a_OBJECTS = \
$(am_libtesseract_ccstruct_a_OBJECTS)
DEFAULT_INCLUDES = -I. -I$(srcdir) -I$(top_builddir)
depcomp = $(SHELL) $(top_srcdir)/config/depcomp
am__depfiles_maybe = depfiles
CXXCOMPILE = $(CXX) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) \
$(AM_CPPFLAGS) $(CPPFLAGS) $(AM_CXXFLAGS) $(CXXFLAGS)
CXXLD = $(CXX)
CXXLINK = $(CXXLD) $(AM_CXXFLAGS) $(CXXFLAGS) $(AM_LDFLAGS) $(LDFLAGS) \
-o $@
SOURCES = $(libtesseract_ccstruct_a_SOURCES)
DIST_SOURCES = $(libtesseract_ccstruct_a_SOURCES)
RECURSIVE_TARGETS = all-recursive check-recursive dvi-recursive \
html-recursive info-recursive install-data-recursive \
install-exec-recursive install-info-recursive \
install-recursive installcheck-recursive installdirs-recursive \
pdf-recursive ps-recursive uninstall-info-recursive \
uninstall-recursive
ETAGS = etags
CTAGS = ctags
DIST_SUBDIRS = $(SUBDIRS)
DISTFILES = $(DIST_COMMON) $(DIST_SOURCES) $(TEXINFOS) $(EXTRA_DIST)
ACLOCAL = @ACLOCAL@
AMDEP_FALSE = @AMDEP_FALSE@
AMDEP_TRUE = @AMDEP_TRUE@
AMTAR = @AMTAR@
AUTOCONF = @AUTOCONF@
AUTOHEADER = @AUTOHEADER@
AUTOMAKE = @AUTOMAKE@
AWK = @AWK@
CC = @CC@
CCDEPMODE = @CCDEPMODE@
CFLAGS = @CFLAGS@
CPPFLAGS = @CPPFLAGS@
CXX = @CXX@
CXXCPP = @CXXCPP@
CXXDEPMODE = @CXXDEPMODE@
CXXFLAGS = @CXXFLAGS@
CXXRPOFLAGS = @CXXRPOFLAGS@
CYGPATH_W = @CYGPATH_W@
DEFS = @DEFS@
DEPDIR = @DEPDIR@
ECHO_C = @ECHO_C@
ECHO_N = @ECHO_N@
ECHO_T = @ECHO_T@
EGREP = @EGREP@
EXEEXT = @EXEEXT@
GNUWIN32_DIR = @GNUWIN32_DIR@
HAVE_GNUWIN32_FALSE = @HAVE_GNUWIN32_FALSE@
HAVE_GNUWIN32_TRUE = @HAVE_GNUWIN32_TRUE@
HAVE_LIBTIFF_FALSE = @HAVE_LIBTIFF_FALSE@
HAVE_LIBTIFF_TRUE = @HAVE_LIBTIFF_TRUE@
INSTALL_DATA = @INSTALL_DATA@
INSTALL_PROGRAM = @INSTALL_PROGRAM@
INSTALL_SCRIPT = @INSTALL_SCRIPT@
INSTALL_STRIP_PROGRAM = @INSTALL_STRIP_PROGRAM@
LDFLAGS = @LDFLAGS@
LIBOBJS = @LIBOBJS@
LIBS = @LIBS@
LIBTIFF_CFLAGS = @LIBTIFF_CFLAGS@
LIBTIFF_LIBS = @LIBTIFF_LIBS@
LTLIBOBJS = @LTLIBOBJS@
MAINT = @MAINT@
MAINTAINER_MODE_FALSE = @MAINTAINER_MODE_FALSE@
MAINTAINER_MODE_TRUE = @MAINTAINER_MODE_TRUE@
MAKEINFO = @MAKEINFO@
OBJEXT = @OBJEXT@
OPTS = @OPTS@
PACKAGE = @PACKAGE@
PACKAGE_BUGREPORT = @PACKAGE_BUGREPORT@
PACKAGE_DATE = @PACKAGE_DATE@
PACKAGE_NAME = @PACKAGE_NAME@
PACKAGE_STRING = @PACKAGE_STRING@
PACKAGE_TARNAME = @PACKAGE_TARNAME@
PACKAGE_VERSION = @PACKAGE_VERSION@
PACKAGE_YEAR = @PACKAGE_YEAR@
PATH_SEPARATOR = @PATH_SEPARATOR@
RANLIB = @RANLIB@
RPO_NO = @RPO_NO@
RPO_YES = @RPO_YES@
SET_MAKE = @SET_MAKE@
SHELL = @SHELL@
STRIP = @STRIP@
USING_CL_FALSE = @USING_CL_FALSE@
USING_CL_TRUE = @USING_CL_TRUE@
VERSION = @VERSION@
ac_ct_CC = @ac_ct_CC@
ac_ct_CXX = @ac_ct_CXX@
ac_ct_RANLIB = @ac_ct_RANLIB@
ac_ct_STRIP = @ac_ct_STRIP@
am__fastdepCC_FALSE = @am__fastdepCC_FALSE@
am__fastdepCC_TRUE = @am__fastdepCC_TRUE@
am__fastdepCXX_FALSE = @am__fastdepCXX_FALSE@
am__fastdepCXX_TRUE = @am__fastdepCXX_TRUE@
am__include = @am__include@
am__leading_dot = @am__leading_dot@
am__quote = @am__quote@
am__tar = @am__tar@
am__untar = @am__untar@
bindir = @bindir@
build = @build@
build_alias = @build_alias@
build_cpu = @build_cpu@
build_os = @build_os@
build_vendor = @build_vendor@
datadir = @datadir@
exec_prefix = @exec_prefix@
host = @host@
host_alias = @host_alias@
host_cpu = @host_cpu@
host_os = @host_os@
host_vendor = @host_vendor@
includedir = @includedir@
infodir = @infodir@
install_sh = @install_sh@
libdir = @libdir@
libexecdir = @libexecdir@
localstatedir = @localstatedir@
mandir = @mandir@
mkdir_p = @mkdir_p@
oldincludedir = @oldincludedir@
prefix = @prefix@
program_transform_name = @program_transform_name@
sbindir = @sbindir@
sharedstatedir = @sharedstatedir@
sysconfdir = @sysconfdir@
target_alias = @target_alias@
SUBDIRS =
AM_CPPFLAGS = \
-I$(top_srcdir)/ccutil -I$(top_srcdir)/cutil \
-I$(top_srcdir)/image -I$(top_srcdir)/viewer
EXTRA_DIST = \
blckerr.h blobbox.h blobs.h blread.h coutln.h crakedge.h \
genblob.h hpddef.h hpdsizes.h ipoints.h labls.h linlsq.h \
lmedsq.h mod128.h normalis.h ocrblock.h ocrrow.h pageblk.h \
pageres.h pdblock.h pdclass.h points.h polyaprx.h polyblk.h \
polyblob.h polyvert.h poutline.h quadlsq.h quadratc.h \
quspline.h ratngs.h rect.h rejctmap.h rwpoly.h statistc.h \
stepblob.h txtregn.h vecfuncs.h werd.h
noinst_LIBRARIES = libtesseract_ccstruct.a
libtesseract_ccstruct_a_SOURCES = \
blobbox.cpp blobs.cpp blread.cpp callcpp.cpp \
coutln.cpp genblob.cpp labls.cpp linlsq.cpp \
lmedsq.cpp mod128.cpp normalis.cpp ocrblock.cpp \
ocrrow.cpp pageblk.cpp pageres.cpp pdblock.cpp \
points.cpp polyaprx.cpp polyblk.cpp polyblob.cpp \
polyvert.cpp poutline.cpp quadlsq.cpp quadratc.cpp \
quspline.cpp ratngs.cpp rect.cpp rejctmap.cpp \
rwpoly.cpp statistc.cpp stepblob.cpp txtregn.cpp \
vecfuncs.cpp werd.cpp
all: all-recursive
.SUFFIXES:
.SUFFIXES: .cpp .o .obj
$(srcdir)/Makefile.in: @MAINTAINER_MODE_TRUE@ $(srcdir)/Makefile.am $(am__configure_deps)
@for dep in $?; do \
case '$(am__configure_deps)' in \
*$$dep*) \
cd $(top_builddir) && $(MAKE) $(AM_MAKEFLAGS) am--refresh \
&& exit 0; \
exit 1;; \
esac; \
done; \
echo ' cd $(top_srcdir) && $(AUTOMAKE) --gnu ccstruct/Makefile'; \
cd $(top_srcdir) && \
$(AUTOMAKE) --gnu ccstruct/Makefile
.PRECIOUS: Makefile
Makefile: $(srcdir)/Makefile.in $(top_builddir)/config.status
@case '$?' in \
*config.status*) \
cd $(top_builddir) && $(MAKE) $(AM_MAKEFLAGS) am--refresh;; \
*) \
echo ' cd $(top_builddir) && $(SHELL) ./config.status $(subdir)/$@ $(am__depfiles_maybe)'; \
cd $(top_builddir) && $(SHELL) ./config.status $(subdir)/$@ $(am__depfiles_maybe);; \
esac;
$(top_builddir)/config.status: $(top_srcdir)/configure $(CONFIG_STATUS_DEPENDENCIES)
cd $(top_builddir) && $(MAKE) $(AM_MAKEFLAGS) am--refresh
$(top_srcdir)/configure: @MAINTAINER_MODE_TRUE@ $(am__configure_deps)
cd $(top_builddir) && $(MAKE) $(AM_MAKEFLAGS) am--refresh
$(ACLOCAL_M4): @MAINTAINER_MODE_TRUE@ $(am__aclocal_m4_deps)
cd $(top_builddir) && $(MAKE) $(AM_MAKEFLAGS) am--refresh
clean-noinstLIBRARIES:
-test -z "$(noinst_LIBRARIES)" || rm -f $(noinst_LIBRARIES)
libtesseract_ccstruct.a: $(libtesseract_ccstruct_a_OBJECTS) $(libtesseract_ccstruct_a_DEPENDENCIES)
-rm -f libtesseract_ccstruct.a
$(libtesseract_ccstruct_a_AR) libtesseract_ccstruct.a $(libtesseract_ccstruct_a_OBJECTS) $(libtesseract_ccstruct_a_LIBADD)
$(RANLIB) libtesseract_ccstruct.a
mostlyclean-compile:
-rm -f *.$(OBJEXT)
distclean-compile:
-rm -f *.tab.c
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/blobbox.Po@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/blobs.Po@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/blread.Po@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/callcpp.Po@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/coutln.Po@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/genblob.Po@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/labls.Po@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/linlsq.Po@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/lmedsq.Po@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/mod128.Po@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/normalis.Po@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/ocrblock.Po@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/ocrrow.Po@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/pageblk.Po@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/pageres.Po@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/pdblock.Po@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/points.Po@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/polyaprx.Po@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/polyblk.Po@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/polyblob.Po@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/polyvert.Po@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/poutline.Po@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/quadlsq.Po@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/quadratc.Po@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/quspline.Po@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/ratngs.Po@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/rect.Po@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/rejctmap.Po@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/rwpoly.Po@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/statistc.Po@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/stepblob.Po@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/txtregn.Po@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/vecfuncs.Po@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/werd.Po@am__quote@
.cpp.o:
@am__fastdepCXX_TRUE@ if $(CXXCOMPILE) -MT $@ -MD -MP -MF "$(DEPDIR)/$*.Tpo" -c -o $@ $<; \
@am__fastdepCXX_TRUE@ then mv -f "$(DEPDIR)/$*.Tpo" "$(DEPDIR)/$*.Po"; else rm -f "$(DEPDIR)/$*.Tpo"; exit 1; fi
@AMDEP_TRUE@@am__fastdepCXX_FALSE@ source='$<' object='$@' libtool=no @AMDEPBACKSLASH@
@AMDEP_TRUE@@am__fastdepCXX_FALSE@ DEPDIR=$(DEPDIR) $(CXXDEPMODE) $(depcomp) @AMDEPBACKSLASH@
@am__fastdepCXX_FALSE@ $(CXXCOMPILE) -c -o $@ $<
.cpp.obj:
@am__fastdepCXX_TRUE@ if $(CXXCOMPILE) -MT $@ -MD -MP -MF "$(DEPDIR)/$*.Tpo" -c -o $@ `$(CYGPATH_W) '$<'`; \
@am__fastdepCXX_TRUE@ then mv -f "$(DEPDIR)/$*.Tpo" "$(DEPDIR)/$*.Po"; else rm -f "$(DEPDIR)/$*.Tpo"; exit 1; fi
@AMDEP_TRUE@@am__fastdepCXX_FALSE@ source='$<' object='$@' libtool=no @AMDEPBACKSLASH@
@AMDEP_TRUE@@am__fastdepCXX_FALSE@ DEPDIR=$(DEPDIR) $(CXXDEPMODE) $(depcomp) @AMDEPBACKSLASH@
@am__fastdepCXX_FALSE@ $(CXXCOMPILE) -c -o $@ `$(CYGPATH_W) '$<'`
uninstall-info-am:
# This directory's subdirectories are mostly independent; you can cd
# into them and run `make' without going through this Makefile.
# To change the values of `make' variables: instead of editing Makefiles,
# (1) if the variable is set in `config.status', edit `config.status'
# (which will cause the Makefiles to be regenerated when you run `make');
# (2) otherwise, pass the desired values on the `make' command line.
$(RECURSIVE_TARGETS):
@failcom='exit 1'; \
for f in x $$MAKEFLAGS; do \
case $$f in \
*=* | --[!k]*);; \
*k*) failcom='fail=yes';; \
esac; \
done; \
dot_seen=no; \
target=`echo $@ | sed s/-recursive//`; \
list='$(SUBDIRS)'; for subdir in $$list; do \
echo "Making $$target in $$subdir"; \
if test "$$subdir" = "."; then \
dot_seen=yes; \
local_target="$$target-am"; \
else \
local_target="$$target"; \
fi; \
(cd $$subdir && $(MAKE) $(AM_MAKEFLAGS) $$local_target) \
|| eval $$failcom; \
done; \
if test "$$dot_seen" = "no"; then \
$(MAKE) $(AM_MAKEFLAGS) "$$target-am" || exit 1; \
fi; test -z "$$fail"
mostlyclean-recursive clean-recursive distclean-recursive \
maintainer-clean-recursive:
@failcom='exit 1'; \
for f in x $$MAKEFLAGS; do \
case $$f in \
*=* | --[!k]*);; \
*k*) failcom='fail=yes';; \
esac; \
done; \
dot_seen=no; \
case "$@" in \
distclean-* | maintainer-clean-*) list='$(DIST_SUBDIRS)' ;; \
*) list='$(SUBDIRS)' ;; \
esac; \
rev=''; for subdir in $$list; do \
if test "$$subdir" = "."; then :; else \
rev="$$subdir $$rev"; \
fi; \
done; \
rev="$$rev ."; \
target=`echo $@ | sed s/-recursive//`; \
for subdir in $$rev; do \
echo "Making $$target in $$subdir"; \
if test "$$subdir" = "."; then \
local_target="$$target-am"; \
else \
local_target="$$target"; \
fi; \
(cd $$subdir && $(MAKE) $(AM_MAKEFLAGS) $$local_target) \
|| eval $$failcom; \
done && test -z "$$fail"
tags-recursive:
list='$(SUBDIRS)'; for subdir in $$list; do \
test "$$subdir" = . || (cd $$subdir && $(MAKE) $(AM_MAKEFLAGS) tags); \
done
ctags-recursive:
list='$(SUBDIRS)'; for subdir in $$list; do \
test "$$subdir" = . || (cd $$subdir && $(MAKE) $(AM_MAKEFLAGS) ctags); \
done
ID: $(HEADERS) $(SOURCES) $(LISP) $(TAGS_FILES)
list='$(SOURCES) $(HEADERS) $(LISP) $(TAGS_FILES)'; \
unique=`for i in $$list; do \
if test -f "$$i"; then echo $$i; else echo $(srcdir)/$$i; fi; \
done | \
$(AWK) ' { files[$$0] = 1; } \
END { for (i in files) print i; }'`; \
mkid -fID $$unique
tags: TAGS
TAGS: tags-recursive $(HEADERS) $(SOURCES) $(TAGS_DEPENDENCIES) \
$(TAGS_FILES) $(LISP)
tags=; \
here=`pwd`; \
if ($(ETAGS) --etags-include --version) >/dev/null 2>&1; then \
include_option=--etags-include; \
empty_fix=.; \
else \
include_option=--include; \
empty_fix=; \
fi; \
list='$(SUBDIRS)'; for subdir in $$list; do \
if test "$$subdir" = .; then :; else \
test ! -f $$subdir/TAGS || \
tags="$$tags $$include_option=$$here/$$subdir/TAGS"; \
fi; \
done; \
list='$(SOURCES) $(HEADERS) $(LISP) $(TAGS_FILES)'; \
unique=`for i in $$list; do \
if test -f "$$i"; then echo $$i; else echo $(srcdir)/$$i; fi; \
done | \
$(AWK) ' { files[$$0] = 1; } \
END { for (i in files) print i; }'`; \
if test -z "$(ETAGS_ARGS)$$tags$$unique"; then :; else \
test -n "$$unique" || unique=$$empty_fix; \
$(ETAGS) $(ETAGSFLAGS) $(AM_ETAGSFLAGS) $(ETAGS_ARGS) \
$$tags $$unique; \
fi
ctags: CTAGS
CTAGS: ctags-recursive $(HEADERS) $(SOURCES) $(TAGS_DEPENDENCIES) \
$(TAGS_FILES) $(LISP)
tags=; \
here=`pwd`; \
list='$(SOURCES) $(HEADERS) $(LISP) $(TAGS_FILES)'; \
unique=`for i in $$list; do \
if test -f "$$i"; then echo $$i; else echo $(srcdir)/$$i; fi; \
done | \
$(AWK) ' { files[$$0] = 1; } \
END { for (i in files) print i; }'`; \
test -z "$(CTAGS_ARGS)$$tags$$unique" \
|| $(CTAGS) $(CTAGSFLAGS) $(AM_CTAGSFLAGS) $(CTAGS_ARGS) \
$$tags $$unique
GTAGS:
here=`$(am__cd) $(top_builddir) && pwd` \
&& cd $(top_srcdir) \
&& gtags -i $(GTAGS_ARGS) $$here
distclean-tags:
-rm -f TAGS ID GTAGS GRTAGS GSYMS GPATH tags
distdir: $(DISTFILES)
@srcdirstrip=`echo "$(srcdir)" | sed 's|.|.|g'`; \
topsrcdirstrip=`echo "$(top_srcdir)" | sed 's|.|.|g'`; \
list='$(DISTFILES)'; for file in $$list; do \
case $$file in \
$(srcdir)/*) file=`echo "$$file" | sed "s|^$$srcdirstrip/||"`;; \
$(top_srcdir)/*) file=`echo "$$file" | sed "s|^$$topsrcdirstrip/|$(top_builddir)/|"`;; \
esac; \
if test -f $$file || test -d $$file; then d=.; else d=$(srcdir); fi; \
dir=`echo "$$file" | sed -e 's,/[^/]*$$,,'`; \
if test "$$dir" != "$$file" && test "$$dir" != "."; then \
dir="/$$dir"; \
$(mkdir_p) "$(distdir)$$dir"; \
else \
dir=''; \
fi; \
if test -d $$d/$$file; then \
if test -d $(srcdir)/$$file && test $$d != $(srcdir); then \
cp -pR $(srcdir)/$$file $(distdir)$$dir || exit 1; \
fi; \
cp -pR $$d/$$file $(distdir)$$dir || exit 1; \
else \
test -f $(distdir)/$$file \
|| cp -p $$d/$$file $(distdir)/$$file \
|| exit 1; \
fi; \
done
list='$(DIST_SUBDIRS)'; for subdir in $$list; do \
if test "$$subdir" = .; then :; else \
test -d "$(distdir)/$$subdir" \
|| $(mkdir_p) "$(distdir)/$$subdir" \
|| exit 1; \
distdir=`$(am__cd) $(distdir) && pwd`; \
top_distdir=`$(am__cd) $(top_distdir) && pwd`; \
(cd $$subdir && \
$(MAKE) $(AM_MAKEFLAGS) \
top_distdir="$$top_distdir" \
distdir="$$distdir/$$subdir" \
distdir) \
|| exit 1; \
fi; \
done
check-am: all-am
check: check-recursive
all-am: Makefile $(LIBRARIES)
installdirs: installdirs-recursive
installdirs-am:
install: install-recursive
install-exec: install-exec-recursive
install-data: install-data-recursive
uninstall: uninstall-recursive
install-am: all-am
@$(MAKE) $(AM_MAKEFLAGS) install-exec-am install-data-am
installcheck: installcheck-recursive
install-strip:
$(MAKE) $(AM_MAKEFLAGS) INSTALL_PROGRAM="$(INSTALL_STRIP_PROGRAM)" \
install_sh_PROGRAM="$(INSTALL_STRIP_PROGRAM)" INSTALL_STRIP_FLAG=-s \
`test -z '$(STRIP)' || \
echo "INSTALL_PROGRAM_ENV=STRIPPROG='$(STRIP)'"` install
mostlyclean-generic:
clean-generic:
distclean-generic:
-test -z "$(CONFIG_CLEAN_FILES)" || rm -f $(CONFIG_CLEAN_FILES)
maintainer-clean-generic:
@echo "This command is intended for maintainers to use"
@echo "it deletes files that may require special tools to rebuild."
clean: clean-recursive
clean-am: clean-generic clean-noinstLIBRARIES mostlyclean-am
distclean: distclean-recursive
-rm -rf ./$(DEPDIR)
-rm -f Makefile
distclean-am: clean-am distclean-compile distclean-generic \
distclean-tags
dvi: dvi-recursive
dvi-am:
html: html-recursive
info: info-recursive
info-am:
install-data-am:
install-exec-am:
install-info: install-info-recursive
install-man:
installcheck-am:
maintainer-clean: maintainer-clean-recursive
-rm -rf ./$(DEPDIR)
-rm -f Makefile
maintainer-clean-am: distclean-am maintainer-clean-generic
mostlyclean: mostlyclean-recursive
mostlyclean-am: mostlyclean-compile mostlyclean-generic
pdf: pdf-recursive
pdf-am:
ps: ps-recursive
ps-am:
uninstall-am: uninstall-info-am
uninstall-info: uninstall-info-recursive
.PHONY: $(RECURSIVE_TARGETS) CTAGS GTAGS all all-am check check-am \
clean clean-generic clean-noinstLIBRARIES clean-recursive \
ctags ctags-recursive distclean distclean-compile \
distclean-generic distclean-recursive distclean-tags distdir \
dvi dvi-am html html-am info info-am install install-am \
install-data install-data-am install-exec install-exec-am \
install-info install-info-am install-man install-strip \
installcheck installcheck-am installdirs installdirs-am \
maintainer-clean maintainer-clean-generic \
maintainer-clean-recursive mostlyclean mostlyclean-compile \
mostlyclean-generic mostlyclean-recursive pdf pdf-am ps ps-am \
tags tags-recursive uninstall uninstall-am uninstall-info-am
# Tell versions [3.59,3.63) of GNU make to not export all variables.
# Otherwise a system limit (for SysV at least) may be exceeded.
.NOEXPORT:

29
ccstruct/blckerr.h Normal file
View File

@ -0,0 +1,29 @@
/**********************************************************************
* File: blckerr.h (Formerly blockerr.h)
* Description: Error codes for the page block classes.
* Author: Ray Smith
* Created: Tue Mar 19 17:43:30 GMT 1991
*
* (C) Copyright 1991, Hewlett-Packard Ltd.
** Licensed under the Apache License, Version 2.0 (the "License");
** you may not use this file except in compliance with the License.
** You may obtain a copy of the License at
** http://www.apache.org/licenses/LICENSE-2.0
** Unless required by applicable law or agreed to in writing, software
** distributed under the License is distributed on an "AS IS" BASIS,
** WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
** See the License for the specific language governing permissions and
** limitations under the License.
*
**********************************************************************/
#ifndef BLCKERR_H
#define BLCKERR_H
#include "errcode.h"
const ERRCODE BADBLOCKLINE = "Y coordinate in block out of bounds";
const ERRCODE LOSTBLOCKLINE = "Can't find rectangle for line";
const ERRCODE ILLEGAL_GRADIENT = "Gradient wrong side of edge step!";
const ERRCODE WRONG_WORD = "Word doesn't have blobs of that type";
#endif

778
ccstruct/blobbox.cpp Normal file
View File

@ -0,0 +1,778 @@
/**********************************************************************
* File: blobbox.cpp (Formerly blobnbox.c)
* Description: Code for the textord blob class.
* Author: Ray Smith
* Created: Thu Jul 30 09:08:51 BST 1992
*
* (C) Copyright 1992, Hewlett-Packard Ltd.
** Licensed under the Apache License, Version 2.0 (the "License");
** you may not use this file except in compliance with the License.
** You may obtain a copy of the License at
** http://www.apache.org/licenses/LICENSE-2.0
** Unless required by applicable law or agreed to in writing, software
** distributed under the License is distributed on an "AS IS" BASIS,
** WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
** See the License for the specific language governing permissions and
** limitations under the License.
*
**********************************************************************/
#include "mfcpch.h"
#include "blobbox.h"
#define PROJECTION_MARGIN 10 //arbitrary
#define EXTERN
EXTERN double_VAR (textord_error_weight, 3,
"Weighting for error in believability");
EXTERN BOOL_VAR (pitsync_projection_fix, TRUE,
"Fix bug in projection profile");
ELISTIZE (BLOBNBOX) ELIST2IZE (TO_ROW) ELISTIZE (TO_BLOCK)
/**********************************************************************
* BLOBNBOX::merge
*
* Merge this blob with the given blob, which should be after this.
**********************************************************************/
void BLOBNBOX::merge( //merge blobs
BLOBNBOX *nextblob //blob to join with
) {
box += nextblob->box; //merge boxes
nextblob->joined = TRUE;
}
/**********************************************************************
* BLOBNBOX::chop
*
* Chop this blob into equal sized pieces using the x height as a guide.
* The blob is not actually chopped. Instead, fake blobs are inserted
* with the relevant bounding boxes.
**********************************************************************/
void BLOBNBOX::chop( //chop blobs
BLOBNBOX_IT *start_it, //location of this
BLOBNBOX_IT *end_it, //iterator
FCOORD rotation, //for landscape
float xheight //of line
) {
INT16 blobcount; //no of blobs
BLOBNBOX *newblob; //fake blob
BLOBNBOX *blob; //current blob
INT16 blobindex; //number of chop
INT16 leftx; //left edge of blob
float blobwidth; //width of each
float rightx; //right edge to scan
float ymin, ymax; //limits of new blob
float test_ymin, test_ymax; //limits of part blob
ICOORD bl, tr; //corners of box
BLOBNBOX_IT blob_it; //blob iterator
//get no of chops
blobcount = (INT16) floor (box.width () / xheight);
if (blobcount > 1 && (blob_ptr != NULL || cblob_ptr != NULL)) {
//width of each
blobwidth = (float) (box.width () + 1) / blobcount;
for (blobindex = blobcount - 1, rightx = box.right ();
blobindex >= 0; blobindex--, rightx -= blobwidth) {
ymin = (float) MAX_INT32;
ymax = (float) -MAX_INT32;
blob_it = *start_it;
do {
blob = blob_it.data ();
if (blob->blob_ptr != NULL)
find_blob_limits (blob->blob_ptr, rightx - blobwidth, rightx,
rotation, test_ymin, test_ymax);
else
find_cblob_vlimits (blob->cblob_ptr, rightx - blobwidth,
rightx,
/*rotation, */ test_ymin, test_ymax);
blob_it.forward ();
if (test_ymin < ymin)
ymin = test_ymin;
if (test_ymax > ymax)
ymax = test_ymax;
}
while (blob != end_it->data ());
if (ymin < ymax) {
leftx = (INT16) floor (rightx - blobwidth);
if (leftx < box.left ())
leftx = box.left (); //clip to real box
bl = ICOORD (leftx, (INT16) floor (ymin));
tr = ICOORD ((INT16) ceil (rightx), (INT16) ceil (ymax));
if (blobindex == 0)
box = BOX (bl, tr); //change box
else {
newblob = new BLOBNBOX;
//box is all it has
newblob->box = BOX (bl, tr);
//stay on current
end_it->add_after_stay_put (newblob);
}
}
}
}
}
/**********************************************************************
* find_blob_limits
*
* Scan the outlines of the blob to locate the y min and max
* between the given x limits.
**********************************************************************/
void find_blob_limits( //get y limits
PBLOB *blob, //blob to search
float leftx, //x limits
float rightx,
FCOORD rotation, //for landscape
float &ymin, //output y limits
float &ymax) {
float testy; //y intercept
FCOORD pos; //rotated
FCOORD vec;
POLYPT *polypt; //current point
//outlines
OUTLINE_IT out_it = blob->out_list ();
POLYPT_IT poly_it; //outline pts
ymin = (float) MAX_INT32;
ymax = (float) -MAX_INT32;
for (out_it.mark_cycle_pt (); !out_it.cycled_list (); out_it.forward ()) {
//get points
poly_it.set_to_list (out_it.data ()->polypts ());
for (poly_it.mark_cycle_pt (); !poly_it.cycled_list ();
poly_it.forward ()) {
polypt = poly_it.data ();
pos = polypt->pos;
pos.rotate (rotation);
vec = polypt->vec;
vec.rotate (rotation);
if (pos.x () < leftx && pos.x () + vec.x () > leftx
|| pos.x () > leftx && pos.x () + vec.x () < leftx) {
testy = pos.y () + vec.y () * (leftx - pos.x ()) / vec.x ();
//intercept of boundary
if (testy < ymin)
ymin = testy;
if (testy > ymax)
ymax = testy;
}
if (pos.x () >= leftx && pos.x () <= rightx) {
if (pos.y () > ymax)
ymax = pos.y ();
if (pos.y () < ymin)
ymin = pos.y ();
}
if (pos.x () > rightx && pos.x () + vec.x () < rightx
|| pos.x () < rightx && pos.x () + vec.x () > rightx) {
testy = pos.y () + vec.y () * (rightx - pos.x ()) / vec.x ();
//intercept of boundary
if (testy < ymin)
ymin = testy;
if (testy > ymax)
ymax = testy;
}
}
}
}
/**********************************************************************
* find_cblob_limits
*
* Scan the outlines of the cblob to locate the y min and max
* between the given x limits.
**********************************************************************/
void find_cblob_limits( //get y limits
C_BLOB *blob, //blob to search
float leftx, //x limits
float rightx,
FCOORD rotation, //for landscape
float &ymin, //output y limits
float &ymax) {
INT16 stepindex; //current point
ICOORD pos; //current coords
ICOORD vec; //rotated step
C_OUTLINE *outline; //current outline
//outlines
C_OUTLINE_IT out_it = blob->out_list ();
ymin = (float) MAX_INT32;
ymax = (float) -MAX_INT32;
for (out_it.mark_cycle_pt (); !out_it.cycled_list (); out_it.forward ()) {
outline = out_it.data ();
pos = outline->start_pos (); //get coords
pos.rotate (rotation);
for (stepindex = 0; stepindex < outline->pathlength (); stepindex++) {
//inside
if (pos.x () >= leftx && pos.x () <= rightx) {
if (pos.y () > ymax)
ymax = pos.y ();
if (pos.y () < ymin)
ymin = pos.y ();
}
vec = outline->step (stepindex);
vec.rotate (rotation);
pos += vec; //move to next
}
}
}
/**********************************************************************
* find_cblob_vlimits
*
* Scan the outlines of the cblob to locate the y min and max
* between the given x limits.
**********************************************************************/
void find_cblob_vlimits( //get y limits
C_BLOB *blob, //blob to search
float leftx, //x limits
float rightx,
float &ymin, //output y limits
float &ymax) {
INT16 stepindex; //current point
ICOORD pos; //current coords
ICOORD vec; //rotated step
C_OUTLINE *outline; //current outline
//outlines
C_OUTLINE_IT out_it = blob->out_list ();
ymin = (float) MAX_INT32;
ymax = (float) -MAX_INT32;
for (out_it.mark_cycle_pt (); !out_it.cycled_list (); out_it.forward ()) {
outline = out_it.data ();
pos = outline->start_pos (); //get coords
for (stepindex = 0; stepindex < outline->pathlength (); stepindex++) {
//inside
if (pos.x () >= leftx && pos.x () <= rightx) {
if (pos.y () > ymax)
ymax = pos.y ();
if (pos.y () < ymin)
ymin = pos.y ();
}
vec = outline->step (stepindex);
pos += vec; //move to next
}
}
}
/**********************************************************************
* find_cblob_hlimits
*
* Scan the outlines of the cblob to locate the x min and max
* between the given y limits.
**********************************************************************/
void find_cblob_hlimits( //get x limits
C_BLOB *blob, //blob to search
float bottomy, //y limits
float topy,
float &xmin, //output x limits
float &xmax) {
INT16 stepindex; //current point
ICOORD pos; //current coords
ICOORD vec; //rotated step
C_OUTLINE *outline; //current outline
//outlines
C_OUTLINE_IT out_it = blob->out_list ();
xmin = (float) MAX_INT32;
xmax = (float) -MAX_INT32;
for (out_it.mark_cycle_pt (); !out_it.cycled_list (); out_it.forward ()) {
outline = out_it.data ();
pos = outline->start_pos (); //get coords
for (stepindex = 0; stepindex < outline->pathlength (); stepindex++) {
//inside
if (pos.y () >= bottomy && pos.y () <= topy) {
if (pos.x () > xmax)
xmax = pos.x ();
if (pos.x () < xmin)
xmin = pos.x ();
}
vec = outline->step (stepindex);
pos += vec; //move to next
}
}
}
/**********************************************************************
* rotate_blob
*
* Poly copy the blob and rotate the copy by the given vector.
**********************************************************************/
PBLOB *rotate_blob( //get y limits
PBLOB *blob, //blob to search
FCOORD rotation //vector to rotate by
) {
PBLOB *copy; //copy of blob
POLYPT *polypt; //current point
OUTLINE_IT out_it;
POLYPT_IT poly_it; //outline pts
copy = new PBLOB;
*copy = *blob; //deep copy
out_it.set_to_list (copy->out_list ());
for (out_it.mark_cycle_pt (); !out_it.cycled_list (); out_it.forward ()) {
//get points
poly_it.set_to_list (out_it.data ()->polypts ());
for (poly_it.mark_cycle_pt (); !poly_it.cycled_list ();
poly_it.forward ()) {
polypt = poly_it.data ();
//rotate it
polypt->pos.rotate (rotation);
polypt->vec.rotate (rotation);
}
out_it.data ()->compute_bb ();
}
return copy;
}
/**********************************************************************
* rotate_cblob
*
* Poly copy the blob and rotate the copy by the given vector.
**********************************************************************/
PBLOB *rotate_cblob( //rotate it
C_BLOB *blob, //blob to search
float xheight, //for poly approx
FCOORD rotation //for landscape
) {
PBLOB *copy; //copy of blob
POLYPT *polypt; //current point
OUTLINE_IT out_it;
POLYPT_IT poly_it; //outline pts
copy = new PBLOB (blob, xheight);
out_it.set_to_list (copy->out_list ());
for (out_it.mark_cycle_pt (); !out_it.cycled_list (); out_it.forward ()) {
//get points
poly_it.set_to_list (out_it.data ()->polypts ());
for (poly_it.mark_cycle_pt (); !poly_it.cycled_list ();
poly_it.forward ()) {
polypt = poly_it.data ();
//rotate it
polypt->pos.rotate (rotation);
polypt->vec.rotate (rotation);
}
out_it.data ()->compute_bb ();
}
return copy;
}
/**********************************************************************
* crotate_cblob
*
* Rotate the copy by the given vector and return a C_BLOB.
**********************************************************************/
C_BLOB *crotate_cblob( //rotate it
C_BLOB *blob, //blob to search
FCOORD rotation //for landscape
) {
C_OUTLINE_LIST out_list; //output outlines
//input outlines
C_OUTLINE_IT in_it = blob->out_list ();
//output outlines
C_OUTLINE_IT out_it = &out_list;
for (in_it.mark_cycle_pt (); !in_it.cycled_list (); in_it.forward ()) {
out_it.add_after_then_move (new C_OUTLINE (in_it.data (), rotation));
}
return new C_BLOB (&out_list);
}
/**********************************************************************
* box_next
*
* Compute the bounding box of this blob with merging of x overlaps
* but no pre-chopping.
* Then move the iterator on to the start of the next blob.
**********************************************************************/
BOX box_next( //get bounding box
BLOBNBOX_IT *it //iterator to blobds
) {
BLOBNBOX *blob; //current blob
BOX result; //total box
blob = it->data ();
result = blob->bounding_box ();
do {
it->forward ();
blob = it->data ();
if (blob->blob () == NULL && blob->cblob () == NULL)
//was pre-chopped
result += blob->bounding_box ();
}
//until next real blob
while (blob->blob () == NULL && blob->cblob () == NULL || blob->joined_to_prev ());
return result;
}
/**********************************************************************
* box_next_pre_chopped
*
* Compute the bounding box of this blob with merging of x overlaps
* but WITH pre-chopping.
* Then move the iterator on to the start of the next pre-chopped blob.
**********************************************************************/
BOX box_next_pre_chopped( //get bounding box
BLOBNBOX_IT *it //iterator to blobds
) {
BLOBNBOX *blob; //current blob
BOX result; //total box
blob = it->data ();
result = blob->bounding_box ();
do {
it->forward ();
blob = it->data ();
}
//until next real blob
while (blob->joined_to_prev ());
return result;
}
/**********************************************************************
* TO_ROW::TO_ROW
*
* Constructor to make a row from a blob.
**********************************************************************/
TO_ROW::TO_ROW ( //constructor
BLOBNBOX * blob, //first blob
float top, //corrected top
float bottom, //of row
float row_size //ideal
):y_min (bottom), y_max (top), initial_y_min (bottom) {
float diff; //in size
BLOBNBOX_IT it = &blobs; //list of blobs
it.add_to_end (blob);
diff = top - bottom - row_size;
if (diff > 0) {
y_max -= diff / 2;
y_min += diff / 2;
}
//very small object
else if ((top - bottom) * 3 < row_size) {
diff = row_size / 3 + bottom - top;
y_max += diff / 2;
y_min -= diff / 2;
}
}
/**********************************************************************
* TO_ROW:add_blob
*
* Add the blob to the end of the row.
**********************************************************************/
void TO_ROW::add_blob( //constructor
BLOBNBOX *blob, //first blob
float top, //corrected top
float bottom, //of row
float row_size //ideal
) {
float allowed; //allowed expansion
float available; //expansion
BLOBNBOX_IT it = &blobs; //list of blobs
it.add_to_end (blob);
allowed = row_size + y_min - y_max;
if (allowed > 0) {
available = top > y_max ? top - y_max : 0;
if (bottom < y_min)
//total available
available += y_min - bottom;
if (available > 0) {
available += available; //do it gradually
if (available < allowed)
available = allowed;
if (bottom < y_min)
y_min -= (y_min - bottom) * allowed / available;
if (top > y_max)
y_max += (top - y_max) * allowed / available;
}
}
}
/**********************************************************************
* TO_ROW:insert_blob
*
* Add the blob to the row in the correct position.
**********************************************************************/
void TO_ROW::insert_blob( //constructor
BLOBNBOX *blob //first blob
) {
BLOBNBOX_IT it = &blobs; //list of blobs
if (it.empty ())
it.add_before_then_move (blob);
else {
it.mark_cycle_pt ();
while (!it.cycled_list ()
&& it.data ()->bounding_box ().left () <=
blob->bounding_box ().left ())
it.forward ();
if (it.cycled_list ())
it.add_to_end (blob);
else
it.add_before_stay_put (blob);
}
}
/**********************************************************************
* TO_ROW::compute_vertical_projection
*
* Compute the vertical projection of a TO_ROW from its blobs.
**********************************************************************/
void TO_ROW::compute_vertical_projection() { //project whole row
BOX row_box; //bound of row
BLOBNBOX *blob; //current blob
BOX blob_box; //bounding box
BLOBNBOX_IT blob_it = blob_list ();
if (blob_it.empty ())
return;
row_box = blob_it.data ()->bounding_box ();
for (blob_it.mark_cycle_pt (); !blob_it.cycled_list (); blob_it.forward ())
row_box += blob_it.data ()->bounding_box ();
projection.set_range (row_box.left () - PROJECTION_MARGIN,
row_box.right () + PROJECTION_MARGIN);
projection_left = row_box.left () - PROJECTION_MARGIN;
projection_right = row_box.right () + PROJECTION_MARGIN;
for (blob_it.mark_cycle_pt (); !blob_it.cycled_list (); blob_it.forward ()) {
blob = blob_it.data ();
if (blob->blob () != NULL)
vertical_blob_projection (blob->blob (), &projection);
else if (blob->cblob () != NULL)
vertical_cblob_projection (blob->cblob (), &projection);
}
}
/**********************************************************************
* vertical_blob_projection
*
* Compute the vertical projection of a blob from its outlines
* and add to the given STATS.
**********************************************************************/
void vertical_blob_projection( //project outlines
PBLOB *blob, //blob to project
STATS *stats //output
) {
//outlines of blob
OUTLINE_IT out_it = blob->out_list ();
for (out_it.mark_cycle_pt (); !out_it.cycled_list (); out_it.forward ()) {
vertical_outline_projection (out_it.data (), stats);
}
}
/**********************************************************************
* vertical_outline_projection
*
* Compute the vertical projection of a outline from its outlines
* and add to the given STATS.
**********************************************************************/
void vertical_outline_projection( //project outlines
OUTLINE *outline, //outline to project
STATS *stats //output
) {
POLYPT *polypt; //current point
INT32 xcoord; //current pixel coord
float end_x; //end of vec
POLYPT_IT poly_it = outline->polypts ();
OUTLINE_IT out_it = outline->child ();
float ymean; //amount to add
float width; //amount of x
for (poly_it.mark_cycle_pt (); !poly_it.cycled_list (); poly_it.forward ()) {
polypt = poly_it.data ();
end_x = polypt->pos.x () + polypt->vec.x ();
if (polypt->vec.x () > 0) {
for (xcoord = (INT32) floor (polypt->pos.x ());
xcoord < end_x; xcoord++) {
if (polypt->pos.x () < xcoord) {
width = (float) xcoord;
ymean =
polypt->vec.y () * (xcoord -
polypt->pos.x ()) / polypt->vec.x () +
polypt->pos.y ();
}
else {
width = polypt->pos.x ();
ymean = polypt->pos.y ();
}
if (end_x > xcoord + 1) {
width -= xcoord + 1;
ymean +=
polypt->vec.y () * (xcoord + 1 -
polypt->pos.x ()) / polypt->vec.x () +
polypt->pos.y ();
}
else {
width -= end_x;
ymean += polypt->pos.y () + polypt->vec.y ();
}
ymean = ymean * width / 2;
stats->add (xcoord, (INT32) floor (ymean + 0.5));
}
}
else if (polypt->vec.x () < 0) {
for (xcoord = (INT32) floor (end_x);
xcoord < polypt->pos.x (); xcoord++) {
if (polypt->pos.x () > xcoord + 1) {
width = xcoord + 1.0f;
ymean =
polypt->vec.y () * (xcoord + 1 -
polypt->pos.x ()) / polypt->vec.x () +
polypt->pos.y ();
}
else {
width = polypt->pos.x ();
ymean = polypt->pos.y ();
}
if (end_x < xcoord) {
width -= xcoord;
ymean +=
polypt->vec.y () * (xcoord -
polypt->pos.x ()) / polypt->vec.x () +
polypt->pos.y ();
}
else {
width -= end_x;
ymean += polypt->pos.y () + polypt->vec.y ();
}
ymean = ymean * width / 2;
stats->add (xcoord, (INT32) floor (ymean + 0.5));
}
}
}
for (out_it.mark_cycle_pt (); !out_it.cycled_list (); out_it.forward ()) {
vertical_outline_projection (out_it.data (), stats);
}
}
/**********************************************************************
* vertical_cblob_projection
*
* Compute the vertical projection of a cblob from its outlines
* and add to the given STATS.
**********************************************************************/
void vertical_cblob_projection( //project outlines
C_BLOB *blob, //blob to project
STATS *stats //output
) {
//outlines of blob
C_OUTLINE_IT out_it = blob->out_list ();
for (out_it.mark_cycle_pt (); !out_it.cycled_list (); out_it.forward ()) {
vertical_coutline_projection (out_it.data (), stats);
}
}
/**********************************************************************
* vertical_coutline_projection
*
* Compute the vertical projection of a outline from its outlines
* and add to the given STATS.
**********************************************************************/
void vertical_coutline_projection( //project outlines
C_OUTLINE *outline, //outline to project
STATS *stats //output
) {
ICOORD pos; //current point
ICOORD step; //edge step
INT32 length; //of outline
INT16 stepindex; //current step
C_OUTLINE_IT out_it = outline->child ();
pos = outline->start_pos ();
length = outline->pathlength ();
for (stepindex = 0; stepindex < length; stepindex++) {
step = outline->step (stepindex);
if (step.x () > 0) {
if (pitsync_projection_fix)
stats->add (pos.x (), -pos.y ());
else
stats->add (pos.x (), pos.y ());
}
else if (step.x () < 0) {
if (pitsync_projection_fix)
stats->add (pos.x () - 1, pos.y ());
else
stats->add (pos.x () - 1, -pos.y ());
}
pos += step;
}
for (out_it.mark_cycle_pt (); !out_it.cycled_list (); out_it.forward ()) {
vertical_coutline_projection (out_it.data (), stats);
}
}
/**********************************************************************
* TO_BLOCK::TO_BLOCK
*
* Constructor to make a TO_BLOCK from a real block.
**********************************************************************/
TO_BLOCK::TO_BLOCK( //make a block
BLOCK *src_block //real block
) {
block = src_block;
}
static void clear_blobnboxes(BLOBNBOX_LIST* boxes) {
BLOBNBOX_IT it = boxes;
// A BLOBNBOX generally doesn't own its blobs, so if they do, you
// have to delete them explicitly.
for (it.mark_cycle_pt(); !it.cycled_list(); it.forward()) {
BLOBNBOX* box = it.data();
if (box->blob() != NULL)
delete box->blob();
if (box->cblob() != NULL)
delete box->cblob();
}
}
TO_BLOCK::~TO_BLOCK() {
// Any residual BLOBNBOXes at this stage own their blobs, so delete them.
clear_blobnboxes(&blobs);
clear_blobnboxes(&underlines);
clear_blobnboxes(&noise_blobs);
clear_blobnboxes(&small_blobs);
clear_blobnboxes(&large_blobs);
}

381
ccstruct/blobbox.h Normal file
View File

@ -0,0 +1,381 @@
/**********************************************************************
* File: blobbox.h (Formerly blobnbox.h)
* Description: Code for the textord blob class.
* Author: Ray Smith
* Created: Thu Jul 30 09:08:51 BST 1992
*
* (C) Copyright 1992, Hewlett-Packard Ltd.
** Licensed under the Apache License, Version 2.0 (the "License");
** you may not use this file except in compliance with the License.
** You may obtain a copy of the License at
** http://www.apache.org/licenses/LICENSE-2.0
** Unless required by applicable law or agreed to in writing, software
** distributed under the License is distributed on an "AS IS" BASIS,
** WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
** See the License for the specific language governing permissions and
** limitations under the License.
*
**********************************************************************/
#ifndef BLOBBOX_H
#define BLOBBOX_H
#include "varable.h"
#include "clst.h"
#include "elst2.h"
#include "werd.h"
#include "ocrblock.h"
#include "statistc.h"
extern double_VAR_H (textord_error_weight, 3,
"Weighting for error in believability");
enum PITCH_TYPE
{
PITCH_DUNNO, //insufficient data
PITCH_DEF_FIXED, //definitely fixed
PITCH_MAYBE_FIXED, //could be
PITCH_DEF_PROP,
PITCH_MAYBE_PROP,
PITCH_CORR_FIXED,
PITCH_CORR_PROP
};
class BLOBNBOX;
ELISTIZEH (BLOBNBOX)
class BLOBNBOX:public ELIST_LINK
{
public:
BLOBNBOX() { //empty
blob_ptr = NULL;
cblob_ptr = NULL;
joined = FALSE;
reduced = FALSE;
area = 0;
}
BLOBNBOX( //constructor
PBLOB *srcblob) {
blob_ptr = srcblob;
cblob_ptr = NULL;
box = srcblob->bounding_box ();
joined = FALSE;
reduced = FALSE;
area = (int) srcblob->area ();
}
BLOBNBOX( //constructor
C_BLOB *srcblob) {
blob_ptr = NULL;
cblob_ptr = srcblob;
box = srcblob->bounding_box ();
joined = FALSE;
reduced = FALSE;
area = (int) srcblob->area ();
}
//get bounding box
const BOX &bounding_box() const {
return box;
}
//get bounding box
const BOX &reduced_box() const {
return red_box;
}
void set_reduced_box( //set other box
BOX new_box) {
red_box = new_box;
reduced = TRUE;
}
INT32 enclosed_area() const { //get area
return area;
}
void rotate_box( //just box
FCOORD vec) {
box.rotate (vec);
}
BOOL8 joined_to_prev() const { //access function
return joined != 0;
}
BOOL8 red_box_set() const { //access function
return reduced != 0;
}
void merge( //merge with next
BLOBNBOX *nextblob);
void chop( //fake chop blob
BLOBNBOX_IT *start_it, //location of this
BLOBNBOX_IT *blob_it, //iterator
FCOORD rotation, //for landscape
float xheight); //line height
PBLOB *blob() { //access function
return blob_ptr;
}
C_BLOB *cblob() { //access function
return cblob_ptr;
}
#ifndef GRAPHICS_DISABLED
void plot( //draw one
WINDOW window, //window to draw in
COLOUR blob_colour, //for outer bits
COLOUR child_colour) { //for holes
if (blob_ptr != NULL)
blob_ptr->plot (window, blob_colour, child_colour);
if (cblob_ptr != NULL)
cblob_ptr->plot (window, blob_colour, child_colour);
}
#endif
NEWDELETE2 (BLOBNBOX) private:
int area:30; //enclosed area
int joined:1; //joined to prev
int reduced:1; //reduced box set
BOX box; //bounding box
BOX red_box; //bounding box
PBLOB *blob_ptr; //poly blob
C_BLOB *cblob_ptr; //edgestep blob
};
class TO_ROW:public ELIST2_LINK
{
public:
TO_ROW() {
} //empty
TO_ROW( //constructor
BLOBNBOX *blob, //from first blob
float top, //of row //target height
float bottom,
float row_size);
float max_y() const { //access function
return y_max;
}
float min_y() const {
return y_min;
}
float mean_y() const {
return (y_min + y_max) / 2.0f;
}
float initial_min_y() const {
return initial_y_min;
}
float line_m() const { //access to line fit
return m;
}
float line_c() const {
return c;
}
float line_error() const {
return error;
}
float parallel_c() const {
return para_c;
}
float parallel_error() const {
return para_error;
}
float believability() const { //baseline goodness
return credibility;
}
float intercept() const { //real parallel_c
return y_origin;
}
void add_blob( //put in row
BLOBNBOX *blob, //blob to add
float top, //of row //target height
float bottom,
float row_size);
void insert_blob( //put in row in order
BLOBNBOX *blob);
BLOBNBOX_LIST *blob_list() { //get list
return &blobs;
}
void set_line( //set line spec
float new_m, //line to set
float new_c,
float new_error) {
m = new_m;
c = new_c;
error = new_error;
}
void set_parallel_line( //set fixed gradient line
float gradient, //page gradient
float new_c,
float new_error) {
para_c = new_c;
para_error = new_error;
credibility =
(float) (blobs.length () - textord_error_weight * new_error);
y_origin = (float) (new_c / sqrt (1 + gradient * gradient));
//real intercept
}
void set_limits( //set min,max
float new_min, //bottom and
float new_max) { //top of row
y_min = new_min;
y_max = new_max;
}
void compute_vertical_projection();
//get projection
//true when dead
NEWDELETE2 (TO_ROW) BOOL8 merged;
BOOL8 all_caps; //had no ascenders
BOOL8 used_dm_model; //in guessing pitch
INT16 projection_left; //start of projection
INT16 projection_right; //start of projection
PITCH_TYPE pitch_decision; //how strong is decision
float fixed_pitch; //pitch or 0
float fp_space; //sp if fixed pitch
float fp_nonsp; //nonsp if fixed pitch
float pr_space; //sp if prop
float pr_nonsp; //non sp if prop
float spacing; //to "next" row
float xheight; //of line
float ascrise; //ascenders
float descdrop; //descenders
INT32 min_space; //min size for real space
INT32 max_nonspace; //max size of non-space
INT32 space_threshold; //space vs nonspace
float kern_size; //average non-space
float space_size; //average space
WERD_LIST rep_words; //repeated chars
ICOORDELT_LIST char_cells; //fixed pitch cells
QSPLINE baseline; //curved baseline
STATS projection; //vertical projection
private:
BLOBNBOX_LIST blobs; //blobs in row
float y_min; //coords
float y_max;
float initial_y_min;
float m, c; //line spec
float error; //line error
float para_c; //constrained fit
float para_error;
float y_origin; //rotated para_c;
float credibility; //baseline believability
};
ELIST2IZEH (TO_ROW)
class TO_BLOCK:public ELIST_LINK
{
public:
TO_BLOCK() {
} //empty
TO_BLOCK( //constructor
BLOCK *src_block); //real block
~TO_BLOCK();
TO_ROW_LIST *get_rows() { //access function
return &row_list;
}
void print_rows() { //debug info
TO_ROW_IT row_it = &row_list;
TO_ROW *row;
for (row_it.mark_cycle_pt (); !row_it.cycled_list ();
row_it.forward ()) {
row = row_it.data ();
printf ("Row range (%g,%g), para_c=%g, blobcount=" INT32FORMAT
"\n", row->min_y (), row->max_y (), row->parallel_c (),
row->blob_list ()->length ());
}
}
BLOBNBOX_LIST blobs; //medium size
BLOBNBOX_LIST underlines; //underline blobs
BLOBNBOX_LIST noise_blobs; //very small
BLOBNBOX_LIST small_blobs; //fairly small
BLOBNBOX_LIST large_blobs; //big blobs
BLOCK *block; //real block
PITCH_TYPE pitch_decision; //how strong is decision
float line_spacing; //estimate
float line_size; //estimate
float max_blob_size; //line assignment limit
float baseline_offset; //phase shift
float xheight; //median blob size
float fixed_pitch; //pitch or 0
float kern_size; //average non-space
float space_size; //average space
INT32 min_space; //min definite space
INT32 max_nonspace; //max definite
float fp_space; //sp if fixed pitch
float fp_nonsp; //nonsp if fixed pitch
float pr_space; //sp if prop
float pr_nonsp; //non sp if prop
TO_ROW *key_row; //starting row
NEWDELETE2 (TO_BLOCK) private:
TO_ROW_LIST row_list; //temporary rows
};
ELISTIZEH (TO_BLOCK)
extern double_VAR_H (textord_error_weight, 3,
"Weighting for error in believability");
void find_blob_limits( //get y limits
PBLOB *blob, //blob to search
float leftx, //x limits
float rightx,
FCOORD rotation, //for landscape
float &ymin, //output y limits
float &ymax);
void find_cblob_limits( //get y limits
C_BLOB *blob, //blob to search
float leftx, //x limits
float rightx,
FCOORD rotation, //for landscape
float &ymin, //output y limits
float &ymax);
void find_cblob_vlimits( //get y limits
C_BLOB *blob, //blob to search
float leftx, //x limits
float rightx,
float &ymin, //output y limits
float &ymax);
void find_cblob_hlimits( //get x limits
C_BLOB *blob, //blob to search
float bottomy, //y limits
float topy,
float &xmin, //output x limits
float &xymax);
PBLOB *rotate_blob( //get y limits
PBLOB *blob, //blob to search
FCOORD rotation //vector to rotate by
);
PBLOB *rotate_cblob( //rotate it
C_BLOB *blob, //blob to search
float xheight, //for poly approx
FCOORD rotation //for landscape
);
C_BLOB *crotate_cblob( //rotate it
C_BLOB *blob, //blob to search
FCOORD rotation //for landscape
);
BOX box_next( //get bounding box
BLOBNBOX_IT *it //iterator to blobds
);
BOX box_next_pre_chopped( //get bounding box
BLOBNBOX_IT *it //iterator to blobds
);
void vertical_blob_projection( //project outlines
PBLOB *blob, //blob to project
STATS *stats //output
);
//project outlines
void vertical_outline_projection(OUTLINE *outline, //outline to project
STATS *stats //output
);
void vertical_cblob_projection( //project outlines
C_BLOB *blob, //blob to project
STATS *stats //output
);
void vertical_coutline_projection( //project outlines
C_OUTLINE *outline, //outline to project
STATS *stats //output
);
#endif

247
ccstruct/blobs.cpp Normal file
View File

@ -0,0 +1,247 @@
/* -*-C-*-
********************************************************************************
*
* File: blobs.c (Formerly blobs.c)
* Description: Blob definition
* Author: Mark Seaman, OCR Technology
* Created: Fri Oct 27 15:39:52 1989
* Modified: Thu Mar 28 15:33:26 1991 (Mark Seaman) marks@hpgrlt
* Language: C
* Package: N/A
* Status: Experimental (Do Not Distribute)
*
* (c) Copyright 1989, Hewlett-Packard Company.
** Licensed under the Apache License, Version 2.0 (the "License");
** you may not use this file except in compliance with the License.
** You may obtain a copy of the License at
** http://www.apache.org/licenses/LICENSE-2.0
** Unless required by applicable law or agreed to in writing, software
** distributed under the License is distributed on an "AS IS" BASIS,
** WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
** See the License for the specific language governing permissions and
** limitations under the License.
*
*********************************************************************************/
/*----------------------------------------------------------------------
I n c l u d e s
----------------------------------------------------------------------*/
#include "mfcpch.h"
#include "blobs.h"
#include "cutil.h"
#include "emalloc.h"
#include "structures.h"
/*----------------------------------------------------------------------
F u n c t i o n s
----------------------------------------------------------------------*/
/**********************************************************************
* blob_origin
*
* Compute the origin of a compound blob, define to be the centre
* of the bounding box.
**********************************************************************/
void blob_origin(TBLOB *blob, /*blob to compute on */
TPOINT *origin) { /*return value */
TPOINT topleft; /*bounding box */
TPOINT botright;
/*find bounding box */
blob_bounding_box(blob, &topleft, &botright);
/*centre of box */
origin->x = (topleft.x + botright.x) / 2;
origin->y = (topleft.y + botright.y) / 2;
}
/**********************************************************************
* blob_bounding_box
*
* Compute the bounding_box of a compound blob, define to be the
* max coordinate value of the bounding boxes of all the top-level
* outlines in the box.
**********************************************************************/
void blob_bounding_box(TBLOB *blob, /*blob to compute on */
register TPOINT *topleft, /*bounding box */
register TPOINT *botright) {
register TESSLINE *outline; /*current outline */
if (blob == NULL || blob->outlines == NULL) {
topleft->x = topleft->y = 0;
*botright = *topleft; /*default value */
}
else {
outline = blob->outlines;
*topleft = outline->topleft;
*botright = outline->botright;
for (outline = outline->next; outline != NULL; outline = outline->next) {
if (outline->topleft.x < topleft->x)
/*find extremes */
topleft->x = outline->topleft.x;
if (outline->botright.x > botright->x)
/*find extremes */
botright->x = outline->botright.x;
if (outline->topleft.y > topleft->y)
/*find extremes */
topleft->y = outline->topleft.y;
if (outline->botright.y < botright->y)
/*find extremes */
botright->y = outline->botright.y;
}
}
}
/**********************************************************************
* blobs_bounding_box
*
* Return the smallest extreme point that contain this word.
**********************************************************************/
void blobs_bounding_box(TBLOB *blobs, TPOINT *topleft, TPOINT *botright) {
TPOINT tl;
TPOINT br;
TBLOB *blob;
/* Start with first blob */
blob_bounding_box(blobs, topleft, botright);
iterate_blobs(blob, blobs) {
blob_bounding_box(blob, &tl, &br);
if (tl.x < topleft->x)
topleft->x = tl.x;
if (tl.y > topleft->y)
topleft->y = tl.y;
if (br.x > botright->x)
botright->x = br.x;
if (br.y < botright->y)
botright->y = br.y;
}
}
/**********************************************************************
* blobs_origin
*
* Compute the origin of a compound blob, define to be the centre
* of the bounding box.
**********************************************************************/
void blobs_origin(TBLOB *blobs, /*blob to compute on */
TPOINT *origin) { /*return value */
TPOINT topleft; /*bounding box */
TPOINT botright;
/*find bounding box */
blobs_bounding_box(blobs, &topleft, &botright);
/*center of box */
origin->x = (topleft.x + botright.x) / 2;
origin->y = (topleft.y + botright.y) / 2;
}
/**********************************************************************
* blobs_widths
*
* Compute the widths of a list of blobs. Return an array of the widths
* and gaps.
**********************************************************************/
WIDTH_RECORD *blobs_widths(TBLOB *blobs) { /*blob to compute on */
WIDTH_RECORD *width_record;
TPOINT topleft; /*bounding box */
TPOINT botright;
TBLOB *blob; /*blob to compute on */
int i = 0;
int blob_end;
int num_blobs = count_blobs (blobs);
/* Get memory */
width_record = (WIDTH_RECORD *) memalloc (sizeof (int) * num_blobs * 2);
width_record->num_chars = num_blobs;
blob_bounding_box(blobs, &topleft, &botright);
width_record->widths[i++] = botright.x - topleft.x;
/* First width */
blob_end = botright.x;
iterate_blobs (blob, blobs->next) {
blob_bounding_box(blob, &topleft, &botright);
width_record->widths[i++] = topleft.x - blob_end;
width_record->widths[i++] = botright.x - topleft.x;
blob_end = botright.x;
}
return (width_record);
}
/**********************************************************************
* count_blobs
*
* Return a count of the number of blobs attached to this one.
**********************************************************************/
int count_blobs(TBLOB *blobs) {
TBLOB *b;
int x = 0;
iterate_blobs (b, blobs) x++;
return (x);
}
/**********************************************************************
* delete_word
*
* Reclaim the memory taken by this word structure and all of its
* lower level structures.
**********************************************************************/
void delete_word(TWERD *word) {
TBLOB *blob;
TBLOB *nextblob;
TESSLINE *outline;
TESSLINE *nextoutline;
TESSLINE *child;
TESSLINE *nextchild;
for (blob = word->blobs; blob; blob = nextblob) {
nextblob = blob->next;
for (outline = blob->outlines; outline; outline = nextoutline) {
nextoutline = outline->next;
delete_edgepts (outline->loop);
for (child = outline->child; child; child = nextchild) {
nextchild = child->next;
delete_edgepts (child->loop);
oldoutline(child);
}
oldoutline(outline);
}
oldblob(blob);
}
if (word->correct != NULL)
strfree (word->correct); /* Reclaim memory */
oldword(word);
}
/**********************************************************************
* delete_edgepts
*
* Delete a list of EDGEPT structures.
**********************************************************************/
void delete_edgepts(register EDGEPT *edgepts) {
register EDGEPT *this_edge;
register EDGEPT *next_edge;
if (edgepts == NULL)
return;
this_edge = edgepts;
do {
next_edge = this_edge->next;
oldedgept(this_edge);
this_edge = next_edge;
}
while (this_edge != edgepts);
}

119
ccstruct/blobs.h Normal file
View File

@ -0,0 +1,119 @@
/* -*-C-*-
********************************************************************************
*
* File: blobs.h (Formerly blobs.h)
* Description: Blob definition
* Author: Mark Seaman, OCR Technology
* Created: Fri Oct 27 15:39:52 1989
* Modified: Thu Mar 28 15:33:38 1991 (Mark Seaman) marks@hpgrlt
* Language: C
* Package: N/A
* Status: Experimental (Do Not Distribute)
*
* (c) Copyright 1989, Hewlett-Packard Company.
** Licensed under the Apache License, Version 2.0 (the "License");
** you may not use this file except in compliance with the License.
** You may obtain a copy of the License at
** http://www.apache.org/licenses/LICENSE-2.0
** Unless required by applicable law or agreed to in writing, software
** distributed under the License is distributed on an "AS IS" BASIS,
** WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
** See the License for the specific language governing permissions and
** limitations under the License.
*
*********************************************************************************/
#ifndef BLOBS_H
#define BLOBS_H
/*----------------------------------------------------------------------
I n c l u d e s
----------------------------------------------------------------------*/
#include "vecfuncs.h"
#include "tessclas.h"
/*----------------------------------------------------------------------
T y p e s
----------------------------------------------------------------------*/
typedef struct
{ /* Widths of pieces */
int num_chars;
int widths[1];
} WIDTH_RECORD;
/*----------------------------------------------------------------------
M a c r o s
----------------------------------------------------------------------*/
/**********************************************************************
* free_widths
*
* Free the memory taken up by a width array.
**********************************************************************/
#define free_widths(w) \
if (w) memfree (w)
/*----------------------------------------------------------------------
F u n c t i o n s
----------------------------------------------------------------------*/
void blob_origin(TBLOB *blob, /*blob to compute on */
TPOINT *origin); /*return value */
/*blob to compute on */
void blob_bounding_box(TBLOB *blob,
register TPOINT *topleft, /*bounding box */
register TPOINT *botright);
void blobs_bounding_box(TBLOB *blobs, TPOINT *topleft, TPOINT *botright);
void blobs_origin(TBLOB *blobs, /*blob to compute on */
TPOINT *origin); /*return value */
/*blob to compute on */
WIDTH_RECORD *blobs_widths(TBLOB *blobs);
int count_blobs(TBLOB *blobs);
void delete_word(TWERD *word);
void delete_edgepts(register EDGEPT *edgepts);
/*
#if defined(__STDC__) || defined(__cplusplus)
# define _ARGS(s) s
#else
# define _ARGS(s) ()
#endif*/
/* blobs.c
void blob_origin
_ARGS((BLOB *blob,
TPOINT *origin));
void blob_bounding_box
_ARGS((BLOB *blob,
TPOINT *topleft,
TPOINT *botright));
void blobs_bounding_box
_ARGS((BLOB *blobs,
TPOINT *topleft,
TPOINT *botright));
void blobs_origin
_ARGS((BLOB *blobs,
TPOINT *origin));
WIDTH_RECORD *blobs_widths
_ARGS((BLOB *blobs));
int count_blobs
_ARGS((BLOB *blobs));
void delete_word
_ARGS((TWERD *word));
void delete_edgepts
_ARGS((EDGEPT *edgepts));
#undef _ARGS
*/
#endif

537
ccstruct/blread.cpp Normal file
View File

@ -0,0 +1,537 @@
/**********************************************************************
* File: blread.cpp (Formerly pdread.c)
* Description: Friend function of BLOCK to read the uscan pd file.
* Author: Ray Smith
* Created: Mon Mar 18 14:39:00 GMT 1991
*
* (C) Copyright 1991, Hewlett-Packard Ltd.
** Licensed under the Apache License, Version 2.0 (the "License");
** you may not use this file except in compliance with the License.
** You may obtain a copy of the License at
** http://www.apache.org/licenses/LICENSE-2.0
** Unless required by applicable law or agreed to in writing, software
** distributed under the License is distributed on an "AS IS" BASIS,
** WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
** See the License for the specific language governing permissions and
** limitations under the License.
*
**********************************************************************/
#include "mfcpch.h"
#include <stdlib.h>
#ifdef __UNIX__
#include <assert.h>
#endif
#include "scanutils.h"
#include "fileerr.h"
#include "imgtiff.h"
#include "pdclass.h"
#include "rwpoly.h"
#include "blread.h"
#define PD_EXT ".pd"
#define VEC_EXT ".vec" //accupage file
#define HPD_EXT ".bl" //hand pd file
//unlv zone file
#define UNLV_EXT ".uzn"
#define BLOCK_EXPANSION 8 //boundary expansion
#define EXTERN
EXTERN BOOL_EVAR (ignore_weird_blocks, TRUE, "Don't read weird blocks");
static BOX convert_vec_block( //make non-rect block
VEC_ENTRY *entries, //vectors
UINT16 entry_count, //no of entries
INT32 ysize, //image size
ICOORDELT_IT *left_it, //block sides
ICOORDELT_IT *right_it);
/**********************************************************************
* BLOCK::read_pd_file
*
* Read a whole pd file to make a list of blocks, or use the whole page.
**********************************************************************/
BOOL8 read_pd_file( //print list of sides
STRING name, //basename of file
INT32 xsize, //image size
INT32 ysize, //image size
BLOCK_LIST *blocks //output list
) {
FILE *pdfp; //file pointer
BLOCK *block; //current block
INT32 block_count; //no of blocks
INT32 junk_count; //no of junks to read
INT32 junks[4]; //junk elements
INT32 vertex_count; //boundary vertices
INT32 xcoord; //current coords
INT32 ycoord;
INT32 prevx; //previous coords
INT32 prevy;
BLOCK_IT block_it = blocks; //block iterator
ICOORDELT_LIST dummy; //for constructor
ICOORDELT_IT left_it = &dummy; //iterator
ICOORDELT_IT right_it = &dummy;//iterator
if (read_hpd_file (name, xsize, ysize, blocks))
return TRUE; //succeeded
if (read_vec_file (name, xsize, ysize, blocks))
return TRUE; //succeeded
if (read_unlv_file (name, xsize, ysize, blocks))
return TRUE; //succeeded
name += PD_EXT; //add extension
if ((pdfp = fopen (name.string (), "r")) == NULL) {
//make rect block
block = new BLOCK (name.string (), TRUE, 0, 0, 0, 0, xsize, ysize);
block_it.add_to_end (block); //on end of list
return FALSE; //didn't read one
}
else {
if (fread (&block_count, sizeof (block_count), 1, pdfp) != 1)
READFAILED.error ("read_pd_file", EXIT, "Block count");
tprintf ("%d blocks in .pd file.\n", block_count);
while (block_count > 0) {
if (fread (&junk_count, sizeof (junk_count), 1, pdfp) != 1)
READFAILED.error ("read_pd_file", EXIT, "Junk count");
if (fread (&vertex_count, sizeof (vertex_count), 1, pdfp) != 1)
READFAILED.error ("read_pd_file", EXIT, "Vertex count");
block = new BLOCK; //make a block
//on end of list
block_it.add_to_end (block);
left_it.set_to_list (&block->leftside);
right_it.set_to_list (&block->rightside);
//read a pair
get_pd_vertex (pdfp, xsize, ysize, &block->box, xcoord, ycoord);
vertex_count -= 2; //count read ones
prevx = xcoord;
do {
if (xcoord == prevx) {
if (!right_it.empty ()) {
if (right_it.data ()->x () <= xcoord + BLOCK_EXPANSION)
right_it.data ()->set_y (right_it.data ()->y () +
BLOCK_EXPANSION);
else
right_it.data ()->set_y (right_it.data ()->y () -
BLOCK_EXPANSION);
}
right_it.
add_before_then_move (new
ICOORDELT (xcoord + BLOCK_EXPANSION,
ycoord));
}
prevx = xcoord; //remember previous
prevy = ycoord;
get_pd_vertex (pdfp, xsize, ysize, &block->box, xcoord, ycoord);
vertex_count -= 2; //count read ones
}
while (ycoord <= prevy);
right_it.data ()->set_y (right_it.data ()->y () - BLOCK_EXPANSION);
//start of left
left_it.add_to_end (new ICOORDELT (prevx - BLOCK_EXPANSION, prevy - BLOCK_EXPANSION));
do {
prevx = xcoord; //remember previous
get_pd_vertex (pdfp, xsize, ysize, &block->box, xcoord, ycoord);
vertex_count -= 2;
if (xcoord != prevx && vertex_count > 0) {
if (xcoord > prevx)
left_it.
add_to_end (new
ICOORDELT (xcoord - BLOCK_EXPANSION,
ycoord + BLOCK_EXPANSION));
else
left_it.
add_to_end (new
ICOORDELT (xcoord - BLOCK_EXPANSION,
ycoord - BLOCK_EXPANSION));
}
else if (vertex_count == 0)
left_it.add_to_end (new ICOORDELT (prevx - BLOCK_EXPANSION,
ycoord + BLOCK_EXPANSION));
}
while (vertex_count > 0); //until all read
while (junk_count > 0) {
if (fread (junks, sizeof (INT32), 4, pdfp) != 4)
READFAILED.error ("read_pd_file", EXIT, "Junk coords");
junk_count--;
}
block_count--; //count read blocks
}
}
fclose(pdfp);
return TRUE; //read one
}
/**********************************************************************
* get_pd_vertex
*
* Read a pair of coords, invert the y and clip to image limits.
* Also update the bounding box.
*
* Read a whole pd file to make a list of blocks, or use the whole page.
**********************************************************************/
void get_pd_vertex( //get new vertex
FILE *pdfp, //file to read
INT32 xsize, //image size
INT32 ysize, //image size
BOX *box, //bounding box
INT32 &xcoord, //output coords
INT32 &ycoord) {
BOX new_coord; //expansion box
//get new coords
if (fread (&xcoord, sizeof (xcoord), 1, pdfp) != 1)
READFAILED.error ("read_pd_file", EXIT, "Xcoord");
if (fread (&ycoord, sizeof (ycoord), 1, pdfp) != 1)
READFAILED.error ("read_pd_file", EXIT, "Xcoord");
ycoord = ysize - ycoord; //invert y
if (xcoord < BLOCK_EXPANSION)
xcoord = BLOCK_EXPANSION; //clip to limits
if (xcoord > xsize - BLOCK_EXPANSION)
xcoord = xsize - BLOCK_EXPANSION;
if (ycoord < BLOCK_EXPANSION)
ycoord = BLOCK_EXPANSION;
if (ycoord > ysize - BLOCK_EXPANSION)
ycoord = ysize - BLOCK_EXPANSION;
new_coord =
BOX (ICOORD (xcoord - BLOCK_EXPANSION, ycoord - BLOCK_EXPANSION),
ICOORD (xcoord + BLOCK_EXPANSION, ycoord + BLOCK_EXPANSION));
(*box) += new_coord;
}
/**********************************************************************
* BLOCK::read_hpd_file
*
* Read a whole hpd file to make a list of blocks.
* Return FALSE if the .vec fiel cannot be found
**********************************************************************/
BOOL8 read_hpd_file( //print list of sides
STRING name, //basename of file
INT32 xsize, //image size
INT32 ysize, //image size
BLOCK_LIST *blocks //output list
) {
FILE *pdfp; //file pointer
PAGE_BLOCK_LIST *page_blocks;
INT32 block_no; //no of blocks
BLOCK_IT block_it = blocks; //block iterator
name += HPD_EXT; //add extension
if ((pdfp = fopen (name.string (), "r")) == NULL) {
return FALSE; //can't find it
}
fclose(pdfp);
page_blocks = read_poly_blocks (name.string ());
block_no = 0;
scan_hpd_blocks (name.string (), page_blocks, block_no, &block_it);
tprintf ("Text region count=%d\n", block_no);
return TRUE; //read one
}
/**********************************************************************
* BLOCK::scan_hpd_blocks
*
* Read a whole hpd file to make a list of blocks.
* Return FALSE if the .vec fiel cannot be found
**********************************************************************/
void scan_hpd_blocks( //print list of sides
const char *name, //block label
PAGE_BLOCK_LIST *page_blocks, //head of full pag
INT32 &block_no, //no of blocks
BLOCK_IT *block_it //block iterator
) {
BLOCK *block; //current block
//page blocks
PAGE_BLOCK_IT pb_it = page_blocks;
PAGE_BLOCK *current_block;
TEXT_REGION_IT tr_it;
TEXT_BLOCK *tb;
TEXT_REGION *tr;
BOX *block_box; //from text region
for (pb_it.mark_cycle_pt (); !pb_it.cycled_list (); pb_it.forward ()) {
current_block = pb_it.data ();
if (current_block->type () == PB_TEXT) {
tb = (TEXT_BLOCK *) current_block;
if (!tb->regions ()->empty ()) {
tr_it.set_to_list (tb->regions ());
for (tr_it.mark_cycle_pt ();
!tr_it.cycled_list (); tr_it.forward ()) {
block_no++;
tr = tr_it.data ();
block_box = tr->bounding_box ();
block = new BLOCK (name, TRUE, 0, 0,
block_box->left (), block_box->bottom (),
block_box->right (), block_box->top ());
block->hand_block = tr;
block->hand_poly = tr;
block_it->add_after_then_move (block);
}
}
}
else if (current_block->type () == PB_WEIRD
&& !ignore_weird_blocks
&& ((WEIRD_BLOCK *) current_block)->id_no () > 0) {
block_no++;
block_box = current_block->bounding_box ();
block = new BLOCK (name, TRUE, 0, 0,
block_box->left (), block_box->bottom (),
block_box->right (), block_box->top ());
block->hand_block = NULL;
block->hand_poly = current_block;
block_it->add_after_then_move (block);
}
if (!current_block->child ()->empty ())
scan_hpd_blocks (name, current_block->child (), block_no, block_it);
}
}
/**********************************************************************
* BLOCK::read_vec_file
*
* Read a whole vec file to make a list of blocks.
* Return FALSE if the .vec fiel cannot be found
**********************************************************************/
BOOL8 read_vec_file( //print list of sides
STRING name, //basename of file
INT32 xsize, //image size
INT32 ysize, //image size
BLOCK_LIST *blocks //output list
) {
FILE *pdfp; //file pointer
BLOCK *block; //current block
INT32 block_no; //no of blocks
INT32 block_index; //current blocks
INT32 vector_count; //total vectors
VEC_HEADER header; //file header
BLOCK_HEADER *vec_blocks; //blocks from file
VEC_ENTRY *vec_entries; //vectors from file
BLOCK_IT block_it = blocks; //block iterator
ICOORDELT_IT left_it; //iterators
ICOORDELT_IT right_it;
name += VEC_EXT; //add extension
if ((pdfp = fopen (name.string (), "r")) == NULL) {
return FALSE; //can't find it
}
if (fread (&header, sizeof (header), 1, pdfp) != 1)
READFAILED.error ("read_vec_file", EXIT, "Header");
//from intel
header.filesize = reverse32 (header.filesize);
header.bytesize = reverse16 (header.bytesize);
header.arraysize = reverse16 (header.arraysize);
header.width = reverse16 (header.width);
header.height = reverse16 (header.height);
header.res = reverse16 (header.res);
header.bpp = reverse16 (header.bpp);
tprintf ("%d blocks in %s file:", header.arraysize, VEC_EXT);
vector_count = header.filesize - header.arraysize * sizeof (BLOCK_HEADER);
vector_count /= sizeof (VEC_ENTRY);
vec_blocks =
(BLOCK_HEADER *) alloc_mem (header.arraysize * sizeof (BLOCK_HEADER));
vec_entries = (VEC_ENTRY *) alloc_mem (vector_count * sizeof (VEC_ENTRY));
xsize = header.width; //real image size
ysize = header.height;
if (fread (vec_blocks, sizeof (BLOCK_HEADER), header.arraysize, pdfp)
!= static_cast<size_t>(header.arraysize))
READFAILED.error ("read_vec_file", EXIT, "Blocks");
if (fread (vec_entries, sizeof (VEC_ENTRY), vector_count, pdfp)
!= static_cast<size_t>(vector_count))
READFAILED.error ("read_vec_file", EXIT, "Vectors");
for (block_index = 0; block_index < header.arraysize; block_index++) {
vec_blocks[block_index].offset =
reverse16 (vec_blocks[block_index].offset);
vec_blocks[block_index].order =
reverse16 (vec_blocks[block_index].order);
vec_blocks[block_index].entries =
reverse16 (vec_blocks[block_index].entries);
vec_blocks[block_index].charsize =
reverse16 (vec_blocks[block_index].charsize);
}
for (block_index = 0; block_index < vector_count; block_index++) {
vec_entries[block_index].start =
ICOORD (reverse16 (vec_entries[block_index].start.x ()),
reverse16 (vec_entries[block_index].start.y ()));
vec_entries[block_index].end =
ICOORD (reverse16 (vec_entries[block_index].end.x ()),
reverse16 (vec_entries[block_index].end.y ()));
}
for (block_no = 1; block_no <= header.arraysize; block_no++) {
for (block_index = 0; block_index < header.arraysize; block_index++) {
if (vec_blocks[block_index].order == block_no
&& vec_blocks[block_index].valid) {
block = new BLOCK;
left_it.set_to_list (&block->leftside);
right_it.set_to_list (&block->rightside);
block->box =
convert_vec_block (&vec_entries
[vec_blocks[block_index].offset],
vec_blocks[block_index].entries, ysize,
&left_it, &right_it);
block->set_xheight (vec_blocks[block_index].charsize);
//on end of list
block_it.add_to_end (block);
// tprintf("Block at (%d,%d)->(%d,%d) has index %d and order %d\n",
// block->box.left(),
// block->box.bottom(),
// block->box.right(),
// block->box.top(),
// block_index,vec_blocks[block_index].order);
}
}
}
free_mem(vec_blocks);
free_mem(vec_entries);
tprintf ("%d valid\n", block_it.length ());
fclose(pdfp);
return TRUE; //read one
}
/**********************************************************************
* BLOCK::convert_vec_block
*
* Read a whole vec file to make a list of blocks.
* Return FALSE if the .vec fiel cannot be found
**********************************************************************/
static BOX convert_vec_block( //make non-rect block
VEC_ENTRY *entries, //vectors
UINT16 entry_count, //no of entries
INT32 ysize, //image size
ICOORDELT_IT *left_it, //block sides
ICOORDELT_IT *right_it) {
BOX block_box; //bounding box
BOX vec_box; //box of vec
ICOORD box_point; //expanded coord
ICOORD shift_vec; //for box expansion
ICOORD prev_pt; //previous coord
ICOORD end_pt; //end of vector
INT32 vertex_index; //boundary vertices
for (vertex_index = 0; vertex_index < entry_count; vertex_index++) {
entries[vertex_index].start = ICOORD (entries[vertex_index].start.x (),
ysize - 1 -
entries[vertex_index].start.y ());
entries[vertex_index].end =
ICOORD (entries[vertex_index].end.x (),
ysize - 1 - entries[vertex_index].end.y ());
vec_box = BOX (entries[vertex_index].start, entries[vertex_index].end);
block_box += vec_box; //find total bounds
}
for (vertex_index = 0; vertex_index < entry_count
&& (entries[vertex_index].start.y () != block_box.bottom ()
|| entries[vertex_index].end.y () != block_box.bottom ());
vertex_index++);
ASSERT_HOST (vertex_index < entry_count);
prev_pt = entries[vertex_index].start;
end_pt = entries[vertex_index].end;
do {
for (vertex_index = 0; vertex_index < entry_count
&& entries[vertex_index].start != end_pt; vertex_index++);
//found start of vertical
ASSERT_HOST (vertex_index < entry_count);
box_point = entries[vertex_index].start;
if (box_point.x () <= prev_pt.x ())
shift_vec = ICOORD (-BLOCK_EXPANSION, -BLOCK_EXPANSION);
else
shift_vec = ICOORD (-BLOCK_EXPANSION, BLOCK_EXPANSION);
left_it->add_to_end (new ICOORDELT (box_point + shift_vec));
prev_pt = box_point;
for (vertex_index = 0; vertex_index < entry_count
&& entries[vertex_index].start != end_pt; vertex_index++);
//found horizontal
ASSERT_HOST (vertex_index < entry_count);
end_pt = entries[vertex_index].end;
}
while (end_pt.y () < block_box.top ());
shift_vec = ICOORD (-BLOCK_EXPANSION, BLOCK_EXPANSION);
left_it->add_to_end (new ICOORDELT (end_pt + shift_vec));
for (vertex_index = 0; vertex_index < entry_count
&& (entries[vertex_index].start.y () != block_box.top ()
|| entries[vertex_index].end.y () != block_box.top ());
vertex_index++);
ASSERT_HOST (vertex_index < entry_count);
prev_pt = entries[vertex_index].start;
end_pt = entries[vertex_index].end;
do {
for (vertex_index = 0; vertex_index < entry_count
&& entries[vertex_index].start != end_pt; vertex_index++);
//found start of vertical
ASSERT_HOST (vertex_index < entry_count);
box_point = entries[vertex_index].start;
if (box_point.x () < prev_pt.x ())
shift_vec = ICOORD (BLOCK_EXPANSION, -BLOCK_EXPANSION);
else
shift_vec = ICOORD (BLOCK_EXPANSION, BLOCK_EXPANSION);
right_it->add_before_then_move (new ICOORDELT (box_point + shift_vec));
prev_pt = box_point;
for (vertex_index = 0; vertex_index < entry_count
&& entries[vertex_index].start != end_pt; vertex_index++);
//found horizontal
ASSERT_HOST (vertex_index < entry_count);
end_pt = entries[vertex_index].end;
}
while (end_pt.y () > block_box.bottom ());
shift_vec = ICOORD (BLOCK_EXPANSION, -BLOCK_EXPANSION);
right_it->add_before_then_move (new ICOORDELT (end_pt + shift_vec));
shift_vec = ICOORD (BLOCK_EXPANSION, BLOCK_EXPANSION);
box_point = block_box.botleft () - shift_vec;
end_pt = block_box.topright () + shift_vec;
return BOX (box_point, end_pt);
}
/**********************************************************************
* read_unlv_file
*
* Read a whole unlv zone file to make a list of blocks.
**********************************************************************/
BOOL8 read_unlv_file( //print list of sides
STRING name, //basename of file
INT32 xsize, //image size
INT32 ysize, //image size
BLOCK_LIST *blocks //output list
) {
FILE *pdfp; //file pointer
BLOCK *block; //current block
int x; //current top-down coords
int y;
int width; //of current block
int height;
BLOCK_IT block_it = blocks; //block iterator
name += UNLV_EXT; //add extension
if ((pdfp = fopen (name.string (), "r")) == NULL) {
return FALSE; //didn't read one
}
else {
while (fscanf (pdfp, "%d %d %d %d %*s", &x, &y, &width, &height) >= 4) {
//make rect block
block = new BLOCK (name.string (), TRUE, 0, 0, (INT16) x, (INT16) (ysize - 1 - y - height), (INT16) (x + width), (INT16) (ysize - 1 - y));
//on end of list
block_it.add_to_end (block);
}
fclose(pdfp);
}
return true;
}

63
ccstruct/blread.h Normal file
View File

@ -0,0 +1,63 @@
/**********************************************************************
* File: blread.h (Formerly pdread.h)
* Description: Friend function of BLOCK to read the uscan pd file.
* Author: Ray Smith
* Created: Mon Mar 18 14:39:00 GMT 1991
*
* (C) Copyright 1991, Hewlett-Packard Ltd.
** Licensed under the Apache License, Version 2.0 (the "License");
** you may not use this file except in compliance with the License.
** You may obtain a copy of the License at
** http://www.apache.org/licenses/LICENSE-2.0
** Unless required by applicable law or agreed to in writing, software
** distributed under the License is distributed on an "AS IS" BASIS,
** WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
** See the License for the specific language governing permissions and
** limitations under the License.
*
**********************************************************************/
#ifndef BLREAD_H
#define BLREAD_H
#include "varable.h"
#include "ocrblock.h"
BOOL8 read_pd_file( //print list of sides
STRING name, //basename of file
INT32 xsize, //image size
INT32 ysize, //image size
BLOCK_LIST *blocks //output list
);
void get_pd_vertex( //get new vertex
FILE *pdfp, //file to read
INT32 xsize, //image size
INT32 ysize, //image size
BOX *box, //bounding box
INT32 &xcoord, //output coords
INT32 &ycoord);
BOOL8 read_hpd_file( //print list of sides
STRING name, //basename of file
INT32 xsize, //image size
INT32 ysize, //image size
BLOCK_LIST *blocks //output list
);
void scan_hpd_blocks( //print list of sides
const char *name, //block label
PAGE_BLOCK_LIST *page_blocks, //head of full pag
INT32 &block_no, //no of blocks
BLOCK_IT *block_it //block iterator
);
BOOL8 read_vec_file( //print list of sides
STRING name, //basename of file
INT32 xsize, //image size
INT32 ysize, //image size
BLOCK_LIST *blocks //output list
);
BOOL8 read_unlv_file( //print list of sides
STRING name, //basename of file
INT32 xsize, //image size
INT32 ysize, //image size
BLOCK_LIST *blocks //output list
);
#endif

270
ccstruct/callcpp.cpp Normal file
View File

@ -0,0 +1,270 @@
/**********************************************************************
* File: callcpp.cpp
* Description: extern C interface calling C++ from C.
* Author: Ray Smith
* Created: Sun Feb 04 20:39:23 MST 1996
*
* (C) Copyright 1996, Hewlett-Packard Co.
** Licensed under the Apache License, Version 2.0 (the "License");
** you may not use this file except in compliance with the License.
** You may obtain a copy of the License at
** http://www.apache.org/licenses/LICENSE-2.0
** Unless required by applicable law or agreed to in writing, software
** distributed under the License is distributed on an "AS IS" BASIS,
** WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
** See the License for the specific language governing permissions and
** limitations under the License.
*
**********************************************************************/
#include "mfcpch.h"
#include "errcode.h"
#ifdef __UNIX__
#include <assert.h>
#include <stdarg.h>
#endif
#include <time.h>
#include "memry.h"
#include "grphics.h"
#include "evnts.h"
#include "varable.h"
#include "callcpp.h"
#include "tprintf.h"
//#include "strace.h"
#include "host.h"
//extern "C" {
INT_VAR (tess_cp_mapping0, 0, "Mappings for class pruner distance");
INT_VAR (tess_cp_mapping1, 1, "Mappings for class pruner distance");
INT_VAR (tess_cp_mapping2, 2, "Mappings for class pruner distance");
INT_VAR (tess_cp_mapping3, 3, "Mappings for class pruner distance");
INT_VAR (stopper_numbers_on, 0, "Allow numbers to be acceptable choices");
INT_VAR (config_pruner_enabled, 0, "Turn on config pruner");
INT_VAR (feature_prune_percentile, 0, "Percent of features to use");
INT_VAR (newcp_ratings_on, 0, "Use new class pruner normalisation");
INT_VAR (record_matcher_output, 0, "Record detailed matcher info");
INT_VAR (il1_adaption_test, 0, "Dont adapt to i/I at beginning of word");
double_VAR (permuter_pending_threshold, 0.0,
"Worst conf for using pending dictionary");
double_VAR (newcp_duff_rating, 0.30, "Worst rating for calling real matcher");
double_VAR (newcp_prune_threshold, 1.2, "Ratio of best to prune");
double_VAR (tessedit_cp_ratio, 0.0, "Ratio from best to prune");
//Global matcher info from the class pruner.
INT32 cp_classes;
INT32 cp_bestindex;
INT32 cp_bestrating;
INT32 cp_bestconf;
char cp_chars[2];
INT32 cp_ratings[2];
INT32 cp_confs[2];
INT32 cp_maps[4];
//Global info to control writes of matcher info
INT32 blob_type; //write control
char blob_answer; //correct char
char *word_answer; //correct word
INT32 matcher_pass; //pass in chopper.c
INT32 bits_in_states; //no of bits in states
#ifndef __UNIX__
/**********************************************************************
* assert
*
* A version of assert for C on NT.
**********************************************************************/
void assert( //recog one owrd
int testing //assert fail if false
) {
ASSERT_HOST(testing);
}
#endif
void setup_cp_maps() {
cp_maps[0] = tess_cp_mapping0;
cp_maps[1] = tess_cp_mapping1;
cp_maps[2] = tess_cp_mapping2;
cp_maps[3] = tess_cp_mapping3;
}
void trace_stack() { //Trace current stack
}
void
cprintf ( //Trace printf
const char *format, ... //special message
) {
va_list args; //variable args
char msg[1000];
va_start(args, format); //variable list
vsprintf(msg, format, args); //Format into msg
va_end(args);
tprintf ("%s", msg);
}
char *c_alloc_string( //allocate string
INT32 count //no of chars required
) {
return alloc_string (count);
}
void c_free_string( //free a string
char *string //string to free
) {
free_string(string);
}
void *c_alloc_struct( //allocate memory
INT32 count, //no of chars required
const char *name //class name
) {
return alloc_struct (count, name);
}
void c_free_struct( //free a structure
void *deadstruct, //structure to free
INT32 count, //no of bytes
const char *name //class name
) {
free_struct(deadstruct, count, name);
}
void *c_alloc_mem_p( //allocate permanent space
INT32 count //block size to allocate
) {
return alloc_mem_p (count);
}
void *c_alloc_mem( //get some memory
INT32 count //no of bytes to get
) {
return alloc_mem (count);
}
void c_free_mem( //free mem from alloc_mem
void *oldchunk //chunk to free
) {
free_mem(oldchunk);
}
void c_check_mem( //check consistency
const char *string, //context message
INT8 level //level of check
) {
check_mem(string, level);
}
#ifndef GRAPHICS_DISABLED
void *c_create_window( /*create a window */
const char *name, /*name/title of window */
INT16 xpos, /*coords of window */
INT16 ypos, /*coords of window */
INT16 xsize, /*size of window */
INT16 ysize, /*size of window */
double xmin, /*scrolling limits */
double xmax, /*to stop users */
double ymin, /*getting lost in */
double ymax /*empty space */
) {
return create_window (name, SCROLLINGWIN, xpos, ypos, xsize, ysize,
xmin, xmax, ymin, ymax, TRUE, FALSE, FALSE, TRUE);
}
void c_line_color_index( /*set color */
void *win,
C_COL index) {
WINDOW window = (WINDOW) win;
// ASSERT_HOST(index>=0 && index<=48);
if (index < 0 || index > 48)
index = (C_COL) 1;
window->Line_color_index ((COLOUR) index);
}
void c_move( /*move pen */
void *win,
double x,
double y) {
WINDOW window = (WINDOW) win;
window->Move2d (x, y);
}
void c_draw( /*move pen */
void *win,
double x,
double y) {
WINDOW window = (WINDOW) win;
window->Draw2d (x, y);
}
void c_make_current( /*move pen */
void *win) {
WINDOW window = (WINDOW) win;
window->Make_picture_current ();
}
void c_clear_window( /*move pen */
void *win) {
WINDOW window = (WINDOW) win;
window->Clear_view_surface ();
}
char window_wait( /*move pen */
void *win) {
WINDOW window = (WINDOW) win;
GRAPHICS_EVENT event;
await_event(window, TRUE, ANY_EVENT, &event);
if (event.type == KEYPRESS_EVENT)
return event.key;
else
return '\0';
}
#endif
void reverse32(void *ptr) {
char tmp;
char *cptr = (char *) ptr;
tmp = *cptr;
*cptr = *(cptr + 3);
*(cptr + 3) = tmp;
tmp = *(cptr + 1);
*(cptr + 1) = *(cptr + 2);
*(cptr + 2) = tmp;
}
void reverse16(void *ptr) {
char tmp;
char *cptr = (char *) ptr;
tmp = *cptr;
*cptr = *(cptr + 1);
*(cptr + 1) = tmp;
}
//};

604
ccstruct/coutln.cpp Normal file
View File

@ -0,0 +1,604 @@
/**********************************************************************
* File: coutln.c (Formerly coutline.c)
* Description: Code for the C_OUTLINE class.
* Author: Ray Smith
* Created: Mon Oct 07 16:01:57 BST 1991
*
* (C) Copyright 1991, Hewlett-Packard Ltd.
** Licensed under the Apache License, Version 2.0 (the "License");
** you may not use this file except in compliance with the License.
** You may obtain a copy of the License at
** http://www.apache.org/licenses/LICENSE-2.0
** Unless required by applicable law or agreed to in writing, software
** distributed under the License is distributed on an "AS IS" BASIS,
** WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
** See the License for the specific language governing permissions and
** limitations under the License.
*
**********************************************************************/
#include "mfcpch.h"
#include <string.h>
#ifdef __UNIX__
#include <assert.h>
#endif
#include "coutln.h"
ELISTIZE_S (C_OUTLINE)
ICOORD C_OUTLINE::step_coords[4] = {
ICOORD (-1, 0), ICOORD (0, -1), ICOORD (1, 0), ICOORD (0, 1)
};
/**********************************************************************
* C_OUTLINE::C_OUTLINE
*
* Constructor to build a C_OUTLINE from a CRACKEDGE LOOP.
**********************************************************************/
C_OUTLINE::C_OUTLINE (
//constructor
CRACKEDGE * startpt, //outline to convert
ICOORD bot_left, //bounding box
ICOORD top_right, INT16 length //length of loop
):box (bot_left, top_right), start (startpt->pos) {
INT16 stepindex; //index to step
CRACKEDGE *edgept; //current point
stepcount = length; //no of steps
//get memory
steps = (UINT8 *) alloc_mem (step_mem());
memset(steps, 0, step_mem());
edgept = startpt;
for (stepindex = 0; stepindex < length; stepindex++) {
//set compact step
set_step (stepindex, edgept->stepdir);
edgept = edgept->next;
}
}
/**********************************************************************
* C_OUTLINE::C_OUTLINE
*
* Constructor to build a C_OUTLINE from a C_OUTLINE_FRAG.
**********************************************************************/
C_OUTLINE::C_OUTLINE (
//constructor
//steps to copy
ICOORD startpt, DIR128 * new_steps,
INT16 length //length of loop
):start (startpt) {
INT8 dirdiff; //direction difference
DIR128 prevdir; //previous direction
DIR128 dir; //current direction
DIR128 lastdir; //dir of last step
BOX new_box; //easy bounding
INT16 stepindex; //index to step
INT16 srcindex; //source steps
ICOORD pos; //current position
pos = startpt;
stepcount = length; //no of steps
//get memory
steps = (UINT8 *) alloc_mem (step_mem());
memset(steps, 0, step_mem());
lastdir = new_steps[length - 1];
prevdir = lastdir;
for (stepindex = 0, srcindex = 0; srcindex < length;
stepindex++, srcindex++) {
new_box = BOX (pos, pos);
box += new_box;
//copy steps
dir = new_steps[srcindex];
set_step(stepindex, dir);
dirdiff = dir - prevdir;
pos += step (stepindex);
if ((dirdiff == 64 || dirdiff == -64) && stepindex > 0) {
stepindex -= 2; //cancel there-and-back
prevdir = stepindex >= 0 ? step_dir (stepindex) : lastdir;
}
else
prevdir = dir;
}
ASSERT_HOST (pos.x () == startpt.x () && pos.y () == startpt.y ());
do {
dirdiff = step_dir (stepindex - 1) - step_dir (0);
if (dirdiff == 64 || dirdiff == -64) {
start += step (0);
stepindex -= 2; //cancel there-and-back
for (int i = 0; i < stepindex; ++i)
set_step(i, step_dir(i + 1));
}
}
while (stepindex > 1 && (dirdiff == 64 || dirdiff == -64));
stepcount = stepindex;
ASSERT_HOST (stepcount >= 4);
}
/**********************************************************************
* C_OUTLINE::C_OUTLINE
*
* Constructor to build a C_OUTLINE from a rotation of a C_OUTLINE.
**********************************************************************/
C_OUTLINE::C_OUTLINE( //constructor
C_OUTLINE *srcline, //outline to
FCOORD rotation //rotate
) {
BOX new_box; //easy bounding
INT16 stepindex; //index to step
INT16 dirdiff; //direction change
ICOORD pos; //current position
ICOORD prevpos; //previous dest point
ICOORD destpos; //destination point
INT16 destindex; //index to step
DIR128 dir; //coded direction
UINT8 new_step;
stepcount = srcline->stepcount * 2;
//get memory
steps = (UINT8 *) alloc_mem (step_mem());
memset(steps, 0, step_mem());
for (int iteration = 0; iteration < 2; ++iteration) {
DIR128 round1 = iteration == 0 ? 32 : 0;
DIR128 round2 = iteration != 0 ? 32 : 0;
pos = srcline->start;
prevpos = pos;
prevpos.rotate (rotation);
start = prevpos;
box = BOX (start, start);
destindex = 0;
for (stepindex = 0; stepindex < srcline->stepcount; stepindex++) {
pos += srcline->step (stepindex);
destpos = pos;
destpos.rotate (rotation);
if (destpos.x () != prevpos.x () || destpos.y () != prevpos.y ()) {
dir = DIR128 (FCOORD (destpos - prevpos));
dir += 64; //turn to step style
new_step = dir.get_dir ();
if (new_step & 31) {
set_step(destindex++, dir + round1);
if (destindex < 2
|| (dirdiff =
step_dir (destindex - 1) - step_dir (destindex - 2)) !=
-64 && dirdiff != 64)
set_step(destindex++, dir + round2);
else {
set_step(destindex - 1, dir + round2);
set_step(destindex++, dir + round1);
}
}
else {
set_step(destindex++, dir);
if (destindex >= 2
&&
((dirdiff =
step_dir (destindex - 1) - step_dir (destindex - 2)) ==
-64 || dirdiff == 64))
destindex -= 2; // Forget u turn
}
prevpos = destpos;
new_box = BOX (destpos, destpos);
box += new_box;
}
}
ASSERT_HOST (destpos.x () == start.x () && destpos.y () == start.y ());
dirdiff = step_dir (destindex - 1) - step_dir (0);
while ((dirdiff == 64 || dirdiff == -64) && destindex > 1) {
start += step (0);
destindex -= 2;
for (int i = 0; i < destindex; ++i)
set_step(i, step_dir(i + 1));
dirdiff = step_dir (destindex - 1) - step_dir (0);
}
if (destindex >= 4)
break;
}
stepcount = destindex;
destpos = start;
for (stepindex = 0; stepindex < stepcount; stepindex++) {
destpos += step (stepindex);
}
ASSERT_HOST (destpos.x () == start.x () && destpos.y () == start.y ());
}
/**********************************************************************
* C_OUTLINE::area
*
* Compute the area of the outline.
**********************************************************************/
INT32 C_OUTLINE::area() { //winding number
int stepindex; //current step
INT32 total_steps; //steps to do
INT32 total; //total area
ICOORD pos; //position of point
ICOORD next_step; //step to next pix
C_OUTLINE_IT it = child ();
pos = start_pos ();
total_steps = pathlength ();
total = 0;
for (stepindex = 0; stepindex < total_steps; stepindex++) {
//all intersected
next_step = step (stepindex);
if (next_step.x () < 0)
total += pos.y ();
else if (next_step.x () > 0)
total -= pos.y ();
pos += next_step;
}
for (it.mark_cycle_pt (); !it.cycled_list (); it.forward ())
total += it.data ()->area ();//add areas of children
return total;
}
/**********************************************************************
* C_OUTLINE::outer_area
*
* Compute the area of the outline.
**********************************************************************/
INT32 C_OUTLINE::outer_area() { //winding number
int stepindex; //current step
INT32 total_steps; //steps to do
INT32 total; //total area
ICOORD pos; //position of point
ICOORD next_step; //step to next pix
pos = start_pos ();
total_steps = pathlength ();
total = 0;
for (stepindex = 0; stepindex < total_steps; stepindex++) {
//all intersected
next_step = step (stepindex);
if (next_step.x () < 0)
total += pos.y ();
else if (next_step.x () > 0)
total -= pos.y ();
pos += next_step;
}
return total;
}
/**********************************************************************
* C_OUTLINE::count_transitions
*
* Compute the number of x and y maxes and mins in the outline.
**********************************************************************/
INT32 C_OUTLINE::count_transitions( //winding number
INT32 threshold //on size
) {
BOOL8 first_was_max_x; //what was first
BOOL8 first_was_max_y;
BOOL8 looking_for_max_x; //what is next
BOOL8 looking_for_min_x;
BOOL8 looking_for_max_y; //what is next
BOOL8 looking_for_min_y;
int stepindex; //current step
INT32 total_steps; //steps to do
//current limits
INT32 max_x, min_x, max_y, min_y;
INT32 initial_x, initial_y; //initial limits
INT32 total; //total changes
ICOORD pos; //position of point
ICOORD next_step; //step to next pix
pos = start_pos ();
total_steps = pathlength ();
total = 0;
max_x = min_x = pos.x ();
max_y = min_y = pos.y ();
looking_for_max_x = TRUE;
looking_for_min_x = TRUE;
looking_for_max_y = TRUE;
looking_for_min_y = TRUE;
first_was_max_x = FALSE;
first_was_max_y = FALSE;
initial_x = pos.x ();
initial_y = pos.y (); //stop uninit warning
for (stepindex = 0; stepindex < total_steps; stepindex++) {
//all intersected
next_step = step (stepindex);
pos += next_step;
if (next_step.x () < 0) {
if (looking_for_max_x && pos.x () < min_x)
min_x = pos.x ();
if (looking_for_min_x && max_x - pos.x () > threshold) {
if (looking_for_max_x) {
initial_x = max_x;
first_was_max_x = FALSE;
}
total++;
looking_for_max_x = TRUE;
looking_for_min_x = FALSE;
min_x = pos.x (); //reset min
}
}
else if (next_step.x () > 0) {
if (looking_for_min_x && pos.x () > max_x)
max_x = pos.x ();
if (looking_for_max_x && pos.x () - min_x > threshold) {
if (looking_for_min_x) {
initial_x = min_x; //remember first min
first_was_max_x = TRUE;
}
total++;
looking_for_max_x = FALSE;
looking_for_min_x = TRUE;
max_x = pos.x ();
}
}
else if (next_step.y () < 0) {
if (looking_for_max_y && pos.y () < min_y)
min_y = pos.y ();
if (looking_for_min_y && max_y - pos.y () > threshold) {
if (looking_for_max_y) {
initial_y = max_y; //remember first max
first_was_max_y = FALSE;
}
total++;
looking_for_max_y = TRUE;
looking_for_min_y = FALSE;
min_y = pos.y (); //reset min
}
}
else {
if (looking_for_min_y && pos.y () > max_y)
max_y = pos.y ();
if (looking_for_max_y && pos.y () - min_y > threshold) {
if (looking_for_min_y) {
initial_y = min_y; //remember first min
first_was_max_y = TRUE;
}
total++;
looking_for_max_y = FALSE;
looking_for_min_y = TRUE;
max_y = pos.y ();
}
}
}
if (first_was_max_x && looking_for_min_x) {
if (max_x - initial_x > threshold)
total++;
else
total--;
}
else if (!first_was_max_x && looking_for_max_x) {
if (initial_x - min_x > threshold)
total++;
else
total--;
}
if (first_was_max_y && looking_for_min_y) {
if (max_y - initial_y > threshold)
total++;
else
total--;
}
else if (!first_was_max_y && looking_for_max_y) {
if (initial_y - min_y > threshold)
total++;
else
total--;
}
return total;
}
/**********************************************************************
* C_OUTLINE::operator<
*
* Return TRUE if the left operand is inside the right one.
**********************************************************************/
BOOL8
C_OUTLINE::operator< ( //winding number
const C_OUTLINE & other //other outline
) const
{
INT16 count = 0; //winding count
ICOORD pos; //position of point
INT32 stepindex; //index to cstep
if (!box.overlap (other.box))
return FALSE; //can't be contained
pos = start;
for (stepindex = 0; stepindex < stepcount
&& (count = other.winding_number (pos)) == INTERSECTING; stepindex++)
pos += step (stepindex); //try all points
if (count == INTERSECTING) {
//all intersected
pos = other.start;
for (stepindex = 0; stepindex < other.stepcount
&& (count = winding_number (pos)) == INTERSECTING; stepindex++)
//try other way round
pos += other.step (stepindex);
return count == INTERSECTING || count == 0;
}
return count != 0;
}
/**********************************************************************
* C_OUTLINE::winding_number
*
* Return the winding number of the outline around the given point.
**********************************************************************/
INT16 C_OUTLINE::winding_number( //winding number
ICOORD point //point to wind around
) const {
INT16 stepindex; //index to cstep
INT16 count; //winding count
ICOORD vec; //to current point
ICOORD stepvec; //step vector
INT32 cross; //cross product
vec = start - point; //vector to it
count = 0;
for (stepindex = 0; stepindex < stepcount; stepindex++) {
stepvec = step (stepindex); //get the step
//crossing the line
if (vec.y () <= 0 && vec.y () + stepvec.y () > 0) {
cross = vec * stepvec; //cross product
if (cross > 0)
count++; //crossing right half
else if (cross == 0)
return INTERSECTING; //going through point
}
else if (vec.y () > 0 && vec.y () + stepvec.y () <= 0) {
cross = vec * stepvec;
if (cross < 0)
count--; //crossing back
else if (cross == 0)
return INTERSECTING; //illegal
}
vec += stepvec; //sum vectors
}
return count; //winding number
}
/**********************************************************************
* C_OUTLINE::turn_direction
*
* Return the sum direction delta of the outline.
**********************************************************************/
INT16 C_OUTLINE::turn_direction() const { //winding number
DIR128 prevdir; //previous direction
DIR128 dir; //current direction
INT16 stepindex; //index to cstep
INT8 dirdiff; //direction difference
INT16 count; //winding count
count = 0;
prevdir = step_dir (stepcount - 1);
for (stepindex = 0; stepindex < stepcount; stepindex++) {
dir = step_dir (stepindex);
dirdiff = dir - prevdir;
ASSERT_HOST (dirdiff == 0 || dirdiff == 32 || dirdiff == -32);
count += dirdiff;
prevdir = dir;
}
ASSERT_HOST (count == 128 || count == -128);
return count; //winding number
}
/**********************************************************************
* C_OUTLINE::reverse
*
* Reverse the direction of an outline.
**********************************************************************/
void C_OUTLINE::reverse() { //reverse drection
DIR128 halfturn = MODULUS / 2; //amount to shift
DIR128 stepdir; //direction of step
INT16 stepindex; //index to cstep
INT16 farindex; //index to other side
INT16 halfsteps; //half of stepcount
halfsteps = (stepcount + 1) / 2;
for (stepindex = 0; stepindex < halfsteps; stepindex++) {
farindex = stepcount - stepindex - 1;
stepdir = step_dir (stepindex);
set_step (stepindex, step_dir (farindex) + halfturn);
set_step (farindex, stepdir + halfturn);
}
}
/**********************************************************************
* C_OUTLINE::move
*
* Move C_OUTLINE by vector
**********************************************************************/
void C_OUTLINE::move( // reposition OUTLINE
const ICOORD vec // by vector
) {
C_OUTLINE_IT it(&children); // iterator
box.move (vec);
start += vec;
for (it.mark_cycle_pt (); !it.cycled_list (); it.forward ())
it.data ()->move (vec); // move child outlines
}
/**********************************************************************
* C_OUTLINE::plot
*
* Draw the outline in the given colour.
**********************************************************************/
#ifndef GRAPHICS_DISABLED
void C_OUTLINE::plot( //draw it
WINDOW window, //window to draw in
COLOUR colour //colour to draw in
) const {
INT16 stepindex; //index to cstep
ICOORD pos; //current position
DIR128 stepdir; //direction of step
DIR128 oldstepdir; //previous stepdir
pos = start; //current position
line_color_index(window, colour);
move2d (window, pos.x (), pos.y ());
stepindex = 0;
stepdir = step_dir (0); //get direction
while (stepindex < stepcount) {
do {
pos += step (stepindex); //step to next
stepindex++; //count steps
oldstepdir = stepdir;
//new direction
stepdir = step_dir (stepindex);
}
while (stepindex < stepcount
&& oldstepdir.get_dir () == stepdir.get_dir ());
//merge straight lines
draw2d (window, pos.x (), pos.y ());
}
}
#endif
/**********************************************************************
* C_OUTLINE::operator=
*
* Assignment - deep copy data
**********************************************************************/
//assignment
C_OUTLINE & C_OUTLINE::operator= (
const C_OUTLINE & source //from this
) {
box = source.box;
start = source.start;
if (steps != NULL)
free_mem(steps);
stepcount = source.stepcount;
steps = (UINT8 *) alloc_mem (step_mem());
memmove (steps, source.steps, step_mem());
if (!children.empty ())
children.clear ();
children.deep_copy (&source.children);
return *this;
}

176
ccstruct/coutln.h Normal file
View File

@ -0,0 +1,176 @@
/**********************************************************************
* File: coutln.c (Formerly: coutline.c)
* Description: Code for the C_OUTLINE class.
* Author: Ray Smith
* Created: Mon Oct 07 16:01:57 BST 1991
*
* (C) Copyright 1991, Hewlett-Packard Ltd.
** Licensed under the Apache License, Version 2.0 (the "License");
** you may not use this file except in compliance with the License.
** You may obtain a copy of the License at
** http://www.apache.org/licenses/LICENSE-2.0
** Unless required by applicable law or agreed to in writing, software
** distributed under the License is distributed on an "AS IS" BASIS,
** WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
** See the License for the specific language governing permissions and
** limitations under the License.
*
**********************************************************************/
#ifndef COUTLN_H
#define COUTLN_H
#include "grphics.h"
#include "crakedge.h"
#include "mod128.h"
#include "bits16.h"
#include "rect.h"
#include "blckerr.h"
#define INTERSECTING MAX_INT16//no winding number
//mask to get step
#define STEP_MASK 3
enum C_OUTLINE_FLAGS
{
COUT_INVERSE //White on black blob
};
class DLLSYM C_OUTLINE; //forward declaration
ELISTIZEH_S (C_OUTLINE)
class DLLSYM C_OUTLINE:public ELIST_LINK
{
public:
C_OUTLINE() { //empty constructor
steps = NULL;
}
C_OUTLINE( //constructor
CRACKEDGE *startpt, //from edge detector
ICOORD bot_left, //bounding box //length of loop
ICOORD top_right,
INT16 length);
C_OUTLINE(ICOORD startpt, //start of loop
DIR128 *new_steps, //steps in loop
INT16 length); //length of loop
//outline to copy
C_OUTLINE(C_OUTLINE *srcline, FCOORD rotation); //and rotate
~C_OUTLINE () { //destructor
if (steps != NULL)
free_mem(steps);
steps = NULL;
}
BOOL8 flag( //test flag
C_OUTLINE_FLAGS mask) const { //flag to test
return flags.bit (mask);
}
void set_flag( //set flag value
C_OUTLINE_FLAGS mask, //flag to test
BOOL8 value) { //value to set
flags.set_bit (mask, value);
}
C_OUTLINE_LIST *child() { //get child list
return &children;
}
//access function
const BOX &bounding_box() const {
return box;
}
void set_step( //set a step
INT16 stepindex, //index of step
INT8 stepdir) { //chain code
int shift = stepindex%4 * 2;
UINT8 mask = 3 << shift;
steps[stepindex/4] = ((stepdir << shift) & mask) |
(steps[stepindex/4] & ~mask);
//squeeze 4 into byte
}
void set_step( //set a step
INT16 stepindex, //index of step
DIR128 stepdir) { //direction
//clean it
INT8 chaindir = stepdir.get_dir() >> (DIRBITS - 2);
//difference
set_step(stepindex, chaindir);
//squeeze 4 into byte
}
//get start position
const ICOORD &start_pos() const {
return start;
}
INT32 pathlength() const { //get path length
return stepcount;
}
// Return step at a given index as a DIR128.
DIR128 step_dir(INT16 index) const {
return DIR128((INT16)(((steps[index/4] >> (index%4 * 2)) & STEP_MASK) <<
(DIRBITS - 2)));
}
// Return the step vector for the given outline position.
ICOORD step(INT16 index) const { //index of step
return step_coords[(steps[index/4] >> (index%4 * 2)) & STEP_MASK];
}
INT32 area(); //return area
INT32 outer_area(); //return area
INT32 count_transitions( //count maxima
INT32 threshold); //size threshold
BOOL8 operator< ( //containment test
const C_OUTLINE & other) const;
BOOL8 operator> ( //containment test
C_OUTLINE & other) const
{
return other < *this; //use the < to do it
}
INT16 winding_number( //get winding number
ICOORD testpt) const; //around this point
//get direction
INT16 turn_direction() const;
void reverse(); //reverse direction
void move( // reposition outline
const ICOORD vec); // by vector
void plot( //draw one
WINDOW window, //window to draw in
COLOUR colour) const; //colour to draw it
void prep_serialise() { //set ptrs to counts
children.prep_serialise ();
}
void dump( //write external bits
FILE *f) {
//stepcount = # bytes
serialise_bytes (f, (void *) steps, step_mem());
children.dump (f);
}
void de_dump( //read external bits
FILE *f) {
steps = (UINT8 *) de_serialise_bytes (f, step_mem());
children.de_dump (f);
}
//assignment
make_serialise (C_OUTLINE) C_OUTLINE & operator= (
const C_OUTLINE & source); //from this
private:
int step_mem() const { return (stepcount+3) / 4; }
BOX box; //boudning box
ICOORD start; //start coord
UINT8 *steps; //step array
INT16 stepcount; //no of steps
BITS16 flags; //flags about outline
C_OUTLINE_LIST children; //child elements
static ICOORD step_coords[4];
};
#endif

39
ccstruct/crakedge.h Normal file
View File

@ -0,0 +1,39 @@
/**********************************************************************
* File: crakedge.h (Formerly: crkedge.h)
* Description: Sturctures for the Crack following edge detector.
* Author: Ray Smith
* Created: Fri Mar 22 16:06:38 GMT 1991
*
* (C) Copyright 1991, Hewlett-Packard Ltd.
** Licensed under the Apache License, Version 2.0 (the "License");
** you may not use this file except in compliance with the License.
** You may obtain a copy of the License at
** http://www.apache.org/licenses/LICENSE-2.0
** Unless required by applicable law or agreed to in writing, software
** distributed under the License is distributed on an "AS IS" BASIS,
** WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
** See the License for the specific language governing permissions and
** limitations under the License.
*
**********************************************************************/
#ifndef CRAKEDGE_H
#define CRAKEDGE_H
#include "points.h"
#include "mod128.h"
class CRACKEDGE
{
public:
ICOORD pos; /*position of crack */
INT8 stepx; //edge step
INT8 stepy;
INT8 stepdir; //chaincode
CRACKEDGE *prev; /*previous point */
CRACKEDGE *next; /*next point */
NEWDELETE2 (CRACKEDGE) CRACKEDGE () {
} //empty constructor
};
#endif

133
ccstruct/genblob.cpp Normal file
View File

@ -0,0 +1,133 @@
/**********************************************************************
* File: genblob.cpp (Formerly gblob.c)
* Description: Generic Blob processing routines
* Author: Phil Cheatle
* Created: Mon Nov 25 10:53:26 GMT 1991
*
* (C) Copyright 1991, Hewlett-Packard Ltd.
** Licensed under the Apache License, Version 2.0 (the "License");
** you may not use this file except in compliance with the License.
** You may obtain a copy of the License at
** http://www.apache.org/licenses/LICENSE-2.0
** Unless required by applicable law or agreed to in writing, software
** distributed under the License is distributed on an "AS IS" BASIS,
** WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
** See the License for the specific language governing permissions and
** limitations under the License.
*
**********************************************************************/
#include "mfcpch.h"
#include "stepblob.h"
#include "polyblob.h"
#include "genblob.h"
/**********************************************************************
* blob_comparator()
*
* Blob comparator used to sort a blob list so that blobs are in increasing
* order of left edge.
**********************************************************************/
int blob_comparator( //sort blobs
const void *blob1p, //ptr to ptr to blob1
const void *blob2p //ptr to ptr to blob2
) {
PBLOB *blob1 = *(PBLOB **) blob1p;
PBLOB *blob2 = *(PBLOB **) blob2p;
return blob1->bounding_box ().left () - blob2->bounding_box ().left ();
}
/**********************************************************************
* c_blob_comparator()
*
* Blob comparator used to sort a blob list so that blobs are in increasing
* order of left edge.
**********************************************************************/
int c_blob_comparator( //sort blobs
const void *blob1p, //ptr to ptr to blob1
const void *blob2p //ptr to ptr to blob2
) {
C_BLOB *blob1 = *(C_BLOB **) blob1p;
C_BLOB *blob2 = *(C_BLOB **) blob2p;
return blob1->bounding_box ().left () - blob2->bounding_box ().left ();
}
/**********************************************************************
* gblob_bounding_box()
*
* Return the bounding box of a generic blob.
**********************************************************************/
BOX gblob_bounding_box( //Get bounding box
PBLOB *blob, //generic blob
BOOL8 polygonal //is blob polygonal?
) {
if (polygonal)
return blob->bounding_box ();
else
return ((C_BLOB *) blob)->bounding_box ();
}
/**********************************************************************
* gblob_sort_list()
*
* Sort a generic blob list into order of bounding box left edge
**********************************************************************/
void gblob_sort_list( //Sort a gblob list
PBLOB_LIST *blob_list, //generic blob list
BOOL8 polygonal //is list polygonal?
) {
PBLOB_IT b_it;
C_BLOB_IT c_it;
if (polygonal) {
b_it.set_to_list (blob_list);
b_it.sort (blob_comparator);
}
else {
c_it.set_to_list ((C_BLOB_LIST *) blob_list);
c_it.sort (c_blob_comparator);
}
}
/**********************************************************************
* gblob_out_list()
*
* Return the generic outline list of a generic blob.
**********************************************************************/
OUTLINE_LIST *gblob_out_list( //Get outline list
PBLOB *blob, //generic blob
BOOL8 polygonal //is blob polygonal?
) {
if (polygonal)
return blob->out_list ();
else
return (OUTLINE_LIST *) ((C_BLOB *) blob)->out_list ();
}
/**********************************************************************
* goutline_bounding_box()
*
* Return the bounding box of a generic outline.
**********************************************************************/
BOX goutline_bounding_box( //Get bounding box
OUTLINE *outline, //generic outline
BOOL8 polygonal //is outline polygonal?
) {
if (polygonal)
return outline->bounding_box ();
else
return ((C_OUTLINE *) outline)->bounding_box ();
}

52
ccstruct/genblob.h Normal file
View File

@ -0,0 +1,52 @@
/**********************************************************************
* File: genblob.h (Formerly gblob.h)
* Description: Generic Blob processing routines
* Author: Phil Cheatle
* Created: Mon Nov 25 10:53:26 GMT 1991
*
* (C) Copyright 1991, Hewlett-Packard Ltd.
** Licensed under the Apache License, Version 2.0 (the "License");
** you may not use this file except in compliance with the License.
** You may obtain a copy of the License at
** http://www.apache.org/licenses/LICENSE-2.0
** Unless required by applicable law or agreed to in writing, software
** distributed under the License is distributed on an "AS IS" BASIS,
** WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
** See the License for the specific language governing permissions and
** limitations under the License.
*
**********************************************************************/
#ifndef GENBLOB_H
#define GENBLOB_H
#include "polyblob.h"
#include "hosthplb.h"
#include "rect.h"
#include "notdll.h"
int blob_comparator( //sort blobs
const void *blob1p, //ptr to ptr to blob1
const void *blob2p //ptr to ptr to blob2
);
int c_blob_comparator( //sort blobs
const void *blob1p, //ptr to ptr to blob1
const void *blob2p //ptr to ptr to blob2
);
BOX gblob_bounding_box( //Get bounding box
PBLOB *blob, //generic blob
BOOL8 polygonal //is blob polygonal?
);
void gblob_sort_list( //Sort a gblob list
PBLOB_LIST *blob_list, //generic blob list
BOOL8 polygonal //is list polygonal?
);
OUTLINE_LIST *gblob_out_list( //Get outline list
PBLOB *blob, //generic blob
BOOL8 polygonal //is blob polygonal?
);
BOX goutline_bounding_box( //Get bounding box
OUTLINE *outline, //generic outline
BOOL8 polygonal //is outline polygonal?
);
#endif

39
ccstruct/hpddef.h Normal file
View File

@ -0,0 +1,39 @@
/**********************************************************************
* File: hpddef.h
* Description: Defines for dll symbols for handpd.dll.
* Author: Ray Smith
* Created: Tue Apr 30 17:15:01 MDT 1996
*
* (C) Copyright 1996, Hewlett-Packard Co.
** Licensed under the Apache License, Version 2.0 (the "License");
** you may not use this file except in compliance with the License.
** You may obtain a copy of the License at
** http://www.apache.org/licenses/LICENSE-2.0
** Unless required by applicable law or agreed to in writing, software
** distributed under the License is distributed on an "AS IS" BASIS,
** WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
** See the License for the specific language governing permissions and
** limitations under the License.
*
**********************************************************************/
//This file does NOT use the usual single inclusion code as it
//is necessary to allow it to be executed every time it is included.
//#ifndef HPDDEF_H
//#define HPDDEF_H
#undef DLLSYM
#ifndef __IPEDLL
# define DLLSYM
#else
# ifdef __BUILDING_HANDPD__
# define DLLSYM DLLEXPORT
# else
# define DLLSYM DLLIMPORT
# endif
#endif
#if defined(__CFM68K__) && !defined(__USING_STATIC_LIBS__)
# pragma import on
#endif
//#endif

8
ccstruct/hpdsizes.h Normal file
View File

@ -0,0 +1,8 @@
#ifndef HPDSIZES_H
#define HPDSIZES_H
#define NUM_TEXT_ATTR 10
#define NUM_BLOCK_ATTR 7
#define MAXLENGTH 128
#define NUM_BACKGROUNDS 8
#endif

479
ccstruct/ipoints.h Normal file
View File

@ -0,0 +1,479 @@
/**********************************************************************
* File: ipoints.h (Formerly icoords.h)
* Description: Inline functions for coords.h.
* Author: Ray Smith
* Created: Fri Jun 21 15:14:21 BST 1991
*
* (C) Copyright 1991, Hewlett-Packard Ltd.
** Licensed under the Apache License, Version 2.0 (the "License");
** you may not use this file except in compliance with the License.
** You may obtain a copy of the License at
** http://www.apache.org/licenses/LICENSE-2.0
** Unless required by applicable law or agreed to in writing, software
** distributed under the License is distributed on an "AS IS" BASIS,
** WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
** See the License for the specific language governing permissions and
** limitations under the License.
*
**********************************************************************/
#ifndef IPOINTS_H
#define IPOINTS_H
#include <math.h>
/**********************************************************************
* operator!
*
* Rotate an ICOORD 90 degrees anticlockwise.
**********************************************************************/
inline ICOORD
operator! ( //rotate 90 deg anti
const ICOORD & src //thing to rotate
) {
ICOORD result; //output
result.xcoord = -src.ycoord;
result.ycoord = src.xcoord;
return result;
}
/**********************************************************************
* operator-
*
* Unary minus of an ICOORD.
**********************************************************************/
inline ICOORD
operator- ( //unary minus
const ICOORD & src //thing to minus
) {
ICOORD result; //output
result.xcoord = -src.xcoord;
result.ycoord = -src.ycoord;
return result;
}
/**********************************************************************
* operator+
*
* Add 2 ICOORDS.
**********************************************************************/
inline ICOORD
operator+ ( //sum vectors
const ICOORD & op1, //operands
const ICOORD & op2) {
ICOORD sum; //result
sum.xcoord = op1.xcoord + op2.xcoord;
sum.ycoord = op1.ycoord + op2.ycoord;
return sum;
}
/**********************************************************************
* operator+=
*
* Add 2 ICOORDS.
**********************************************************************/
inline ICOORD &
operator+= ( //sum vectors
ICOORD & op1, //operands
const ICOORD & op2) {
op1.xcoord += op2.xcoord;
op1.ycoord += op2.ycoord;
return op1;
}
/**********************************************************************
* operator-
*
* Subtract 2 ICOORDS.
**********************************************************************/
inline ICOORD
operator- ( //subtract vectors
const ICOORD & op1, //operands
const ICOORD & op2) {
ICOORD sum; //result
sum.xcoord = op1.xcoord - op2.xcoord;
sum.ycoord = op1.ycoord - op2.ycoord;
return sum;
}
/**********************************************************************
* operator-=
*
* Subtract 2 ICOORDS.
**********************************************************************/
inline ICOORD &
operator-= ( //sum vectors
ICOORD & op1, //operands
const ICOORD & op2) {
op1.xcoord -= op2.xcoord;
op1.ycoord -= op2.ycoord;
return op1;
}
/**********************************************************************
* operator%
*
* Scalar product of 2 ICOORDS.
**********************************************************************/
inline INT32
operator% ( //scalar product
const ICOORD & op1, //operands
const ICOORD & op2) {
return op1.xcoord * op2.xcoord + op1.ycoord * op2.ycoord;
}
/**********************************************************************
* operator*
*
* Cross product of 2 ICOORDS.
**********************************************************************/
inline INT32 operator *( //cross product
const ICOORD &op1, //operands
const ICOORD &op2) {
return op1.xcoord * op2.ycoord - op1.ycoord * op2.xcoord;
}
/**********************************************************************
* operator*
*
* Scalar multiply of an ICOORD.
**********************************************************************/
inline ICOORD operator *( //scalar multiply
const ICOORD &op1, //operands
INT16 scale) {
ICOORD result; //output
result.xcoord = op1.xcoord * scale;
result.ycoord = op1.ycoord * scale;
return result;
}
inline ICOORD operator *( //scalar multiply
INT16 scale,
const ICOORD &op1 //operands
) {
ICOORD result; //output
result.xcoord = op1.xcoord * scale;
result.ycoord = op1.ycoord * scale;
return result;
}
/**********************************************************************
* operator*=
*
* Scalar multiply of an ICOORD.
**********************************************************************/
inline ICOORD &
operator*= ( //scalar multiply
ICOORD & op1, //operands
INT16 scale) {
op1.xcoord *= scale;
op1.ycoord *= scale;
return op1;
}
/**********************************************************************
* operator/
*
* Scalar divide of an ICOORD.
**********************************************************************/
inline ICOORD
operator/ ( //scalar divide
const ICOORD & op1, //operands
INT16 scale) {
ICOORD result; //output
result.xcoord = op1.xcoord / scale;
result.ycoord = op1.ycoord / scale;
return result;
}
/**********************************************************************
* operator/=
*
* Scalar divide of an ICOORD.
**********************************************************************/
inline ICOORD &
operator/= ( //scalar divide
ICOORD & op1, //operands
INT16 scale) {
op1.xcoord /= scale;
op1.ycoord /= scale;
return op1;
}
/**********************************************************************
* ICOORD::rotate
*
* Rotate an ICOORD by the given (normalized) (cos,sin) vector.
**********************************************************************/
inline void ICOORD::rotate( //rotate by vector
const FCOORD& vec) {
INT16 tmp;
tmp = (INT16) floor (xcoord * vec.x () - ycoord * vec.y () + 0.5);
ycoord = (INT16) floor (ycoord * vec.x () + xcoord * vec.y () + 0.5);
xcoord = tmp;
}
/**********************************************************************
* operator!
*
* Rotate an FCOORD 90 degrees anticlockwise.
**********************************************************************/
inline FCOORD
operator! ( //rotate 90 deg anti
const FCOORD & src //thing to rotate
) {
FCOORD result; //output
result.xcoord = -src.ycoord;
result.ycoord = src.xcoord;
return result;
}
/**********************************************************************
* operator-
*
* Unary minus of an FCOORD.
**********************************************************************/
inline FCOORD
operator- ( //unary minus
const FCOORD & src //thing to minus
) {
FCOORD result; //output
result.xcoord = -src.xcoord;
result.ycoord = -src.ycoord;
return result;
}
/**********************************************************************
* operator+
*
* Add 2 FCOORDS.
**********************************************************************/
inline FCOORD
operator+ ( //sum vectors
const FCOORD & op1, //operands
const FCOORD & op2) {
FCOORD sum; //result
sum.xcoord = op1.xcoord + op2.xcoord;
sum.ycoord = op1.ycoord + op2.ycoord;
return sum;
}
/**********************************************************************
* operator+=
*
* Add 2 FCOORDS.
**********************************************************************/
inline FCOORD &
operator+= ( //sum vectors
FCOORD & op1, //operands
const FCOORD & op2) {
op1.xcoord += op2.xcoord;
op1.ycoord += op2.ycoord;
return op1;
}
/**********************************************************************
* operator-
*
* Subtract 2 FCOORDS.
**********************************************************************/
inline FCOORD
operator- ( //subtract vectors
const FCOORD & op1, //operands
const FCOORD & op2) {
FCOORD sum; //result
sum.xcoord = op1.xcoord - op2.xcoord;
sum.ycoord = op1.ycoord - op2.ycoord;
return sum;
}
/**********************************************************************
* operator-=
*
* Subtract 2 FCOORDS.
**********************************************************************/
inline FCOORD &
operator-= ( //sum vectors
FCOORD & op1, //operands
const FCOORD & op2) {
op1.xcoord -= op2.xcoord;
op1.ycoord -= op2.ycoord;
return op1;
}
/**********************************************************************
* operator%
*
* Scalar product of 2 FCOORDS.
**********************************************************************/
inline float
operator% ( //scalar product
const FCOORD & op1, //operands
const FCOORD & op2) {
return op1.xcoord * op2.xcoord + op1.ycoord * op2.ycoord;
}
/**********************************************************************
* operator*
*
* Cross product of 2 FCOORDS.
**********************************************************************/
inline float operator *( //cross product
const FCOORD &op1, //operands
const FCOORD &op2) {
return op1.xcoord * op2.ycoord - op1.ycoord * op2.xcoord;
}
/**********************************************************************
* operator*
*
* Scalar multiply of an FCOORD.
**********************************************************************/
inline FCOORD operator *( //scalar multiply
const FCOORD &op1, //operands
float scale) {
FCOORD result; //output
result.xcoord = op1.xcoord * scale;
result.ycoord = op1.ycoord * scale;
return result;
}
inline FCOORD operator *( //scalar multiply
float scale,
const FCOORD &op1 //operands
) {
FCOORD result; //output
result.xcoord = op1.xcoord * scale;
result.ycoord = op1.ycoord * scale;
return result;
}
/**********************************************************************
* operator*=
*
* Scalar multiply of an FCOORD.
**********************************************************************/
inline FCOORD &
operator*= ( //scalar multiply
FCOORD & op1, //operands
float scale) {
op1.xcoord *= scale;
op1.ycoord *= scale;
return op1;
}
/**********************************************************************
* operator/
*
* Scalar divide of an FCOORD.
**********************************************************************/
inline FCOORD
operator/ ( //scalar divide
const FCOORD & op1, //operands
float scale) {
FCOORD result; //output
if (scale != 0) {
result.xcoord = op1.xcoord / scale;
result.ycoord = op1.ycoord / scale;
}
return result;
}
/**********************************************************************
* operator/=
*
* Scalar divide of an FCOORD.
**********************************************************************/
inline FCOORD &
operator/= ( //scalar divide
FCOORD & op1, //operands
float scale) {
if (scale != 0) {
op1.xcoord /= scale;
op1.ycoord /= scale;
}
return op1;
}
/**********************************************************************
* rotate
*
* Rotate an FCOORD by the given (normalized) (cos,sin) vector.
**********************************************************************/
inline void FCOORD::rotate( //rotate by vector
const FCOORD vec) {
float tmp;
tmp = xcoord * vec.x () - ycoord * vec.y ();
ycoord = ycoord * vec.x () + xcoord * vec.y ();
xcoord = tmp;
}
#endif

188
ccstruct/labls.cpp Normal file
View File

@ -0,0 +1,188 @@
/**********************************************************************
* File: labls.c (Formerly labels.c)
* Description: Attribute definition tables
* Author: Sheelagh Lloyd?
* Created:
*
* (C) Copyright 1993, Hewlett-Packard Ltd.
** Licensed under the Apache License, Version 2.0 (the "License");
** you may not use this file except in compliance with the License.
** You may obtain a copy of the License at
** http://www.apache.org/licenses/LICENSE-2.0
** Unless required by applicable law or agreed to in writing, software
** distributed under the License is distributed on an "AS IS" BASIS,
** WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
** See the License for the specific language governing permissions and
** limitations under the License.
*
**********************************************************************/
#include "mfcpch.h"
#include "hpdsizes.h"
#include "labls.h"
/******************************************************************************
* TEXT REGIONS
*****************************************************************************/
DLLSYM INT32 tn[NUM_TEXT_ATTR] = {
3, //T_HORIZONTAL
4, //T_TEXT
2, //T_SERIF
2, //T_PROPORTIONAL
2, //T_NORMAL
2, //T_UPRIGHT
2, //T_SOLID
3, //T_BLACK
2, //T_NOTUNDER
2, //T_NOTDROP
};
DLLSYM char tlabel[NUM_TEXT_ATTR][4][MAXLENGTH] = { {
//T_HORIZONTAL
"Horizontal",
"Vertical",
"Skew",
""
},
{ //T_TEXT
"Text",
"Table",
"Form",
"Mixed"
},
{ //T_SERIF
"Serif",
"Sans-serif",
"",
""
},
{ //T_PROPORTIONAL
"Proportional",
"Fixed pitch",
"",
""
},
{ //T_NORMAL
"Normal",
"Bold",
"",
""
},
{ //T_UPRIGHT
"Upright",
"Italic",
"",
""
},
{ //T_SOLID
"Solid",
"Outline",
"",
""
},
{ //T_BLACK
"Black",
"White",
"Coloured",
""
},
{ //T_NOTUNDER
"Not underlined",
"Underlined",
"",
""
},
{ //T_NOTDROP
"Not drop caps",
"Drop Caps",
"",
""
}
};
DLLSYM INT32 bn[NUM_BLOCK_ATTR] = {
4, //G_MONOCHROME
2, //I_MONOCHROME
2, //I_SMOOTH
3, //R_SINGLE
3, //R_BLACK
3, //S_BLACK
2 //W_TEXT
};
DLLSYM INT32 tvar[NUM_TEXT_ATTR];
DLLSYM INT32 bvar[NUM_BLOCK_ATTR];
DLLSYM char blabel[NUM_BLOCK_ATTR][4][MAXLENGTH] = { {
//G_MONOCHROME
/****************************************************************************
* GRAPHICS
***************************************************************************/
"Monochrome ",
"Two colour ",
"Spot colour",
"Multicolour"
},
/****************************************************************************
* IMAGE
***************************************************************************/
{ //I_MONOCHROME
"Monochrome ",
"Colour ",
"",
""
},
{ //I_SMOOTH
"Smooth ",
"Grainy ",
"",
""
},
/****************************************************************************
* RULES
***************************************************************************/
{ //R_SINGLE
"Single ",
"Double ",
"Multiple",
""
},
{ //R_BLACK
"Black ",
"White ",
"Coloured",
""
},
/****************************************************************************
* SCRIBBLE
***************************************************************************/
{ //S_BLACK
"Black ",
"White ",
"Coloured",
""
},
/****************************************************************************
* WEIRD
***************************************************************************/
{ //W_TEXT
"No text ",
"Contains text",
"",
""
}
};
DLLSYM char backlabel[NUM_BACKGROUNDS][MAXLENGTH] = {
"White", //B_WHITE
"Black", //B_BLACK
"Coloured", //B_COLOURED
"Textured", //B_TEXTURED
"Patterned", //B_PATTERNED
"Gradient fill", //B_GRADIENTFILL
"Image", //B_IMAGE
"Text" //B_TEXT
};

38
ccstruct/labls.h Normal file
View File

@ -0,0 +1,38 @@
/**********************************************************************
* File: labls.h (Formerly labels.h)
* Description: Attribute definition tables
* Author: Sheelagh Lloyd?
* Created:
*
* (C) Copyright 1993, Hewlett-Packard Ltd.
** Licensed under the Apache License, Version 2.0 (the "License");
** you may not use this file except in compliance with the License.
** You may obtain a copy of the License at
** http://www.apache.org/licenses/LICENSE-2.0
** Unless required by applicable law or agreed to in writing, software
** distributed under the License is distributed on an "AS IS" BASIS,
** WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
** See the License for the specific language governing permissions and
** limitations under the License.
*
**********************************************************************/
#ifndef LABLS_H
#define LABLS_H
#include "host.h"
#include "hpdsizes.h"
#include "hpddef.h" //must be last (handpd.dll)
extern DLLSYM INT32 tn[NUM_TEXT_ATTR];
extern DLLSYM char tlabel[NUM_TEXT_ATTR][4][MAXLENGTH];
extern DLLSYM INT32 bn[NUM_BLOCK_ATTR];
extern DLLSYM INT32 tvar[NUM_TEXT_ATTR];
extern DLLSYM INT32 bvar[NUM_BLOCK_ATTR];
extern DLLSYM char blabel[NUM_BLOCK_ATTR][4][MAXLENGTH];
extern DLLSYM char backlabel[NUM_BACKGROUNDS][MAXLENGTH];
#endif

249
ccstruct/linlsq.cpp Normal file
View File

@ -0,0 +1,249 @@
/**********************************************************************
* File: linlsq.cpp (Formerly llsq.c)
* Description: Linear Least squares fitting code.
* Author: Ray Smith
* Created: Thu Sep 12 08:44:51 BST 1991
*
* (C) Copyright 1991, Hewlett-Packard Ltd.
** Licensed under the Apache License, Version 2.0 (the "License");
** you may not use this file except in compliance with the License.
** You may obtain a copy of the License at
** http://www.apache.org/licenses/LICENSE-2.0
** Unless required by applicable law or agreed to in writing, software
** distributed under the License is distributed on an "AS IS" BASIS,
** WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
** See the License for the specific language governing permissions and
** limitations under the License.
*
**********************************************************************/
#include "mfcpch.h"
#include <stdio.h>
#include <math.h>
#include "errcode.h"
#include "linlsq.h"
#ifndef __UNIX__
#define M_PI 3.14159265359
#endif
const ERRCODE EMPTY_LLSQ = "Can't delete from an empty LLSQ";
#define EXTERN
EXTERN double_VAR (pdlsq_posdir_ratio, 4e-6, "Mult of dir to cf pos");
EXTERN double_VAR (pdlsq_threshold_angleavg, 0.1666666,
"Frac of pi for simple fit");
/**********************************************************************
* LLSQ::clear
*
* Function to initialize a LLSQ.
**********************************************************************/
void LLSQ::clear() { //initialize
n = 0; //no elements
sigx = 0; //update accumulators
sigy = 0;
sigxx = 0;
sigxy = 0;
sigyy = 0;
}
/**********************************************************************
* LLSQ::add
*
* Add an element to the accumulator.
**********************************************************************/
void LLSQ::add( //add an element
double x, //xcoord
double y //ycoord
) {
n++; //count elements
sigx += x; //update accumulators
sigy += y;
sigxx += x * x;
sigxy += x * y;
sigyy += y * y;
}
/**********************************************************************
* LLSQ::remove
*
* Delete an element from the acculuator.
**********************************************************************/
void LLSQ::remove( //delete an element
double x, //xcoord
double y //ycoord
) {
if (n <= 0)
//illegal
EMPTY_LLSQ.error ("LLSQ::remove", ABORT, NULL);
n--; //count elements
sigx -= x; //update accumulators
sigy -= y;
sigxx -= x * x;
sigxy -= x * y;
sigyy -= y * y;
}
/**********************************************************************
* LLSQ::m
*
* Return the gradient of the line fit.
**********************************************************************/
double LLSQ::m() { //get gradient
if (n > 1)
return (sigxy - sigx * sigy / n) / (sigxx - sigx * sigx / n);
else
return 0; //too little
}
/**********************************************************************
* LLSQ::c
*
* Return the constant of the line fit.
**********************************************************************/
double LLSQ::c( //get constant
double m //gradient to fit with
) {
if (n > 0)
return (sigy - m * sigx) / n;
else
return 0; //too little
}
/**********************************************************************
* LLSQ::rms
*
* Return the rms error of the fit.
**********************************************************************/
double LLSQ::rms( //get error
double m, //gradient to fit with
double c //constant to fit with
) {
double error; //total error
if (n > 0) {
error =
sigyy + m * (m * sigxx + 2 * (c * sigx - sigxy)) + c * (n * c -
2 * sigy);
if (error >= 0)
error = sqrt (error / n); //sqrt of mean
else
error = 0;
}
else
error = 0; //too little
return error;
}
/**********************************************************************
* LLSQ::spearman
*
* Return the spearman correlation coefficient.
**********************************************************************/
double LLSQ::spearman() { //get error
double error; //total error
if (n > 1) {
error = (sigxx - sigx * sigx / n) * (sigyy - sigy * sigy / n);
if (error > 0) {
error = (sigxy - sigx * sigy / n) / sqrt (error);
}
else
error = 1;
}
else
error = 1; //too little
return error;
}
/**********************************************************************
* PDLSQ::fit
*
* Return all the parameters of the fit to pos/dir.
* The return value is the rms error.
**********************************************************************/
float PDLSQ::fit( //get fit
DIR128 &ang, //output angle
float &sin_ang, //r,theta parameterisation
float &cos_ang,
float &r) {
double a, b; //itermediates
double angle; //resulting angle
double avg_angle; //simple average
double error; //total error
double sinx, cosx; //return values
if (pos.n > 0) {
a = pos.sigxy - pos.sigx * pos.sigy / pos.n
+ pdlsq_posdir_ratio * dir.sigxy;
b =
pos.sigxx - pos.sigyy + (pos.sigy * pos.sigy -
pos.sigx * pos.sigx) / pos.n +
pdlsq_posdir_ratio * (dir.sigxx - dir.sigyy);
if (dir.sigy != 0 || dir.sigx != 0)
avg_angle = atan2 (dir.sigy, dir.sigx);
else
avg_angle = 0;
if ((a != 0 || b != 0) && pos.n > 1)
angle = atan2 (2 * a, b) / 2;
else
angle = avg_angle;
error = avg_angle - angle;
if (error > M_PI / 2) {
error -= M_PI;
angle += M_PI;
}
if (error < -M_PI / 2) {
error += M_PI;
angle -= M_PI;
}
if (error > M_PI * pdlsq_threshold_angleavg
|| error < -M_PI * pdlsq_threshold_angleavg)
angle = avg_angle; //go simple
//convert direction
ang = (INT16) (angle * MODULUS / (2 * M_PI));
sinx = sin (angle);
cosx = cos (angle);
r = (sinx * pos.sigx - cosx * pos.sigy) / pos.n;
// tprintf("x=%g, y=%g, xx=%g, xy=%g, yy=%g, a=%g, b=%g, ang=%g, r=%g\n",
// pos.sigx,pos.sigy,pos.sigxx,pos.sigxy,pos.sigyy,
// a,b,angle,r);
error = dir.sigxx * sinx * sinx + dir.sigyy * cosx * cosx
- 2 * dir.sigxy * sinx * cosx;
error *= pdlsq_posdir_ratio;
error += sinx * sinx * pos.sigxx + cosx * cosx * pos.sigyy
- 2 * sinx * cosx * pos.sigxy
- 2 * r * (sinx * pos.sigx - cosx * pos.sigy) + r * r * pos.n;
if (error >= 0)
//rms value
error = sqrt (error / pos.n);
else
error = 0; //-0
sin_ang = sinx;
cos_ang = cosx;
}
else {
sin_ang = 0.0f;
cos_ang = 0.0f;
ang = 0;
error = 0; //too little
}
return error;
}

102
ccstruct/linlsq.h Normal file
View File

@ -0,0 +1,102 @@
/**********************************************************************
* File: linlsq.h (Formerly llsq.h)
* Description: Linear Least squares fitting code.
* Author: Ray Smith
* Created: Thu Sep 12 08:44:51 BST 1991
*
* (C) Copyright 1991, Hewlett-Packard Ltd.
** Licensed under the Apache License, Version 2.0 (the "License");
** you may not use this file except in compliance with the License.
** You may obtain a copy of the License at
** http://www.apache.org/licenses/LICENSE-2.0
** Unless required by applicable law or agreed to in writing, software
** distributed under the License is distributed on an "AS IS" BASIS,
** WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
** See the License for the specific language governing permissions and
** limitations under the License.
*
**********************************************************************/
#ifndef LINLSQ_H
#define LINLSQ_H
#include "points.h"
#include "mod128.h"
#include "varable.h"
class LLSQ
{
friend class PDLSQ; //pos & direction
public:
LLSQ() { //constructor
clear(); //set to zeros
}
void clear(); //initialize
void add( //add element
double x, //coords to add
double y);
void remove( //delete element
double x, //coords to delete
double y);
INT32 count() { //no of elements
return n;
}
double m(); //get gradient
double c( //get constant
double m); //gradient
double rms( //get error
double m, //gradient
double c); //constant
double spearman(); //get error
private:
INT32 n; //no of elements
double sigx; //sum of x
double sigy; //sum of y
double sigxx; //sum x squared
double sigxy; //sum of xy
double sigyy; //sum y squared
};
class PDLSQ
{
public:
PDLSQ() { //constructor
clear(); //set to zeros
}
void clear() { //initialize
pos.clear (); //clear both
dir.clear ();
}
void add( //add element
const ICOORD &addpos, //position of pt
const ICOORD &adddir) { //dir of pt
pos.add (addpos.x (), addpos.y ());
dir.add (adddir.x (), adddir.y ());
}
void remove( //remove element
const ICOORD &removepos, //position of pt
const ICOORD &removedir) { //dir of pt
pos.remove (removepos.x (), removepos.y ());
dir.remove (removedir.x (), removedir.y ());
}
INT32 count() { //no of elements
return pos.count ();
}
float fit( //get fit parameters
DIR128 &ang, //output angle
float &sin_ang, //output components
float &cos_ang,
float &r);
private:
LLSQ pos; //position
LLSQ dir; //directions
};
extern double_VAR_H (pdlsq_posdir_ratio, 0.4e-6, "Mult of dir to cf pos");
#endif

453
ccstruct/lmedsq.cpp Normal file
View File

@ -0,0 +1,453 @@
/**********************************************************************
* File: lmedsq.cpp (Formerly lms.c)
* Description: Code for the LMS class.
* Author: Ray Smith
* Created: Fri Aug 7 09:30:53 BST 1992
*
* (C) Copyright 1992, Hewlett-Packard Ltd.
** Licensed under the Apache License, Version 2.0 (the "License");
** you may not use this file except in compliance with the License.
** You may obtain a copy of the License at
** http://www.apache.org/licenses/LICENSE-2.0
** Unless required by applicable law or agreed to in writing, software
** distributed under the License is distributed on an "AS IS" BASIS,
** WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
** See the License for the specific language governing permissions and
** limitations under the License.
*
**********************************************************************/
#include "mfcpch.h"
#include <stdlib.h>
#include "statistc.h"
#include "memry.h"
#include "statistc.h"
#include "lmedsq.h"
#define EXTERN
EXTERN INT_VAR (lms_line_trials, 12, "Number of linew fits to do");
#define SEED1 0x1234 //default seeds
#define SEED2 0x5678
#define SEED3 0x9abc
#define LMS_MAX_FAILURES 3
#ifndef __UNIX__
UINT32 nrand48( //get random number
UINT16 *seeds //seeds to use
) {
static UINT32 seed = 0; //only seed
if (seed == 0) {
seed = seeds[0] ^ (seeds[1] << 8) ^ (seeds[2] << 16);
srand(seed);
}
//make 32 bit one
return rand () | (rand () << 16);
}
#endif
/**********************************************************************
* LMS::LMS
*
* Construct a LMS class, given the max no of samples to be given
**********************************************************************/
LMS::LMS ( //constructor
INT32 size //samplesize
):samplesize (size) {
samplecount = 0;
a = 0;
m = 0.0f;
c = 0.0f;
samples = (FCOORD *) alloc_mem (size * sizeof (FCOORD));
errors = (float *) alloc_mem (size * sizeof (float));
line_error = 0.0f;
fitted = FALSE;
}
/**********************************************************************
* LMS::~LMS
*
* Destruct a LMS class.
**********************************************************************/
LMS::~LMS ( //constructor
) {
free_mem(samples);
free_mem(errors);
}
/**********************************************************************
* LMS::clear
*
* Clear samples from array.
**********************************************************************/
void LMS::clear() { //clear sample
samplecount = 0;
fitted = FALSE;
}
/**********************************************************************
* LMS::add
*
* Add another sample. More than the constructed number will be ignored.
**********************************************************************/
void LMS::add( //add sample
FCOORD sample //sample coords
) {
if (samplecount < samplesize)
//save it
samples[samplecount++] = sample;
fitted = FALSE;
}
/**********************************************************************
* LMS::fit
*
* Fit a line to the given sample points.
**********************************************************************/
void LMS::fit( //fit sample
float &out_m, //output line
float &out_c) {
INT32 index; //of median
INT32 trials; //no of medians
float test_m, test_c; //candidate line
float test_error; //error of test line
switch (samplecount) {
case 0:
m = 0.0f; //no info
c = 0.0f;
line_error = 0.0f;
break;
case 1:
m = 0.0f;
c = samples[0].y (); //horiz thru pt
line_error = 0.0f;
break;
case 2:
if (samples[0].x () != samples[1].x ()) {
m = (samples[1].y () - samples[0].y ())
/ (samples[1].x () - samples[0].x ());
c = samples[0].y () - m * samples[0].x ();
}
else {
m = 0.0f;
c = (samples[0].y () + samples[1].y ()) / 2;
}
line_error = 0.0f;
break;
default:
pick_line(m, c); //use pts at random
compute_errors(m, c); //from given line
index = choose_nth_item (samplecount / 2, errors, samplecount);
line_error = errors[index];
for (trials = 1; trials < lms_line_trials; trials++) {
//random again
pick_line(test_m, test_c);
compute_errors(test_m, test_c);
index = choose_nth_item (samplecount / 2, errors, samplecount);
test_error = errors[index];
if (test_error < line_error) {
//find least median
line_error = test_error;
m = test_m;
c = test_c;
}
}
}
fitted = TRUE;
out_m = m;
out_c = c;
a = 0;
}
/**********************************************************************
* LMS::fit_quadratic
*
* Fit a quadratic to the given sample points.
**********************************************************************/
void LMS::fit_quadratic( //fit sample
float outlier_threshold, //min outlier size
double &out_a, //x squared
float &out_b, //output line
float &out_c) {
INT32 trials; //no of medians
double test_a;
float test_b, test_c; //candidate line
float test_error; //error of test line
if (samplecount < 3) {
out_a = 0;
fit(out_b, out_c);
return;
}
pick_quadratic(a, m, c);
line_error = compute_quadratic_errors (outlier_threshold, a, m, c);
for (trials = 1; trials < lms_line_trials * 2; trials++) {
pick_quadratic(test_a, test_b, test_c);
test_error = compute_quadratic_errors (outlier_threshold,
test_a, test_b, test_c);
if (test_error < line_error) {
line_error = test_error; //find least median
a = test_a;
m = test_b;
c = test_c;
}
}
fitted = TRUE;
out_a = a;
out_b = m;
out_c = c;
}
/**********************************************************************
* LMS::constrained_fit
*
* Fit a line to the given sample points.
* The line must have the given gradient.
**********************************************************************/
void LMS::constrained_fit( //fit sample
float fixed_m, //forced gradient
float &out_c) {
INT32 index; //of median
INT32 trials; //no of medians
float test_c; //candidate line
static UINT16 seeds[3] = { SEED1, SEED2, SEED3 };
//for nrand
float test_error; //error of test line
m = fixed_m;
switch (samplecount) {
case 0:
c = 0.0f;
line_error = 0.0f;
break;
case 1:
//horiz thru pt
c = samples[0].y () - m * samples[0].x ();
line_error = 0.0f;
break;
case 2:
c = (samples[0].y () + samples[1].y ()
- m * (samples[0].x () + samples[1].x ())) / 2;
line_error = m * samples[0].x () + c - samples[0].y ();
line_error *= line_error;
break;
default:
index = (INT32) nrand48 (seeds) % samplecount;
//compute line
c = samples[index].y () - m * samples[index].x ();
compute_errors(m, c); //from given line
index = choose_nth_item (samplecount / 2, errors, samplecount);
line_error = errors[index];
for (trials = 1; trials < lms_line_trials; trials++) {
index = (INT32) nrand48 (seeds) % samplecount;
test_c = samples[index].y () - m * samples[index].x ();
//compute line
compute_errors(m, test_c);
index = choose_nth_item (samplecount / 2, errors, samplecount);
test_error = errors[index];
if (test_error < line_error) {
//find least median
line_error = test_error;
c = test_c;
}
}
}
fitted = TRUE;
out_c = c;
a = 0;
}
/**********************************************************************
* LMS::pick_line
*
* Fit a line to a random pair of sample points.
**********************************************************************/
void LMS::pick_line( //fit sample
float &line_m, //output gradient
float &line_c) {
INT16 trial_count; //no of attempts
static UINT16 seeds[3] = { SEED1, SEED2, SEED3 };
//for nrand
INT32 index1; //picked point
INT32 index2; //picked point
trial_count = 0;
do {
index1 = (INT32) nrand48 (seeds) % samplecount;
index2 = (INT32) nrand48 (seeds) % samplecount;
line_m = samples[index2].x () - samples[index1].x ();
trial_count++;
}
while (line_m == 0 && trial_count < LMS_MAX_FAILURES);
if (line_m == 0) {
line_c = (samples[index2].y () + samples[index1].y ()) / 2;
}
else {
line_m = (samples[index2].y () - samples[index1].y ()) / line_m;
line_c = samples[index1].y () - samples[index1].x () * line_m;
}
}
/**********************************************************************
* LMS::pick_quadratic
*
* Fit a quadratic to a random triplet of sample points.
**********************************************************************/
void LMS::pick_quadratic( //fit sample
double &line_a, //x suaread
float &line_m, //output gradient
float &line_c) {
INT16 trial_count; //no of attempts
static UINT16 seeds[3] = { SEED1, SEED2, SEED3 };
//for nrand
INT32 index1; //picked point
INT32 index2; //picked point
INT32 index3;
FCOORD x1x2; //vector
FCOORD x1x3;
FCOORD x3x2;
double bottom; //of a
trial_count = 0;
do {
if (trial_count >= LMS_MAX_FAILURES - 1) {
index1 = 0;
index2 = samplecount / 2;
index3 = samplecount - 1;
}
else {
index1 = (INT32) nrand48 (seeds) % samplecount;
index2 = (INT32) nrand48 (seeds) % samplecount;
index3 = (INT32) nrand48 (seeds) % samplecount;
}
x1x2 = samples[index2] - samples[index1];
x1x3 = samples[index3] - samples[index1];
x3x2 = samples[index2] - samples[index3];
bottom = x1x2.x () * x1x3.x () * x3x2.x ();
trial_count++;
}
while (bottom == 0 && trial_count < LMS_MAX_FAILURES);
if (bottom == 0) {
line_a = 0;
pick_line(line_m, line_c);
}
else {
line_a = x1x3 * x1x2 / bottom;
line_m = x1x2.y () - line_a * x1x2.x ()
* (samples[index2].x () + samples[index1].x ());
line_m /= x1x2.x ();
line_c = samples[index1].y () - samples[index1].x ()
* (samples[index1].x () * line_a + line_m);
}
}
/**********************************************************************
* LMS::compute_errors
*
* Compute the squared error from all the points.
**********************************************************************/
void LMS::compute_errors( //fit sample
float line_m, //input gradient
float line_c) {
INT32 index; //picked point
for (index = 0; index < samplecount; index++) {
errors[index] =
line_m * samples[index].x () + line_c - samples[index].y ();
errors[index] *= errors[index];
}
}
/**********************************************************************
* LMS::compute_quadratic_errors
*
* Compute the squared error from all the points.
**********************************************************************/
float LMS::compute_quadratic_errors( //fit sample
float outlier_threshold, //min outlier
double line_a,
float line_m, //input gradient
float line_c) {
INT32 outlier_count; //total outliers
INT32 index; //picked point
INT32 error_count; //no in total
double total_error; //summed squares
total_error = 0;
outlier_count = 0;
error_count = 0;
for (index = 0; index < samplecount; index++) {
errors[error_count] = line_c + samples[index].x ()
* (line_m + samples[index].x () * line_a) - samples[index].y ();
errors[error_count] *= errors[error_count];
if (errors[error_count] > outlier_threshold) {
outlier_count++;
errors[samplecount - outlier_count] = errors[error_count];
}
else {
total_error += errors[error_count++];
}
}
if (outlier_count * 3 < error_count)
return total_error / error_count;
else {
index = choose_nth_item (outlier_count / 2,
errors + samplecount - outlier_count,
outlier_count);
//median outlier
return errors[samplecount - outlier_count + index];
}
}
/**********************************************************************
* LMS::plot
*
* Plot the fitted line of a LMS.
**********************************************************************/
#ifndef GRAPHICS_DISABLED
void LMS::plot( //plot fit
WINDOW win, //window
COLOUR colour //colour to draw in
) {
if (fitted) {
line_color_index(win, colour);
move2d (win, samples[0].x (),
c + samples[0].x () * (m + samples[0].x () * a));
draw2d (win, samples[samplecount - 1].x (),
c + samples[samplecount - 1].x () * (m +
samples[samplecount -
1].x () * a));
}
}
#endif

84
ccstruct/lmedsq.h Normal file
View File

@ -0,0 +1,84 @@
/**********************************************************************
* File: lmedsq.h (Formerly lms.h)
* Description: Code for the LMS class.
* Author: Ray Smith
* Created: Fri Aug 7 09:30:53 BST 1992
*
* (C) Copyright 1992, Hewlett-Packard Ltd.
** Licensed under the Apache License, Version 2.0 (the "License");
** you may not use this file except in compliance with the License.
** You may obtain a copy of the License at
** http://www.apache.org/licenses/LICENSE-2.0
** Unless required by applicable law or agreed to in writing, software
** distributed under the License is distributed on an "AS IS" BASIS,
** WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
** See the License for the specific language governing permissions and
** limitations under the License.
*
**********************************************************************/
#ifndef LMEDSQ_H
#define LMEDSQ_H
#include "points.h"
#include "varable.h"
#include "grphics.h"
#include "notdll.h"
class LMS
{
public:
LMS( //constructor
INT32 size); //no of samples
~LMS (); //destructor
void clear(); //clear samples
void add( //add sample
FCOORD sample); //sample coords
void fit( //generate fit
float &m, //output line
float &c);
void constrained_fit( //fixed gradient
float fixed_m, //forced gradient
float &out_c); //output line
void fit_quadratic( //easy quadratic
float outlier_threshold, //min outlier
double &a, //x squared
float &b, //x
float &c); //constant
void plot( //plot fit
WINDOW win, //window
COLOUR colour); //colour to draw in
float error() { //get error
return fitted ? line_error : -1;
}
private:
void pick_line( //random choice
float &m, //output line
float &c);
void pick_quadratic( //random choice
double &a, //output curve
float &b,
float &c);
void compute_errors( //find errors
float m, //from line
float c);
//find errors
float compute_quadratic_errors(float outlier_threshold, //min outlier
double a, //from curve
float m,
float c);
BOOL8 fitted; //line parts valid
INT32 samplesize; //max samples
INT32 samplecount; //current sample size
FCOORD *samples; //array of samples
float *errors; //error distances
double a; //x squared
float m; //line gradient
float c;
float line_error; //error of fit
};
extern INT_VAR_H (lms_line_trials, 12, "Number of linew fits to do");
#endif

100
ccstruct/mod128.cpp Normal file
View File

@ -0,0 +1,100 @@
/**********************************************************************
* File: mod128.c (Formerly dir128.c)
* Description: Code to convert a DIR128 to an ICOORD.
* Author: Ray Smith
* Created: Tue Oct 22 11:56:09 BST 1991
*
* (C) Copyright 1991, Hewlett-Packard Ltd.
** Licensed under the Apache License, Version 2.0 (the "License");
** you may not use this file except in compliance with the License.
** You may obtain a copy of the License at
** http://www.apache.org/licenses/LICENSE-2.0
** Unless required by applicable law or agreed to in writing, software
** distributed under the License is distributed on an "AS IS" BASIS,
** WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
** See the License for the specific language governing permissions and
** limitations under the License.
*
**********************************************************************/
#include "mfcpch.h" //precompiled headers
#include "mod128.h"
static INT16 idirtab[] = {
1000, 0, 998, 49, 995, 98, 989, 146,
980, 195, 970, 242, 956, 290, 941, 336,
923, 382, 903, 427, 881, 471, 857, 514,
831, 555, 803, 595, 773, 634, 740, 671,
707, 707, 671, 740, 634, 773, 595, 803,
555, 831, 514, 857, 471, 881, 427, 903,
382, 923, 336, 941, 290, 956, 242, 970,
195, 980, 146, 989, 98, 995, 49, 998,
0, 1000, -49, 998, -98, 995, -146, 989,
-195, 980, -242, 970, -290, 956, -336, 941,
-382, 923, -427, 903, -471, 881, -514, 857,
-555, 831, -595, 803, -634, 773, -671, 740,
-707, 707, -740, 671, -773, 634, -803, 595,
-831, 555, -857, 514, -881, 471, -903, 427,
-923, 382, -941, 336, -956, 290, -970, 242,
-980, 195, -989, 146, -995, 98, -998, 49,
-1000, 0, -998, -49, -995, -98, -989, -146,
-980, -195, -970, -242, -956, -290, -941, -336,
-923, -382, -903, -427, -881, -471, -857, -514,
-831, -555, -803, -595, -773, -634, -740, -671,
-707, -707, -671, -740, -634, -773, -595, -803,
-555, -831, -514, -857, -471, -881, -427, -903,
-382, -923, -336, -941, -290, -956, -242, -970,
-195, -980, -146, -989, -98, -995, -49, -998,
0, -1000, 49, -998, 98, -995, 146, -989,
195, -980, 242, -970, 290, -956, 336, -941,
382, -923, 427, -903, 471, -881, 514, -857,
555, -831, 595, -803, 634, -773, 671, -740,
707, -707, 740, -671, 773, -634, 803, -595,
831, -555, 857, -514, 881, -471, 903, -427,
923, -382, 941, -336, 956, -290, 970, -242,
980, -195, 989, -146, 995, -98, 998, -49
};
static ICOORD *dirtab = (ICOORD *) idirtab;
/**********************************************************************
* DIR128::DIR128
*
* Quantize the direction of an FCOORD to make a DIR128.
**********************************************************************/
DIR128::DIR128( //from fcoord
const FCOORD fc //vector to quantize
) {
int high, low, current; //binary search
low = 0;
if (fc.y () == 0) {
if (fc.x () >= 0)
dir = 0;
else
dir = MODULUS / 2;
return;
}
high = MODULUS;
do {
current = (high + low) / 2;
if (dirtab[current] * fc >= 0)
low = current;
else
high = current;
}
while (high - low > 1);
dir = low;
}
/**********************************************************************
* dir_to_gradient
*
* Convert a direction to a vector.
**********************************************************************/
ICOORD DIR128::vector() const { //convert to vector
return dirtab[dir]; //easy really
}

85
ccstruct/mod128.h Normal file
View File

@ -0,0 +1,85 @@
/**********************************************************************
* File: mod128.h (Formerly dir128.h)
* Description: Header for class which implements modulo arithmetic.
* Author: Ray Smith
* Created: Tue Mar 26 17:48:13 GMT 1991
*
* (C) Copyright 1991, Hewlett-Packard Ltd.
** Licensed under the Apache License, Version 2.0 (the "License");
** you may not use this file except in compliance with the License.
** You may obtain a copy of the License at
** http://www.apache.org/licenses/LICENSE-2.0
** Unless required by applicable law or agreed to in writing, software
** distributed under the License is distributed on an "AS IS" BASIS,
** WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
** See the License for the specific language governing permissions and
** limitations under the License.
*
**********************************************************************/
#ifndef MOD128_H
#define MOD128_H
#include "points.h"
#define MODULUS 128 /*range of directions */
#define DIRBITS 7 //no of bits used
#define DIRSCALE 1000 //length of vector
class DLLSYM DIR128
{
public:
DIR128() {
} //empty constructor
DIR128( //constructor
INT16 value) { //value to assign
value %= MODULUS; //modulo arithmetic
if (value < 0)
value += MODULUS; //done properly
dir = (INT8) value;
}
DIR128(const FCOORD fc); //quantize vector
DIR128 & operator= ( //assign of INT16
INT16 value) { //value to assign
value %= MODULUS; //modulo arithmetic
if (value < 0)
value += MODULUS; //done properly
dir = (INT8) value;
return *this;
}
INT8 operator- ( //subtraction
const DIR128 & minus) const//for signed result
{
//result
INT16 result = dir - minus.dir;
if (result > MODULUS / 2)
result -= MODULUS; //get in range
else if (result < -MODULUS / 2)
result += MODULUS;
return (INT8) result;
}
DIR128 operator+ ( //addition
const DIR128 & add) const //of itself
{
DIR128 result; //sum
result = dir + add.dir; //let = do the work
return result;
}
DIR128 & operator+= ( //same as +
const DIR128 & add) {
*this = dir + add.dir; //let = do the work
return *this;
}
INT8 get_dir() const { //access function
return dir;
}
ICOORD vector() const; //turn to vector
private:
INT8 dir; //a direction
};
#endif

176
ccstruct/normalis.cpp Normal file
View File

@ -0,0 +1,176 @@
/**********************************************************************
* File: normalis.cpp (Formerly denorm.c)
* Description: Code for the DENORM class.
* Author: Ray Smith
* Created: Thu Apr 23 09:22:43 BST 1992
*
* (C) Copyright 1992, Hewlett-Packard Ltd.
** Licensed under the Apache License, Version 2.0 (the "License");
** you may not use this file except in compliance with the License.
** You may obtain a copy of the License at
** http://www.apache.org/licenses/LICENSE-2.0
** Unless required by applicable law or agreed to in writing, software
** distributed under the License is distributed on an "AS IS" BASIS,
** WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
** See the License for the specific language governing permissions and
** limitations under the License.
*
**********************************************************************/
#include "mfcpch.h"
#include "werd.h"
#include "normalis.h"
/**********************************************************************
* DENORM::binary_search_segment
*
* Find the segment to use for the given x.
**********************************************************************/
const DENORM_SEG *DENORM::binary_search_segment(float src_x) const {
int bottom, top, middle; //binary search
bottom = 0;
top = segments;
do {
middle = (bottom + top) / 2;
if (segs[middle].xstart > src_x)
top = middle;
else
bottom = middle;
}
while (top - bottom > 1);
return &segs[bottom];
}
/**********************************************************************
* DENORM::scale_at_x
*
* Return scaling at a given (normalized) x coord.
**********************************************************************/
float DENORM::scale_at_x(float src_x) const { // In normalized coords.
if (segments != 0) {
const DENORM_SEG* seg = binary_search_segment(src_x);
if (seg->scale_factor > 0.0)
return seg->scale_factor;
}
return scale_factor;
}
/**********************************************************************
* DENORM::yshift_at_x
*
* Return yshift at a given (normalized) x coord.
**********************************************************************/
float DENORM::yshift_at_x(float src_x) const { // In normalized coords.
if (segments != 0) {
const DENORM_SEG* seg = binary_search_segment(src_x);
if (seg->ycoord == -MAX_INT32) {
if (base_is_row)
return source_row->base_line(x(src_x)/scale_at_x(src_x) + x_centre);
else
return m * x(src_x) + c;
} else {
return seg->ycoord;
}
}
return source_row->base_line (x(src_x)/scale_at_x(src_x) + x_centre);
}
/**********************************************************************
* DENORM::x
*
* Denormalise an x coordinate.
**********************************************************************/
float DENORM::x( //convert x coord
float src_x //coord to convert
) const {
return src_x / scale_at_x(src_x) + x_centre;
}
/**********************************************************************
* DENORM::y
*
* Denormalise a y coordinate.
**********************************************************************/
float DENORM::y( //convert y coord
float src_y, //coord to convert
float src_centre //x location for base
) const {
return (src_y - bln_baseline_offset) / scale_at_x(src_centre)
+ yshift_at_x(src_centre);
}
DENORM::DENORM(float x, //from same pieces
float scaling,
double line_m, //default line
double line_c,
INT16 seg_count, //no of segments
DENORM_SEG *seg_pts, //actual segments
BOOL8 using_row, //as baseline
ROW *src) {
x_centre = x; //just copy
scale_factor = scaling;
source_row = src;
if (seg_count > 0) {
segs = new DENORM_SEG[seg_count];
for (segments = 0; segments < seg_count; segments++) {
// It is possible, if infrequent that the segments may be out of order.
// since we are searching with a binary search, keep them in order.
if (segments == 0 || segs[segments - 1].xstart <=
seg_pts[segments].xstart) {
segs[segments] = seg_pts[segments];
} else {
int i;
for (i = 0; i < segments
&& segs[segments - 1 - i].xstart > seg_pts[segments].xstart;
++i) {
segs[segments - i ] = segs[segments - 1 - i];
}
segs[segments - i] = seg_pts[segments];
}
}
}
else {
segments = 0;
segs = NULL;
}
base_is_row = using_row;
m = line_m;
c = line_c;
}
DENORM::DENORM(const DENORM &src) {
segments = 0;
segs = NULL;
*this = src;
}
DENORM & DENORM::operator= (const DENORM & src) {
x_centre = src.x_centre;
scale_factor = src.scale_factor;
source_row = src.source_row;
if (segments > 0)
delete[]segs;
if (src.segments > 0) {
segs = new DENORM_SEG[src.segments];
for (segments = 0; segments < src.segments; segments++)
segs[segments] = src.segs[segments];
}
else {
segments = 0;
segs = NULL;
}
base_is_row = src.base_is_row;
m = src.m;
c = src.c;
return *this;
}

108
ccstruct/normalis.h Normal file
View File

@ -0,0 +1,108 @@
/**********************************************************************
* File: normalis.h (Formerly denorm.h)
* Description: Code for the DENORM class.
* Author: Ray Smith
* Created: Thu Apr 23 09:22:43 BST 1992
*
* (C) Copyright 1992, Hewlett-Packard Ltd.
** Licensed under the Apache License, Version 2.0 (the "License");
** you may not use this file except in compliance with the License.
** You may obtain a copy of the License at
** http://www.apache.org/licenses/LICENSE-2.0
** Unless required by applicable law or agreed to in writing, software
** distributed under the License is distributed on an "AS IS" BASIS,
** WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
** See the License for the specific language governing permissions and
** limitations under the License.
*
**********************************************************************/
#ifndef NORMALIS_H
#define NORMALIS_H
#include <stdio.h>
class ROW; //forward decl
class DENORM_SEG
{
public:
DENORM_SEG() {
} //empty
INT32 xstart; //start of segment
INT32 ycoord; //y at segment
float scale_factor; //for this segment
};
class DENORM
{
public:
DENORM() { //constructor
source_row = NULL;
x_centre = 0.0f;
scale_factor = 1.0f;
segments = 0;
segs = NULL;
base_is_row = TRUE;
m = c = 0;
}
DENORM( //constructor
float x, //from same pieces
float scaling,
ROW *src) {
x_centre = x; //just copy
scale_factor = scaling;
source_row = src;
segments = 0;
segs = NULL;
base_is_row = TRUE;
m = c = 0;
}
DENORM( //constructor
float x, //from same pieces
float scaling,
double line_m, //default line //no of segments
double line_c,
INT16 seg_count,
DENORM_SEG *seg_pts, //actual segments
BOOL8 using_row, //as baseline
ROW *src);
DENORM(const DENORM &);
DENORM & operator= (const DENORM &);
~DENORM () {
if (segments > 0)
delete[]segs;
}
float origin() const { //get x centre
return x_centre;
}
float scale() const { //get scale
return scale_factor;
}
ROW *row() const { //get row
return source_row;
}
float x( //convert an xcoord
float src_x) const;
float y( //convert a ycoord
float src_y, //coord to convert
float src_centre) const; //normed x centre
float scale_at_x( // Return scaling at this coord.
float src_x) const;
float yshift_at_x( // Return yshift at this coord.
float src_x) const;
private:
const DENORM_SEG *binary_search_segment(float src_x) const;
BOOL8 base_is_row; //using row baseline?
INT16 segments; //no of segments
double c, m; //baseline
float x_centre; //middle of word
float scale_factor; //scaling
ROW *source_row; //row it came from
DENORM_SEG *segs; //array of segments
};
#endif

368
ccstruct/ocrblock.cpp Normal file
View File

@ -0,0 +1,368 @@
/**********************************************************************
* File: ocrblock.cpp (Formerly block.c)
* Description: BLOCK member functions and iterator functions.
* Author: Ray Smith
* Created: Fri Mar 15 09:41:28 GMT 1991
*
* (C) Copyright 1991, Hewlett-Packard Ltd.
** Licensed under the Apache License, Version 2.0 (the "License");
** you may not use this file except in compliance with the License.
** You may obtain a copy of the License at
** http://www.apache.org/licenses/LICENSE-2.0
** Unless required by applicable law or agreed to in writing, software
** distributed under the License is distributed on an "AS IS" BASIS,
** WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
** See the License for the specific language governing permissions and
** limitations under the License.
*
**********************************************************************/
#include "mfcpch.h"
#include <stdlib.h>
#include "blckerr.h"
#include "ocrblock.h"
#include "tprintf.h"
#define BLOCK_LABEL_HEIGHT 150 //char height of block id
ELISTIZE_S (BLOCK)
/**********************************************************************
* BLOCK::BLOCK
*
* Constructor for a simple rectangular block.
**********************************************************************/
BLOCK::BLOCK ( //rectangular block
const char *name, //filename
BOOL8 prop, //proportional
INT16 kern, //kerning
INT16 space, //spacing
INT16 xmin, //bottom left
INT16 ymin, INT16 xmax, //top right
INT16 ymax):
PDBLK (xmin, ymin, xmax, ymax),
filename(name) { //box(ICOORD(xmin,ymin),ICOORD(xmax,ymax))
//boundaries
ICOORDELT_IT left_it = &leftside;
ICOORDELT_IT right_it = &rightside;
proportional = prop;
kerning = kern;
spacing = space;
font_class = -1; //not assigned
hand_block = NULL;
hand_poly = NULL;
left_it.set_to_list (&leftside);
right_it.set_to_list (&rightside);
//make default box
left_it.add_to_end (new ICOORDELT (xmin, ymin));
left_it.add_to_end (new ICOORDELT (xmin, ymax));
right_it.add_to_end (new ICOORDELT (xmax, ymin));
right_it.add_to_end (new ICOORDELT (xmax, ymax));
}
/**********************************************************************
* BLOCK::set_sides
*
* Sets left and right vertex lists
**********************************************************************/
//void BLOCK::set_sides( //set vertex lists
//ICOORDELT_LIST *left, //left vertices
//ICOORDELT_LIST *right //right vertices
//)
//{
// ICOORDELT_IT left_it= &leftside; //boundaries
// ICOORDELT_IT right_it= &rightside;
// leftside.clear();
// left_it.move_to_first();
// left_it.add_list_before(left);
// rightside.clear();
// right_it.move_to_first();
// right_it.add_list_before(right);
//}
/**********************************************************************
* BLOCK::contains
*
* Return TRUE if the given point is within the block.
**********************************************************************/
//BOOL8 BLOCK::contains( //test containment
//ICOORD pt //point to test
//)
//{
// BLOCK_RECT_IT it=this; //rectangle iterator
// ICOORD bleft,tright; //corners of rectangle
// for (it.start_block();!it.cycled_rects();it.forward())
// {
// it.bounding_box(bleft,tright); //get rectangle
// if (pt.x()>=bleft.x() && pt.x()<=tright.x() //inside rect
// && pt.y()>=bleft.y() && pt.y()<=tright.y())
// return TRUE; //is inside
// }
// return FALSE; //not inside
//}
/**********************************************************************
* BLOCK::move
*
* Reposition block
**********************************************************************/
//void BLOCK::move( // reposition block
//const ICOORD vec // by vector
//)
//{
// ROW_IT row_it( &rows );
// ICOORDELT_IT it( &leftside );
// for( row_it.mark_cycle_pt(); !row_it.cycled_list(); row_it.forward() )
// row_it.data()->move( vec );
// for( it.mark_cycle_pt(); !it.cycled_list(); it.forward() )
// *(it.data()) += vec;
// it.set_to_list( &rightside );
// for( it.mark_cycle_pt(); !it.cycled_list(); it.forward() )
// *(it.data()) += vec;
// box.move( vec );
//}
/**********************************************************************
* decreasing_top_order
*
* Sort Comparator: Return <0 if row1 top < row2 top
**********************************************************************/
int decreasing_top_order( //
const void *row1,
const void *row2) {
return (*(ROW **) row2)->bounding_box ().top () -
(*(ROW **) row1)->bounding_box ().top ();
}
/**********************************************************************
* BLOCK::sort_rows
*
* Order rows so that they are in order of decreasing Y coordinate
**********************************************************************/
void BLOCK::sort_rows() { // order on "top"
ROW_IT row_it(&rows);
row_it.sort (decreasing_top_order);
}
/**********************************************************************
* BLOCK::compress
*
* Delete space between the rows. (And maybe one day, compress the rows)
* Fill space of block from top down, left aligning rows.
**********************************************************************/
void BLOCK::compress() { // squash it up
#define ROW_SPACING 5
ROW_IT row_it(&rows);
ROW *row;
ICOORD row_spacing (0, ROW_SPACING);
ICOORDELT_IT icoordelt_it;
sort_rows();
box = BOX (box.topleft (), box.topleft ());
box.move_bottom_edge (ROW_SPACING);
for (row_it.mark_cycle_pt (); !row_it.cycled_list (); row_it.forward ()) {
row = row_it.data ();
row->move (box.botleft () - row_spacing -
row->bounding_box ().topleft ());
box += row->bounding_box ();
}
leftside.clear ();
icoordelt_it.set_to_list (&leftside);
icoordelt_it.add_to_end (new ICOORDELT (box.left (), box.bottom ()));
icoordelt_it.add_to_end (new ICOORDELT (box.left (), box.top ()));
rightside.clear ();
icoordelt_it.set_to_list (&rightside);
icoordelt_it.add_to_end (new ICOORDELT (box.right (), box.bottom ()));
icoordelt_it.add_to_end (new ICOORDELT (box.right (), box.top ()));
}
/**********************************************************************
* BLOCK::check_pitch
*
* Check whether the block is fixed or prop, set the flag, and set
* the pitch if it is fixed.
**********************************************************************/
void BLOCK::check_pitch() { // check prop
// tprintf("Missing FFT fixed pitch stuff!\n");
pitch = -1;
}
/**********************************************************************
* BLOCK::compress
*
* Compress and move in a single operation.
**********************************************************************/
void BLOCK::compress( // squash it up
const ICOORD vec // and move
) {
box.move (vec);
compress();
}
/**********************************************************************
* BLOCK::print
*
* Print the info on a block
**********************************************************************/
void BLOCK::print( //print list of sides
FILE *, //file to print on
BOOL8 dump //print full detail
) {
ICOORDELT_IT it = &leftside; //iterator
box.print ();
tprintf ("Proportional= %s\n", proportional ? "TRUE" : "FALSE");
tprintf ("Kerning= %d\n", kerning);
tprintf ("Spacing= %d\n", spacing);
tprintf ("Fixed_pitch=%d\n", pitch);
tprintf ("Filename= %s\n", filename.string ());
if (dump) {
tprintf ("Left side coords are:\n");
for (it.mark_cycle_pt (); !it.cycled_list (); it.forward ())
tprintf ("(%d,%d) ", it.data ()->x (), it.data ()->y ());
tprintf ("\n");
tprintf ("Right side coords are:\n");
it.set_to_list (&rightside);
for (it.mark_cycle_pt (); !it.cycled_list (); it.forward ())
tprintf ("(%d,%d) ", it.data ()->x (), it.data ()->y ());
tprintf ("\n");
}
}
/**********************************************************************
* BLOCK::plot
*
* Plot the outline of a block in the given colour.
**********************************************************************/
//void BLOCK::plot( //draw outline
//WINDOW window, //window to draw in
//INT32 serial, //serial number
//COLOUR colour //colour to draw in
//)
//{
// ICOORD startpt; //start of outline
// ICOORD endpt; //end of outline
// ICOORD prevpt; //previous point
// ICOORDELT_IT it= &leftside; //iterator
// char number[32]; //block id
// line_color_index(window,colour); //set the colour
// text_color_index(window,colour);
// character_height(window,(float)BLOCK_LABEL_HEIGHT);
// text_font_index(window,6);
// if (!leftside.empty())
// {
// startpt= *(it.data()); //bottom left corner
//// fprintf(stderr,"Block %d bottom left is (%d,%d)\n",
//// serial,startpt.x(),startpt.y());
// sprintf(number,"%d",serial);
// text2d(window,startpt.x(),startpt.y(),number,0,FALSE);
// move2d(window,startpt.x(),startpt.y());
// do
// {
// prevpt= *(it.data()); //previous point
// it.forward(); //move to next point
// draw2d(window,prevpt.x(),it.data()->y()); //draw round corner
// draw2d(window,it.data()->x(),it.data()->y());
// }
// while (!it.at_last()); //until end of list
// endpt= *(it.data()); //end point
// move2d(window,startpt.x(),startpt.y()); //other side of boundary
// it.set_to_list(&rightside);
// prevpt=startpt;
// for (it.mark_cycle_pt();!it.cycled_list();it.forward())
// {
// draw2d(window,prevpt.x(),it.data()->y()); //draw round corner
// draw2d(window,it.data()->x(),it.data()->y());
// prevpt= *(it.data()); //previous point
// }
// draw2d(window,endpt.x(),endpt.y()); //close boundary
// if (hand_block!=NULL)
// hand_block->plot(window,colour,serial);
// }
//}
/**********************************************************************
* BLOCK::show
*
* Show the image corresponding to a block as its set of rectangles.
**********************************************************************/
//void BLOCK::show( //show image block
//IMAGE *image, //image to show
//WINDOW window //window to show in
//)
//{
// BLOCK_RECT_IT it=this; //rectangle iterator
// ICOORD bleft,tright; //corners of rectangle
// for (it.start_block();!it.cycled_rects();it.forward())
// {
// it.bounding_box(bleft,tright); //get rectangle
//// fprintf(stderr,"Drawing a block with a bottom left of (%d,%d)\n",
//// bleft.x(),bleft.y());
// show_sub_image(image,bleft.x(),bleft.y(),
// tright.x()-bleft.x(),tright.y()-bleft.y(),
// window,bleft.x(),bleft.y()); //show it
// }
//}
/**********************************************************************
* BLOCK::operator=
*
* Assignment - duplicate the block structure, but with an EMPTY row list.
**********************************************************************/
BLOCK & BLOCK::operator= ( //assignment
const BLOCK & source //from this
) {
this->ELIST_LINK::operator= (source);
this->PDBLK::operator= (source);
proportional = source.proportional;
kerning = source.kerning;
spacing = source.spacing;
filename = source.filename; //STRINGs assign ok
if (!rows.empty ())
rows.clear ();
// if ( !leftside.empty() )
// leftside.clear();
// if ( !rightside.empty() )
// rightside.clear();
// leftside.deep_copy( &source.leftside );
// rightside.deep_copy( &source.rightside );
// box=source.box;
return *this;
}

228
ccstruct/ocrblock.h Normal file
View File

@ -0,0 +1,228 @@
/**********************************************************************
* File: ocrblock.h (Formerly block.h)
* Description: Page block class definition.
* Author: Ray Smith
* Created: Thu Mar 14 17:32:01 GMT 1991
*
* (C) Copyright 1991, Hewlett-Packard Ltd.
** Licensed under the Apache License, Version 2.0 (the "License");
** you may not use this file except in compliance with the License.
** You may obtain a copy of the License at
** http://www.apache.org/licenses/LICENSE-2.0
** Unless required by applicable law or agreed to in writing, software
** distributed under the License is distributed on an "AS IS" BASIS,
** WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
** See the License for the specific language governing permissions and
** limitations under the License.
*
**********************************************************************/
#ifndef OCRBLOCK_H
#define OCRBLOCK_H
#include "img.h"
#include "ocrrow.h"
#include "pageblk.h"
#include "pdblock.h"
class BLOCK; //forward decl
ELISTIZEH_S (BLOCK)
class BLOCK:public ELIST_LINK, public PDBLK
//page block
{
friend class BLOCK_RECT_IT; //block iterator
//block label
friend void scan_hpd_blocks(const char *name,
PAGE_BLOCK_LIST *page_blocks, //head of full pag
INT32 &block_no, //no of blocks
BLOCK_IT *block_it);
friend BOOL8 read_vec_file( //read uscan output
STRING name, //basename of file
INT32 xsize, //page size //output list
INT32 ysize,
BLOCK_LIST *blocks);
friend BOOL8 read_pd_file( //read uscan output
STRING name, //basename of file
INT32 xsize, //page size //output list
INT32 ysize,
BLOCK_LIST *blocks);
public:
BLOCK() { //empty constructor
hand_block = NULL;
hand_poly = NULL;
}
BLOCK( //simple constructor
const char *name, //filename
BOOL8 prop, //proportional
INT16 kern, //kerning
INT16 space, //spacing
INT16 xmin, //bottom left
INT16 ymin,
INT16 xmax, //top right
INT16 ymax);
// void set_sides( //set vertex lists
// ICOORDELT_LIST *left, //list of left vertices
// ICOORDELT_LIST *right); //list of right vertices
~BLOCK () { //destructor
}
void set_stats( //set space size etc.
BOOL8 prop, //proportional
INT16 kern, //inter char size
INT16 space, //inter word size
INT16 ch_pitch) { //pitch if fixed
proportional = prop;
kerning = (INT8) kern;
spacing = space;
pitch = ch_pitch;
}
void set_xheight( //set char size
INT32 height) {
xheight = height;
}
void set_font_class( //set font class
INT16 font) {
font_class = font;
}
// TEXT_REGION* text_region()
// {
// return hand_block;
// }
// POLY_BLOCK* poly_block()
// {
// return hand_poly;
// }
BOOL8 prop() const { //return proportional
return proportional;
}
INT32 fixed_pitch() const { //return pitch
return pitch;
}
INT16 kern() const { //return kerning
return kerning;
}
INT16 font() const { //return font class
return font_class;
}
INT16 space() const { //return spacing
return spacing;
}
const char *name() const { //return filename
return filename.string ();
}
INT32 x_height() const { //return xheight
return xheight;
}
ROW_LIST *row_list() { //get rows
return &rows;
}
C_BLOB_LIST *blob_list() { //get blobs
return &c_blobs;
}
C_BLOB_LIST *reject_blobs() {
return &rej_blobs;
}
// void bounding_box( //get box
// ICOORD& bottom_left, //bottom left
// ICOORD& top_right) const //topright
// {
// bottom_left=box.botleft();
// top_right=box.topright();
// }
// const BOX& bounding_box() const //get real box
// {
// return box;
// }
// BOOL8 contains( //is pt inside block
// ICOORD pt);
// void move( // reposition block
// const ICOORD vec); // by vector
void sort_rows(); //decreasing y order
void compress(); //shrink white space
void check_pitch(); //check proportional
void compress( //shrink white space
const ICOORD vec); //and move by vector
void print( //print summary/table
FILE *fp, //file to print on
BOOL8 dump); //dump whole table
// void plot( //draw histogram
// WINDOW window, //window to draw in
// INT32 serial, //serial number
// COLOUR colour); //colour to draw in
// void show( //show image
// IMAGE *image, //image to show
// WINDOW window); //window to show in
void prep_serialise() { //set ptrs to counts
filename.prep_serialise ();
rows.prep_serialise ();
c_blobs.prep_serialise ();
rej_blobs.prep_serialise ();
leftside.prep_serialise ();
rightside.prep_serialise ();
}
void dump( //write external bits
FILE *f) {
filename.dump (f);
rows.dump (f);
c_blobs.dump (f);
rej_blobs.dump (f);
leftside.dump (f);
rightside.dump (f);
if (hand_block != NULL)
hand_block->serialise (f);
}
void de_dump( //read external bits
FILE *f) {
filename.de_dump (f);
rows.de_dump (f);
c_blobs.de_dump (f);
rej_blobs.de_dump (f);
leftside.de_dump (f);
rightside.de_dump (f);
if (hand_block != NULL)
hand_block = TEXT_REGION::de_serialise (f);
}
//assignment
make_serialise (BLOCK) BLOCK & operator= (
const BLOCK & source); //from this
private:
BOOL8 proportional; //proportional
INT8 kerning; //inter blob gap
INT16 spacing; //inter word gap
INT16 pitch; //pitch of non-props
INT16 font_class; //correct font class
INT32 xheight; //height of chars
STRING filename; //name of block
// TEXT_REGION* hand_block; //if it exists
// POLY_BLOCK* hand_poly; //wierd as well
ROW_LIST rows; //rows in block
C_BLOB_LIST c_blobs; //before textord
C_BLOB_LIST rej_blobs; //duff stuff
// ICOORDELT_LIST leftside; //left side vertices
// ICOORDELT_LIST rightside; //right side vertices
// BOX box; //bounding box
};
int decreasing_top_order( //
const void *row1,
const void *row2);
#endif

216
ccstruct/ocrrow.cpp Normal file
View File

@ -0,0 +1,216 @@
/**********************************************************************
* File: ocrrow.cpp (Formerly row.c)
* Description: Code for the ROW class.
* Author: Ray Smith
* Created: Tue Oct 08 15:58:04 BST 1991
*
* (C) Copyright 1991, Hewlett-Packard Ltd.
** Licensed under the Apache License, Version 2.0 (the "License");
** you may not use this file except in compliance with the License.
** You may obtain a copy of the License at
** http://www.apache.org/licenses/LICENSE-2.0
** Unless required by applicable law or agreed to in writing, software
** distributed under the License is distributed on an "AS IS" BASIS,
** WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
** See the License for the specific language governing permissions and
** limitations under the License.
*
**********************************************************************/
#include "mfcpch.h"
#include "ocrrow.h"
#include "blobbox.h"
ELISTIZE_S (ROW)
/**********************************************************************
* ROW::ROW
*
* Constructor to build a ROW. Only the stats stuff are given here.
* The words are added directly.
**********************************************************************/
ROW::ROW ( //constructor
INT32 spline_size, //no of segments
INT32 * xstarts, //segment boundaries
double *coeffs, //coefficients
float x_height, //line height
float ascenders, //ascender size
float descenders, //descender drop
INT16 kern, //char gap
INT16 space //word gap
):
baseline(spline_size, xstarts, coeffs) {
kerning = kern; //just store stuff
spacing = space;
xheight = x_height;
ascrise = ascenders;
descdrop = descenders;
}
/**********************************************************************
* ROW::ROW
*
* Constructor to build a ROW. Only the stats stuff are given here.
* The words are added directly.
**********************************************************************/
ROW::ROW( //constructor
TO_ROW *to_row, //source row
INT16 kern, //char gap
INT16 space //word gap
) {
kerning = kern; //just store stuff
spacing = space;
xheight = to_row->xheight;
ascrise = to_row->ascrise;
descdrop = to_row->descdrop;
baseline = to_row->baseline;
}
/**********************************************************************
* ROW::recalc_bounding_box
*
* Set the bounding box correctly
**********************************************************************/
void ROW::recalc_bounding_box() { //recalculate BB
WERD *word; //current word
WERD_IT it = &words; //words of ROW
INT16 left; //of word
INT16 prev_left; //old left
if (!it.empty ()) {
word = it.data ();
prev_left = word->bounding_box ().left ();
it.forward ();
while (!it.at_first ()) {
word = it.data ();
left = word->bounding_box ().left ();
if (left < prev_left) {
it.move_to_first ();
//words in BB order
it.sort (word_comparator);
break;
}
prev_left = left;
it.forward ();
}
}
for (it.mark_cycle_pt (); !it.cycled_list (); it.forward ()) {
word = it.data ();
if (it.at_first ())
word->set_flag (W_BOL, TRUE);
else
//not start of line
word->set_flag (W_BOL, FALSE);
if (it.at_last ())
word->set_flag (W_EOL, TRUE);
else
//not end of line
word->set_flag (W_EOL, FALSE);
//extend BB as reqd
bound_box += word->bounding_box ();
}
}
/**********************************************************************
* ROW::move
*
* Reposition row by vector
**********************************************************************/
void ROW::move( // reposition row
const ICOORD vec // by vector
) {
WERD_IT it(&words); // word iterator
for (it.mark_cycle_pt (); !it.cycled_list (); it.forward ())
it.data ()->move (vec);
bound_box.move (vec);
baseline.move (vec);
}
/**********************************************************************
* ROW::print
*
* Display members
**********************************************************************/
void ROW::print( //print
FILE *fp //file to print on
) {
tprintf ("Kerning= %d\n", kerning);
tprintf ("Spacing= %d\n", spacing);
bound_box.print ();
tprintf ("Xheight= %f\n", xheight);
tprintf ("Ascrise= %f\n", ascrise);
tprintf ("Descdrop= %f\n", descdrop);
}
/**********************************************************************
* ROW::plot
*
* Draw the ROW in the given colour.
**********************************************************************/
#ifndef GRAPHICS_DISABLED
void ROW::plot( //draw it
WINDOW window, //window to draw in
COLOUR colour //colour to draw in
) {
WERD *word; //current word
WERD_IT it = &words; //words of ROW
for (it.mark_cycle_pt (); !it.cycled_list (); it.forward ()) {
word = it.data ();
word->plot (window, colour); //all in one colour
}
}
#endif
/**********************************************************************
* ROW::plot
*
* Draw the ROW in rainbow colours.
**********************************************************************/
#ifndef GRAPHICS_DISABLED
void ROW::plot( //draw it
WINDOW window //window to draw in
) {
WERD *word; //current word
WERD_IT it = &words; //words of ROW
for (it.mark_cycle_pt (); !it.cycled_list (); it.forward ()) {
word = it.data ();
word->plot (window); //in rainbow colours
}
}
#endif
/**********************************************************************
* ROW::operator=
*
* Assign rows by duplicating the row structure but NOT the WERDLIST
**********************************************************************/
ROW & ROW::operator= ( //assignment
const ROW & source //from this
) {
this->ELIST_LINK::operator= (source);
kerning = source.kerning;
spacing = source.spacing;
xheight = source.xheight;
ascrise = source.ascrise;
descdrop = source.descdrop;
if (!words.empty ())
words.clear ();
baseline = source.baseline; //QSPLINES must do =
bound_box = source.bound_box;
return *this;
}

133
ccstruct/ocrrow.h Normal file
View File

@ -0,0 +1,133 @@
/**********************************************************************
* File: ocrrow.h (Formerly row.h)
* Description: Code for the ROW class.
* Author: Ray Smith
* Created: Tue Oct 08 15:58:04 BST 1991
*
* (C) Copyright 1991, Hewlett-Packard Ltd.
** Licensed under the Apache License, Version 2.0 (the "License");
** you may not use this file except in compliance with the License.
** You may obtain a copy of the License at
** http://www.apache.org/licenses/LICENSE-2.0
** Unless required by applicable law or agreed to in writing, software
** distributed under the License is distributed on an "AS IS" BASIS,
** WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
** See the License for the specific language governing permissions and
** limitations under the License.
*
**********************************************************************/
#ifndef OCRROW_H
#define OCRROW_H
#include <stdio.h>
#include "quspline.h"
#include "werd.h"
class TO_ROW;
class ROW:public ELIST_LINK
{
friend void tweak_row_baseline(ROW *);
public:
ROW() {
} //empty constructor
ROW( //constructor
INT32 spline_size, //no of segments
INT32 *xstarts, //segment boundaries
double *coeffs, //coefficients //ascender size
float x_height,
float ascenders,
float descenders, //descender size
INT16 kern, //char gap
INT16 space); //word gap
ROW( //constructor
TO_ROW *row, //textord row
INT16 kern, //char gap
INT16 space); //word gap
WERD_LIST *word_list() { //get words
return &words;
}
float base_line( //compute baseline
float xpos) const { //at the position
//get spline value
return (float) baseline.y (xpos);
}
float x_height() const { //return x height
return xheight;
}
INT32 kern() const { //return kerning
return kerning;
}
INT32 space() const { //return spacing
return spacing;
}
float ascenders() const { //return size
return ascrise;
}
float descenders() const { //return size
return descdrop;
}
BOX bounding_box() const { //return bounding box
return bound_box;
}
void recalc_bounding_box(); //recalculate BB
void move( // reposition row
const ICOORD vec); // by vector
void print( //print
FILE *fp); //file to print on
void plot( //draw one
WINDOW window, //window to draw in
COLOUR colour); //uniform colour
void plot( //draw one
WINDOW window); //in rainbow colours
#ifndef GRAPHICS_DISABLED
void plot_baseline( //draw the baseline
WINDOW window, //window to draw in
COLOUR colour) { //colour to draw
//draw it
baseline.plot (window, colour);
}
#endif
void prep_serialise() { //set ptrs to counts
words.prep_serialise ();
baseline.prep_serialise ();
}
void dump( //write external bits
FILE *f) {
words.dump (f);
baseline.dump (f);
}
void de_dump( //read external bits
FILE *f) {
words.de_dump (f);
baseline.de_dump (f);
}
//assignment
make_serialise (ROW) ROW & operator= (
const ROW & source); //from this
private:
INT32 kerning; //inter char gap
INT32 spacing; //inter word gap
BOX bound_box; //bounding box
float xheight; //height of line
float ascrise; //size of ascenders
float descdrop; //-size of descenders
WERD_LIST words; //words
QSPLINE baseline; //baseline spline
};
ELISTIZEH_S (ROW)
#endif

879
ccstruct/pageblk.cpp Normal file
View File

@ -0,0 +1,879 @@
#include "mfcpch.h"
#include "pageblk.h"
#include <stdio.h>
#include <ctype.h>
#include <math.h>
#ifdef __UNIX__
#include <unistd.h>
#else
#include <io.h>
#endif
#include "hpddef.h" //must be last (handpd.dll)
#define G_START 0
#define I_START 1
#define R_START 3
#define S_START 5
extern char blabel[NUM_BLOCK_ATTR][4][MAXLENGTH];
extern char backlabel[NUM_BACKGROUNDS][MAXLENGTH];
ELISTIZE_S (PAGE_BLOCK)
void PAGE_BLOCK::pb_delete() {
switch (pb_type) {
case PB_TEXT:
delete ((TEXT_BLOCK *) this);
break;
case PB_GRAPHICS:
delete ((GRAPHICS_BLOCK *) this);
break;
case PB_IMAGE:
delete ((IMAGE_BLOCK *) this);
break;
case PB_RULES:
delete ((RULE_BLOCK *) this);
break;
case PB_SCRIBBLE:
delete ((SCRIBBLE_BLOCK *) this);
break;
case PB_WEIRD:
delete ((WEIRD_BLOCK *) this);
break;
default:
break;
}
}
#define QUOTE_IT( parm ) #parm
void PAGE_BLOCK::serialise(FILE *f) {
if (fwrite (&pb_type, sizeof (PB_TYPE), 1, f) != 1)
WRITEFAILED.error (QUOTE_IT (PAGE_BLOCK::serialise), ABORT, NULL);
switch (pb_type) {
case PB_TEXT:
((TEXT_BLOCK *) this)->serialise (f);
break;
case PB_GRAPHICS:
((GRAPHICS_BLOCK *) this)->serialise (f);
break;
case PB_RULES:
((RULE_BLOCK *) this)->serialise (f);
break;
case PB_IMAGE:
((IMAGE_BLOCK *) this)->serialise (f);
break;
case PB_SCRIBBLE:
((SCRIBBLE_BLOCK *) this)->serialise (f);
break;
case PB_WEIRD:
((WEIRD_BLOCK *) this)->serialise (f);
break;
default:
break;
}
}
PAGE_BLOCK *PAGE_BLOCK::de_serialise(FILE *f) {
PB_TYPE type;
TEXT_BLOCK *tblock;
GRAPHICS_BLOCK *gblock;
RULE_BLOCK *rblock;
IMAGE_BLOCK *iblock;
SCRIBBLE_BLOCK *sblock;
WEIRD_BLOCK *wblock;
if (fread ((void *) &type, sizeof (PB_TYPE), 1, f) != 1)
WRITEFAILED.error (QUOTE_IT (PAGE_BLOCK::serialise), ABORT, NULL);
switch (type) {
case PB_TEXT:
tblock = (TEXT_BLOCK *) alloc_struct (sizeof (TEXT_BLOCK));
return tblock->de_serialise (f);
case PB_GRAPHICS:
gblock = (GRAPHICS_BLOCK *) alloc_struct (sizeof (GRAPHICS_BLOCK));
return gblock->de_serialise (f);
case PB_RULES:
rblock = (RULE_BLOCK *) alloc_struct (sizeof (RULE_BLOCK));
return rblock->de_serialise (f);
case PB_IMAGE:
iblock = (IMAGE_BLOCK *) alloc_struct (sizeof (IMAGE_BLOCK));
return iblock->de_serialise (f);
case PB_SCRIBBLE:
sblock = (SCRIBBLE_BLOCK *) alloc_struct (sizeof (SCRIBBLE_BLOCK));
return sblock->de_serialise (f);
case PB_WEIRD:
wblock = (WEIRD_BLOCK *) alloc_struct (sizeof (SCRIBBLE_BLOCK));
return wblock->de_serialise (f);
default:
return NULL;
}
}
/**********************************************************************
* PAGE_BLOCK::serialise_asc() Convert to ascii file.
*
**********************************************************************/
void PAGE_BLOCK::serialise_asc( //convert to ascii
FILE *f //file to use
) {
serialise_INT32(f, pb_type);
switch (pb_type) {
case PB_TEXT:
((TEXT_BLOCK *) this)->serialise_asc (f);
break;
case PB_GRAPHICS:
((GRAPHICS_BLOCK *) this)->serialise_asc (f);
break;
case PB_RULES:
((RULE_BLOCK *) this)->serialise_asc (f);
break;
case PB_IMAGE:
((IMAGE_BLOCK *) this)->serialise_asc (f);
break;
case PB_SCRIBBLE:
((SCRIBBLE_BLOCK *) this)->serialise_asc (f);
break;
case PB_WEIRD:
((WEIRD_BLOCK *) this)->serialise_asc (f);
break;
default:
break;
}
}
/**********************************************************************
* PAGE_BLOCK::internal_serialise_asc() Convert to ascii file.
*
**********************************************************************/
void PAGE_BLOCK::internal_serialise_asc( //convert to ascii
FILE *f //file to use
) {
((POLY_BLOCK *) this)->serialise_asc (f);
serialise_INT32(f, pb_type);
children.serialise_asc (f);
}
/**********************************************************************
* PAGE_BLOCK::de_serialise_asc() Convert from ascii file.
*
**********************************************************************/
void PAGE_BLOCK::de_serialise_asc( //convert from ascii
FILE *f //file to use
) {
PAGE_BLOCK *page_block; //new block for list
INT32 len; /*length to retrive */
PAGE_BLOCK_IT it;
((POLY_BLOCK *) this)->de_serialise_asc (f);
pb_type = (PB_TYPE) de_serialise_INT32 (f);
// children.de_serialise_asc(f);
len = de_serialise_INT32 (f);
it.set_to_list (&children);
for (; len > 0; len--) {
page_block = new_de_serialise_asc (f);
it.add_to_end (page_block); /*put on the list */
}
}
/**********************************************************************
* PAGE_BLOCK::new_de_serialise_asc() Convert from ascii file.
*
**********************************************************************/
PAGE_BLOCK *PAGE_BLOCK::new_de_serialise_asc( //convert from ascii
FILE *f //file to use
) {
PB_TYPE type;
TEXT_BLOCK *tblock;
GRAPHICS_BLOCK *gblock;
RULE_BLOCK *rblock;
IMAGE_BLOCK *iblock;
SCRIBBLE_BLOCK *sblock;
WEIRD_BLOCK *wblock;
type = (PB_TYPE) de_serialise_INT32 (f);
switch (type) {
case PB_TEXT:
tblock = new TEXT_BLOCK;
tblock->de_serialise_asc (f);
return tblock;
case PB_GRAPHICS:
gblock = new GRAPHICS_BLOCK;
gblock->de_serialise_asc (f);
return gblock;
case PB_RULES:
rblock = new RULE_BLOCK;
rblock->de_serialise_asc (f);
return rblock;
case PB_IMAGE:
iblock = new IMAGE_BLOCK;
iblock->de_serialise_asc (f);
return iblock;
case PB_SCRIBBLE:
sblock = new SCRIBBLE_BLOCK;
sblock->de_serialise_asc (f);
return sblock;
case PB_WEIRD:
wblock = new WEIRD_BLOCK;
wblock->de_serialise_asc (f);
return wblock;
default:
return NULL;
}
}
void PAGE_BLOCK::show_attrs(DEBUG_WIN *f) {
PAGE_BLOCK_IT it;
switch (pb_type) {
case PB_TEXT:
((TEXT_BLOCK *) this)->show_attrs (f);
break;
case PB_GRAPHICS:
((GRAPHICS_BLOCK *) this)->show_attrs (f);
break;
case PB_RULES:
((RULE_BLOCK *) this)->show_attrs (f);
break;
case PB_IMAGE:
((IMAGE_BLOCK *) this)->show_attrs (f);
break;
case PB_SCRIBBLE:
((SCRIBBLE_BLOCK *) this)->show_attrs (f);
break;
case PB_WEIRD:
((WEIRD_BLOCK *) this)->show_attrs (f);
break;
default:
break;
}
if (!children.empty ()) {
f->dprintf ("containing subblocks\n");
it.set_to_list (&children);
for (it.mark_cycle_pt (); !it.cycled_list (); it.forward ())
it.data ()->show_attrs (f);
f->dprintf ("end of subblocks\n");
}
}
PAGE_BLOCK::PAGE_BLOCK (ICOORDELT_LIST * points, PB_TYPE type, PAGE_BLOCK_LIST * child):POLY_BLOCK (points,
POLY_PAGE) {
PAGE_BLOCK_IT
c = &children;
pb_type = type;
children.clear ();
c.move_to_first ();
c.add_list_before (child);
}
PAGE_BLOCK::PAGE_BLOCK (ICOORDELT_LIST * points, PB_TYPE type):POLY_BLOCK (points,
POLY_PAGE) {
pb_type = type;
children.clear ();
}
void PAGE_BLOCK::add_a_child(PAGE_BLOCK *newchild) {
PAGE_BLOCK_IT c = &children;
c.move_to_first ();
c.add_to_end (newchild);
}
/**********************************************************************
* PAGE_BLOCK::rotate
*
* Rotate the PAGE_BLOCK and its children
**********************************************************************/
void PAGE_BLOCK::rotate( //cos,sin
FCOORD rotation) {
//sub block iterator
PAGE_BLOCK_IT child_it = &children;
PAGE_BLOCK *child; //child block
for (child_it.mark_cycle_pt (); !child_it.cycled_list ();
child_it.forward ()) {
child = child_it.data ();
child->rotate (rotation);
}
if (pb_type == PB_TEXT)
((TEXT_BLOCK *) this)->rotate (rotation);
else
POLY_BLOCK::rotate(rotation);
}
/**********************************************************************
* PAGE_BLOCK::move
*
* Move the PAGE_BLOCK and its children
**********************************************************************/
void PAGE_BLOCK::move(ICOORD shift //amount to move
) {
//sub block iterator
PAGE_BLOCK_IT child_it = &children;
PAGE_BLOCK *child; //child block
for (child_it.mark_cycle_pt (); !child_it.cycled_list ();
child_it.forward ()) {
child = child_it.data ();
child->move (shift);
}
if (pb_type == PB_TEXT)
((TEXT_BLOCK *) this)->move (shift);
else
POLY_BLOCK::move(shift);
}
#ifndef GRAPHICS_DISABLED
void PAGE_BLOCK::basic_plot(WINDOW window, COLOUR colour) {
PAGE_BLOCK_IT c = &children;
POLY_BLOCK::plot (window, colour, 0);
if (!c.empty ())
for (c.mark_cycle_pt (); !c.cycled_list (); c.forward ())
c.data ()->plot (window, colour);
}
void PAGE_BLOCK::plot(WINDOW window, COLOUR colour) {
TEXT_BLOCK *tblock;
WEIRD_BLOCK *wblock;
switch (pb_type) {
case PB_TEXT:
basic_plot(window, colour);
tblock = (TEXT_BLOCK *) this;
tblock->plot (window, colour, REGION_COLOUR, SUBREGION_COLOUR);
break;
case PB_WEIRD:
wblock = (WEIRD_BLOCK *) this;
wblock->plot (window, colour);
break;
default:
basic_plot(window, colour);
break;
}
}
#endif
void show_all_in(PAGE_BLOCK *pblock, POLY_BLOCK *show_area, DEBUG_WIN *f) {
PAGE_BLOCK_IT c;
INT16 i, pnum;
c.set_to_list (pblock->child ());
pnum = pblock->child ()->length ();
for (i = 0; i < pnum; i++, c.forward ()) {
if (show_area->contains (c.data ()))
c.data ()->show_attrs (f);
else if (show_area->overlap (c.data ()))
show_all_in (c.data (), show_area, f);
}
}
void delete_all_in(PAGE_BLOCK *pblock, POLY_BLOCK *delete_area) {
PAGE_BLOCK_IT c;
INT16 i, pnum;
c.set_to_list (pblock->child ());
pnum = pblock->child ()->length ();
for (i = 0; i < pnum; i++, c.forward ()) {
if (delete_area->contains (c.data ()))
c.extract ()->pb_delete ();
else if (delete_area->overlap (c.data ()))
delete_all_in (c.data (), delete_area);
}
}
PAGE_BLOCK *smallest_containing(PAGE_BLOCK *pblock, POLY_BLOCK *other) {
PAGE_BLOCK_IT c;
c.set_to_list (pblock->child ());
if (c.empty ())
return (pblock);
for (c.mark_cycle_pt (); !c.cycled_list (); c.forward ())
if (c.data ()->contains (other))
return (smallest_containing (c.data (), other));
return (pblock);
}
TEXT_BLOCK::TEXT_BLOCK (ICOORDELT_LIST * points, BOOL8 backg[NUM_BACKGROUNDS]):PAGE_BLOCK (points,
PB_TEXT) {
int
i;
for (i = 0; i < NUM_BACKGROUNDS; i++)
background.set_bit (i, backg[i]);
text_regions.clear ();
}
void
TEXT_BLOCK::set_attrs (BOOL8 backg[NUM_BACKGROUNDS]) {
int i;
for (i = 0; i < NUM_BACKGROUNDS; i++)
background.set_bit (i, backg[i]);
}
void TEXT_BLOCK::add_a_region(TEXT_REGION *newchild) {
TEXT_REGION_IT c;
c.set_to_list (&text_regions);
c.move_to_first ();
c.add_to_end (newchild);
}
/**********************************************************************
* TEXT_BLOCK::rotate
*
* Rotate the TEXT_BLOCK and its children
**********************************************************************/
void TEXT_BLOCK::rotate( //cos,sin
FCOORD rotation) {
//sub block iterator
TEXT_REGION_IT child_it = &text_regions;
TEXT_REGION *child; //child block
for (child_it.mark_cycle_pt (); !child_it.cycled_list ();
child_it.forward ()) {
child = child_it.data ();
child->rotate (rotation);
}
POLY_BLOCK::rotate(rotation);
}
/**********************************************************************
* TEXT_BLOCK::move
*
* Move the TEXT_BLOCK and its children
**********************************************************************/
void TEXT_BLOCK::move(ICOORD shift //amount to move
) {
//sub block iterator
TEXT_REGION_IT child_it = &text_regions;
TEXT_REGION *child; //child block
for (child_it.mark_cycle_pt (); !child_it.cycled_list ();
child_it.forward ()) {
child = child_it.data ();
child->move (shift);
}
POLY_BLOCK::move(shift);
}
/**********************************************************************
* TEXT_BLOCK::serialise_asc() Convert to ascii file.
*
**********************************************************************/
void TEXT_BLOCK::serialise_asc( //convert to ascii
FILE *f //file to use
) {
((PAGE_BLOCK *) this)->internal_serialise_asc (f);
serialise_INT32 (f, background.val);
text_regions.serialise_asc (f);
}
/**********************************************************************
* TEXT_BLOCK::de_serialise_asc() Convert from ascii file.
*
**********************************************************************/
void TEXT_BLOCK::de_serialise_asc( //convert from ascii
FILE *f //file to use
) {
((PAGE_BLOCK *) this)->de_serialise_asc (f);
background.val = de_serialise_INT32 (f);
text_regions.de_serialise_asc (f);
}
#ifndef GRAPHICS_DISABLED
void TEXT_BLOCK::plot(WINDOW window,
COLOUR colour,
COLOUR region_colour,
COLOUR subregion_colour) {
TEXT_REGION_IT t = &text_regions, tc;
PAGE_BLOCK::basic_plot(window, colour);
if (!t.empty ())
for (t.mark_cycle_pt (); !t.cycled_list (); t.forward ()) {
t.data ()->plot (window, region_colour, t.data ()->id_no ());
tc.set_to_list (t.data ()->regions ());
if (!tc.empty ())
for (tc.mark_cycle_pt (); !tc.cycled_list (); tc.forward ())
tc.data ()->plot (window, subregion_colour, -1);
}
}
#endif
void TEXT_BLOCK::show_attrs(DEBUG_WIN *f) {
TEXT_REGION_IT it;
f->dprintf ("TEXT BLOCK\n");
print_background(f, background);
if (!text_regions.empty ()) {
f->dprintf ("containing text regions:\n");
it.set_to_list (&text_regions);
for (it.mark_cycle_pt (); !it.cycled_list (); it.forward ())
it.data ()->show_attrs (f);
f->dprintf ("end of regions\n");
}
}
DLLSYM void show_all_tr_in(TEXT_BLOCK *tblock,
POLY_BLOCK *show_area,
DEBUG_WIN *f) {
TEXT_REGION_IT t, tc;
INT16 i, tnum, j, ttnum;
t.set_to_list (tblock->regions ());
tnum = tblock->regions ()->length ();
for (i = 0; i < tnum; i++, t.forward ()) {
if (show_area->contains (t.data ()))
t.data ()->show_attrs (f);
else if (show_area->overlap (t.data ())) {
tc.set_to_list (t.data ()->regions ());
ttnum = t.data ()->regions ()->length ();
for (j = 0; j < ttnum; j++, tc.forward ())
if (show_area->contains (tc.data ()))
tc.data ()->show_attrs (f);
}
}
}
void delete_all_tr_in(TEXT_BLOCK *tblock, POLY_BLOCK *delete_area) {
TEXT_REGION_IT t, tc;
INT16 i, tnum, j, ttnum;
t.set_to_list (tblock->regions ());
tnum = tblock->regions ()->length ();
for (i = 0; i < tnum; i++, t.forward ()) {
if (delete_area->contains (t.data ()))
delete (t.extract ());
else if (delete_area->overlap (t.data ())) {
tc.set_to_list (t.data ()->regions ());
ttnum = t.data ()->regions ()->length ();
for (j = 0; j < ttnum; j++, tc.forward ())
if (delete_area->contains (tc.data ()))
delete (tc.extract ());
}
}
}
RULE_BLOCK::RULE_BLOCK (ICOORDELT_LIST * points, INT8 sing, INT8 colo):PAGE_BLOCK (points,
PB_RULES) {
multiplicity = sing;
colour = colo;
}
void RULE_BLOCK::set_attrs(INT8 sing, INT8 colo) {
multiplicity = sing;
colour = colo;
}
void RULE_BLOCK::show_attrs(DEBUG_WIN *f) {
f->dprintf ("RULE BLOCK with attributes %s, %s\n",
blabel[R_START][multiplicity], blabel[R_START + 1][colour]);
}
/**********************************************************************
* RULE_BLOCK::serialise_asc() Convert to ascii file.
*
**********************************************************************/
void RULE_BLOCK::serialise_asc( //convert to ascii
FILE *f //file to use
) {
((PAGE_BLOCK *) this)->internal_serialise_asc (f);
serialise_INT32(f, multiplicity);
serialise_INT32(f, colour);
}
/**********************************************************************
* RULE_BLOCK::de_serialise_asc() Convert from ascii file.
*
**********************************************************************/
void RULE_BLOCK::de_serialise_asc( //convert from ascii
FILE *f //file to use
) {
((PAGE_BLOCK *) this)->de_serialise_asc (f);
multiplicity = de_serialise_INT32 (f);
colour = de_serialise_INT32 (f);
}
GRAPHICS_BLOCK::GRAPHICS_BLOCK (ICOORDELT_LIST * points, BOOL8 backg[NUM_BACKGROUNDS], INT8 foreg):PAGE_BLOCK (points,
PB_GRAPHICS) {
int
i;
for (i = 0; i < NUM_BACKGROUNDS; i++)
background.set_bit (i, backg[i]);
foreground = foreg;
}
void
GRAPHICS_BLOCK::set_attrs (BOOL8 backg[NUM_BACKGROUNDS], INT8 foreg) {
int i;
for (i = 0; i < NUM_BACKGROUNDS; i++)
background.set_bit (i, backg[i]);
foreground = foreg;
}
void GRAPHICS_BLOCK::show_attrs(DEBUG_WIN *f) {
f->dprintf ("GRAPHICS BLOCK with attribute %s\n",
blabel[G_START][foreground]);
print_background(f, background);
}
/**********************************************************************
* GRAPHICS_BLOCK::serialise_asc() Convert to ascii file.
*
**********************************************************************/
void GRAPHICS_BLOCK::serialise_asc( //convert to ascii
FILE *f //file to use
) {
((PAGE_BLOCK *) this)->internal_serialise_asc (f);
serialise_INT32 (f, background.val);
serialise_INT32(f, foreground);
}
/**********************************************************************
* GRAPHICS_BLOCK::de_serialise_asc() Convert from ascii file.
*
**********************************************************************/
void GRAPHICS_BLOCK::de_serialise_asc( //convert from ascii
FILE *f //file to use
) {
((PAGE_BLOCK *) this)->de_serialise_asc (f);
background.val = de_serialise_INT32 (f);
foreground = de_serialise_INT32 (f);
}
IMAGE_BLOCK::IMAGE_BLOCK (ICOORDELT_LIST * points, INT8 colo, INT8 qual):PAGE_BLOCK (points,
PB_IMAGE) {
colour = colo;
quality = qual;
}
void IMAGE_BLOCK::set_attrs(INT8 colo, INT8 qual) {
colour = colo;
quality = qual;
}
void IMAGE_BLOCK::show_attrs(DEBUG_WIN *f) {
f->dprintf ("IMAGE BLOCK with attributes %s, %s\n", blabel[I_START][colour],
blabel[I_START + 1][quality]);
}
/**********************************************************************
* IMAGE_BLOCK::serialise_asc() Convert to ascii file.
*
**********************************************************************/
void IMAGE_BLOCK::serialise_asc( //convert to ascii
FILE *f //file to use
) {
((PAGE_BLOCK *) this)->internal_serialise_asc (f);
serialise_INT32(f, colour);
serialise_INT32(f, quality);
}
/**********************************************************************
* IMAGE_BLOCK::de_serialise_asc() Convert from ascii file.
*
**********************************************************************/
void IMAGE_BLOCK::de_serialise_asc( //convert from ascii
FILE *f //file to use
) {
((PAGE_BLOCK *) this)->de_serialise_asc (f);
colour = de_serialise_INT32 (f);
quality = de_serialise_INT32 (f);
}
SCRIBBLE_BLOCK::SCRIBBLE_BLOCK (ICOORDELT_LIST * points, BOOL8 backg[NUM_BACKGROUNDS], INT8 foreg):PAGE_BLOCK (points,
PB_SCRIBBLE) {
int
i;
for (i = 0; i < NUM_BACKGROUNDS; i++)
background.set_bit (i, backg[i]);
foreground = foreg;
}
void
SCRIBBLE_BLOCK::set_attrs (BOOL8 backg[NUM_BACKGROUNDS], INT8 foreg) {
int i;
for (i = 0; i < NUM_BACKGROUNDS; i++)
background.set_bit (i, backg[i]);
foreground = foreg;
}
void SCRIBBLE_BLOCK::show_attrs(DEBUG_WIN *f) {
f->dprintf ("SCRIBBLE BLOCK with attributes %s\n",
blabel[S_START][foreground]);
print_background(f, background);
}
/**********************************************************************
* SCRIBBLE_BLOCK::serialise_asc() Convert to ascii file.
*
**********************************************************************/
void SCRIBBLE_BLOCK::serialise_asc( //convert to ascii
FILE *f //file to use
) {
((PAGE_BLOCK *) this)->internal_serialise_asc (f);
serialise_INT32 (f, background.val);
serialise_INT32(f, foreground);
}
/**********************************************************************
* SCRIBBLE_BLOCK::de_serialise_asc() Convert from ascii file.
*
**********************************************************************/
void SCRIBBLE_BLOCK::de_serialise_asc( //convert from ascii
FILE *f //file to use
) {
((PAGE_BLOCK *) this)->de_serialise_asc (f);
background.val = de_serialise_INT32 (f);
foreground = de_serialise_INT32 (f);
}
WEIRD_BLOCK::WEIRD_BLOCK (ICOORDELT_LIST * points, INT32 id_no):PAGE_BLOCK (points,
PB_WEIRD) {
id_number = id_no;
}
#ifndef GRAPHICS_DISABLED
void WEIRD_BLOCK::plot(WINDOW window, COLOUR colour) {
PAGE_BLOCK_IT c = this->child ();
POLY_BLOCK::plot(window, colour, id_number);
if (!c.empty ())
for (c.mark_cycle_pt (); !c.cycled_list (); c.forward ())
c.data ()->plot (window, colour);
}
#endif
void WEIRD_BLOCK::set_id(INT32 id_no) {
id_number = id_no;
}
void WEIRD_BLOCK::show_attrs(DEBUG_WIN *f) {
f->dprintf ("WEIRD BLOCK with id number %d\n", id_number);
}
/**********************************************************************
* WEIRD_BLOCK::serialise_asc() Convert to ascii file.
*
**********************************************************************/
void WEIRD_BLOCK::serialise_asc( //convert to ascii
FILE *f //file to use
) {
((PAGE_BLOCK *) this)->internal_serialise_asc (f);
serialise_INT32(f, id_number);
}
/**********************************************************************
* WEIRD_BLOCK::de_serialise_asc() Convert from ascii file.
*
**********************************************************************/
void WEIRD_BLOCK::de_serialise_asc( //convert from ascii
FILE *f //file to use
) {
((PAGE_BLOCK *) this)->de_serialise_asc (f);
id_number = de_serialise_INT32 (f);
}
void print_background(DEBUG_WIN *f, BITS16 background) {
int i;
f->dprintf ("Background is \n");
for (i = 0; i < NUM_BACKGROUNDS; i++) {
if (background.bit (i))
f->dprintf ("%s, ", backlabel[i]);
}
f->dprintf ("\n");
}

318
ccstruct/pageblk.h Normal file
View File

@ -0,0 +1,318 @@
#ifndef PAGEBLK_C
#define PAGEBLK_C
#include "elst.h"
#include "txtregn.h"
#include "bits16.h"
#include "hpddef.h" //must be last (handpd.dll)
enum PB_TYPE
{
PB_TEXT,
PB_RULES,
PB_GRAPHICS,
PB_IMAGE,
PB_SCRIBBLE,
PB_WEIRD
};
class DLLSYM PAGE_BLOCK; //forward decl
class DLLSYM TEXT_BLOCK; //forward decl
class DLLSYM GRAPHICS_BLOCK; //forward decl
class DLLSYM RULE_BLOCK; //forward decl
class DLLSYM IMAGE_BLOCK; //forward decl
class DLLSYM SCRIBBLE_BLOCK; //forward decl
class DLLSYM WEIRD_BLOCK; //forward decl
ELISTIZEH_S (PAGE_BLOCK)
class DLLSYM PAGE_BLOCK:public ELIST_LINK, public POLY_BLOCK
//page block
{
public:
PAGE_BLOCK() {
} //empty constructor
PAGE_BLOCK( //simple constructor
ICOORDELT_LIST *points,
PB_TYPE type,
PAGE_BLOCK_LIST *child);
PAGE_BLOCK( //simple constructor
ICOORDELT_LIST *points,
PB_TYPE type);
~PAGE_BLOCK () { //destructor
}
void add_a_child(PAGE_BLOCK *newchild);
PB_TYPE type() { //get type
return pb_type;
}
PAGE_BLOCK_LIST *child() { //get children
return &children;
}
void rotate( //rotate it
FCOORD rotation);
void move( //move it
ICOORD shift); //vector
void basic_plot(WINDOW window, COLOUR colour);
void plot(WINDOW window, COLOUR colour);
void show_attrs(DEBUG_WIN *debug);
NEWDELETE2 (PAGE_BLOCK) void pb_delete ();
void serialise(FILE *f);
static PAGE_BLOCK *de_serialise(FILE *f);
void prep_serialise() { //set ptrs to counts
POLY_BLOCK::prep_serialise();
children.prep_serialise ();
}
void dump( //write external bits
FILE *f) {
POLY_BLOCK::dump(f);
children.dump (f);
}
void de_dump( //read external bits
FILE *f) {
POLY_BLOCK::de_dump(f);
children.de_dump (f);
}
//note that due to the awful switched nature of the PAGE_BLOCK class,
//a PAGE_BLOCK_LIST cannot be de-serialised by the normal mechanism, since
//each element cannot be de-serialised in place.
//To fix this it is important to use read_poly_blocks or the code therein.
void serialise_asc( //serialise to ascii
FILE *f);
void internal_serialise_asc( //serialise to ascii
FILE *f);
void de_serialise_asc( //serialise from ascii
FILE *f);
//make one from ascii
static PAGE_BLOCK *new_de_serialise_asc(FILE *f);
private:
PB_TYPE pb_type;
PAGE_BLOCK_LIST children;
};
DLLSYM void show_all_in(PAGE_BLOCK *pblock,
POLY_BLOCK *show_area,
DEBUG_WIN *f);
DLLSYM void delete_all_in(PAGE_BLOCK *pblock, POLY_BLOCK *delete_area);
DLLSYM PAGE_BLOCK *smallest_containing(PAGE_BLOCK *pblock, POLY_BLOCK *other);
class DLLSYM TEXT_BLOCK:public PAGE_BLOCK
//text block
{
public:
TEXT_BLOCK() {
} //empty constructor
TEXT_BLOCK(ICOORDELT_LIST *points);
TEXT_BLOCK (ICOORDELT_LIST * points, BOOL8 backg[NUM_BACKGROUNDS]);
//get children
TEXT_REGION_LIST *regions() {
return &text_regions;
}
INT32 nregions() {
return text_regions.length ();
}
void add_a_region(TEXT_REGION *newchild);
void rotate( //rotate it
FCOORD rotation);
void move( //move it
ICOORD shift); //vector
void plot(WINDOW window,
COLOUR colour,
COLOUR region_colour,
COLOUR subregion_colour);
void set_attrs (BOOL8 backg[NUM_BACKGROUNDS]);
void show_attrs(DEBUG_WIN *debug);
void prep_serialise() { //set ptrs to counts
PAGE_BLOCK::prep_serialise();
text_regions.prep_serialise ();
}
void dump( //write external bits
FILE *f) {
PAGE_BLOCK::dump(f);
text_regions.dump (f);
}
void de_dump( //read external bits
FILE *f) {
PAGE_BLOCK::de_dump(f);
text_regions.de_dump (f);
}
//serialise to ascii
make_serialise (TEXT_BLOCK) void serialise_asc (
FILE * f);
void de_serialise_asc( //serialise from ascii
FILE *f);
private:
BITS16 background;
TEXT_REGION_LIST text_regions;
};
DLLSYM void delete_all_tr_in(TEXT_BLOCK *tblock, POLY_BLOCK *delete_area);
DLLSYM void show_all_tr_in(TEXT_BLOCK *tblock,
POLY_BLOCK *show_area,
DEBUG_WIN *f);
class DLLSYM RULE_BLOCK:public PAGE_BLOCK
//rule block
{
public:
RULE_BLOCK() {
} //empty constructor
RULE_BLOCK(ICOORDELT_LIST *points, INT8 sing, INT8 colo);
void set_attrs(INT8 sing, INT8 colo);
void show_attrs(DEBUG_WIN *debug);
//serialise to ascii
make_serialise (RULE_BLOCK) void serialise_asc (
FILE * f);
void de_serialise_asc( //serialise from ascii
FILE *f);
private:
INT8 multiplicity;
INT8 colour;
};
class DLLSYM GRAPHICS_BLOCK:public PAGE_BLOCK
//graphics block
{
public:
GRAPHICS_BLOCK() {
} //empty constructor
GRAPHICS_BLOCK (ICOORDELT_LIST * points,
BOOL8 backg[NUM_BACKGROUNDS], INT8 foreg);
void set_attrs (BOOL8 backg[NUM_BACKGROUNDS], INT8 foreg);
void show_attrs(DEBUG_WIN *debug);
//serialise to ascii
make_serialise (GRAPHICS_BLOCK) void serialise_asc (
FILE * f);
void de_serialise_asc( //serialise from ascii
FILE *f);
private:
BITS16 background;
INT8 foreground;
};
class DLLSYM IMAGE_BLOCK:public PAGE_BLOCK
//image block
{
public:
IMAGE_BLOCK() {
} //empty constructor
IMAGE_BLOCK(ICOORDELT_LIST *points, INT8 colo, INT8 qual);
void set_attrs(INT8 colo, INT8 qual);
void show_attrs(DEBUG_WIN *debug);
//serialise to ascii
make_serialise (IMAGE_BLOCK) void serialise_asc (
FILE * f);
void de_serialise_asc( //serialise from ascii
FILE *f);
private:
INT8 colour;
INT8 quality;
};
class DLLSYM SCRIBBLE_BLOCK:public PAGE_BLOCK
//scribble block
{
public:
SCRIBBLE_BLOCK() {
} //empty constructor
SCRIBBLE_BLOCK (ICOORDELT_LIST * points,
BOOL8 backg[NUM_BACKGROUNDS], INT8 foreg);
void set_attrs (BOOL8 backg[NUM_BACKGROUNDS], INT8 foreg);
void show_attrs(DEBUG_WIN *debug);
//serialise to ascii
make_serialise (SCRIBBLE_BLOCK) void serialise_asc (
FILE * f);
void de_serialise_asc( //serialise from ascii
FILE *f);
private:
BITS16 background;
INT8 foreground;
};
class DLLSYM WEIRD_BLOCK:public PAGE_BLOCK
//weird block
{
public:
WEIRD_BLOCK() {
} //empty constructor
WEIRD_BLOCK(ICOORDELT_LIST *points, INT32 id_no);
void set_id(INT32 id_no);
void show_attrs(DEBUG_WIN *debug);
void set_id_no(INT32 new_id) {
id_number = new_id;
}
void plot(WINDOW window, COLOUR colour);
INT32 id_no() {
return id_number;
}
//serialise to ascii
make_serialise (WEIRD_BLOCK) void serialise_asc (
FILE * f);
void de_serialise_asc( //serialise from ascii
FILE *f);
private:
INT32 id_number; //unique id
};
void print_background(DEBUG_WIN *f, BITS16 background);
#endif

325
ccstruct/pageres.cpp Normal file
View File

@ -0,0 +1,325 @@
/**********************************************************************
* File: pageres.cpp (Formerly page_res.c)
* Description: Results classes used by control.c
* Author: Phil Cheatle
* Created: Tue Sep 22 08:42:49 BST 1992
*
* (C) Copyright 1992, Hewlett-Packard Ltd.
** Licensed under the Apache License, Version 2.0 (the "License");
** you may not use this file except in compliance with the License.
** You may obtain a copy of the License at
** http://www.apache.org/licenses/LICENSE-2.0
** Unless required by applicable law or agreed to in writing, software
** distributed under the License is distributed on an "AS IS" BASIS,
** WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
** See the License for the specific language governing permissions and
** limitations under the License.
*
**********************************************************************/
#include "mfcpch.h"
#include <stdlib.h>
#ifdef __UNIX__
#include <assert.h>
#endif
#include "pageres.h"
#include "notdll.h"
ELISTIZE (BLOCK_RES)
CLISTIZE (BLOCK_RES) ELISTIZE (ROW_RES) ELISTIZE (WERD_RES)
/*************************************************************************
* PAGE_RES::PAGE_RES
*
* Constructor for page results
*************************************************************************/
PAGE_RES::PAGE_RES( //recursive construct
BLOCK_LIST *the_block_list //real page
) {
BLOCK_IT block_it(the_block_list);
BLOCK_RES_IT block_res_it(&block_res_list);
char_count = 0;
rej_count = 0;
rejected = FALSE;
for (block_it.mark_cycle_pt ();
!block_it.cycled_list (); block_it.forward ()) {
block_res_it.add_to_end (new BLOCK_RES (block_it.data ()));
}
}
/*************************************************************************
* BLOCK_RES::BLOCK_RES
*
* Constructor for BLOCK results
*************************************************************************/
BLOCK_RES::BLOCK_RES( //recursive construct
BLOCK *the_block //real BLOCK
) {
ROW_IT row_it (the_block->row_list ());
ROW_RES_IT row_res_it(&row_res_list);
char_count = 0;
rej_count = 0;
font_class = -1; //not assigned
x_height = -1.0;
font_assigned = FALSE;
bold = FALSE;
italic = FALSE;
row_count = 0;
block = the_block;
for (row_it.mark_cycle_pt (); !row_it.cycled_list (); row_it.forward ()) {
row_res_it.add_to_end (new ROW_RES (row_it.data ()));
}
}
/*************************************************************************
* ROW_RES::ROW_RES
*
* Constructor for ROW results
*************************************************************************/
ROW_RES::ROW_RES( //recursive construct
ROW *the_row //real ROW
) {
WERD_IT word_it (the_row->word_list ());
WERD_RES_IT word_res_it(&word_res_list);
WERD_RES *combo = NULL; //current combination of fuzzies
WERD_RES *word_res; //current word
WERD *copy_word;
char_count = 0;
rej_count = 0;
whole_word_rej_count = 0;
font_class = -1;
font_class_score = -1.0;
bold = FALSE;
italic = FALSE;
row = the_row;
for (word_it.mark_cycle_pt (); !word_it.cycled_list (); word_it.forward ()) {
word_res = new WERD_RES (word_it.data ());
if (word_res->word->flag (W_FUZZY_NON)) {
ASSERT_HOST (combo != NULL);
word_res->part_of_combo = TRUE;
combo->copy_on (word_res);
}
if (word_it.data_relative (1)->flag (W_FUZZY_NON)) {
if (combo == NULL) {
copy_word = new WERD;
//deep copy
*copy_word = *(word_it.data ());
combo = new WERD_RES (copy_word);
combo->combination = TRUE;
word_res_it.add_to_end (combo);
}
word_res->part_of_combo = TRUE;
}
else
combo = NULL;
word_res_it.add_to_end (word_res);
}
}
WERD_RES & WERD_RES::operator= ( //assign word_res
const WERD_RES & source //from this
) {
this->ELIST_LINK::operator= (source);
if (source.combination) {
word = new WERD;
*word = *(source.word); //deep copy
}
else
word = source.word; //pt to same word
if (source.outword != NULL) {
outword = new WERD;
*outword = *(source.outword);//deep copy
}
else
outword = NULL;
denorm = source.denorm;
if (source.best_choice != NULL) {
best_choice = new WERD_CHOICE;
*best_choice = *(source.best_choice);
raw_choice = new WERD_CHOICE;
*raw_choice = *(source.raw_choice);
}
else {
best_choice = NULL;
raw_choice = NULL;
}
if (source.ep_choice != NULL) {
ep_choice = new WERD_CHOICE;
*ep_choice = *(source.ep_choice);
}
else
ep_choice = NULL;
reject_map = source.reject_map;
tess_failed = source.tess_failed;
tess_accepted = source.tess_accepted;
tess_would_adapt = source.tess_would_adapt;
done = source.done;
unlv_crunch_mode = source.unlv_crunch_mode;
italic = source.italic;
bold = source.bold;
font1 = source.font1;
font1_count = source.font1_count;
font2 = source.font2;
font2_count = source.font2_count;
x_height = source.x_height;
caps_height = source.caps_height;
guessed_x_ht = source.guessed_x_ht;
guessed_caps_ht = source.guessed_caps_ht;
combination = source.combination;
part_of_combo = source.part_of_combo;
reject_spaces = source.reject_spaces;
return *this;
}
WERD_RES::~WERD_RES () {
if (combination)
delete word;
if (outword != NULL)
delete outword;
if (best_choice != NULL) {
delete best_choice;
delete raw_choice;
}
if (ep_choice != NULL) {
delete ep_choice;
}
}
/*************************************************************************
* PAGE_RES_IT::restart_page
*
* Set things up at the start of the page
*************************************************************************/
WERD_RES *PAGE_RES_IT::restart_page() {
block_res_it.set_to_list (&page_res->block_res_list);
block_res_it.mark_cycle_pt ();
block_res = NULL;
row_res = NULL;
word_res = NULL;
next_block_res = NULL;
next_row_res = NULL;
next_word_res = NULL;
internal_forward(TRUE);
return internal_forward (FALSE);
}
/*************************************************************************
* PAGE_RES_IT::internal_forward
*
* Find the next word on the page. Empty blocks and rows are skipped.
* The iterator maintains pointers to block, row and word for the previous,
* current and next words. These are correct, regardless of block/row
* boundaries. NULL values denote start and end of the page.
*************************************************************************/
WERD_RES *PAGE_RES_IT::internal_forward(BOOL8 new_block) {
BOOL8 found_next_word = FALSE;
BOOL8 new_row = FALSE;
prev_block_res = block_res;
prev_row_res = row_res;
prev_word_res = word_res;
block_res = next_block_res;
row_res = next_row_res;
word_res = next_word_res;
while (!found_next_word && !block_res_it.cycled_list ()) {
if (new_block) {
new_block = FALSE;
row_res_it.set_to_list (&block_res_it.data ()->row_res_list);
row_res_it.mark_cycle_pt ();
new_row = TRUE;
}
while (!found_next_word && !row_res_it.cycled_list ()) {
if (new_row) {
new_row = FALSE;
word_res_it.set_to_list (&row_res_it.data ()->word_res_list);
word_res_it.mark_cycle_pt ();
}
while (!found_next_word && !word_res_it.cycled_list ()) {
next_block_res = block_res_it.data ();
next_row_res = row_res_it.data ();
next_word_res = word_res_it.data ();
found_next_word = TRUE;
do {
word_res_it.forward ();
}
while (word_res_it.data ()->part_of_combo);
}
if (!found_next_word) { //end of row reached
row_res_it.forward ();
new_row = TRUE;
}
}
if (!found_next_word) { //end of block reached
block_res_it.forward ();
new_block = TRUE;
}
}
if (!found_next_word) { //end of page reached
next_block_res = NULL;
next_row_res = NULL;
next_word_res = NULL;
}
return word_res;
}
/*************************************************************************
* PAGE_RES_IT::forward_block
*
* Move to the first word of the next block
* Can be followed by subsequent calls to forward() BUT at the first word in
* the block, the prev block, row and word are all NULL.
*************************************************************************/
WERD_RES *PAGE_RES_IT::forward_block() {
if (block_res == next_block_res) {
block_res_it.forward ();;
block_res = NULL;
row_res = NULL;
word_res = NULL;
next_block_res = NULL;
next_row_res = NULL;
next_word_res = NULL;
internal_forward(TRUE);
}
return internal_forward (FALSE);
}
void PAGE_RES_IT::rej_stat_word() {
INT16 chars_in_word;
INT16 rejects_in_word = 0;
chars_in_word = word_res->reject_map.length ();
page_res->char_count += chars_in_word;
block_res->char_count += chars_in_word;
row_res->char_count += chars_in_word;
rejects_in_word = word_res->reject_map.reject_count ();
page_res->rej_count += rejects_in_word;
block_res->rej_count += rejects_in_word;
row_res->rej_count += rejects_in_word;
if (chars_in_word == rejects_in_word)
row_res->whole_word_rej_count += rejects_in_word;
}

Some files were not shown because too many files have changed in this diff Show More