tesseract/doc/lstmtraining.1.asc

LSTMTRAINING(1)
===============
:doctype: manpage

NAME
----
lstmtraining - Training program for LSTM-based networks.

SYNOPSIS
--------
*lstmtraining*
  --continue_from  'train_output_dir/continue_from_lang.lstm'
  --old_traineddata 'bestdata_dir/continue_from_lang.traineddata'
  --traineddata   'train_output_dir/lang/lang.traineddata'
  --max_iterations 'NNN'
  --debug_interval '0|-1'
  --train_listfile 'train_output_dir/lang.training_files.txt'
  --model_output  'train_output_dir/newlstmmodel'

DESCRIPTION
-----------
lstmtraining(1)  trains LSTM-based networks using a list of lstmf files and starter traineddata file as the main input. Training from scratch is not recommended to be done by users. Finetuning (example command shown in synopsis above) or replacing a layer options can be used instead. Different options apply to different types of training.
Read the [training documentation](https://tesseract-ocr.github.io/tessdoc/TrainingTesseract-4.00.html) for details.

OPTIONS
-------

'--debug_interval  '::
  How often to display the alignment.  (type:int default:0)

'--net_mode  '::
  Controls network behavior.  (type:int default:192)

'--perfect_sample_delay  '::
  How many imperfect samples between perfect ones.  (type:int default:0)

'--max_image_MB  '::
  Max memory to use for images.  (type:int default:6000)

'--append_index  '::
  Index in continue_from Network at which to attach the new network defined by net_spec  (type:int default:-1)

'--max_iterations  '::
  If set, exit after this many iterations. A negative value is interpreted as epochs, 0 means infinite iterations.  (type:int default:0)

'--target_error_rate  '::
  Final error rate in percent.  (type:double default:0.01)

'--weight_range  '::
  Range of initial random weights.  (type:double default:0.1)

'--learning_rate  '::
  Weight factor for new deltas.  (type:double default:0.001)

'--momentum  '::
  Decay factor for repeating deltas.  (type:double default:0.5)

'--adam_beta  '::
  Decay factor for repeating deltas.  (type:double default:0.999)

'--stop_training  '::
  Just convert the training model to a runtime model.  (type:bool default:false)

'--convert_to_int  '::
  Convert the recognition model to an integer model.  (type:bool default:false)

'--sequential_training  '::
  Use the training files sequentially instead of round-robin.  (type:bool default:false)

'--debug_network  '::
  Get info on distribution of weight values  (type:bool default:false)

'--randomly_rotate  '::
  Train OSD and randomly turn training samples upside-down  (type:bool default:false)

'--net_spec  '::
  Network specification  (type:string default:)

'--continue_from  '::
  Existing model to extend  (type:string default:)

'--model_output  '::
  Basename for output models  (type:string default:lstmtrain)

'--train_listfile  '::
  File listing training files in lstmf training format.  (type:string default:)

'--eval_listfile  '::
  File listing eval files in lstmf training format.  (type:string default:)

'--traineddata  '::
  Starter traineddata with combined Dawgs/Unicharset/Recoder for language model  (type:string default:)

'--old_traineddata  '::
  When changing the character set, this specifies the traineddata with the old character set that is to be replaced  (type:string default:)

HISTORY
-------
lstmtraining(1) was first made available for tesseract4.00.00alpha.

RESOURCES
---------
Main web site: <https://github.com/tesseract-ocr> +
Information on training tesseract LSTM: <https://tesseract-ocr.github.io/tessdoc/TrainingTesseract-4.00.html>

SEE ALSO
--------
tesseract(1)

COPYING
-------
Copyright \(C) 2012 Google, Inc.
Licensed under the Apache License, Version 2.0

AUTHOR
------
The Tesseract OCR engine was written by Ray Smith and his research groups
at Hewlett Packard (1985-1995) and Google (2006-present).
Manpages (#1378) * Add missing man pages * Update lstmeval.1.asc * Update combine_lang_model.1.asc * Update lstmtraining.1.asc * Update merge_unicharsets.1.asc * Update set_unicharset_properties.1.asc * Update text2image.1.asc * Update text2image.1.asc * Update combine_lang_model.1.asc 2018-03-13 02:08:16 +08:00			`LSTMTRAINING(1)`
			`===============`
			`:doctype: manpage`

			`NAME`
			`----`
			`lstmtraining - Training program for LSTM-based networks.`

			`SYNOPSIS`
			`--------`
Fix whitespace issues * Remove whitespace (blanks, tabs, cr) at line endings Signed-off-by: Stefan Weil <sw@weilnetz.de> 2017-09-17 00:47:04 +08:00			`lstmtraining`
Manpages (#1378) * Add missing man pages * Update lstmeval.1.asc * Update combine_lang_model.1.asc * Update lstmtraining.1.asc * Update merge_unicharsets.1.asc * Update set_unicharset_properties.1.asc * Update text2image.1.asc * Update text2image.1.asc * Update combine_lang_model.1.asc 2018-03-13 02:08:16 +08:00			`--continue_from 'train_output_dir/continue_from_lang.lstm'`
Fix whitespace issues * Remove whitespace (blanks, tabs, cr) at line endings Signed-off-by: Stefan Weil <sw@weilnetz.de> 2017-09-17 00:47:04 +08:00			`--old_traineddata 'bestdata_dir/continue_from_lang.traineddata'`
			`--traineddata 'train_output_dir/lang/lang.traineddata'`
			`--max_iterations 'NNN'`
			`--debug_interval '0\|-1'`
Manpages (#1378) * Add missing man pages * Update lstmeval.1.asc * Update combine_lang_model.1.asc * Update lstmtraining.1.asc * Update merge_unicharsets.1.asc * Update set_unicharset_properties.1.asc * Update text2image.1.asc * Update text2image.1.asc * Update combine_lang_model.1.asc 2018-03-13 02:08:16 +08:00			`--train_listfile 'train_output_dir/lang.training_files.txt'`
			`--model_output 'train_output_dir/newlstmmodel'`

			`DESCRIPTION`
			`-----------`
Replace references to the old wiki by new URLs Signed-off-by: Stefan Weil <sw@weilnetz.de> 2020-02-03 18:37:41 +08:00			`lstmtraining(1) trains LSTM-based networks using a list of lstmf files and starter traineddata file as the main input. Training from scratch is not recommended to be done by users. Finetuning (example command shown in synopsis above) or replacing a layer options can be used instead. Different options apply to different types of training.`
			`Read the [training documentation](https://tesseract-ocr.github.io/tessdoc/TrainingTesseract-4.00.html) for details.`
Fix whitespace issues * Remove whitespace (blanks, tabs, cr) at line endings Signed-off-by: Stefan Weil <sw@weilnetz.de> 2017-09-17 00:47:04 +08:00
Manpages (#1378) * Add missing man pages * Update lstmeval.1.asc * Update combine_lang_model.1.asc * Update lstmtraining.1.asc * Update merge_unicharsets.1.asc * Update set_unicharset_properties.1.asc * Update text2image.1.asc * Update text2image.1.asc * Update combine_lang_model.1.asc 2018-03-13 02:08:16 +08:00			`OPTIONS`
			`-------`

			`'--debug_interval '::`
			`How often to display the alignment. (type:int default:0)`

			`'--net_mode '::`
			`Controls network behavior. (type:int default:192)`

			`'--perfect_sample_delay '::`
			`How many imperfect samples between perfect ones. (type:int default:0)`

			`'--max_image_MB '::`
			`Max memory to use for images. (type:int default:6000)`

			`'--append_index '::`
			`Index in continue_from Network at which to attach the new network defined by net_spec (type:int default:-1)`

			`'--max_iterations '::`
lstmtraining: Interpret negative value for --max_iterations as epochs Signed-off-by: Stefan Weil <sw@weilnetz.de> 2021-01-15 02:50:51 +08:00			`If set, exit after this many iterations. A negative value is interpreted as epochs, 0 means infinite iterations. (type:int default:0)`
Manpages (#1378) * Add missing man pages * Update lstmeval.1.asc * Update combine_lang_model.1.asc * Update lstmtraining.1.asc * Update merge_unicharsets.1.asc * Update set_unicharset_properties.1.asc * Update text2image.1.asc * Update text2image.1.asc * Update combine_lang_model.1.asc 2018-03-13 02:08:16 +08:00
			`'--target_error_rate '::`
			`Final error rate in percent. (type:double default:0.01)`

			`'--weight_range '::`
			`Range of initial random weights. (type:double default:0.1)`

			`'--learning_rate '::`
			`Weight factor for new deltas. (type:double default:0.001)`

			`'--momentum '::`
			`Decay factor for repeating deltas. (type:double default:0.5)`

			`'--adam_beta '::`
			`Decay factor for repeating deltas. (type:double default:0.999)`

			`'--stop_training '::`
			`Just convert the training model to a runtime model. (type:bool default:false)`

			`'--convert_to_int '::`
			`Convert the recognition model to an integer model. (type:bool default:false)`

			`'--sequential_training '::`
			`Use the training files sequentially instead of round-robin. (type:bool default:false)`

			`'--debug_network '::`
			`Get info on distribution of weight values (type:bool default:false)`

			`'--randomly_rotate '::`
			`Train OSD and randomly turn training samples upside-down (type:bool default:false)`

			`'--net_spec '::`
			`Network specification (type:string default:)`

			`'--continue_from '::`
			`Existing model to extend (type:string default:)`

			`'--model_output '::`
			`Basename for output models (type:string default:lstmtrain)`

			`'--train_listfile '::`
			`File listing training files in lstmf training format. (type:string default:)`

			`'--eval_listfile '::`
			`File listing eval files in lstmf training format. (type:string default:)`

			`'--traineddata '::`
			`Starter traineddata with combined Dawgs/Unicharset/Recoder for language model (type:string default:)`

			`'--old_traineddata '::`
			`When changing the character set, this specifies the traineddata with the old character set that is to be replaced (type:string default:)`

			`HISTORY`
			`-------`
Fix whitespace issues * Remove whitespace (blanks, tabs, cr) at line endings Signed-off-by: Stefan Weil <sw@weilnetz.de> 2017-09-17 00:47:04 +08:00			`lstmtraining(1) was first made available for tesseract4.00.00alpha.`
Manpages (#1378) * Add missing man pages * Update lstmeval.1.asc * Update combine_lang_model.1.asc * Update lstmtraining.1.asc * Update merge_unicharsets.1.asc * Update set_unicharset_properties.1.asc * Update text2image.1.asc * Update text2image.1.asc * Update combine_lang_model.1.asc 2018-03-13 02:08:16 +08:00
			`RESOURCES`
			`---------`
			`Main web site: <https://github.com/tesseract-ocr> +`
Replace references to the old wiki by new URLs Signed-off-by: Stefan Weil <sw@weilnetz.de> 2020-02-03 18:37:41 +08:00			`Information on training tesseract LSTM: <https://tesseract-ocr.github.io/tessdoc/TrainingTesseract-4.00.html>`
Fix whitespace issues * Remove whitespace (blanks, tabs, cr) at line endings Signed-off-by: Stefan Weil <sw@weilnetz.de> 2017-09-17 00:47:04 +08:00
Manpages (#1378) * Add missing man pages * Update lstmeval.1.asc * Update combine_lang_model.1.asc * Update lstmtraining.1.asc * Update merge_unicharsets.1.asc * Update set_unicharset_properties.1.asc * Update text2image.1.asc * Update text2image.1.asc * Update combine_lang_model.1.asc 2018-03-13 02:08:16 +08:00			`SEE ALSO`
			`--------`
			`tesseract(1)`

			`COPYING`
			`-------`
			`Copyright \(C) 2012 Google, Inc.`
			`Licensed under the Apache License, Version 2.0`

			`AUTHOR`
			`------`
			`The Tesseract OCR engine was written by Ray Smith and his research groups`
			`at Hewlett Packard (1985-1995) and Google (2006-present).`