Skip to content

Commit 40f4311

Browse files
Shreeshriizdenop
authored andcommitted
Add list of scripts to manpage for tesseract (#1347)
1 parent bb89dc3 commit 40f4311

File tree

1 file changed

+49
-0
lines changed

1 file changed

+49
-0
lines changed

doc/tesseract.1.asc

Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -244,6 +244,55 @@ To use a non-standard language pack named *foo.traineddata*, set the
244244
*TESSDATA_PREFIX*/tessdata/*foo*.traineddata and give Tesseract the
245245
argument '-l foo'.
246246

247+
SCRIPTS
248+
-------
249+
250+
The traineddata files for the following scripts for tesseract 4.00
251+
are also in https://github.com/tesseract-ocr/tessdata_fast.
252+
253+
In most cases, each of these contains all the languages that use that script PLUS English.
254+
So it is possible to recognize a language that has not been specifically trained for
255+
by using traineddata for the script it is written in.
256+
257+
Arabic,
258+
Armenian,
259+
Bengali,
260+
Canadian Aboriginal,
261+
Cherokee,
262+
Cyrillic,
263+
Devanagari,
264+
Ethiopic,
265+
Fraktur,
266+
Georgian,
267+
Greek,
268+
Gujarati,
269+
Gurmukhi,
270+
Han - Simplified,
271+
Han - Simplified (vertical),
272+
Han - Traditional,
273+
Han - Traditional (vertical),
274+
Hangul,
275+
Hangul (vertical),
276+
Hebrew,
277+
Japanese,
278+
Japanese (vertical),
279+
Kannada,
280+
Khmer,
281+
Lao,
282+
Latin,
283+
Malayalam,
284+
Myanmar,
285+
Oriya (Odia),
286+
Sinhala,
287+
Syriac,
288+
Tamil,
289+
Telugu,
290+
Thaana,
291+
Thai,
292+
Tibetan,
293+
Vietnamese.
294+
295+
247296
CONFIG FILES AND AUGMENTING WITH USER DATA
248297
------------------------------------------
249298

0 commit comments

Comments
 (0)