MTL Toolbox

MTL Toolbox (https://learntaiwanese.org/MTLtoolbox/about.html): Modern Taiwanese Language Toolbox. Software and data to help people use written Taiwanese in Modern Literal Taiwanese (MLT) and other Latin-script writing systems.

Features

six Taiwanese dictionaries spanning from Japanese era to present day
full-text search engine accepts written Taiwanese as well as English, and Harnji
audio from government-compiled dictionary: DFT
basic text segmentation (including "unjoining" into syllables) and "bag-of-syllables" search
Seven Tones soundboard: table of all MLT finals with examples

How to search using the segmenter

We describe how to use "Taiwanese–English dictionaries: MLT segmenter & full-text search". This interface is mainly for Taiwanese words written in MLT, which we refer to as "M-style" written Taiwanese. After entering M-input, press "Zhøe" to run the segmenter and search. Results from HTB and DFT are displayed, and results from other dictionaries are summarized and linked to.

Typical usage

Input: Taiwanese word (typically disyllable: two syllables joined by tone sandhi)
- Example: køefcie (copy and paste into this link [1]. you may substitute 0 for ø: k0efcie)
Press return or tap "Zhøe" (means "search")
- your input is "unjoined" (original syllables found by database lookup)
  - in this example, the original syllables are: køea and cie
- search is done using original syllables (unordered collection of syllables, or "bag-of-syllables" (BOS))
- confirm the results are the same as for input: (except for HTB which is not unjoined)

Try more examples:
- chviafmng
- tøsia
- Taioaan

Monosyllable

for a monosyllable, exact matches are displayed by default, for example (PTC):
- goar
- lie
- ee

if the syllable is a DFT monosyllable, a navigation bar displays adjacent DFT monosyllables in alphabetical order
due to high number of matches, "monosyllable mode" returns monosyllable search results. To see all matching results, click "Khahzøe"

Other fields

The "en" button is used to direct the search to the English field (en). Harnji (hj) can also be input, which can be useful with DFT. Otherwise, we do not attempt Chinese text segmentation, which is non-trivial (see Chinese word-segmented writing).

How to search the dictionary set (without segmenter)

Go to [2], use the checkboxes to select which dictionaries to search, and input search terms to define your search. Typical inputs include English terms, M-style syllables (original without tone sandhi), and the number of syllables. Feel free to try any other terms that would help narrow down your search. If you need to specify a column for a term, follow the column-name by a ":" character, then the term. For example, if the term is "too" and want to match only the English column, type en:too. But if you want to match only M-style syllables, type u:too. See #Technical notes for more details.

Data

Local copies of:

HTB: Hiexntai-buun Dictionary
DFT: Dictionary of Frequently-Used Taiwanese Taigi (in TL. We added MLT annotations and annotated over 5800 definitions in English for monosyllables)
MK: Maryknoll Taiwanese–English Dictionary (in POJ. We added MLT annotations)
EDUTECH: Liim Keahioong (2001-2003) EDUTECH: Taiwanese-English Dictionary Searched with Concise Atonal Spelling (in MLT with unified spellings (øe))
Embree, Bernard L. M. (1973). A Dictionary of Southern Min: based on current usage in Taiwan and checked against the earlier works of Carstairs Douglas, Thomas Barclay, and Ernest Tipson. Hong Kong: Hong Kong Language Institute. (in POJ. We added MLT annotations)
TDJ: Tai-Nichi Daijiten (original 1931 & 1932, in Taiwanese kana. Lim08 version: definitions translated into Taiwanese (Han-Romanization mixed script - POJ). We added MLT annotations)

The M-fields we present in DFT and MK may be machine-generated ("auto-joined") and may not represent the common or recommended spelling.

We also support searching other websites with conversion to POJ/TL:

Lim (2019): updated version of TDJ-Lim08 above
Taiwanese - Chinese Dictionary (currently not open to public)

Technical notes

Our full-text search is provided by the SQLite FTS4 extension. We currently use the Standard Query Syntax. One of the three basic query types supported by FTS tables is "token or token prefix queries":

Specify a token prefix by appending an asterisk ('*') to the prefix. (While similar to wildcard character in operating systems, wildcard search is not currently supported by FTS)
- Example: Taioa*, 臺*
Specify a column-name followed by a colon (':')
- Example: hj:頭* (returns entries where Taiwanese written with Harnji begins with character for thaau)
Prefix the token with carat ('^') to require token to be very first token in its column
- Example: ^thaau

Tokenizer: the default tokenizer ("simple") is used. It only does case folding of ASCII characters, so Ø is not folded to lower case.

Acknowledgements

The MTL Toolbox uses data from the Maryknoll Taiwanese–English Dictionary, which was generously released to the public under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 License.