MTL Toolbox

From Taioaan Wiki
Revision as of 10:21, 27 September 2024 by LearnTaiwanese (talk | contribs) (→‎How to search)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

MTL Toolbox (https://learntaiwanese.org/MTLtoolbox/about.html) is software and data to help work with written Taiwanese using the Modern Taiwanese Language (MTL) writing system and other romanizations for Taiwanese.

Features

  • six Taiwanese dictionaries spanning from Japanese era to present day
  • full-text search engine accepts written Taiwanese as well as English, and Harnji
  • audio from government-compiled dictionary: DFT
  • word unjoiner to aid learning and searching at syllable level
  • Seven Tones soundboard: table of all MLT finals with examples

How to search

We describe how to use "Taiwanese–English dictionaries full-text search". This interface is mainly for MLT or MTL input. The "en" button is used to direct the search to the English field. Harnji can also be input, although we do not attempt Chinese text segmentation.

Typical usage

  • input: Taiwanese word (often consists of two syllables joined together), for example:
  • the Toolbox "unjoins" words (syllable segmentation) by database lookup
  • then it performs search using unordered collection of syllables (bag-of-syllables)
  • the results from historical works should be about the same as for:

Monosyllable

  • "Monosyllable mode" normally allows only monosyllable results. To see more entries with this syllable, click "Khahzøe"
  • if the syllable is a DFT monosyllable, a navigation bar displays adjacent DFT monosyllables in alphabetical order

Data

Local copies of:

We also support searching other websites with conversion to POJ/TL:

Technical notes

  • SQLite: FTS4 for full-text search
  • Token prefix queries: use the asterisk ('*') at the end. Similar to wildcard character in operating systems (normal wildcard search not currently supported by FTS)
  • Specify a column-name followed by a colon (':')
    • Example: thj:頭* (returns entries where Taiwanese written with Harnji begins with character for thaau)
  • Add carat ^ before token to require token to be very first token in its column

See also

Acknowledgements