MTL Toolbox: Difference between revisions

From Taioaan Wiki
Jump to navigation Jump to search
Line 12: Line 12:


=== Typical usage ===
=== Typical usage ===
* input: Taiwanese word (often consists of two syllables [[tone sandhi|joined together]]), for example:
* Input: Taiwanese word (typically disyllable: two syllables joined by [[tone sandhi]])
** {{x|køefcie}}
** Example: køefcie (copy and paste into this link {{x|}})
* the Toolbox "unjoins" your input (syllable segmentation by database lookup)
* Press return or tap "Zhøe" (means "search")
* then it performs search using unordered collection of syllables (bag-of-syllables)
** your input is "unjoined" (original syllables found by database lookup)
* the results from historical works should be about the same as for:
*** in this example, the original syllables are: {{x|køea}} and {{x|cie}}
** {{x|køea cie}}
** search is done using original syllables (unordered collection of syllables, or "bag-of-syllables")
** confirm the results are the same as for input: (except for Htb which is not unjoined)


* more examples:
* Try more examples:
** {{x|chviafmng}}
** {{x|chviafmng}}
** {{x|tøsia}}
** {{x|tøsia}}

Revision as of 16:25, 25 November 2024

MTL Toolbox (https://learntaiwanese.org/MTLtoolbox/about.html) is software and data to help work with written Taiwanese using the Modern Taiwanese Language (MTL) writing system and other romanizations for Taiwanese.

Features

  • six Taiwanese dictionaries spanning from Japanese era to present day
  • full-text search engine accepts written Taiwanese as well as English, and Harnji
  • audio from government-compiled dictionary: DFT
  • basic text segmentation (including "unjoining" into syllables) and "bag-of-syllables" search
  • Seven Tones soundboard: table of all MLT finals with examples

How to search

We describe how to use "Taiwanese–English dictionaries: segmenter & full-text search". This interface is mainly for Taiwanese words written in MLT or MTL, which we refer to as "M-style" written Taiwanese. After entering M-input, press "Zhøe" to run the segmenter and search.

Typical usage

  • Input: Taiwanese word (typically disyllable: two syllables joined by tone sandhi)
    • Example: køefcie (copy and paste into this link [1])
  • Press return or tap "Zhøe" (means "search")
    • your input is "unjoined" (original syllables found by database lookup)
      • in this example, the original syllables are: køea and cie
    • search is done using original syllables (unordered collection of syllables, or "bag-of-syllables")
    • confirm the results are the same as for input: (except for Htb which is not unjoined)

Monosyllable

  • if the syllable is a DFT monosyllable, a navigation bar displays adjacent DFT monosyllables in alphabetical order
  • due to high number of matches, "monosyllable mode" returns monosyllable search results. To see all matching results, click "Khahzøe"

Other fields

  • The "en" button is used to direct the search to the English field (en). Harnji (hj) can also be input, although we do not attempt Chinese text segmentation.

Data

Local copies of:

We also support searching other websites with conversion to POJ/TL:

Technical notes

  • SQLite: FTS4 for full-text search
  • Token prefix queries: use the asterisk ('*') at the end. Similar to wildcard character in operating systems (normal wildcard search not currently supported by FTS)
  • Specify a column-name followed by a colon (':')
    • Example: thj:頭* (returns entries where Taiwanese written with Harnji begins with character for thaau)
  • Add carat ^ before token to require token to be very first token in its column

See also

Acknowledgements