MTL Toolbox: Difference between revisions
Jump to navigation
Jump to search
m (→Typical usage: il) |
|||
(46 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
'''MTL Toolbox''' (https://learntaiwanese.org/MTLtoolbox/about.html) | '''MTL Toolbox''' (https://learntaiwanese.org/MTLtoolbox/about.html): Modern Taiwanese Language Toolbox. Software and data to help people use written Taiwanese in [[Modern Literal Taiwanese]] (MLT) and other Latin-script writing systems. | ||
== Features == | == Features == | ||
* six Taiwanese dictionaries spanning from [[Taioaan Jidpurn-sitai|Japanese era]] to present day | * six Taiwanese dictionaries spanning from [[Taioaan Jidpurn-sitai|Japanese era]] to present day | ||
* full-text search engine accepts written Taiwanese as well as English, and ''[[Harnji]]'' | * full-text search engine accepts written Taiwanese as well as English, and ''[[Harnji]]'' | ||
* audio from | * audio from government-compiled dictionary: [[DFT]] | ||
* | * basic [[text segmentation]] (including "unjoining" into syllables) and "bag-of-syllables" search | ||
* ''Seven Tones'' soundboard: [[table of all MLT finals]] with examples | * ''Seven Tones'' soundboard: [[table of all MLT finals]] with examples | ||
== How to search == | == How to search == | ||
* | We describe how to use "Taiwanese–English dictionaries: MLT segmenter & full-text search". This interface is mainly for Taiwanese words written in MLT, which we refer to as "M-style" written Taiwanese. After entering M-input, press "Zhøe" to run the segmenter and search. | ||
=== Typical usage === | |||
* Input: Taiwanese word (typically disyllable: two syllables joined by [[tone sandhi]]) | |||
** Example: køefcie (copy and paste into this link {{x|}}. you may substitute 0 for [[ø]]: {{x|k0efcie}}) | |||
* Press return or tap "Zhøe" (means "search") | |||
** your input is "unjoined" (original syllables found by database lookup) | |||
*** in this example, the original syllables are: {{x|køea}} and {{x|cie}} | |||
** search is done using original syllables (unordered collection of syllables, or "bag-of-syllables") | |||
** confirm the results are the same as for input: (except for HTB which is not unjoined) | |||
* Try more examples: | |||
** {{x|chviafmng}} | |||
** {{x|tøsia}} | |||
** {{x|Taioaan}} | |||
=== Monosyllable === | |||
* for a [[monosyllable]], exact matches are displayed by default, for example ([[Practical Taiwanese Conversation|PTC]]): | |||
** {{x|goar}} | ** {{x|goar}} | ||
** {{x|lie}} | ** {{x|lie}} | ||
** {{x|ee}} | ** {{x|ee}} | ||
* | * if the syllable is a DFT monosyllable, a navigation bar displays adjacent DFT monosyllables in alphabetical order | ||
* to see | * due to high number of matches, "monosyllable mode" returns monosyllable search results. To see all matching results, click "Khahzøe" | ||
* | === Other fields === | ||
* The "en" button is used to direct the search to the English field (en). Harnji (hj) can also be input, although we do not attempt Chinese text segmentation. | |||
== Data == | == Data == | ||
Local copies of: | Local copies of: | ||
* HTB: ''[[Hiexntai-buun Dictionary]]'' | * HTB: ''[[Hiexntai-buun Dictionary]]'' | ||
* DFT: ''[[Dictionary of Frequently-Used | * DFT: ''[[Dictionary of Frequently-Used Taiwanese Taigi]]'' (in [[TL]]. We added MLT annotations and annotated over 5800 definitions in English for monosyllables) | ||
* MK: ''[[Maryknoll Taiwanese-English Dictionary]]'' ( | * MK: ''[[Maryknoll Taiwanese-English Dictionary]]'' (in [[POJ]]. We added MLT annotations) | ||
* EDUTECH: [[Liim Keahioong]] (2001-2003) ''EDUTECH: Taiwanese-English Dictionary Searched with Concise Atonal Spelling'' | * EDUTECH: [[Liim Keahioong]] (2001-2003) ''EDUTECH: Taiwanese-English Dictionary Searched with Concise Atonal Spelling'' (in [[MLT]] with [[Talk:Øe|unified spellings]] (øe)) | ||
* [[Bernard L.M. Embree|Embree, Bernard L. M.]] (1973). ''[[A Dictionary of Southern Min]]: based on current usage in Taiwan and checked against the earlier works of Carstairs Douglas, Thomas Barclay, and Ernest Tipson''. Hong Kong: Hong Kong Language Institute. | * [[Bernard L.M. Embree|Embree, Bernard L. M.]] (1973). ''[[A Dictionary of Southern Min]]: based on current usage in Taiwan and checked against the earlier works of Carstairs Douglas, Thomas Barclay, and Ernest Tipson''. Hong Kong: Hong Kong Language Institute. (in POJ. We added MLT annotations) | ||
* | * TDJ: ''[[Tai-Nichi Daijiten]]'' (original 1931 & 1932, in [[Taioaan-guo kana|Taiwanese kana]]. Lim08 version: definitions translated into Taiwanese (Han-Romanization mixed script - POJ). We added MLT annotations) | ||
We also support searching other websites with conversion to POJ/TL: | We also support searching other websites with conversion to POJ/TL: | ||
* Lim (2019): | * Lim (2019): updated version of TDJ-Lim08 above | ||
* ''[[ | * ''[[Taiwanese - Chinese Dictionary]]'' (currently not open to public) | ||
== Technical | == Technical notes == | ||
* [[SQLite]]: [https://sqlite.org/fts3.html FTS4] for full-text search | * [[SQLite]]: [https://sqlite.org/fts3.html FTS4] for full-text search | ||
* Token prefix queries: use the asterisk ('*') at the end. Similar to | * Token prefix queries: use the asterisk ('*') at the end. Similar to {{w|wildcard character}} in [[zokgiap hexthorng|operating systems]] (normal wildcard search not currently supported by FTS) | ||
** Example: {{ | ** Example: {{x|Taioa*}}, {{x|臺*}} | ||
* Specify a column-name followed by a colon (':') | * Specify a column-name followed by a colon (':') | ||
** Example: {{ | ** Example: {{x|hj:頭*}} (returns entries where Taiwanese written with Harnji begins with character for [[thaau]]) | ||
* Add carat ^ before token to require token to be very first token in its column | |||
** Example: {{x|^thaau}} | |||
* [[Ø]] is not folded to lower case by the tokenizer | |||
== See | == See also == | ||
* [[Taiwanese-English dictionaries]] | * [[Taiwanese-English dictionaries]] | ||
Line 52: | Line 66: | ||
* The MTL Toolbox uses data from the ''[[Maryknoll Taiwanese-English Dictionary]]'', which was generously released to the public under a [https://creativecommons.org/licenses/by-nc-sa/3.0/ Creative Commons Attribution-NonCommercial-ShareAlike 3.0 License]. | * The MTL Toolbox uses data from the ''[[Maryknoll Taiwanese-English Dictionary]]'', which was generously released to the public under a [https://creativecommons.org/licenses/by-nc-sa/3.0/ Creative Commons Attribution-NonCommercial-ShareAlike 3.0 License]. | ||
[[Category: | [[Category:Peqoexji]] | ||
[[Category: | [[Category:Modern Literal Taiwanese]] |
Latest revision as of 08:38, 15 December 2024
MTL Toolbox (https://learntaiwanese.org/MTLtoolbox/about.html): Modern Taiwanese Language Toolbox. Software and data to help people use written Taiwanese in Modern Literal Taiwanese (MLT) and other Latin-script writing systems.
Features
- six Taiwanese dictionaries spanning from Japanese era to present day
- full-text search engine accepts written Taiwanese as well as English, and Harnji
- audio from government-compiled dictionary: DFT
- basic text segmentation (including "unjoining" into syllables) and "bag-of-syllables" search
- Seven Tones soundboard: table of all MLT finals with examples
How to search
We describe how to use "Taiwanese–English dictionaries: MLT segmenter & full-text search". This interface is mainly for Taiwanese words written in MLT, which we refer to as "M-style" written Taiwanese. After entering M-input, press "Zhøe" to run the segmenter and search.
Typical usage
- Input: Taiwanese word (typically disyllable: two syllables joined by tone sandhi)
- Press return or tap "Zhøe" (means "search")
- your input is "unjoined" (original syllables found by database lookup)
- search is done using original syllables (unordered collection of syllables, or "bag-of-syllables")
- confirm the results are the same as for input: (except for HTB which is not unjoined)
Monosyllable
- for a monosyllable, exact matches are displayed by default, for example (PTC):
- if the syllable is a DFT monosyllable, a navigation bar displays adjacent DFT monosyllables in alphabetical order
- due to high number of matches, "monosyllable mode" returns monosyllable search results. To see all matching results, click "Khahzøe"
Other fields
- The "en" button is used to direct the search to the English field (en). Harnji (hj) can also be input, although we do not attempt Chinese text segmentation.
Data
Local copies of:
- HTB: Hiexntai-buun Dictionary
- DFT: Dictionary of Frequently-Used Taiwanese Taigi (in TL. We added MLT annotations and annotated over 5800 definitions in English for monosyllables)
- MK: Maryknoll Taiwanese-English Dictionary (in POJ. We added MLT annotations)
- EDUTECH: Liim Keahioong (2001-2003) EDUTECH: Taiwanese-English Dictionary Searched with Concise Atonal Spelling (in MLT with unified spellings (øe))
- Embree, Bernard L. M. (1973). A Dictionary of Southern Min: based on current usage in Taiwan and checked against the earlier works of Carstairs Douglas, Thomas Barclay, and Ernest Tipson. Hong Kong: Hong Kong Language Institute. (in POJ. We added MLT annotations)
- TDJ: Tai-Nichi Daijiten (original 1931 & 1932, in Taiwanese kana. Lim08 version: definitions translated into Taiwanese (Han-Romanization mixed script - POJ). We added MLT annotations)
We also support searching other websites with conversion to POJ/TL:
- Lim (2019): updated version of TDJ-Lim08 above
- Taiwanese - Chinese Dictionary (currently not open to public)
Technical notes
- SQLite: FTS4 for full-text search
- Token prefix queries: use the asterisk ('*') at the end. Similar to wildcard character in operating systems (normal wildcard search not currently supported by FTS)
- Specify a column-name followed by a colon (':')
- Add carat ^ before token to require token to be very first token in its column
- Example: ^thaau
- Ø is not folded to lower case by the tokenizer
See also
Acknowledgements
- The MTL Toolbox uses data from the Maryknoll Taiwanese-English Dictionary, which was generously released to the public under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 License.