pycantonese.characters_to_jyutping

pycantonese.characters_to_jyutping(chars: str | list[str]) → list[tuple[str, str]][source]

Convert Cantonese characters into Jyutping romanization.

The conversion model is based on the HKCanCor corpus and rime-cantonese data. Any unseen Cantonese character (or punctuation mark, for that matter) is represented by None in the output.

Parameters:: chars (str or list[str]) – A string of Cantonese characters, in which case word segmentation is also run on this input string (by segment()) in order to resolve potential ambiguity in mapping characters to Jyutping. If you don’t want word segmentation to be done, then provide a list of strings instead with your desired segmentation.
Returns:: A list of segmented words, where each word is a 2-tuple of (Cantonese characters, Jyutping romanization).
Return type:: list[tuple[str, str]]

Examples

>>> characters_to_jyutping("香港人講廣東話。")  # Hongkongers speak Cantonese.
[('香港人', 'hoeng1gong2jan4'), ('講', 'gong2'), ('廣東話', 'gwong2dung1waa2'), ('。', None)]