pycantonese.characters_to_jyutping

pycantonese.characters_to_jyutping(chars: str | list[str]) → list[tuple[str, str | None]][source]

Convert Cantonese characters into Jyutping romanization.

The conversion model is based on the HKCanCor corpus and rime-cantonese data. Any unseen Cantonese character (or punctuation mark, for that matter) is represented by None in the output.

Parameters:: chars (str or list[str]) – A string of Cantonese characters, in which case word segmentation is also run on this input string (by segment()) in order to resolve potential ambiguity in mapping characters to Jyutping. If you don’t want word segmentation to be done, then provide a list of strings instead with your desired segmentation.
Returns:: A list of segmented words, where each word is a 2-tuple of (Cantonese characters, Jyutping romanization). Within the Jyutping string, syllables are separated by a single space.
Return type:: list[tuple[str, str | None]]

Examples

>>> characters_to_jyutping("香港人講廣東話。")  # Hongkongers speak Cantonese.
[('香港人', 'hoeng1 gong2 jan4'), ('講', 'gong2'), ('廣東話', 'gwong2 dung1 waa2'), ('。', None)]