pycantonese.segment
- pycantonese.segment(unsegmented: str) list[str][source]
Segment the unsegmented input.
The word segmentation model is a Jieba-styled DAG+HMM hybrid segmenter, trained by HKCanCor, rime-cantonese, Common Voice Cantonese, and Cantonese-Traditional Chinese Parallel Corpus.
- Parameters:
- unsegmentedstr
Unsegmented input.
- Returns:
- list[str]
Examples
>>> segment("廣東話容唔容易學?") # "Is Cantonese easy to learn?" ['廣東話', '容', '唔', '容', '易', '學', '?']