PyCantonese Logo
  • Quickstart
    • Using PyCantonese in JavaScript
  • Corpus Data
    • CHAT Format
    • Built-in Data
      • HKCanCor
      • CantoMap
    • CHILDES and TalkBank Data
    • Custom Data
  • Corpus Reader Methods
    • Headers
    • Transcriptions and Annotations
      • Jyutping Romanization
      • Chinese Characters
    • Word Ngrams
  • Corpus Search Queries
    • Searching by a Jyutping Element
    • Searching by a Chinese Character
    • Searching by a Part-of-speech Tag
    • Searching by a Word or Utterance Range
    • Searching by Multiple Criteria
    • Output Format of Search Results
    • Complex Searches
  • Parsing Cantonese Text
    • Input 1: A Plain String
    • Input 2: A List of Strings
    • Input 3: A List of Tuples of Strings
    • Customizing Part-of-Speech Tagging
    • Outputting CHAT Data
    • More Customization
  • Grapheme-to-Phoneme Conversion
  • Jyutping Romanization
    • Characters-to-Jyutping Conversion
    • Parsing Jyutping Strings
    • Jyutping-to-IPA Conversion
    • Grapheme-to-Phoneme Conversion
    • Jyutping-to-Yale Conversion
    • Yale-to-Jyutping Conversion
    • Jyutping-to-TIPA Conversion
  • Stop Words
  • Word Segmentation
    • Character Offsets
  • Part-of-Speech Tagging
  • API Reference
    • Corpus Data
      • pycantonese.read_chat
        • read_chat()
      • pycantonese.hkcancor
        • hkcancor()
      • pycantonese.cantomap
        • cantomap()
      • pycantonese.CHAT
        • CHAT
      • pycantonese.CHAT.search
        • CHAT.search()
    • Jyutping Romanization
      • pycantonese.characters_to_jyutping
        • characters_to_jyutping()
      • pycantonese.parse_jyutping
        • parse_jyutping()
      • pycantonese.jyutping_to_ipa
        • jyutping_to_ipa()
      • pycantonese.jyutping_to_yale
        • jyutping_to_yale()
      • pycantonese.stringify_yale
        • stringify_yale()
      • pycantonese.yale_to_jyutping
        • yale_to_jyutping()
      • pycantonese.jyutping_to_tipa
        • jyutping_to_tipa()
    • Grapheme-to-Phoneme Conversion
      • pycantonese.g2p
        • g2p()
    • Natural Language Processing
      • pycantonese.stop_words
        • stop_words()
      • pycantonese.parse_text
        • parse_text()
      • pycantonese.segment
        • segment()
      • pycantonese.pos_tag
        • pos_tag()
      • pycantonese.pos_tagging.hkcancor_to_ud
        • hkcancor_to_ud()
    • CHAT
      • CHAT
        • CHAT.ages()
        • CHAT.append()
        • CHAT.characters()
        • CHAT.extend()
        • CHAT.file_paths
        • CHAT.filter()
        • CHAT.from_dir()
        • CHAT.from_files()
        • CHAT.from_git()
        • CHAT.from_strs()
        • CHAT.from_url()
        • CHAT.from_utterances()
        • CHAT.from_zip()
        • CHAT.head()
        • CHAT.headers()
        • CHAT.info()
        • CHAT.jyutping()
        • CHAT.languages()
        • CHAT.n_files
        • CHAT.participants()
        • CHAT.search()
        • CHAT.tail()
        • CHAT.to_files()
        • CHAT.to_strs()
        • CHAT.tokens()
        • CHAT.utterances()
        • CHAT.word_ngrams()
        • CHAT.words()
    • Token
      • Token
    • Utterance
      • Utterance
    • Jyutping
      • Jyutping
        • Jyutping.onset
        • Jyutping.nucleus
        • Jyutping.coda
        • Jyutping.tone
        • Jyutping.__eq__()
        • Jyutping.__hash__()
        • Jyutping.__init__()
        • Jyutping.__repr__()
        • Jyutping.__str__()
        • Jyutping.final
    • Headers
      • Headers
    • Ngrams
      • Ngrams
  • Archives
    • Tutorials
    • Research Outputs
PyCantonese
  • Search


© Copyright 2014-2026, Jackson L. Lee.

Built with Sphinx using a theme provided by Read the Docs.