PyCantonese Logo
  • Quickstart
    • Using PyCantonese in JavaScript
  • Corpus Data
    • CHAT Format
    • Built-in Data
      • HKCanCor
      • CantoMap
    • CHILDES and TalkBank Data
    • Custom Data
  • Corpus Reader Methods
    • Headers
    • Transcriptions and Annotations
      • Jyutping Romanization
      • Chinese Characters
    • Word Ngrams
  • Corpus Search Queries
    • Searching by a Jyutping Element
    • Searching by a Chinese Character
    • Searching by a Part-of-speech Tag
    • Searching by a Word or Utterance Range
    • Searching by Multiple Criteria
    • Output Format of Search Results
    • Complex Searches
  • Parsing Cantonese Text
    • Input 1: A Plain String
    • Input 2: A List of Strings
    • Input 3: A List of Tuples of Strings
    • Customizing Part-of-Speech Tagging
    • Outputting CHAT Data
    • More Customization
  • Grapheme-to-Phoneme Conversion
  • Jyutping Romanization
    • Characters-to-Jyutping Conversion
    • Parsing Jyutping Strings
    • Jyutping-to-IPA Conversion
    • Grapheme-to-Phoneme Conversion
    • Jyutping-to-Yale Conversion
    • Yale-to-Jyutping Conversion
    • Jyutping-to-TIPA Conversion
  • Stop Words
  • Word Segmentation
    • Character Offsets
  • Part-of-Speech Tagging
  • API Reference
    • Corpus Data
      • pycantonese.read_chat
        • read_chat()
      • pycantonese.hkcancor
        • hkcancor()
      • pycantonese.cantomap
        • cantomap()
      • pycantonese.CHAT
        • CHAT
      • pycantonese.CHAT.search
        • CHAT.search()
    • Jyutping Romanization
      • pycantonese.characters_to_jyutping
        • characters_to_jyutping()
      • pycantonese.parse_jyutping
        • parse_jyutping()
      • pycantonese.jyutping_to_ipa
        • jyutping_to_ipa()
      • pycantonese.jyutping_to_yale
        • jyutping_to_yale()
      • pycantonese.stringify_yale
        • stringify_yale()
      • pycantonese.yale_to_jyutping
        • yale_to_jyutping()
      • pycantonese.jyutping_to_tipa
        • jyutping_to_tipa()
    • Grapheme-to-Phoneme Conversion
      • pycantonese.g2p
        • g2p()
    • Natural Language Processing
      • pycantonese.stop_words
        • stop_words()
      • pycantonese.parse_text
        • parse_text()
      • pycantonese.segment
        • segment()
      • pycantonese.pos_tag
        • pos_tag()
      • pycantonese.pos_tagging.hkcancor_to_ud
        • hkcancor_to_ud()
    • CHAT
      • CHAT
        • CHAT.ages()
        • CHAT.append()
        • CHAT.characters()
        • CHAT.extend()
        • CHAT.file_paths
        • CHAT.filter()
        • CHAT.from_dir()
        • CHAT.from_files()
        • CHAT.from_git()
        • CHAT.from_strs()
        • CHAT.from_url()
        • CHAT.from_utterances()
        • CHAT.from_zip()
        • CHAT.head()
        • CHAT.headers()
        • CHAT.info()
        • CHAT.jyutping()
        • CHAT.languages()
        • CHAT.n_files
        • CHAT.participants()
        • CHAT.search()
        • CHAT.tail()
        • CHAT.to_files()
        • CHAT.to_strs()
        • CHAT.tokens()
        • CHAT.utterances()
        • CHAT.word_ngrams()
        • CHAT.words()
    • Token
      • Token
    • Utterance
      • Utterance
    • Jyutping
      • Jyutping
        • Jyutping.onset
        • Jyutping.nucleus
        • Jyutping.coda
        • Jyutping.tone
        • Jyutping.__eq__()
        • Jyutping.__hash__()
        • Jyutping.__init__()
        • Jyutping.__repr__()
        • Jyutping.__str__()
        • Jyutping.final
    • Headers
      • Headers
    • Ngrams
      • Ngrams
  • Archives
    • Tutorials
    • Research Outputs
PyCantonese
  • API Reference
  • pycantonese.CHAT.search

pycantonese.CHAT.search

CHAT.search(*, onset=None, nucleus=None, coda=None, tone=None, initial=None, final=None, jyutping=None, character=None, pos=None, word_range=(0, 0), utterance_range=(0, 0), by_token=True, by_utterance=False, by_file=False)[source]

Search the data for the given criteria.

Parameters:
  • onset (str, optional) – Onset to search for. A regex is supported.

  • nucleus (str, optional) – Nucleus to search for. A regex is supported.

  • coda (str, optional) – Coda to search for. A regex is supported.

  • tone (str, optional) – Tone to search for. A regex is supported.

  • initial (str, optional) – Initial to search for. A regex is supported.

  • final (str, optional) – Final to search for.

  • jyutping (str, optional) – Jyutping romanization of one Cantonese character to search for.

  • character (str, optional) – One or more Cantonese characters to search for.

  • pos (str, optional) – A part-of-speech tag to search for. A regex is supported.

  • word_range (tuple[int, int], optional) – Span of words around a match. Default is (0, 0).

  • utterance_range (tuple[int, int], optional) – Span of utterances around a match. Default is (0, 0).

  • by_token (bool, optional) – If True, return Token objects. Otherwise return word strings.

  • by_utterance (bool, optional) – If True, return full utterances containing matches.

  • by_file (bool, optional) – If True, return data organized by file.

Returns:

list

Previous Next

© Copyright 2014-2026, Jackson L. Lee.

Built with Sphinx using a theme provided by Read the Docs.