PyCantonese Logo
  • Quickstart
  • Corpus Data
    • CHAT Format
    • Built-in Data
    • CHILDES and TalkBank Data
    • Custom Data
  • Corpus Reader Methods
    • Headers
    • Transcriptions and Annotations
      • Jyutping Romanization
      • Chinese Characters
    • Word Ngrams
  • Corpus Search Queries
    • Searching by a Jyutping Element
    • Searching by a Chinese Character
    • Searching by a Part-of-speech Tag
    • Searching by a Word or Utterance Range
    • Searching by Multiple Criteria
    • Output Format of Search Results
    • Complex Searches
  • Parsing Cantonese Text
    • Input 1: A Plain String
    • Input 2: A List of Strings
    • Input 3: A List of Tuples of Strings
    • Customizing Part-of-Speech Tagging
    • Outputting CHAT Data
    • More Customization
  • Jyutping Romanization
    • Characters-to-Jyutping Conversion
    • Parsing Jyutping Strings
    • Jyutping-to-IPA Conversion
    • Jyutping-to-Yale Conversion
    • Jyutping-to-TIPA Conversion
  • Stop Words
  • Word Segmentation
  • Part-of-Speech Tagging
  • API Reference
    • Corpus Data
      • pycantonese.read_chat
        • read_chat()
      • pycantonese.hkcancor
        • hkcancor()
      • pycantonese.CHAT
        • CHAT
        • pycantonese.CHAT.search
      • pycantonese.CHAT.search
        • CHAT.search()
    • Jyutping Romanization
      • pycantonese.characters_to_jyutping
        • characters_to_jyutping()
      • pycantonese.parse_jyutping
        • parse_jyutping()
      • pycantonese.jyutping_to_ipa
        • jyutping_to_ipa()
      • pycantonese.jyutping_to_yale
        • jyutping_to_yale()
      • pycantonese.jyutping_to_tipa
        • jyutping_to_tipa()
    • Natural Language Processing
      • pycantonese.stop_words
        • stop_words()
      • pycantonese.parse_text
        • parse_text()
      • pycantonese.segment
        • segment()
      • pycantonese.pos_tag
        • pos_tag()
      • pycantonese.pos_tagging.hkcancor_to_ud
        • hkcancor_to_ud()
    • CHAT
      • CHAT
        • CHAT.ages()
        • CHAT.append()
        • CHAT.characters()
        • CHAT.extend()
        • CHAT.file_paths
        • CHAT.filter()
        • CHAT.from_dir()
        • CHAT.from_files()
        • CHAT.from_strs()
        • CHAT.from_utterances()
        • CHAT.from_zip()
        • CHAT.head()
        • CHAT.headers()
        • CHAT.info()
        • CHAT.jyutping()
        • CHAT.languages()
        • CHAT.n_files
        • CHAT.participants()
        • CHAT.search()
        • CHAT.tail()
        • CHAT.to_chat()
        • CHAT.to_strs()
        • CHAT.tokens()
        • CHAT.utterances()
        • CHAT.word_ngrams()
        • CHAT.words()
    • Token
      • Token
    • Jyutping
      • Jyutping
        • Jyutping.__eq__()
        • Jyutping.__hash__()
        • Jyutping.__init__()
        • Jyutping.__repr__()
        • Jyutping.__str__()
        • Jyutping.final
    • Headers
      • Headers
    • Ngrams
      • Ngrams
  • Archives
    • Tutorials
    • Research Outputs
PyCantonese
  • API Reference
  • pycantonese.CHAT
  • pycantonese.CHAT.search

pycantonese.CHAT.search

CHAT.search(*, onset=None, nucleus=None, coda=None, tone=None, initial=None, final=None, jyutping=None, character=None, pos=None, word_range=(0, 0), utterance_range=(0, 0), by_token=True, by_utterance=False, by_file=False)[source]

Search the data for the given criteria.

Parameters:
onsetstr, optional

Onset to search for. A regex is supported.

nucleusstr, optional

Nucleus to search for. A regex is supported.

codastr, optional

Coda to search for. A regex is supported.

tonestr, optional

Tone to search for. A regex is supported.

initialstr, optional

Initial to search for. A regex is supported.

finalstr, optional

Final to search for.

jyutpingstr, optional

Jyutping romanization of one Cantonese character to search for.

characterstr, optional

One or more Cantonese characters to search for.

posstr, optional

A part-of-speech tag to search for. A regex is supported.

word_rangetuple[int, int], optional

Span of words around a match. Default is (0, 0).

utterance_rangetuple[int, int], optional

Span of utterances around a match. Default is (0, 0).

by_tokenbool, optional

If True, return Token objects. Otherwise return word strings.

by_utterancebool, optional

If True, return full utterances containing matches.

by_filebool, optional

If True, return data organized by file.

Returns:
list
Previous Next

© Copyright 2014-2026, Jackson L. Lee.

Built with Sphinx using a theme provided by Read the Docs.