pycantonese.CHAT
- class pycantonese.CHAT(chat: Chat | None = None)[source]
A reader for Cantonese CHAT corpus data.
This class wraps a Rust-backed CHAT parser and provides Cantonese-specific functionality such as Jyutping extraction, character-level access, and corpus search.
Methods
__init__([chat])ages()Return the ages.
append(other)Append another CHAT object's data.
characters(*[, by_utterance, by_file])Return the data in individual Chinese characters.
extend(others)Extend with data from multiple CHAT objects.
filter(*[, participants, files])Filter the data by participants and/or files.
from_dir(path, *[, match, extension, ...])Read CHAT data from a directory.
from_files(paths, *[, parallel, strict, ...])Read CHAT data from file paths.
from_git(url, *[, rev, depth, match, ...])Read CHAT data from a Git repository.
from_strs(strs, *[, ids, parallel, strict, ...])Read CHAT data from strings.
from_url(url, *[, match, extension, ...])Read CHAT data from a URL pointing to a ZIP archive.
from_utterances(utterances)Construct a CHAT reader from a list of utterances.
from_zip(path, *[, match, extension, ...])Read CHAT data from a ZIP file.
head([n])Return the first n utterances with a formatted display.
headers()Return the headers.
info([verbose])Print summary information.
jyutping(*[, by_utterance, by_file])Return the data in Jyutping romanization.
languages(*[, by_file])Return the languages.
participants(*[, by_file])Return the participants.
search(*[, onset, nucleus, coda, tone, ...])Search the data for the given criteria.
tail([n])Return the last n utterances with a formatted display.
to_files(dir_path, *[, filenames])Write CHAT (.cha) files to a directory.
to_strs()Return the data as CHAT-formatted strings.
tokens(*[, by_utterance, by_file])Return the tokens.
utterances(*[, by_file])Return the utterances.
word_ngrams(n)Return word n-grams across all utterances.
words(*[, by_utterance, by_file])Return the words.
Attributes
The file paths.
The number of files.