pycantonese.CHAT

class pycantonese.CHAT(chat: Chat | None = None)[source]

A reader for Cantonese CHAT corpus data.

This class wraps a Rust-backed CHAT parser and provides Cantonese-specific functionality such as Jyutping extraction, character-level access, and corpus search.

__init__(chat: Chat | None = None)[source]

Methods

__init__([chat])

ages()

Return the ages.

append(other)

Append another CHAT object's data.

characters(*[, by_utterance, by_file])

Return the data in individual Chinese characters.

extend(others)

Extend with data from multiple CHAT objects.

filter(*[, participants, files])

Filter the data by participants and/or files.

from_dir(path, *[, match, extension, ...])

Read CHAT data from a directory.

from_files(paths, *[, parallel, strict, ...])

Read CHAT data from file paths.

from_git(url, *[, rev, depth, match, ...])

Read CHAT data from a Git repository.

from_strs(strs, *[, ids, parallel, strict, ...])

Read CHAT data from strings.

from_url(url, *[, match, extension, ...])

Read CHAT data from a URL pointing to a ZIP archive.

from_utterances(utterances)

Construct a CHAT reader from a list of utterances.

from_zip(path, *[, match, extension, ...])

Read CHAT data from a ZIP file.

head([n])

Return the first n utterances with a formatted display.

headers()

Return the headers.

info([verbose])

Print summary information.

jyutping(*[, by_utterance, by_file])

Return the data in Jyutping romanization.

languages(*[, by_file])

Return the languages.

participants(*[, by_file])

Return the participants.

search(*[, onset, nucleus, coda, tone, ...])

Search the data for the given criteria.

tail([n])

Return the last n utterances with a formatted display.

to_files(dir_path, *[, filenames])

Write CHAT (.cha) files to a directory.

to_strs()

Return the data as CHAT-formatted strings.

tokens(*[, by_utterance, by_file])

Return the tokens.

utterances(*[, by_file])

Return the utterances.

word_ngrams(n)

Return word n-grams across all utterances.

words(*[, by_utterance, by_file])

Return the words.

Attributes

file_paths

The file paths.

n_files

The number of files.