Chinese language word embeddings based on the corpus Hanku
Chinese language word embeddings based on the corpus Hanku
Author(s): Radovan GarabíkSubject(s): Language and Literature Studies, Applied Linguistics
Published by: Jazykovedný ústav Ľudovíta Štúra Slovenskej akadémie vied
Keywords: word embeddings; Chinese; Pǔtōnghuà; corpus; NLP
Summary/Abstract: Vector models based on word embeddings are an indispensable part of advanced Natural Language Processing research and language analysis. We describe several Chinese language (Pǔtōnghuà) word embeddings, the differences from "western" language models caused by specific orthographic and linguistic features of the written Chinese language, and introduce a publicly available web interface for querying the vector models, aimed at linguistically or pedagogically oriented users.
Journal: Jazykovedný časopis
- Issue Year: 72/2021
- Issue No: 4
- Page Range: 996-1004
- Page Count: 9
- Language: English