Chinese language word embeddings based on the corpus Hanku

Radovan Garab&#237;k

Chinese language word embeddings based on the corpus Hanku
Chinese language word embeddings based on the corpus Hanku

Author(s): Radovan Garabík
Subject(s): Language and Literature Studies, Applied Linguistics
Published by: Jazykovedný ústav Ľudovíta Štúra Slovenskej akadémie vied
Keywords: word embeddings; Chinese; Pǔtōnghuà; corpus; NLP

Summary/Abstract: Vector models based on word embeddings are an indispensable part of advanced Natural Language Processing research and language analysis. We describe several Chinese language (Pǔtōnghuà) word embeddings, the differences from "western" language models caused by specific orthographic and linguistic features of the written Chinese language, and introduce a publicly available web interface for querying the vector models, aimed at linguistically or pedagogically oriented users.

Details
Contents

Journal: Jazykovedný časopis

Issue Year: 72/2021
Issue No: 4
Page Range: 996-1004
Page Count: 9
Language: English

Content File-PDF

Back to list

Chinese language word embeddings based on the corpus Hanku Chinese language word embeddings based on the corpus Hanku

Chinese language word embeddings based on the corpus Hanku
Chinese language word embeddings based on the corpus Hanku