Chinese language word embeddings based on the corpus Hanku Cover Image

Chinese language word embeddings based on the corpus Hanku
Chinese language word embeddings based on the corpus Hanku

Author(s): Radovan Garabík
Subject(s): Language and Literature Studies, Applied Linguistics
Published by: Jazykovedný ústav Ľudovíta Štúra Slovenskej akadémie vied
Keywords: word embeddings; Chinese; Pǔtōnghuà; corpus; NLP

Summary/Abstract: Vector models based on word embeddings are an indispensable part of advanced Natural Language Processing research and language analysis. We describe several Chinese language (Pǔtōnghuà) word embeddings, the differences from "western" language models caused by specific orthographic and linguistic features of the written Chinese language, and introduce a publicly available web interface for querying the vector models, aimed at linguistically or pedagogically oriented users.

  • Issue Year: 72/2021
  • Issue No: 4
  • Page Range: 996-1004
  • Page Count: 9
  • Language: English
Toggle Accessibility Mode