Kısa Metinlerde Yazar Tanıma: Twitter İçin Bir Yöntem Önerisi
Determining a given text’s author and finding the texts whose authors are the same, through several texts is one of the most important application fields of forensic linguistics. Most of the studies, which are done up to the present, were conducted along with a long corpus, which also consists of relatively more linguistic data such as newspaper articles. Thus, they don’t suggest a method in terms of author identification of short texts. Yet, in forensic linguistic studies, the entreated texts are quite short. However, social media entries including the micro-blogs can be subject to criminal or law cases several times. Accordingly, there’s a need for some methods and aspects related to author identification. This study aims to suggest a method to determine authors by using grammar, punctuation, lexis and context features of texts, which were collected from the micro-blog named Twitter, with a special regard to its character constraint. The features, which are seen to be or may be distinctive, are shared and the methods, which mostly need to be formatted related to the corpus, are presented along with their general titles.
More...