Sinica BOW and 300 Tang Poems: An overview of a bilingual ontological wordnet and its application to a small ontology of Tang poetry

Chu-Ren Huang (Academia Sinica), Feng-ju Lo (Yuan Ze University), Ru-Yng Chang (Academia Sinica), Sueming Chang (Academia Sinica)

研究院知識詞網與唐詩三百首 ─雙語知識本體詞網簡介及唐詩知識本體之初步 構建 黃居仁(中央研究院),羅鳳珠(元智大學),張如瑩(中央研 究院),張舒茗(中央研究院)

The Academia Sinica Bilingual Ontological Wordnet (Sinica BOW) integrates three resources: WordNet, English-Chinese Translation Equivalents Database (ECTED), and SUMO (Suggested Upper Merged Ontology). The three resources were originally linked in two pairs: WordNet 1.6 was manually mapped to SUMO (Niles and Pease 2003) and also to ECTED (the English lemmas in WordNet were mapped to their Chinese lexical equivalents). ECTED encodes both equivalent pairs and their semantic relations (Huang et al. 2003). With the integration of these three key resources, Sinica BOW functions both as an English-Chinese bilingual wordnet and a bilingual lexical access to SUMO.

Sinica BOW allows versatile access and facilitates a combination of lexical, semantic, and ontological information. Versatility is built in with its bilinguality, and the lemma-based merging of multiple resources. First, either English or Chinese can be used for the query, as well as for presenting the content of the resources. Second, the user can easily access the logical structure of both the WordNet and SUMO ontology using either words or conceptual nodes. That is, users can use words to search for ontology or use ontological nodes to search for linguistic realizations in both languages. Third, multiple linguistic indexing is built in to allow additional versatility. Fourth, domain information allows another dimension of knowledge manipulation.

In addition to serving as the reference and infrastructure for the construction of specific knowledgebases, the Sinica BOW model can also be applied to encode and represent a particular knowledge system, such as Tang civilization. This application will allow comparative studies of a historical conceptual system with our modern conceptual system. Our pilot study on the 300 Tang Poems are reported here. The segmented and classified lexicon of the 300 Tang Poems (Chang and Luo 1999) serve as the basis of this study. Three domain ontologies are constructed and studied: animals, plants, and artifacts. Each domain is mapped to the SUMO/BOW structure. The resultant ontological representation is taken as a slice of the knowledge structure of Tang civilization. For instance, from the ontology of animals of Tang 300 (see attached file), we reach some broad generalizations about the familiar fauna of Tang. With further examination, we also found a fascination with flying in Tang is confirmed by the poets' choice of poetic animals.

In sum, we argue that the Sinica BOW model will not only be a useful resource but also a productive model for the construction of a knowledgebase that will greatly facilitate our understanding of Tang civilization.



