lb:hangul:third
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revision | |||
lb:hangul:third [2021-11-08 09:45:55] – [On Hangul Supremacy & Exclusivity – An Information Theory Comparison of Hangul and Hanja] ninjasr | lb:hangul:third [2025-01-05 17:49:18] (current) – external edit 127.0.0.1 | ||
---|---|---|---|
Line 1: | Line 1: | ||
+ | ====== On Hangul Supremacy & Exclusivity – An Information Theory Comparison of Hangul and Hanja ====== | ||
+ | [{{ https:// | ||
+ | ===== An Information Theory Comparison of Hangul and Hanja ===== | ||
+ | “//A picture is worth a thousand words.//” This is a well-known English proverb. Most do not think about why this proverb is true, because its proof seems quite obvious. If one were to describe a picture in words, he would need indeed a lot of words. Why is that so? The answer can be found in information theory, a field of study that has made modern digital communications possible and plays an important role in several other fields including linguistics.\\ | ||
+ | ==== A Layman’s Short Introduction to Information Theory ==== | ||
+ | Information theory involves the quantification of “information.” // | ||
+ | === Illustrative Examples === | ||
+ | Information thus depends on the distribution of the symbols, or elements, that are in one set of symbols. In general, the higher the number of elements, the higher the information. For instance, imagine observing a coin toss and a fair die roll. The probability of heads and tails of the fair coin is 1/2 each. The probability of each side of the die is 1/6 each. Comparatively, | ||
+ | Thus, it can be easily seen why the proverb “a picture is worth a thousand words” is true. A picture is effectively a set with an infinite number of symbols, each symbol with an infinitesimal probability of being observed, regardless of how each symbol is defined. On the other hand, words are effectively a set with a finite number of symbols. To get a empirical sense, approximately 10,000 words comprise the vocabulary of native speakers with higher education. The word set is thus far smaller than the picture set. Therefore, observing a picture reduces uncertainty much more than observing words, and a coin flip and die roll.\\ | ||
+ | === Measuring Information === | ||
+ | Mathematically, | ||
+ | {{ https:// | ||
+ | where //I(m)// is the information of a symbol and //p(m)// is the probability of observing symbol //m//. For a number of reasons, it is a logarithmic measure. Principally, | ||
+ | The average information for a set of symbols is called entropy. Mathematically, | ||
+ | {{ https:// | ||
+ | where //H(M)// is the entropy measured in bits-per-symbol, | ||
+ | ==== Comparing the Information Hangul Versus Hanja ==== | ||
+ | According to information theory, Hangul should have a lower amount of average information than Hanja. Hangul is a phonetic alphabet comprising of only 24 symbols. Hanja, in contrast, is an ideogram comprising of more than 40,000 symbols, out of which only about 2,000 are considered “common use” in Korea. From the start, it can be readily recognized that there is a lot more uncertainty in observing a Hanja character versus observing a Hangul letter.\\ | ||
+ | To get a sense of the disparity, assume that each symbol in each respective script occurs with equal probability and is independent. That is, each alphabet of Hangul occurs 1/24 of the time and each character in Hanja occurs 1/2000 of the time. (The actual probability for Hangul ranges from 0.122 for ㅇ and 0.002 for ㅋ. This blogger has not yet found a complete listing for Hanja). Thus, the entropy of Hangul is only 4.75 bits-per-symbol, | ||
+ | Of course, this assumption that each symbol in each respective script occurs with equal probability is not entirely correct. Certain symbols do occur with more frequency than others, and therefore the entropy in actuality will be much lower. This, however, does not detract away from the finding that each character of Hanja conveys more information than each letter of Hangul: there is still a lot more Hanja characters than Hangul letters. The fact that Hanja conveys more information than Hangul has ramifications in the semantic meaning conveyed by each symbol.\\ | ||
+ | For example, take the Hangul letters “일.” It has three symbols: ㅇ, ㅣ, and ㄹ. Even with three symbols, the semantic meaning is highly ambiguous. It could mean “one,” “work,” “day,” or even a grammatical particle. Contrast this to seeing just one Hanja character. Since there is a lot more information, | ||
+ | * 車(1) -> 차(3) (“car”) | ||
+ | * 天(1) -> 천(3), 하늘(5) (“sky”) | ||
+ | * 止(1) -> 지(2), 멈추다(7) (“to stop”) | ||
+ | * 褰(1) -> 건(3), 옷을걷어올리다(18) (“to hang up clothes”) | ||
+ | * 蔭(1) -> 음(3), 조상의 공덕에 의하여 맡은 벼슬 (33) (“A bureaucratic position attained based on merits of an ancestor”) | ||
+ | This finding should not be surprising. In no instance, can the representation in Hangul be more compact than the representation in Hanja. Since Hanja characters have a higher amount of information, | ||
+ | This is more apparent with prose text. Compare the original Classical Chinese text of the Pater Noster (天主經, 천주경) versus the Hangul-only Korean translation (both are Catholic translations): | ||
+ | {{ https:// | ||
+ | Notice how few the number of symbols are in the Classical Chinese text is compared to how many Hangul letters are in the Hangul-only Korean translation. Both are roughly the same symbolic representations of the underlying semantic meaning. Hangul only appears more compact, simply because of its arrangement into syllable blocks. Other comparisons of Classical Chinese text and mixed script versus Hangul-only representations will show the same result, without fail. Hangul is vastly inferior from an information theory perspective.\\ | ||
+ | ===== Conclusion ===== | ||
+ | This blogger conceived of this argument, to introduce much needed objectivity the debate between Hangul exclusivity and mixed script. In the end, subjective arguments, such as appeals to nationalism, | ||
+ | __Disclosure__: |