Knowledge Creation from Zero Ground – Information Theory Basics
In the figure above is homoristically depicted how past, present and future may play misleading roles within our brains, how Dilbert might behave if he really could grasp and ponder the future. See The Arrow Time paradox below.
This will be a K-Creation experience, initially chaotic: no plan, no guides, no discrimination even though focused in a main subject: The World Crisis. Authors will belong as much as possible to a wide spectrum of interests, age, professionality, beliefs and why not?: cultures. See more in about us…..
Raw data will be meanigful pieces of knowledge under the form of Web messages or “semantic pills” that in authors’ criteria justify relevance enough to be saved for ulterior semantic anlaysis. Relevance somehow related to our main subject in the near future and “authoritatively” issued. At its turn uthoritativeness should rest on: message popularity; message authoring and publishing media relevance.
Making use of some well known Information Theory analogies and examples by semantic pills we mean messages that trigger high our awareness, perhaps shocking us because all sort of imagined strong effects over our concern, in fact generally bizarre, unexpected and unfrequent events such as: “man bites dog”,.
Note: Man bites dog has 1,300,000 Google references meanwhile Dog bites man 913,000. The common sense tells us that the first event is extremely rare and its probability of occurrence extremely low related to the second, for instance in the range of 1 to millions. This deformation depicted by Search Engines accounts because the phrase becomes a frequent concept in journalism, a way to tag something like extremely rare. See Man bits dog in Wikipedia.
“Semantic pill” template
· Suspected Knowledge Subject/Theme/Topic;
· Name of the Article (message);
· Publishing media, Author/s, URL;
· A relevant paragraph;
· Conceptual Tags.
These semantic pills will from time to time be arbitrarily interrelated having in mind our focus trying to build larger pieces of knowledge by threading raw data meaningfully. By analyzing tracks by Blog author we may draw inferences about our own auto learning process and how we influence each other as well
What information is?
Source: Wikipedia, Shannon and his famous AI Theseus Labyrinth mouse experience
Intuitively information is something like a fluid of elementary semantic particles that thru our sensorial system strikes our conscience, at large bursts of energy. However non sentient creatures also receive bursts of energy from media but their effects are considered “mechanic reflexes”. The brain mechanics is not well known yet and much lesser how we human take, assume, and process information. Claude Elwood. Shannon founded the first milestones and hypothesis about energy, information and how we perceive this rather unattainable form of fluid. Intuitively we may draw the following Conjectures about this mechanism:
a) The more rare a message appears to our conscience the more information we assign to it:
b) The same message behaves for humans like carrying different amounts of information;
c) the information carried on looks like bounded, so according to b) we could establish a principle of metric saying that “information received” would range from “zero” to a “maximum” depending of the message structure and content;
Note: For example if we read on a newspaper that a dog bites a man it merely adds a tiny touch of intensity to our awareness meanwhile the reverse a man bites a dog adds a strong level of intensity to our awareness.
d) We may imagine a metrics for our ignorance about something as being somehow proportional to the number of possible “answers” or alternatives mentally associated to that something. We somehow ponder answers and/or alternatives assigning them “weights” and priorities. Conventionally we may agree to assimilate our “range of ignorance” to those numbers when answers and/or alternatives probabilities of occurrence are the same or pretty much the same. In this case we may also talk about our “degree of uncertainty”.
e) From a) to d) conjectures it seems meaningful to us that the information quanta has a maximum potentiality of triggering awareness changes, from its inherent maxima to zero.
f) The basic unit of analysis of this new fluid interaction could be imagined as follows:
i. An individual receiving a given fluid quanta under the form of a message;
ii. The message itself;
iii. The media where the message moves;
iv. The origin: namely a mechanism, point of space, box, where the message has been generated via energy expenditure.;
v. The message sender, personal or impersonal, imagined behind the box.
Shannon had some interconnected brilliant ideas besides. Certainty is to be true about something and Uncertainty, on the contrary, is to be in the ignorance about something and this “something” refers to what we want/need to know!. In order to reduce our “degree of Uncertainty” we humans need of information, don’t we?. An if we think of truth as something unique why not to imagine an intellectual stepwise process within our minds tending to reduce our range of ignorance as much as possible by going from ex-antes ignorance (the state before receiving the information) to an ideal ex-post truth defined by a range of ignorance equal to ONE!.
Note: In our inferences perhaps we are going too fast and audaciously taking for granted that humans may “think digitally”. The intellectual stepwise process mentioned above that requires mastering this type of thinking, is an acquired talent. We have to take care of these type of reductionisms and abstractions extremely useful to build and manage machines, robots and agents but dangerous if we dare to extrapolate too much going upwards to humans. Shannon work was a milestone of the Digital to Mind adventure but too far yet from Mind understanding. On the contrary the Mind to Digital adventure has been relatively truncated and postponed along the last five decades. As we are going to suggest in this blog a step ahead Shannon is needed along the long trip from Digital to Mind as well a step ahead Descartes (just to put an example of a great Thinker) along the reverse and equivalent trip from Mind to Digital is also needed. .
A classic hands-on Information gym
Let´s present here the classic “Bob and Alice” type dialogs used in Theory of Information and Communications classes. Bob and Alice are origin and end, respectively each other, of a two way elemental communication system. They agreed about a “code” to “cipher” and to “decipher” messages and to make things simpler they agreed about using a binary code, where for example 1 stands for a given action and 0 by its total complementary counter action: left-right, up-down, white-black, etc. So possible messages of this basic communication system may look a little bored and monotone: [0 1 1 0 1] or from [0 0 0 0 0] to [1 1 1 1 1] thru all possible combinations, in this case 25 = 32. Bob and Alice used to appear as secret agents and eventually they must interchange messages about which platoons the enemy are going to mobilize soon. Using the binary system interpreted orderly, for example from left to right, the message [0 1 1 0 1] sent by Bob warns Alice that next mobilized platoon will be the 14th (because [0 0 0 0 0] matches platoon 1, [0 0 0 0 1] platoon 2 and so on and so forth till [1 1 1 1 1] that matches platoon 32). Let´s imagine a possible stepwise mechanism of Alice’s brain meanwhile receiving this message:
1. Bob send 0 => Alice may infer that platoon number should be in the inferior half, from 1 to 16;
2. Bob send 1 => Alice may infer that platoon number should be in superior half of remaining, from 9 to 16;
3. Bob send 1 => Alice may infer that platoon number should be in the superior half of remaining, from 13 to 16;
4. Bob send 0 => Alice may infer that platoon number should be in the inferior half of remaining , from 13 to 14;
5. Bob send 1 => Alice may infer that platoon number should be in the superior half of remaining, now the TRUTH
TRUTH ó UNCERTAINTY = 1, the answer is 14th platoon.
Intuitive inference of Shannon Entropy
We are going to imagine how Shannon may possibly arrived to this expression that measures the maximum amount of information a digital message has. Let´s have the following message:
[0 1 1 1 0 1 0 1 0 0]
There are many ways to interpret it arbitrarily and symmetrically whether possible as h-bits packets (h from 1 to 5) from right to left as follows:
· As a 10 bits message: [0 1 1 1 0 1 0 1 0 0] => as a sequence of 10 bits
· As a two bits packets message: [(0 1) (1 1) (0 1) (0 1) (0 0)] => as a sequence of 5 “two bits” symbols
· As a three bits packets message: [(# # 0) (1 1 1) (0 1 0) (1 0 0)] => as a sequence of 3.33 “three bits symbols
· As a four bits packets message: [(# # 0 1) (1 1 0 1) (0 1 0 0 )] => as a sequence of 2.5 “four bits” symbols
· As a five bits packets message: [(0 1 1 1 0) (1 0 1 0 0)] => as a sequence of 2 “five bits” symbols
And under any agreed code between Bob and Alice the information power to pass from uncertainty to certainty will be the same. In all cases the “Ex antes” receiving side (Alice) degree of Uncertainty is measured by 1024 possibilities for a variable at random: X: for example the enemy platoon expected time of attack coded in this example from 0000 to 1023 two minutes lapses of time beginning at noon of a D-day . The spy (Bob) arranges the message as a sequence of 10 bits as shown above that transformed to decimal and arbitrarily form right to left becomes
0x1 + 0x2 + 1×4 + 0x8 + 1×16 + 0x32 + 1×64 + 1×128 + 1×256 0x512 = 468
Meaning the 469th coded time (938 minutes after noon, that is 03:38 AM hours of the next day) as expected attack time. Now let’s create our own code symbols for each interpretation:
· One bit symbols: symbols: 0 and 1
· Two bits symbols: 00 ó 0; 01 ó 1; 10 ó 2; 11 ó 3
· Three bits symbols: 000 ó 0; 001 ó 1; 010 ó 2; 011 ó 3; 100 ó 4; 101 ó 5; 110 ó 6; 111 ó 7;
· Four bits symbols: from 0000 to 1111 ó 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B. C, D, E, F
· Five bits symbols: from 00000 to 11111 ó A, B, C, D, E, F, G, H. I. J. K. L. M, N, O, P, Q, R, S, T, U, V, W, X, Y, Z, #, $, %, =, (, )
That Alice interprets as follows:
· One bit symbols sequence: [0 1 1 1 0 1 0 1 0 0]2 ó 46810
· Two bits symbols sequence: [1 3 1 1 0]4 ó 1×256 + 3×64 + 1×16 + 1×4 ó 46810
· Three bits symbols sequence: [0 7 2 4]8 ó 7×64 + 2×8 + 4×1 ó 46810
· Four bits symbols sequence: [1 D 4]16 ó 1×256 + 13×64 + 4×1 ó 46810
· Five bits symbols sequence [O U]32 ó 14×32 + 20×1 ó 46810
And we may now arrive to the simplest version of the Shannon formula when applied to messages constituted by a sequence of equally probable h-bits code symbols:
H = n.Log2N
· N stands for the number of different symbols within a given alphabet – with the same probability of occurrence-;
· n stands for the number of Log2N-bits symbols of the sequence;
· H is the “Shannon Entropy”, namely the maximum amount of information transmitted;
· And Log2 stands for “Logarithm base 2”. So in our example where N=32. Log232 = 5, each 5-bits packet may carry 5 units of information. As in our example H is always the same for all five interpretations we have
10 = 10. Log22 = 5. Log24 = 3.,33. Log28 = 2.5. Log216 = 2. Log232
Up to now we have seen that the formula H = n.Log2N is coherent with our intuitive conjectures. Let´s try now to relate it with probabilities of symbols occurrence a little off the idealized Bob and Alice experience. In Bob and Alice analogy we may infer a “by default” probability p=1/N that transforms the above expression in
H = – n. Log2(p)
Where H remains positive because logs of numbers lower than 1 are negative. In our example with N=32 and with p=1/32 N=10 for n=5 a chain of 2 five.bit symbols. Therefore for a single symbol contribution we arrive to the expression:
H = – Log2(p)
That expression tells us that the more “bizarre” a symbol is the more information it may carry. For example if we were using a Chinese type ideogram system of 65.536 ideograms and consider them equally probable the appearance of each one may unveil at the receptor side 1 among 65.536 possible values of a given random variable. As we may easily check the same result could be obtained with a 16 one-bit chain message.
How could we estimate the corresponding H for a sequence of n independent symbols belonging to one or more than one alphabet?. Well each one will contribute with – Log2(p), and generically with – Log2(pi). But if the occurrence within the message chain is ruled by a probability pi, the H of the chain will be expressed by
H = ∑I -[ pi.Log2(pi)]
This is the formal expression discovered by Shannon for its “Information Entropy” a useful but rather controversial concept. We have to take into account that p = p(X), stands for the probability of appearance of symbol X in an information burst. Among simplest and bright Shannon’s ideas is the concept of “surprise factor” or “self information” of a message m defined by applying the same versatile expression
I(m) = -Log2p(m)
Under these definitions H could be interpreted as the “average self information of a message”.
How energy is somehow intelligently involved
In this ideal case Alice uses all the potential information contained in the Bob’s message. Finally as messages must be created spending some amount of energy Shannon was worried about how to introduce a metric to make a meaningful equivalence between these two fluids, energy and information. As per Shannon paradigm conscious creatures are continuously receiving information in order to reduce their uncertainty, as the basic process of learning and acquiring knowledge he imagined an analogy with thermodynamic systems that are continuously receiving energy to continue strive for a living and finally dying.
However science still ignores how humans use information and the overall efficiency of the process. Humans need in the average a given amount of Calories and a menu of basic nutrients to continue living. In this model we humans behave as a thermal machine, delivering useful work until dying. In the case of information considered as a vital nutrient our ignorance is so immense that it could only be partially satisfied along our lives. The only we know as per Shannon paradigm is that messages have a maximum nutrient power and that we live surrounded and stimulated by billions of them. We don´t even know how efficient is the voluntary process of learning. Globally the only we know is that some people get knowledge easier than others but could be for many reasons combined. Let´s try to imagine how the global human society behaves concerning this elusive fluid: Information.
What then is knowledge?
One thing seems true: Knowledge derives from information and we dare to say that knowledge is somehow and within limits, proportional to the amount of information digested.
As physically brains remain pretty much the same along life what then adds knowledge to them?. What is in a genius brain?. We may dare to say: “neural complexity”, and perhaps more fluidity and overall efficiency in the answer to information stimuli. In fact a brain is an ordered collection of molecules working as a system, as a machine to generate messages as well. From a thermodynamic point of view we may infer that information as a nutrient has transformed the biochemical structure of the brain, its ordered collection of molecules, making it more efficient as a thinking machine: more and better judgments and pieces of truth outcome. It looks like a “negentropic” internal process: mind taking energy from outside to increase its inherent order!.
Note: In information theory, entropy is a measure of the uncertainty associated with a random variable. The concept was introduced by Claude E. Shannon in his 1948 paper “A Mathematical Theory of Communication”.
The Arrow of Time paradox
Especially we humans are creatures that have the capability of continuously improving our knowledge. Of course we as a system are ruled by The Second Thermodynamic Law: the Arrow Time, in the “macro domain” where we live, asymmetrically evolves towards higher entropy states, towards high disorder. As a paradox ingredient we could say that even Einstein’s mind –just to mention one brilliant mind archetype- was globally entropic even though negentropic internally. In common terms we could affirm that Einstein´s brain as a system, within a media (his body and its surroundings) consumed more energy than gained under the form of knowledge. Living creatures have to import negentropy in order to improve their thinking machines and at last we all follow the Arrow Time destiny. For people that believe in God, He is the negentropy master, creator and provider and all God creation is sooner or later pervaded by decomposition, basically subject to entropic processes. However for both believers and agnostic this destiny is still considered an unknowable eternal mystery. .
Once a man like Einstein die his marvelous brain goes to chaos, becoming an unordered collection of molecules disperse all over the space. But what happens with its knowledge?. Is in the Big Bang model and similar a place for knowledge?. It seems that at least no physical place. However evolution gives us a reasonable answer: complexity could be transmitted via messages. The DNA is in fact encapsulated complexity transmitted and waiting its opportunity to reinitiate a vital cycle. Concerning humans’ knowledge is preserved as encapsulated messages as well: documents, databases, books, e-books, plain messages. If we consider that information is a form of first level encapsulated energy we have to come to the conclusion that knowledge is perhaps a second level of information, denser, with an extended coding capacity. However knowledge also could be destroyed unless transformed or transported to a non corruptible domain, perhaps something like the ancient utopia of Gnosis.
Could be the knowledge mapped?; packed in a nutshell?
Defined knowledge as a “hierarchically ordered collection of meanings” we may devise a sort of “industrial procedure” to create a first approximation to it via Semantically Mapping the hugest existent collection of documents: the Web. We have first to ask ourselves: what’s in a “document”?. Is this a message?. At first sight and intuitively we are prone to say NO. Within it we distinguish Common Words and Expressions necessary to express meaningfully literarily and “concepts”, meanings embedded within the literary text corpuses. To detect, understand and order these “concepts” we need to have access to more documents and probably these documents lead us to others and those others to others and so on and so forth. These mappings may represent a reasonable good approximation to a collective mind machine that provides answers to satisfy individual ignorance’s via very specific messages known as “queries”. .
Tags: common words and expressions, concepts, semantic maps, semantic mapping, Shannon, entropy, information entropy, darwin ontology, arrow of time, human knowledge map, uncertainty, degree of uncertainty, second thermodynamic law, negentropy, negentropic, self information, surprise factor, neural complexity, Theory of Information, Bob and Alice, man bites dog, dog bites man, semantic pills, theseus labyrinth, world crisis, k-creation, Dilbert, time reversal