Understanding Knowledge and Information

Dr. Keith Devlin,CSLI, Stanford University

Dr. Keith Devlin is Executive Director of Center for the Study of Language and Information (CSLI) of Stanford University. He visited STRL in October as a part of joint research agreement between Stanford University and STRL. He made a presentation titled "Understanding Knowledge and Information" during his stay. The following article is an outline of his presentation.


What's the oldest information in the universe? It's the information about the 'big bang'. Arno Penzias, one of its discoverers, thinks of information as a substance, something you can transport. Knowledge is quite different. Unlike information, knowledge is not something that you can move from place to place. Knowledge is inside people: in their heads, or in their experiences, or in the collective activities of an organization. Can we provide a scientific, mathematical-type understanding of information? Can we provide a clear explanation of the difference between data, information and knowledge, and then, can we use that understanding to design better information and communication technologies?
The intuitive idea that we developed in the early 1980s at CSLI was that information is what you get when you add meaning to data, and knowledge is what you get when that information moves inside your head - when you internalize it, when it becomes part of your experience and it becomes available for use. Information rides on the back of the data when we add meaning to it.
Knowledge definitely involves people, and to study it we must use methods of the social and cognitive sciences. But what about information? Can we develop a mathematical theory of information? Such a theory would involve - almost certainly - mathematical theories of language. That's why, from the very beginning, CSLI had lots of people who did semantics, because it was understood that you had to understand how meaning is carried by language. I eventually realized that I was trying to find something that was not there, because I came to believe that even for information, you have to use methods of the social and cognitive sciences.
Knowledge is transferred from head to head. Knowledge doesn't easily travel through information. Knowledge resides in people, and hence knowledge management is people management.
But what about information? Where does it exist? It's not locked in physical objects, but it's not exactly in our minds either. It's somewhere between our minds and the physical world. How does it arise? It's represented by things, like books, films, CDs - all sorts of things, and they encode information by means of some mechanism - we'll call it a 'constraint'. A constraint could be a natural law, it could be a rule, some kind of regularity. It could be the legal system. So the picture that we have of information is not a simple one. It involves the world, the mind, and the mind interacting with the world.

Imagine I make a statement that there are infinitely many prime numbers; I'm a mathematician, so that's the kind of thing I might say. What information does my statement carry? It's a true mathematical statement and true mathematical statements don't have information content in the sense of Shannon and Weaver's classical Information Theory, because they have to be true. So in terms of Information Theory, the most obvious answer is perhaps the least important one. Another piece of information you'd get from my statement is that I'm alive. That has greater information content than the last piece, because I don't have to be alive. My point is, the question, "What is the information in a signal?" doesn't have a unique answer. It depends on these things I'm calling constraints. What Shannon and Weaver called "information theory" actually isn't a theory of information; it's a theory of signal or channel capacities.
In order to develop a true mathematical theory of information, you have to develop a framework to analyze the way signals encode or represent information. Somehow, information flows. A newspaper article can give me information in Japan today about events that took place in Florida yesterday. The question is, in a deep sense, how does it?
Fred Dretzke gave the really key insight, "A signal 'S' carries the information 'X' by virtue of 'S' being of a certain type 'T"." This means we need agents who can classify the world according to type before we get information. One possible signal is (for example) the sky. The type could be one where there are lots of black clouds. If we look up and see black clouds, we get the information that rain is likely. So we've got the representation, the type and the information. The point is, things themselves don't carry information.
In the 1980's John Barwise and John Perry, who founded CSLI, introduced a new mathematical theory called 'situation theory'. Here's a situation. [see figure] In order for the situation s to give information about the situation r, the s has to be of a certain type S. In order to have information flow, the situation type S has to be linked to some other type R by virtue of a constraint (C).
Constraints are relations that link types, and types are what cognitive agents use to classify the world. When cognitive agents classify the world in terms of types, they can link types together cognitively. You don't need to see the fire because the flow through (C) gives you the information.
Now we have mathematical objects: types and relationships. So we have moved from the real world into the world of mathematics. And that was the goal.
One of the conclusions you get from this research is that information arises and flows because of the interaction of minds and things in the world. The things in the world are situations, objects, configurations, various kinds of systems that we build and design. The things that make those entities informational are things in the mind.
What I've tried to make clear is that information, unavoidably and crucially, involves two things: systems and minds. Good design of an information and communication technology should therefore not merely involve both the computing sciences and the human sciences, but should involve them at every stage of the process in an integrated fashion. Situation theory is a mathematical theory that allows you to do that.

(From presentation at STRL on October 19, 2001)