Information is the "difference that makes a difference." Gregory Bateson. 
This catchy phrase contains a truth about information: it is measured and used (or processed.) 

A wholly new set of convergences occurred around the term of entropy when it was adopted by Claude Shannon, on the advice of John von Neumann, in the context of information. Shannon was concerned with transmitting signals down wires. He brilliantly thought of the minimal signal as a" yes" or "no" answer, hence representable as the binary 1 or 0, now called a "bit." He considered the entropy of a source sending a prospective signal as the set of possible signals that might be sent, where each message was to be weighted by the probability of actually being sent and used the same mathematics as Bolzmann. 

Shannon conceived of receiving a message as reducing the entropy, or uncertainty, about which message was actually sent, given the intitial set of possible messages. Shannon defined information as the negative of the quantity formally identified with thermodynamic entropy.

But information can only be received where there is doubt, and doubt implies the existence of alternatives. The unexpected seems to contain more information than the expected. Learning that the next term in the series 2,4,6, 8 gives you no information once you know the rule. (William Bateson found that a loss of information is accompanied by an increase in symmetry.) Learning the the next term in the series 11, 37, 191, 48 would give you much more information. 

But the extreme case of this unexpected information is a set of random numbers, where it is impossible to build up expectations. (A definition of randomness is that there are no shortcuts.) (is this also the definition of noise?) Thus information requires redundancy as well. 

The links between entropy and information are that they are statistical theories. The units of information are the amount transmitted as a single decision between equally probable alternatives. This unlikely association of maximum information with randomness would make it conceptually possible to conceive of chaos as information. Sensitivity to intitial conditions is an indication of error propagation. It can also be understood as the generation of information: there were two distinct starting points even if we couldn't distinguish between them, and running the system has generated that information. 

As Jacques Lacan sourly put it, "The Bell Telephone Company needed to economize, that is to say, to pass the greatest possible number of communications down one wire. In a country as vast as the US, it is very important to save on a few wires, and to get the inanities which generally travel by this kind of transmission apparatus to pass down the smallest possible number of wires. That is where the quantification of information started...It had nothing to do with knowing whether what people tell each other makes sense."Information is also not necessarily tied to language. In fact, a statistical conversion of language needs to take place before information content can be measured.

"Commerce is the ocean that information swims in." (Mondo 2000: A User's Guide to the New Edge) In the Informational City 

Manuel Castells describes the rise of the informational paradigm arising from the discoveries of the transistor (1947), the integrated circuit (1957), the planar process (?) 1959, and the microprocessor (1971). As a result of these discoveries, computers were able to revolutionize information processing, telecommunications became the basis for forming informational networks, and information became both the raw material and the product of a set of new technologies, including biotechnologies. The new technical paradigm transformed the processes of production, just as previous industrial revolutions had been organized around the steam engine and later electricity.