A new perspective on Entropy 💖

https://www.youtube.com/watch?v=_cAEfQQcELA&ab_channel=ToposInstitute

<aside> 💡 Goal is to show that Shannon entropy , a measure of information, is inextricably linked to abstract algebra and topology via a version of calculus’ product/Leibniz rule.

</aside>

<aside> 🔑 Information theory: Shannon and his Entropy

</aside>

<aside> ☎️ Abstract Algebra and Compositionality

</aside>

<aside> 📌 Topology: Sameness and Closeness

</aside>

<aside> 📢 When in doubt, count! Counting is the first thing we do when solving most discrete mathematical problems.

</aside>

<aside> 💡 The metaphor of frogs, wadding in the muddy pond, and birds, flying high above the landscape.

</aside>

<aside> 💡 The bird’s eye view unifies our thinking. It allows us to discover interesting connections between disparate concepts.

</aside>

<aside> 💡 The frogs, on the other hand, delight in the details of particular objects. They rigorously solve one problem at a time.

</aside>

<aside> 💡 The broad vision of the birds enables us to make discoveries of unexpected connections. However, in order to make those connections precise and rigorous, we require the frogs’ attention to detail.

</aside>

Information Theory: Shannon and his Entropy

<aside> 🛠 Information and probability are inversely proportional. An event with high probability carries little information while an event with low probability carries a lot of information.

</aside>

To formalize this intuition, we proceed as follows

Suppose an event occurs with probability $p$
As a first attempt, we could define its information content to be $I = \frac{1}{p}$.
This is a re-statement of the intuition above but in the froggy language of mathematics.
When p is small, that is when the event is unlikely, then I is large and vice versa
This restatement is, however, deficient. It gives information with $p = 1$, $I = \frac{1}{1} = 1$ instead of $I = 0$ as one would expect.
We can fix this by redefining I to be $I = \log \left(\frac{1}{p}\right) = \log 1 - \log p = -\log p$.

The amount of information conveyed in a single event with probability p is the number - log(p)

<aside> 🛠 In practice, we rarely encounter single probabilities, hanging out on their own. We usually encounter them in the company of other probabilities, representing the probabilities for the range of possible outcomes. Such a grouping of probabilities is what is known as a Probability Distribution

</aside>

A probability distribution is a finite list of numbers, each between 0 and 1, whose sum is 1

Shannon Entropy refers to the average amount of information contained in a particular probability distribution.

<aside> 🛠 Entropy is a number associated to a list of probabilities and we interpret that number as a measure of information.

</aside>

Abstract Algebra: Reunion of Broken Parts

<aside> 🛠 Algebra refers to the basic sense in which things come together to create something new. Closely related to this is the notion of Compositionality — small things assemble to build a larger construction and where knowledge of this larger construction comes through understanding the individual parts along with the rules for combining them.

</aside>

Whenever things can be combined, be they numbers of cells, there is likely algebraic structure behind the scenes.
Any collection of things that can be combined — i.e a notion of multiplication makes sense, is called an algebra.
When the composition rule (the “multiplication”) possesses certain properties, the algebra is usually given a descriptive name.
Different algebraic objects have different (and sometimes multiple) notions of combining things.

Formally, an algebra is defined as a vector space equipped with a way to multiply vectors.

Probabilities exhibit both algebraic and topological structure and entropy interacts “very nicely” with both. Furthermore, the way entropy interacts with algebra and topology is its defining characteristic.

Topology: Sameness, Closeness, and Continuity

<aside> 🛠 Topology is a branch of mathematics that involves the study of shapes.

</aside>

In topology, shapes are thought of as malleable and pliable.
Sameness: Two shapes are considered the same if one can be molded and deformed into the other without ever tearing or ripping the shape.
Closeness/Distance: In mathematics, the notion of sameness depends upon the idea of closeness. Similar object are closer together. Metric Space
In topology, Continuity is the concept of mapping between object in a way that preserves closeness. Continuous Maps. Homomorphisms between topological Spaces
Topology allows us to generalize the notion of distance in settings beyond numbers.
A topology on a set is a way of declaring which elements in the set are close to one another.

Any Set can be equipped with an extra structure known as a topology . A topology on a set X declares which elements in X are close to each other.

The pair of a set X, along with its topology are referred to as a topological space. Topological Space = <X, topology(X)>

<aside> 🛠 Recall other words with a similar root such as topography. Intuitively, a topology on a set X places each element in X on some abstract map. Once on the map, we can look and see which objects are closer to each other.

</aside>

Two topological spaces are said to be the “same” if they can be transformed into each other while preserving their respective versions of closeness.

A new Perspective: Entropy is a Number

<aside> 🛠 Each probability distribution is parameterized by a single natural number $n$. This whole number tells us how many possible outcomes a given distribution represents (which also tells us the length of the distribution)

</aside>

<aside> 🛠 Every probability distribution p has a number associated to it called entropy. This number, denoted by H(p) tells us the average amount of information conveyed by a given probability distribution.

</aside>

<aside> 🛠 Entropy aggregates information from all possible events. It is a weighted average of the information associated with each event. In this sense, entropy is akin to temperature.

</aside>

<aside> 🛠 The natural logarithm is negative whenever its input is between 0 and 1. This means that -log(p) is always non-negative which further means that H(p) is always non-negative too.

</aside>

<aside> 🛠 Zero entropy corresponds to zero uncertainty. Maximal uncertainty corresponds to a uniform distribution which corresponds to an entropy of log(n)

</aside>

$$ n \in \N \\ p = \left(p_1, p_2, \ldots, p_n \right), \sum_{i}^{n} p_i = 1 $$

$$ H_n(p) = -p_1\log(p_1) + -p_2\log(p_2), \ldots, -p_n\log(p_n) \\

\boxed {H_n(p) = -\sum_{i = 1}^{n} p_i \log (p_i)} $$

<aside> 🛠 Whenever there are n outcomes each with equal probability 1/n, the entropy of the resulting probability distribution $p = \left(\frac{1}{n}, \frac{1}{n}, \ldots, \frac{1}{n} \right)$ will always be log(n)

</aside>

$$ 0 \leq H_n(p) \leq \log(n) $$