https://www.youtube.com/watch?v=_cAEfQQcELA&ab_channel=ToposInstitute


<aside> ๐Ÿ’ก Goal is to show that Shannon entropy , a measure of information, is inextricably linked to abstract algebra and topology via a version of calculusโ€™ product/Leibniz rule.

</aside>

<aside> ๐Ÿ”‘ Information theory: Shannon and his Entropy

</aside>

<aside> โ˜Ž๏ธ Abstract Algebra and Compositionality

</aside>

<aside> ๐Ÿ“Œ Topology: Sameness and Closeness

</aside>

<aside> ๐Ÿ“ข When in doubt, count! Counting is the first thing we do when solving most discrete mathematical problems.

</aside>

<aside> ๐Ÿ’ก The metaphor of frogs, wadding in the muddy pond, and birds, flying high above the landscape.

</aside>

<aside> ๐Ÿ’ก The birdโ€™s eye view unifies our thinking. It allows us to discover interesting connections between disparate concepts.

</aside>

<aside> ๐Ÿ’ก The frogs, on the other hand, delight in the details of particular objects. They rigorously solve one problem at a time.

</aside>

<aside> ๐Ÿ’ก The broad vision of the birds enables us to make discoveries of unexpected connections. However, in order to make those connections precise and rigorous, we require the frogsโ€™ attention to detail.

</aside>


Information Theory: Shannon and his Entropy

<aside> ๐Ÿ›  Information and probability are inversely proportional. An event with high probability carries little information while an event with low probability carries a lot of information.

</aside>

To formalize this intuition, we proceed as follows

The amount of information conveyed in a single event with probability p is the number - log(p)

<aside> ๐Ÿ›  In practice, we rarely encounter single probabilities, hanging out on their own. We usually encounter them in the company of other probabilities, representing the probabilities for the range of possible outcomes. Such a grouping of probabilities is what is known as a Probability Distribution

</aside>

Shannon Entropy refers to the average amount of information contained in a particular probability distribution.

<aside> ๐Ÿ›  Entropy is a number associated to a list of probabilities and we interpret that number as a measure of information.

</aside>


Abstract Algebra: Reunion of Broken Parts

<aside> ๐Ÿ›  Algebra refers to the basic sense in which things come together to create something new. Closely related to this is the notion of Compositionality โ€” small things assemble to build a larger construction and where knowledge of this larger construction comes through understanding the individual parts along with the rules for combining them.

</aside>

Formally, an algebra is defined as a vector space equipped with a way to multiply vectors.

Probabilities exhibit both algebraic and topological structure and entropy interacts โ€œvery nicelyโ€ with both. Furthermore, the way entropy interacts with algebra and topology is its defining characteristic.


Topology: Sameness, Closeness, and Continuity

<aside> ๐Ÿ›  Topology is a branch of mathematics that involves the study of shapes.

</aside>

Any Set can be equipped with an extra structure known as a topology . A topology on a set X declares which elements in X are close to each other.

The pair of a set X, along with its topology are referred to as a topological space. Topological Space = <X, topology(X)>

<aside> ๐Ÿ›  Recall other words with a similar root such as topography. Intuitively, a topology on a set X places each element in X on some abstract map. Once on the map, we can look and see which objects are closer to each other.

</aside>

Two topological spaces are said to be the โ€œsameโ€ if they can be transformed into each other while preserving their respective versions of closeness.


A new Perspective: Entropy is a Number

<aside> ๐Ÿ›  Each probability distribution is parameterized by a single natural number $n$. This whole number tells us how many possible outcomes a given distribution represents (which also tells us the length of the distribution)

</aside>

<aside> ๐Ÿ›  Every probability distribution p has a number associated to it called entropy. This number, denoted by H(p) tells us the average amount of information conveyed by a given probability distribution.

</aside>

<aside> ๐Ÿ›  Entropy aggregates information from all possible events. It is a weighted average of the information associated with each event. In this sense, entropy is akin to temperature.

</aside>

<aside> ๐Ÿ›  The natural logarithm is negative whenever its input is between 0 and 1. This means that -log(p) is always non-negative which further means that H(p) is always non-negative too.

</aside>

<aside> ๐Ÿ›  Zero entropy corresponds to zero uncertainty. Maximal uncertainty corresponds to a uniform distribution which corresponds to an entropy of log(n)

</aside>

$$ n \in \N \\ p = \left(p_1, p_2, \ldots, p_n \right), \sum_{i}^{n} p_i = 1 $$

$$ H_n(p) = -p_1\log(p_1) + -p_2\log(p_2), \ldots, -p_n\log(p_n) \\

\boxed {H_n(p) = -\sum_{i = 1}^{n} p_i \log (p_i)} $$

<aside> ๐Ÿ›  Whenever there are n outcomes each with equal probability 1/n, the entropy of the resulting probability distribution $p = \left(\frac{1}{n}, \frac{1}{n}, \ldots, \frac{1}{n} \right)$ will always be log(n)

</aside>

$$ 0 \leq H_n(p) \leq \log(n) $$