https://www.youtube.com/watch?v=_cAEfQQcELA&ab_channel=ToposInstitute
<aside> ๐ก Goal is to show that Shannon entropy , a measure of information, is inextricably linked to abstract algebra and topology via a version of calculusโ product/Leibniz rule.
</aside>
<aside> ๐ Information theory: Shannon and his Entropy
</aside>
<aside> โ๏ธ Abstract Algebra and Compositionality
</aside>
<aside> ๐ Topology: Sameness and Closeness
</aside>
<aside> ๐ข When in doubt, count! Counting is the first thing we do when solving most discrete mathematical problems.
</aside>
<aside> ๐ก The metaphor of frogs, wadding in the muddy pond, and birds, flying high above the landscape.
</aside>
<aside> ๐ก The birdโs eye view unifies our thinking. It allows us to discover interesting connections between disparate concepts.
</aside>
<aside> ๐ก The frogs, on the other hand, delight in the details of particular objects. They rigorously solve one problem at a time.
</aside>
<aside> ๐ก The broad vision of the birds enables us to make discoveries of unexpected connections. However, in order to make those connections precise and rigorous, we require the frogsโ attention to detail.
</aside>
<aside> ๐ Information and probability are inversely proportional. An event with high probability carries little information while an event with low probability carries a lot of information.
</aside>
To formalize this intuition, we proceed as follows
p
is small, that is when the event is unlikely, then I
is large and vice versaI
to be $I = \log \left(\frac{1}{p}\right) = \log 1 - \log p = -\log p$.The amount of information conveyed in a single event with probability
p
is the number- log(p)
<aside> ๐ In practice, we rarely encounter single probabilities, hanging out on their own. We usually encounter them in the company of other probabilities, representing the probabilities for the range of possible outcomes. Such a grouping of probabilities is what is known as a Probability Distribution
</aside>
Shannon Entropy refers to the average amount of information contained in a particular probability distribution.
<aside> ๐ Entropy is a number associated to a list of probabilities and we interpret that number as a measure of information.
</aside>
<aside> ๐ Algebra refers to the basic sense in which things come together to create something new. Closely related to this is the notion of Compositionality โ small things assemble to build a larger construction and where knowledge of this larger construction comes through understanding the individual parts along with the rules for combining them.
</aside>
Formally, an algebra is defined as a
vector space
equipped with a way to multiply vectors.
Probabilities exhibit both algebraic and topological structure and entropy interacts โvery nicelyโ with both. Furthermore, the way entropy interacts with algebra and topology is its defining characteristic.
<aside> ๐ Topology is a branch of mathematics that involves the study of shapes.
</aside>
Any Set can be equipped with an extra structure known as a topology . A topology on a set
X
declares which elements inX
are close to each other.
The pair of a set
X
, along with its topology are referred to as a topological space.Topological Space = <X, topology(X)>
<aside>
๐ Recall other words with a similar root such as topography. Intuitively, a topology on a set X
places each element in X
on some abstract map. Once on the map, we can look and see which objects are closer to each other.
</aside>
Two topological spaces are said to be the โsameโ if they can be transformed into each other while preserving their respective versions of closeness.
<aside> ๐ Each probability distribution is parameterized by a single natural number $n$. This whole number tells us how many possible outcomes a given distribution represents (which also tells us the length of the distribution)
</aside>
<aside>
๐ Every probability distribution p
has a number associated to it called entropy
. This number, denoted by H(p)
tells us the average amount of information conveyed by a given probability distribution.
</aside>
<aside> ๐ Entropy aggregates information from all possible events. It is a weighted average of the information associated with each event. In this sense, entropy is akin to temperature.
</aside>
<aside>
๐ The natural logarithm is negative whenever its input is between 0 and 1. This means that -log(p)
is always non-negative which further means that H(p)
is always non-negative too.
</aside>
<aside>
๐ Zero entropy corresponds to zero uncertainty. Maximal uncertainty corresponds to a uniform distribution which corresponds to an entropy of log(n)
</aside>
$$ n \in \N \\ p = \left(p_1, p_2, \ldots, p_n \right), \sum_{i}^{n} p_i = 1 $$
$$ H_n(p) = -p_1\log(p_1) + -p_2\log(p_2), \ldots, -p_n\log(p_n) \\
\boxed {H_n(p) = -\sum_{i = 1}^{n} p_i \log (p_i)} $$
<aside>
๐ Whenever there are n
outcomes each with equal probability 1/n
, the entropy of the resulting probability distribution $p = \left(\frac{1}{n}, \frac{1}{n}, \ldots, \frac{1}{n} \right)$ will always be log(n)
</aside>
$$ 0 \leq H_n(p) \leq \log(n) $$