As you may know, the Pokémon type system consists of (approximately) seventeen different types, each with different comparative advantages over the others like a more elaborate version of rock-paper-scissors. In the Pokémon video games, the types and their advantages are hardcoded as laws. But if you adopt the view of someone who actually lives in the Pokémon universe, those advantages would have to be scientifically measured and discovered.

How would you empirically determine the types and their advantages? What information would you need to be able to measure, and how many measurements would you need to get a reasonably accurate picture? What would it take to be reasonably sure you had discovered all existing types and interactions? Questions like these comprise a field of study I call empirical Pokémon typing, and this article contains a few preliminary results.

# Problem #1: Relating attacking and defending types

In our first result, I'll point out a feature of the type-effectiveness system that is surprisingly hard to discover empirically. We are told in the game that attacks have types, and Pokémon species have types, and that these types come from the same list: there are Fire-type attacks and Fire-type Pokémon. If you weren't told this relationship in advance, how could you discover it empirically?

There's one particularly good way to do it: same-type attack bonus (STAB) directly shows the link between an attack type and a Pokémon type, because the Pokémon using that attack gets an additional bonus. You could set up an experiment to identify STAB and therefore associate attack types with Pokémon types. I discuss this method in the next section.

You could also use an indirect method, such as assuming that a Pokémon shares a type with most of the attacks it learns. Then you could compute the appropriate learnset statistics to guess which attack types are associated with which Pokémon. This, however, is less decisive than STAB.

It turns out, actually, that STAB is the only direct method: without STAB, there is no way to empirically associate Pokémon types with attack types. Details follow.

Here's the basic structure of an empirical Pokémon typing experiment: You start with a group of Pokémon subjects, each of which knows some attacks. You have each Pokémon use each attack against each other Pokémon, and somehow measure the resulting susceptibility (type effectiveness).

Tabulating the results, you could fill out a table of empirically-determined type effectiveness values. The rows of the table will be attacks, and the columns of the table will be Pokémon species. Duplicate rows or duplicate columns suggest that two attacks / species are of the same type.

This effectiveness table is the source of our data. Now we must make a theory of how many types exist and what their effectivenesses are. To start, let's first approach the problem in a setting with the following simplifying assumptions:

1. No dual types. Each Pokémon has exactly one type. This way, we don't need to disentangle whether we are dealing with a dual type or not.
2. Our measurements are perfect. Our data has no errors because we can perfectly measure the susceptibility of each Pokémon to each attack.
3. We can perfectly distinguish different species and types. Assume we are never confused about whether two species of Pokémon are the same, or whether two attacks are the same. (No crocodile vs alligator ambiguity)
4. No STAB. Type effectiveness is completely determined by the attack type and defending Pokémon type. In particular, there are no other factors like same-type attack bonus ("STAB"). (Same-type attack bonus amplifies the type effectiveness of an attack based on the type of attacking Pokémon.)

Under even these simplifying conditions, we have a surprising negative result:

Without same-type attack bonus (STAB), there is no principled way to identify types of attack with types of Pokémon based on susceptibility measurements alone.

In other words, using the effectiveness table, you could divide the Pokémon species into groups that seem to share the same type. And you could divide the attacks into groups that seem to share the same type. But there would be no principled way to match up the Pokémon species types with the corresponding attack types.

This means that if you weren't told that the attacks we call Fire-type (attacks that are super-effective against Grass defenders and ineffective against Water defenders) should be identified with the type of Pokémon we call Fire-type (that are vulnerable to Ground-type attacks and resistant to Ice-type attacks.), you would have no way to infer that information from the susceptibility type chart alone.

Or put another way: if I reorder the rows and columns of the ground-truth type susceptibility table, then erase the type labels, there is no principled way for you to figure out which attacking types match which defending types just by looking at the chart.

## Proof: Attack types and species types are unrelated.

This negative result can actually be viewed as a consequence of a theorem from linear algebra, which states that there is no canonical basis for the dual of a vector space. Put more colloqually, if you cannot measure the similarity of one type to another, then there's no relationship between attacking and defending types.

You can show that this result applies by showing that Pokémon type effectiveness has the structure of a linear (vector) space, as follows.

Attack effectiveness in practice is a multiplicative factor. Possible effectiveness values consist of 2x, 1x, 0.5x, and 0x. In order to apply linear algebra, which is additive instead of multiplicative, we will not deal with effectiveness values directly, but instead with (base 2) logarithms of effectiveness values; I call these susceptability values.

If there are n Pokémon types, then we can form an $$n \times n$$ effectiveness matrix $$\mathbf{S}$$ of susceptibility values. Using $$\mathbf{S}$$, you can compute how effective an attack will be against a particular Pokémon through straightforward matrix multiplication: The defending Pokémon's type(s) are encoded in a length-n column vector $$\vec{d}$$. Each entry is 1 if the Pokémon has that type, or 0 if the Pokémon does not. The product $$\mathbf{S}\cdot\vec{d}$$ is then a column vector neatly listing the Pokémon's susceptibility to each type.

If $$\vec{a}$$ is a length-n row vector describing the type of the attack, then $$\vec{a}\cdot \mathbf{S}\cdot \vec{d}$$ is a single number indicating the susceptibility of the specific Pokémon with type $$\vec{d}$$ to the attack of type $$\vec{a}$$. Next, the collection of all theoretically possible Pokémon type combinations forms an n-dimensional vector space. It's the collection of all possible $$\vec{d}$$ vectors. There's one dimension for each type, and the value of each component tells you how many copies of that type a Pokémon has. Because we'll need to distinguish attacking and defending types, we might call this the defending type space. An attacking type is a map assigning a susceptibility value to each of the n defending types. Susceptibility values add, so that a Pokémon with multiple types has the sum of the susceptibility values of the individual types---this means that an attacking type is a linear map assigning a susceptibility value to each of the n defending types. Hence the space of all theoretically possible attacking types is a space of linear functionals on defending types; it's the dual of the space of defending types.

But there is no canonical way to associate the basis of a mere vector space (such as the single types in defending type space) with a basis in the dual space (such as the n different attacking types.) It follows that, without additional structure, we have no principled way to identify defending types (a vector space) with attacking types (functionals on that space).

# Irrational factors and same-type attack bonus

As I've pointed out, same-type attack bonus (STAB) is the only direct way to identify which attacking and defending types should be identified with each other.

You can empirically determine STAB using a straightforward experimental setup like this: make an empirical susceptibility table with attacking species+move along one dimension and defending species on the other, then finding two rows that are identical except that when Pokémon A performs attack B against Pokémon C, there is a boost to susceptibility that does not occur when Pokémon A' performs that same attack against Pokémon C. This is conclusive evidence of STAB, because the difference is not in the type of attack or the type of defending Pokémon, but the type of attacking Pokémon.

But in fact, there's an even easier analytic route to identifying STAB: if we assume that STAB confers a multiplicative factor of 1.5 as it does in game, then its logarithmic susceptibility value is $$\gamma \equiv \log_2(1.5)\approx 0.5849$$. This is an irrational number, and it means that we'll get a row of irrational susceptibility values in our susceptibility table when, and only when, STAB occurs. Details follow.

In practice, if we can make some assumptions about allowed susceptibility values, there is an even easier analytic route to determining STAB which does not require more than one attacking Pokémon. The idea is that STAB yields irrational susceptibility values instead of rational ones, and so is immediately identifiable.

The possible monotype effectiveness values are [2x 1x 0.5x 0x] which correspond to susceptibility values of [1 0 -1 -∞]. We could consider alternative possible values for STAB, but in the games it confers 1.5x effectiveness, or a susceptibility of $$\gamma \equiv \log_2(1.5)\approx 0.5849$$. This susceptibility value is irrational (because if $$\gamma = p/q$$ is rational then $$2^{p/q} = 1.5$$ so $$2^{(p/q)+1} = 3$$ so $$2^{p+q} = 3^{q}$$, contradicting the unique factorization of integers.) But natural susceptibility values (those unaffected by STAB) are all integers (or infinite). Therefore, if we are told that natural susceptibility values are all integers (or infinite) and are given precise empirical susceptibility measurements, we can always identify STAB because it will produce irrational susceptibility values wherever it occurs.

In fact, if we know that the STAB component is irrational---even if we don't know its exact value---we can compute its value uniquely from the table: if the susceptiblity is an integer combination of STAB and non-STAB susceptibilities ($$m + n\gamma$$), note that these coefficients are unique; if $$m + n\gamma = m^\prime + n^\prime\gamma$$ then $$(m-m^\prime) = (n^\prime-n)\gamma$$. If $$n\neq n^\prime$$, then $$(m-m^\prime)/(n-n^\prime) = \gamma$$---but the left side is a rational number and the right side is irrational, a contradiction. Therefore $$n=n^\prime$$, so $$m=m^\prime$$, so these coefficients are unique.