I love the language of Category theory. It allows us to talk about many different contexts using the same language, and can be helpful in discovering what seemingly unrelated phenomena have in common, by pointing out that they are all manifestations of the same categorical concept, e.g. functoriality, adjointness, or some universal property. Furthermore, it allows us to speak of Things (groups, rings, modules, topological spaces) without speaking of the smaller things that make up the Things, which sometimes shows a forest previously hidden by the trees. The concept of tensor product of modules, which I’ll be going over, is one of those: its “rough”, “pointwise” definition looks random and weird, but the universal property that defines it is simple and reasonable.
However, this great generality and disregard for nitty gritty details is also the cause for some of my problems with Algebra as of late: it’s very easy to use the language of category theory to make intuitive properties look (to me) uninteresting or undistinguished. To say that so-and-so is left-exact, right-exact, natural… It all looks neat and tidy in diagrams, but I find it very easy to lose track of what functors do what, and I feel like these properties give me no insight. Tools for proof, yes, but then we begin to enter abstract nonsense territory, in which a lot of words are said but little meaning is grasped.
In this post, we will look at the exactness properties of $\Hom$, the tensor product, and attempt to grasp what it means for a module to be flat. It’s not perfect (anyone know what’s so “flat” about flat modules? I don’t. If you know please tell me), but it’s a lot better than nothing, I think.
This post assumes that the reader is already more or less familiar with the notion of module and tensor product. Of course, they need not have mastered it – what would be the point of a post explaining $X$ to someone who already knows it? – but I’m not going to define what a ring is, and you won’t catch me doing routine verifications.
Exactness Properties of $\Hom$
Let us start where an algebra book would start: with the rather simple (bi)functor which, given two $R$-modules, $M$ and $N$, returns the set ($R$-module, in fact) of homomorphisms $M \to N$. This is actually a slightly more complicated object than a functor, because it takes two arguments rather than one, so we actually work with two “particular cases”. Given an $R$-module $N$, we define the two functors (covariant and contravariant, respectively)
\[\Hom(N, \blank) \text{, and } \Hom(\blank, N).\]
This notation makes clear what the functor means when applied to an $R$-module, but I personally dislike the notation $\Hom(N, f)$, where $f$ is a homomorphism. I prefer to use the following notation, which is inspired by point-free programming in Haskell: if $f \colon A \to B$ is an $R$-module homomorphism, we consider the functions
\[\Hom(N,A) \xrightarrow{f \circ} \Hom(N,B), \quad \Hom(B,N) \xrightarrow{\circ f} \Hom(A,N),\]
which do what they say on the tin: $(f \circ)(g) = f \circ g$ and $(\circ f)(g) = g \circ f$. While I find this notation to be very descriptive, it does require some care, especially when expressions such as $g \circ (\circ f)$ and $(h \circ (f \circ))(g)$ start to crop up.
Contravariant Left-Exactness
Anyhow, exactness properties of $\Hom$ right? Instead of starting there, I’ll start with something the reader should be familiar with: the universal property of a quotient. If $A$ and $B$ are modules, with $A$ a submodule of $B$, we can define the module $B / A$, with associated projection $\pi$. The fundamental property of this quotient is that, if we define a function $g \colon B \to N$ which is null on all of $A$, we induce a function $\tilde g \colon B/A \to N$. This is a bijection, in fact, as from any such $\tilde g$, a $g$ can be recovered as $g = \tilde g \circ \pi$. We leave it to the reader to do the details (they’re not many) and conclude that there is a one-to-one correspondence:
\[\Hom(B/A, N) \leftrightarrow \{\, g \colon B \to N \text{ such that } g(A) = 0\,\}.\]
A one-to-one correspondence is a bijection. While we’re at it, we can grasp additional structure: since this correspondence behaves well with regard to sum and product by scalars, we get an isomorphism of $R$-modules.
Can we describe the module on the right a little bit more concisely? Sure we can. To say that $g \colon B \to N$ makes $g(A) = 0$ is the same as to say that $g |_A$ is the zero function. In other words, if $i : A \to B$ is the inclusion function, $g \circ i = 0$, and so we can write the above correspondence as
\[\Hom(B/A, N) \cong\ker(\circ i)\]
and this condition is usually expressed by saying that the following diagram is exact:
\[0 \rightarrow \Hom(B/A, N) \xrightarrow{\circ \pi} \Hom(B,N) \xrightarrow{\circ i} \Hom(A, N).\]
So we have just concluded a (weaker) version of the exactness property of $\Hom(\blank, N)$: given an exact sequence
0 \rightarrow A \overset i {\hookrightarrow} B \xrightarrow \pi C \rightarrow 0,
the following sequence is exact
0 \rightarrow \Hom(C, N) \xrightarrow{\circ \pi} \Hom(B,N) \xrightarrow{\circ i} \Hom(A, N).
This can be understood in natural language as: the set of functions leaving the quotient is “kind of” the set of functions leaving $B$ which, when restricted to $A$, equal zero. In other words, the kernel of composition with $i$.
Of course, now one could dig a little bit deeper. It can be shown without much difficulty that the assumption that $A$ is a subset of $B$ is unnecessary, so $i$ could be any homomorphism, and in fact it is even unnecessary that it is injective ($C$ becomes simply (isomorphic to) $B/\image i$), and so we can replace the hypothesis \eqref{eq1} by the weaker version
\[A \xrightarrow i B \xrightarrow \pi C \rightarrow 0.\]
Also, one might ask why we don’t also have right-exactness in \eqref{eq2}. To say that would be to say that any function $A \to N$ is the restriction of a function $B \to N$, or in other words that any function $A \to N$ can be extended to a function $B \to N$. If this happens, we say that $N$ is an injective module, but this is in general not true.
To find a counterexample, we want to imagine a function $f \colon A \to N$ which is well-defined and consistent, but would stop “working” if one added a “wrong element” of $B$. For example, it might be the case that for some $a$, $f(a) \neq 0$, and this causes no problem in $A$. But then in $B$ there exists some $b$ such that $r b = a$, and so taking $f$ on both sides (assuming it is extendible to $B$) one gets $r f(b) = f(a) \neq 0$. But then, if we arrange things in order to make $r f(b) = 0$ regardless of the choice of $f(b)$, we reach a contradiction.
We can now find a specific example of such a scenario. The idea that $r f(b) = 0$ no matter the choice of $f(b)$ is easy enough to arrange: simply let $N = R/\!\braket r$, for example. For definiteness, let us set $R = \Z$ (the most vanilla ring out there, when it’s not being too nice) and $N = \Z/2$ (because $2$ is the smallest nontrivial number). As for our choice of $A$ and $B$, might as well set $A = R = \Z$, with $f$ as the projection $\Z \to \Z/2$, and the most natural setting for us to put $\Z$ “inside of” that allows us to divide by two is $B = \Q$. And so we conclude our example:
Let $\pi : A \to N$ be the projection $\Z \to \Z/2$. We wish to show that $\pi$ cannot be extended as a function $p : \Q \to \Z/2$. Indeed, if this were possible, we would get $1 = \pi(1) = p(1) = 2 p(1/2) = 0$ (modulo $2$). This proves that, in general, we cannot expect \eqref{eq2} to be right-exact.
Covariant Left-Exactness
For our next magic trick, we will look at a dual notion of what we have done so far. What we have done was simply use the fact: a function leaving the quotient $B/A$ is the same as a function leaving $B$ which is null all over $A$. In order to reach a similar exactness property for the covariant $\Hom$ functor, we need to have a similar statement, but with functions going to some space.
Since there is a very strong correlation between exact sequences and quotients, we can try again to look at a pair of modules $A \subseteq B$. Now, we wish to relate functions going from $N$ to $A$, $N$ to $B$, and $N$ to $B/A$. Before, we answered the question: when can a function $B \to N$ be seen as a function $B/A \to N$? Now, instead, we ask: when can a function $N \to B$ be seen as a function… $N \to B/A$? Or would $N \to A$ be a better choice? Surprisingly enough, whichever choice leads to the same conclusion.
$N \to B/A$
It is trivial to turn a function $N \to B$ into a function $N \to B/A$: simply compose with the projection. In other words, we get the arrow
\[\Hom(N, B) \xrightarrow{\pi \circ} \Hom(N, B/A).\]
Problem: this is neither injective nor surjective. I don’t know about you, but I can’t see any way to easily investigate surjectivity. Injectivity, however, (or the lack thereof) is easy: the kernel of $\pi \circ$ is precisely those functions $f \colon N \to B$ such that $f(N) \subseteq A$. Why, this is precisely the set we call $\Hom(N, A)$! Therefore, the following sequence is exact
\[0 \rightarrow \Hom(N,A) \hookrightarrow \Hom(N,B) \xrightarrow{\pi\circ} \Hom(N, B/A).\]
$N \to A$
Any function from $N$ to $A$ can be seen as a function from $N$ to $B$. In other words, $\Hom(N,A) \hookrightarrow \Hom(N, B)$. Is there an easy characterization of which functions $N \to B$ are actually $N \to A$? Since we’re on the subject of short exact sequences, we might try to express it in terms of $\pi : B \to B/A$, and easily conclude that $f \in \Hom(N,B)$ is in $\Hom(N,A)$ if and only if $\pi(f(n)) = 0$ for all $n$, or $\pi \circ f = 0$. Or, more suggestively, if $f \in \ker(\pi \circ)$. Therefore, we again conclude the exactness of
\[0 \rightarrow \Hom(N,A) \hookrightarrow \Hom(N,B) \xrightarrow{\pi\circ} \Hom(N, B/A).\]
You might object that these two “different paths” are actually the same, just worded slightly different. Yeah. They are. But the point I’m trying to make is the inevitability of the idea. As long as I throw the words “relate these $\Hom$s”, “projection” and “kernel” into the same cooking pot, I’ll end up with the same theorem.
As in the contravariant case, it is unreasonable to expect right-exactness. In this case, the property that $N$ makes $\Hom(N,\blank)$ right-exact is called projectiveness, and the easiest way to find a counterexample is to find modules where functions don’t exist. For example, the only ($\Z$-module) homomorphism from $\Z/2$ to $\Z$ is the null homomorphism, so $\Z/2$ is a good candidate for $N$ and $\Z$ a good candidate for $B$. Now, in order to make sure that $\Hom(N,B/A)$ has more than one element, we need only make $A = 2\Z$. This shows that $\Z/2$ is not projective as a $\Z$-module.
A Look at the Tensor Product
The Tensor Product is an object that took me a while to grasp, and I still feel like I don’t understand it completely. For modules there are, in my view, three big definitions/points of view, each with advantages and disadvantages:
- The Universal Property
- Generators and Relations
- “Bilinear Function Data”
The Universal Property
This is, for most intents and purposes, The Right Definition. What it tells us is that the tensor product turns questions of bilinearity into questions of linearity. In other words, the Tensor Product $A \otimes B$ of two modules $A$ and $B$ is the unique module (up to isomorphism) such that there exists a nice one-to-one correspondence between bilinear functions leaving $A \times B$ and linear functions leaving $A \otimes B$. The reader should already be familiar with this correspondence: given a function $f \colon A \otimes B \to N$, one turns it into a bilinear function as $\tilde f(a,b) := f(a \otimes b)$.
This definition is generally the cleanest and most useful to prove things. It can provide a lot of insight in some proofs, but mostly for proofs “around” the tensor product: it gives notoriously little information as to what kind of “stuff” lives in the tensor product. When is $a \otimes b$ equal or not to $a’ \otimes b’$? Beats me.
Generators and Relations
The main problem of the definition via the universal property is that it is nonconstructive: a priori there is no reason to believe that there exists a “linear object which encodes bilinearity”. As we know from logic, the most straightforward way to show that there exists an object satisfying so-and-so axioms is to construct it.
The idea is that we’ll make sure that there exists a function $\otimes : A \times B \to A \otimes B$, and we will ensure that it is bilinear… And that’s basically it. The resulting module is the tensor product we have been looking for.
More specifically, we will use the method of generators and relations. We consider a module with generators of the form $a \otimes b$, and we add just enough relations to ensure that the function $\otimes$ is bilinear, i.e.
\[A \otimes B = \left\langle a \otimes b, a \in A, b \in B \,\middle|\,
&(a+a’)\otimes b = a \otimes b + a’ \otimes b, &&a \otimes (b + b’) = a \otimes b + a \otimes b’,\\
&(r a) \otimes b = r (a \otimes b), &&a \otimes (r b) = r(a \otimes b),\\
&\text{ for all $a, a’ \in A$, $b, b’ \in B$ and $r \in R$.}
\end{aligned} \right\rangle\]
This is a lot of work to write, a lot of work to read, and doesn’t really add that much information, so I’ll just start writing the word $\mathrm{bilinearity}$ to mean that whole set of relations.
If you are not familiar with defining modules by generators and relations, here’s how it goes: you have a bunch of generators, which we usually collect in a set we call $G$. The free module over these generators, written as $\braket G$ or sometimes something like $\bigoplus_{g \in G} R$ or $R^{\oplus G}$, is simply the set of (formal) sums of elements of $G$. Then, we can add relations of the form, say, $r_1 g_1 + \dots + r_n g_n = s_1 g’_1 + \dots + s_m g’_m$. To make the formalism easier to handle in theoretical terms, we pretend that we’re actually writing relations of the form $r_1 g_1 + \dots + r_n g_n = 0$ (just subtract the right-hand side from both sides), and if we write $\rho$ to mean an expression of the form $r_1 g_1 + \dots + r_n g_n$, we can consider a module which we write as
\[\braket{G \mid \rho = 0, \rho = 0, \dots},\]
where the different $\rho$s represent different linear combinations of the generators. What we mean when we write this expression is actually the free module $\braket G$, quotiented by the submodule generated by all the things we wish to identify with zero.
This is the rigorous definition of the whole ordeal. The way to work with it is as follows: when meddling with elements of $\braket{G \mid A}$, where $A$ is a set of relations, we work with linear combinations of elements of $G$, with the understanding that two linear combinations are equal if and only if we can show that they are equal in a finite amount of steps, using only the laws of $R$-modules and the relations in $A$.
The good news: this method gives us easy ways to show that two elements of the tensor product are equal (get from one to the other using the bilinearity relations), and more crucially, it gives us a way to use the hypothesis that two elements of the tensor product are equal. For example, if we know that a sum $\sum a_i \otimes b_i$ is null, we know that there exists a finite-time process, using only the laws of bilinearity at each step, that reduces this expression to zero. We will do an example later.
The bad news: Using the assumption that two things are equal in this manner is very ugly. It’s ugly to write, and it lends itself to ugly arguments. Even if it’s potentially a useful tool to have in your toolbelt, any proof you write using it is going to come out ugly.
As further bad news, even though it is easy to show that two things are equal in the generators-and-relations point of view, to show that two things are different is horrible at best. Indeed, the situation is so ugly that there exists a theorem which shows that, for some (finite!) presentations of groups, there exists no algorithm which is guaranteed to distinguish different expressions in finite time.
As a last remark on this point of view, if you’re not familiar with it, here is how to show that the module $A \otimes B$ represented above satisfies the universal property of the tensor product. It is easy to show (using the universal properties of the sum of modules and the quotient) that to define a module homomorphism $f$ leaving something of the form $\braket{G \mid A}$ is the same as to define the image $f(g)$ of all $g \in G$, as long as their images satisfy the relations in $A$. In other words, for all equations of the form $\sum r_i g_i = \sum s_i g’_i$ in $A$, we need to ensure that $\sum r_i f(g_i) = \sum s_i f(g’_i)$. However, as long as these equations are all ensured, we get a well-defined and unique $f$.
In the case of our tensor product, the generators are (effectively) pairs $(a,b)$, which we write as $a \otimes b$. So a function $f$ leaving $A \otimes B$ is the same as a function $g$ defined on $A \times B$, which satisfies certain relations… Which are precisely the axioms stating that $g$ is bilinear. In other words, defining a function $f$ leaving $A \otimes B$ is precisely the same as defining a bilinear function $g$ leaving $A \times B$, and the relation between these two is that $f(a \otimes b) = g(a,b)$. This is exactly the universal property of the tensor product.
Bilinear Function Data
This approach (which I’ve never seen anywhere else, though I can’t possibly claim originality since it is a rather simple idea) is somewhat of an intermediate between the previous two approaches, combining the specificity of the generators and relations methods, but with some of the “True Meaning of Tensor Product” in the mix as in the universal property approach.
The idea is to consider the tensor product of $A$ and $B$ as “the set of data that can be plugged into bilinear functions”.
As an example, suppose that $a \in A$ and $b \in B$. Then, if $f$ is a bilinear function leaving $A \times B$, we could certainly feed it the pair $(a,b)$. So $a \otimes b$ is now considered an element of the tensor product.
If $a’$ is another element of $A$ and $b’$ another of $B$, we can now consider feeding $f$ the “formal sum” $(a,b) + (a’,b’)$. This isn’t a legitimate thing to give $f$, as it accepts pairs, not formal sums of pairs. But it is data that can be fed into $f$, just not directly: we have to specify that we are evaluating $f$ on each pair separately, and afterwards adding all.
This can be formalized by considering a function $\tilde f$ defined on $\braket{A \times B}$, defined by $\tilde f(a,b) = f(a,b)$. Now, we can talk about all formal sums and the result of “feeding them to $f$”.
The problem with this construction is that it contains a lot of redundant elements, which could be simplified. For example, even though $(a,b) + (a’, b)$ and $(a+a’,b)$ are “different data”, they yield the same result when fed to any bilinear function. Therefore, we may wish to consider them the same data.
There are now two ways to look at the situation. One of them is what we’ve already seen: identify “bilinear data” using the bilinear relations. Another equivalent but interesting way is as follows: we identify two sets of bilinear data if they yield the same result for all bilinear functions.
A more mathematical way to express this definition of tensor product is as follows: we define $A \otimes B$ as the quotient of $\braket{A \times B}$ by the relation that $s \sim t$ iff $\tilde f(s) = \tilde f(t)$ for all bilinear functions $f$ leaving $A \times B$.
It is necessary to do a bit of busywork to show that this is a well-defined module. You need to show that this is an equivalence relation and that the module operations are well-defined on the equivalence classes. Alternatively, you show that the set $Z = \{\,s \in \braket{A \otimes B} \mid \tilde f(s) = 0 \text{, for all $f$}\,\}$ is a submodule of $\braket{A \times B}$, and define $A \otimes B$ as $\braket{A \times B} / Z$. However, once these verifications are done, you have a cute criterion for checking if two elements of the tensor product are equal. To show that they are, prove that they yield the same when fed to any bilinear function. But more useful yet, to show that they are different, find a bilinear function that gives different results to both. This is great, because this is something you can actually do! We will give examples soon.
Two remarks are in order. First of all, we need to check that this notion coincides with the usual definition of tensor product. Second, it is worth mentioning that this way to prove that two elements of the tensor product differ is already imbued in the usual definition of tensor product, and you might already have used it or be familiar with it.
So, why is this definition equivalent to the other two? We will present two different proofs.
Using the Universal Property
Let $f$ be a bilinear function $A \times B \to N$. Then, it induces a function $\tilde f : \braket{A \times B}\to N$, which in turn induces a function in the quotient $\hat f \colon \braket{A \times B}/Z \to N$ (why? Recall the definition of $Z$). This correspondence is one-to-one, as $f$ can be recovered by $f(a,b) = \hat f(a \otimes b)$, where $a \otimes b$ is defined as the equivalence class of $(a,b) \in \braket{A \times B}$.
Using Relations
We’ve already defined the tensor product as a quotient of $\braket{A \times B}$, except instead of taking the quotient by $Z$ we took the quotient by “the bilinear kernel”, i.e. the submodule generated by $(a,b) + (a’,b) – (a+a’, b)$, etc. Therefore, a way to show that these two notions coincide is to show that these two submodules of $\braket{A \times B}$ coincide.
It is trivial to check that every element of the bilinear kernel is in $Z$; it is the very definition of bilinear function. The tricky part is that $Z$ contains no more than the bilinear kernel. The only way I know to do this is kind of a cheat. You see, as we’ve said before (in slightly different words), an effective way to show that an element $s$ of $\braket{A \times B}$ is not in $Z$ is to find a bilinear function $A \times B \to N$ which does not vanish on $s$. And so, the function we pick is precisely the tensor $\otimes : A \times B \to A \otimes B$, where the tensor product $A \otimes B$ is considered in the sense of $\braket{A \times B}/\{\text{bilinear kernel}\}$. With this function, we prove that any $s$ not in the bilinear kernel is also not in $Z$, which shows that they coincide.
Why This Point of View Adds Nothing New (in theory)
What we mean by this title is that this method of proving that two elements of the tensor product are distinct was already present in the characterization by the universal property, just hidden. Indeed, if we wish to show that an expression of the form $s = \sum_i a_i \otimes b_i$ is different from zero (for simplicity), the universal property could always have been used for this purpose, for if we find a bilinear function $f \colon A \times B \to N$ such that $\sum f(a_i, b_i) \neq 0$, then the universal property induces $\tilde f \colon A \otimes B \to N$ such that $\tilde f(s) = \sum_i \tilde f(a_i \otimes b_i) \neq 0$, and so $s$ must have been nonzero to begin with.
However, even though the new point of view has not created a new proof method, it has put an important idea in the center of attention, and I think that’s a valuable thing. Even though it was always available, this method of proof is not an obvious a priori consequence of the universal property to someone who has never thought of it in this manner.
Exactness of Tensor
We are now prepared to examine the exactness of the tensor product, as we have for the $\Hom$ functor. As in the $\Hom$ case, the tensor product is not actually a functor, but a bifunctor, as it receives two arguments. Thus, like with $\Hom$, we could look at the two “particular cases” of $\blank \otimes N$ and $N \otimes \blank$. However, unlike in the case of $\Hom$, these two are effectively the same thing; in the language of category theory, we say that these two functors are naturally isomorphic (for each $N$) by the isomorphism given by the swap function $A \otimes N \to N \otimes A$. Consequently, when inspecting the functor given by tensor product, we consider only it as only the functor “tensor by $N$”, and either decide or one specific side or leave it ambiguous as it does not matter. In what follows, I will consider the functor $\blank \otimes N$, or sometimes just $\otimes N$.
What the functor $\otimes N$ does on objects (modules) is obvious to the eye: given an $R$-module $A$, it returns the $R$-module $A \otimes N$. So what does it do on functions?
Given a function $f \colon A \to B$, there is a way that jumps to mind to make a function $\tilde f \colon A \otimes N \to B \otimes N$: just “apply $f$ to the first coordinate and leave the second unchanged”. Some denote this as $f \otimes N$, but I will denote it by $f \otimes \id$, as it is a particular case of a general phenomenon: Given $f \colon A \to B$ and $g \colon C \to D$, there exists a function
f \otimes g \colon A \otimes C &\to B \otimes D\\
a \otimes c & \mapsto f(a) \otimes g(c).
Let us then begin to investigate the exactness of the tensor product. We begin with a simple statement: if $g \colon B \to C$ is surjective, so is $g \otimes \id \colon B\otimes N \to C \otimes N$. This is a triviality: we know that $C \otimes N$ is generated by elements of the form $c \otimes n$, so to show that $g \otimes \id$ is surjective it suffices to show that all such elements are in its image. To do so, write $c = g(b)$ for some $b$ and clearly $c \otimes n = (g \otimes \id)(b \otimes n)$. So we have shown that the tensor product preserves surjectivity.
Let us now try to investigate if the tensor product preserves injectivity. This is a much trickier subject, because the identifications that go on in the tensor product are complicated.
To investigate the injectivity of a map $f \colon B \otimes N \to C \otimes N$ is to ask the question: if $f(\sum b_i \otimes n_i) = 0$, can I guarantee that $\sum b_i \otimes n_i = 0$? In our particular case, $f = g \otimes \id$, so that we know
\[\sum g(b_i) \otimes n_i = 0.\]
Let’s see what our three points of view on the tensor product have to say about this:
- Universal Property: As before remarked, the universal property of the tensor product seldom has anything to say on specific elements of the tensor. It is possible to use it to prove exactness properties of the tensor, and we will do it later, but through inspection of specific elements is not that way.
- Generators and Relations: This alternative gives us a way to handle the hypothesis that $\sum g(b_i) \otimes n_i = 0$, in the sense that there exists a process to reduce this expression to zero using only the bilinear relations. It also gives us a way to prove that $\sum b_i \otimes n_i = 0$: likewise, reduce it to zero using only the bilinear relations. The obvious proof idea is to use the process that shows $\sum g(b_i) \otimes n_i = 0$ and “translate it backwards” somehow, in order to create a process that shows $\sum b_i \otimes n_i = 0$.
- Bilinear Data: This alternative also gives us a way forward. The hypothesis corresponds to assuming that, given any bilinear function $h \colon C \times N \to M$, the sum $\sum h(g(b_i), n_i)$ is null. There are two candidate ways forward. We could try to show a similar condition for things of the form $\sum j(b_i, n_i)$, for $j \colon B \times N \to M$, but that would require “pushing $j$ forward somehow”, writing it as something of the form $j(b,n) = \tilde\jmath(g(b),n)$ for some $\tilde\jmath$. An easier, more direct way, would be to simply arrange a bilinear $h$ that makes $h(g(b), n) = b \otimes n$, which immediately shows that $\sum b_i \otimes n_i = \sum h(g(b_i), n_i) = 0$.
Let us take a look at the Generators and Relations method, which will show that surjectivity of $g$ turns out to be crucial.
To write the details in full generality is difficult and obfuscating, so we will do a kind of specific example. Suppose that you manage to show that $g(b_1) \otimes n_1 + g(b_2) \otimes n_2 = 0$ through a computation of the following sort:
g(b_1) \otimes n_1 + g(b_2) \otimes n_2 &= c_1 \otimes n_1 + c_2 \otimes n_2\\
&= (r c_3) \otimes n_1 + (s c_3 + t c_4) \otimes n_2\\
&= c_3 \otimes (r n_1) + (s c_3) \otimes n_2 + (t c_4) \otimes n_2\\
&= c_3 \otimes (r n_1) + c_3 \otimes (s n_2) + c_4 \otimes (t n_2)\\
&= c_3 \otimes \cancel{(r n_1 + s n_2)} + c_4 \otimes (u n_3)\\
&= 0 + \cancel{(u c_4)}\otimes n_3 = 0.
In this computation, some steps correspond to hypotheses we’re making for the sake of computation, e.g. that $c_2 = s c_3 + t c_4$, or that $u c_4 = 0$.
The most important property of this kind of computations is that sometimes, auxiliary variables are necessary. For example, to show that $1 \otimes 1 \in \Q \otimes \Z/2$ is null, it is necessary to write $1 = 2 \frac12$, and then “pass 2 to the other side of the tensor”. It is these auxiliary variables that force the hypothesis that $g$ is surjective on us, so that we can write, for example, $c_3 = g(b_3)$ and $c_4 = g(b_4)$, and try to reproduce the same computation, with $b$’s instead of $c$’s. In other words, we want to write the following computation (it’s the same but with $b_i$ instead of $c_i$)
b_1 \otimes n_1 + b_2 \otimes n_2 &= (r b_3) \otimes n_1 + (s b_3 + t b_4) \otimes n_2\\
&= b_3 \otimes (r n_1) + (s b_3) \otimes n_2 + (t b_4) \otimes n_2\\
&= b_3 \otimes (r n_1) + b_3 \otimes (s n_2) + b_4 \otimes (t n_2)\\
&= b_3 \otimes \cancel{(r n_1 + s n_2)} + b_4 \otimes (u n_3)\\
&= 0 + \cancel{(u b_4)}\otimes n_3 = 0.
For this to work is where injectivity comes in. For example, from $c_2 = s c_3 + t c_4$ we obtain $g(b_2) = g(s b_3 + t b_4)$, and by injectivity $b_2 = s b_3 + t b_4$, and so all the steps follow through as they should.
The unfortunate fact is that we have just proven a triviality. We have shown that, if $g$ is injective and surjective, then $g \otimes \id$ is injective… But this was always obvious, because $g$ is an isomorphism, and therefore (check) so is $g \otimes \id$.
However, the argument is not beyond repair. Surjectivity is essential if we are to have any hope of “passing a computation from $C \otimes N$ backwards to $B \otimes N$”. However, instead of considering $g$ to be injective, we could instead examine the relation between the kernel of $g \otimes \id$ and the kernel of $g$. In other words, measure how much tensoring with the identity increases the failure to be injective.
The previous argument, of “pulling back the computation”, works almost flawlessly… Except for the part were we required that $g$ be injective. Now, instead of $c_2 = s c_3 + t c_4$ implying that $b_2 = s b_3 + t b_4$, we need to add a kernel term, i.e. $b_2 = s b_3 + t b_4 + K$, where $K \in \ker g$. We will henceforth use the letter $K$ as a catchall term for elements on $\ker g$.
In conclusion, if $g(b_1) \otimes n_1 + g(b_2) \otimes n_2 = 0$, we can pull back the computation that shows this in order to obtain something of the sort
b_1 \otimes n_1 + b_2 \otimes n_2 &= (r b_3) \otimes n_1 + K \otimes n_1 + (s b_3 + t b_4) \otimes n_2 + K \otimes n_2\\
&= b_3 \otimes (r n_1) + (s b_3) \otimes n_2 + (t b_4) \otimes n_2 + K \otimes n_1 + K \otimes n_2\\
&= b_3 \otimes (r n_1) + b_3 \otimes (s n_2) + b_4 \otimes (t n_2) + K \otimes n_1 + K \otimes n_2\\
&= b_3 \otimes \cancel{(r n_1 + s n_2)} + b_4 \otimes (u n_3) + K \otimes n_1 + K \otimes n_2\\
&= 0 + \cancel{(u b_4)}\otimes n_3 + K \otimes n_1 + K \otimes n_2\\
&= K \otimes n_1 + K \otimes n_2.
We could not show that $b_1 \otimes n_1 + b_2 \otimes n_2$ was null (as to be expected; if $g$ isn’t injective there is no reason to expect that $g \otimes \id$ would be), but we did manage to show that it is of the form $k_1 \otimes n_1 + k_2 \otimes n_2$, for $k_1, k_2 \in \ker g$. This suggests that any element in the kernel of $g \otimes \id$ can be written as a sum $\sum k_i \otimes n_i$. Since every element of this form is in the kernel of $g \otimes \id$, our hopeful conclusion is the following relation between the kernel of $g$ and the kernel of $g \otimes \id$:
\[\ker (g \otimes \id) = \text{“$(\ker g) \otimes N$”} \subseteq B \otimes N.\]
It is necessary to be careful with notation here, and distinguish the module $(\ker g) \otimes N$ from the subset of $B \otimes N$ generated by elements of the form $K \otimes n$, $K \in \ker g$, $n \in \ker N$, as the latter subset might have more identifications, which come from “passages through $B$”. In other words, if $i \colon \ker g \to B$ is the inclusion, the map $i \otimes \id \colon \ker g \otimes N \to B \otimes N$ might not be injective.
With these remarks in mind, the exactness property of the tensor product we seek to prove is the following. If $g$ is a surjective map, then so is $g \otimes \id$, and its kernel in $B \otimes N$ is given by (the image under inclusion of) $\ker g \otimes N$. In other words, the following sequence is exact
\[\ker g \otimes N \xrightarrow{i \otimes \id} B \otimes N \xrightarrow{g \otimes \id} C \otimes N \rightarrow 0.\]
We will now present two different ways to prove exactness of this sequence, starting from the exactness of
\[0 \rightarrow \ker g \xrightarrow{i} B \xrightarrow{g} C \rightarrow 0.\]
We could also prove this in a third way, which is to “generalize the above example”, using generators and relations. But I wouldn’t recommend it.
Bilinear Data
Now that we have a working conjecture, it is not too difficult to verify it using the method of bilinear data. This is a proof that lies somewhere between the harsh but visible “concreteness” of the method of generators and relations, and the abstract and clean proof using only the universal property.
We intend to show that if $\sum g(b_i) \otimes n_i = 0$ then $\sum b_i \otimes n_i \in i(\ker g \otimes N)$. Our hypothesis tells us that, for any bilinear function $h \colon C \times N \to M$, $\sum h(g(b_i), n_i) = 0$. We will simply construct an appropriate $h$, from $C \times N$ to $B \otimes N / i(\ker g \otimes N)$. In this space, zero-ness of equivalence classes of things guarantees that said things are in $i(\ker g \otimes N)$.
Another way to motivate the codomain of this $h$ is as follows. We wish to make $h(g(b), n) = b \otimes n \in B \otimes N$. However, this function is not well-defined. This is a valid attempted definition, as all elements of $C$ can be written as $g(b)$ for some $b$. The problem is that this $b$ is not well-defined, and so the expression $b \otimes n$ could take several different values. The easy solution is simply to take a quotient in order to ensure that this is well-defined. Since the possible variations in $b$ satisfying $g(b) = c$ are changing $b$ by an element of $\ker g$, it makes sense to consider
h \colon C \times N & \to (B \otimes N)/i(\ker g \otimes N)\\
(g(b),n) &\mapsto [b \otimes n].
We leave it to the reader to verify that this is a well-defined bilinear map, and with it we conclude $\sum h(g(b_i), n_i) = 0$, and so $\sum [b_i \otimes n_i] = 0$, whence $\sum b_i \otimes n_i \in i(\ker g \otimes N)$.
The Universal Property
This is the standard proof of right-exactness of the tensor product, which is found in algebra books. It is equivalent to the proof by Bilinear Data, albeit in slightly different clothes.
Let $A \xrightarrow{f} B \xrightarrow{g} C \to 0$ be exact. We will show that the following sequence is exact
\[A \otimes N \xrightarrow{f \otimes \id} B \otimes N \xrightarrow{g \otimes \id} C \otimes N \to 0.\]
To this effect, consider $D = (B \otimes N)/\image(f \otimes \id)$, also known as the cokernel of $f \otimes \id$. It is easy to check that $g \otimes \id$ induces a function $h \colon D \to C \otimes N$, such that $g \otimes \id = h \circ \pi$, where $\pi$ is the projection on $D$. If we show that $h$ is an isomorphism, exactness of the above sequence is a trivial consequence.
To show that $h$ is an isomorphism, we construct an inverse. The construction is done as in the Bilinear Data proof.
An observation about this proof is that it shifts the focus of the proposition. The way I prefer to think of the right-exactness of the tensor (subject to change) is in terms of preservation of the kernel. That is, the kernel of $g \otimes \id$ is given by $(\ker g) \otimes N$, or at least the “identified copy of this inside $B \otimes N$”. The universal property view of the proof shifts the focus to cokernels: it says that the tensor commutes with cokernels, in that $\coker(f \otimes \id) = \coker f \otimes N$, where this is an actual tensor product, and not identified as in the case with the kernel. This is certainly more elegant to state, but my non-algebraist brain cannot think in cokernels.
A particular case of cokernels that I can think about, and is also a nice way to look at it, and a different perspective, is to say that tensors preserve quotients. In other words, if $A$ is a submodule of $B$ then $(B/A) \otimes N$ is nicely isomorphic to $(B \otimes N)/i(A \otimes N)$, where $i$ is the “inclusion” of $A \otimes N$ in $B \otimes N$. The isomorphism is given in the obvious way, identifying $[b] \otimes n$ with $[b \otimes n]$.
(write more about this preservation of quotients?)
Summary of Tensor Product Exactness
The main idea I want to get across is that the right-exactness of the tensor product can be seen in terms of “pulling back the tensor relations”. That is, if we get from $\sum g(b_i) \otimes n_i$ to zero, we can write this process down in a finite number of steps using bilinearity relations, and then we can “pull this deduction back” by replacing $g(b_i)$ terms with just $b_i$. This can be done because $g$ is surjective (this hypothesis is a necessity because we might need auxilliary elements of $C$ that otherwise we couldn’t pull back to $B$), and we will in general not be able to reduce $\sum b_i \otimes n_i$ to zero because in pulling back these computations we may need to add additional terms of the form $K \otimes n$, for $k \in \ker g$. The consequence is that $\sum b_i \otimes n_i$ can be written as $ 0 + \sum k_j \otimes n_j$, for $k_j \in \ker g$.
