[RUME 2016] Borrowing from linguistics: Metonymy and linear transformations

Here’s part 2 of my “borrowing from linguistics” series.

So like I said in yesterday’s post, Rina Zazkis’s talk grabbed my attention because of some previous work I’ve done. (Some day I will bother to make my own personal website and link to things there, but that day is not today.) My coauthors on that paper, Chris Rasmussen and Michelle Zandieh, had conducted these interesting interviews on students’ construal of similarity between linear transformations and functions like from high-school algebra. These interviews were wide-ranging, including such topics as injectivity, surjectivity, invertibility, and composition. I focused in on the bits about invertibility and composing a function / linear transformation with its inverse.

One of the tasks from those interviews was to predict what you’d get when you compose a function with its inverse, and then to predict what you’d get when you compose a linear transformation with its inverse. When I started digging into this data, I noticed something unexpected: all ten students initially said you should get 1 in the function case — i.e., that f(f^{-1}(x)) should be 1. This seemed weird, and it drove the investigation that turned into this paper: why did everyone make this incorrect prediction??

One of the things that ended up being salient was that many of these students predicted that a linear transformation composed with its inverse should be the identity matrix (i.e., that \mathbf{T}(\mathbf{T}^{-1}(\mathbf{x})) = \mathbf{I}). This seems less weird and more understandable. The place where this comes from, we ended up arguing in the paper, is a metonymy.

Let’s digress a bit and recall what metonymy is. For our purposes, metonymy is a literary device whereby a thing is called not by its name, but by the name of one of its parts*, or by the name of something that is associated with it. For instance, you can use “Hollywood” to refer to the American movie industry, or “the press” (i.e., the printing press) to refer broadly to the journalistic endeavor. (These examples are gratuitously stolen from the Wikipedia page).

What’s the metonymy that’s going on here? It’s actually very mathematically sophisticated: Every linear transformation from \mathbf{R}^n to \mathbf{R}^m can be represented as a particular m \times n matrix. (What’s more, every matrix represents a linear transformation, so that Hom(\mathbf{R}^n\mathbf{R}^m) is in fact isomorphic (as a vector space) to \mathbf{M}_{m\times n}! This isomorphism is canonical as long as you’re working in the standard bases for \mathbf{R}^n and \mathbf{R}^m.) We make use of this fact all the time: we usually write \mathbf{T}(\mathbf{x}) = \mathbf{A}\cdot\mathbf{x} whenever we’re discussing some transformation.

Now here’s the metonymy: we tend to speak of the matrix \mathbf{A}, which represents the transformation, as the transformation — for instance, it’s not unusual to say that \mathbf{A} = \begin{pmatrix} \cos\theta & -\sin\theta \\ \sin\theta & \cos\theta \end{pmatrix} is a CCW rotation by an angle \theta. However, it would be more accurate to say that \mathbf{A} represents such a transformation. For mathematically sophisticated people, this little bit of playing fast and loose is not really problematic, and in fact it enables fluency that is a mark of mathematical sophistication. (If I need to calculate the composition of two transformations, I can quickly and fluently pass into thinking about multiplying the two matrices, and then quickly and fluently pass back into thinking about the resulting matrix as representing the resulting transformation.) However, for our students, it can cause trouble, as I’ll detail below.

Let’s think about composing a transformation with its inverse: \mathbf{T}(\mathbf{T^{-1}}(\mathbf{x})) = \mathbf{A}\cdot(\mathbf{A}^{-1}\cdot\mathbf{x}) = \mathbf{I}\cdot\mathbf{x} = \mathbf{x}. (Note that this calculation sort of implicitly invoked the isomorphism discussed above.) Not so bad, right?

But now let’s really buy into this metonymy and try that again:

\mathbf{T}(\mathbf{T^{-1}}(\mathbf{x})) \leadsto \mathbf{A} \cdot \mathbf{A}^{-1} = \mathbf{I}.

Hm. That’s slightly wrong. And, as I’m sure any of you who teach linear algebra will immediately recognize, lots of students do this. It’s not a big deal as long as you remember to “un-metonymize” and stick that \mathbf{x} back on the end, and mathematically sophisticated people do this fluently. However, if you’re not super aware that this is a metonymy, and you just think that the matrix is the transformation, then this could lead to problems.

In particular, it could lead you to say something like, “if I compose a linear transformation with its inverse, I ought to get the identity matrix,” which is indeed something that a lot of people in our interviews said. And again, this is wrong, but maybe only slightly wrong. If you compose a linear transformation with its inverse, you get a transformation that is represented by the identity matrix, i.e., the identity transformation \mathbf{T}(\mathbf{x}) = \mathbf{x}.

So I think this is another example of a productive borrowing from linguistics: the word “metonymy” is probably a good one to have in our conceptual toolboxes when we’re teaching linear algebra. I’m not going to say that telling students about this whole metonymy thing is going to 100% keep them from making this mistake, but I think it might help. At the very least, it gives us a conceptual label for this tricky subject that doesn’t require an understanding of isomorphisms of vector spaces.

Anyway, with this train of thought in mind, it’s sorta unsurprising that people say that they should get 1 if they compose a function with its inverse. 1 is kinda like \mathbf{I} (in that they are both the multiplicative identity in their respective rings), so maybe this new “fact” that comes from thinking metonymously in linear algebra is influencing an old thing they used to know — which leads us to backward transfer, the subject of a future blog post!

—————

* Some sticklers will probably here insist that a part-whole relationship is a synecdoche rather than a metonymy. I prefer to consider synecdoche a particular kind of metonymy. If this bothers you, then feel free to mentally alter my terms; I don’t think it will impact the argument.

[RUME 2016] Borrowing from linguistics: Rina Zazkis and the superscript -1

So, to begin with, let me introduce this sequence of blog posts tagged [RUME 2016]. I was recently awarded a travel grant to the ICME conference coming up in Germany. Part of the requirements for this award is that I’m supposed to help disseminate stuff I learn at that conference, so I decided to practice / build readership of this blog by blogging about RUME talks I found particularly fun and interesting. I’ve got basically three blog post topics (which may span multiple posts) in my brain, with a fourth that may come later.

The first talk I’d like to blog about is Rina Zaskis’s talk “On symbols, reciprocals, and inverse functions.” (Igor’ Kontorovich is a coauthor on this talk but could not make it because Auckland is v. far away.) This talk immediately grabbed my attention because I wrote a paper some time ago about students’ construal of similarity between functions and linear transformations, and in particular, the ways they think about the inverses of each of these things. More on this later (in a forthcoming post or two).

Dr Zazkis gave students a scripting task, in which she asked students (pre-service secondary teachers) to extend this imaginary (but completely plausible!) interaction:

T: So today we will continue our exploration of how to find an inverse function for a given function.

S: So you said yesterday that f^{-1} stands for an inverse function.

T: This is correct.

S: But we learned that this power (-1) means 1 over, that is, 5^{-1} = \frac{1}{5}, right?

T: Right.

S: So is this the same symbol, or what?

T: …

(I think this is such a brilliant task.)

So here’s three ways you could sensibly answer this question:

  1. The group theory approach: In every group, every element g has an inverse g^{-1} such that g * g^{-1} = e, the identity element. So then these are totally the same thing; the only difference is what’s the group and what’s the operation (the group of rational numbers under multiplication vs the group of all functions under composition). Unfortunately, it’s probably not the best idea to drop some group theory on some high-school students, so we should probably explore other approaches.
  2. The context-dependent approach: The common symbol \Box^{-1} means different things depending on what’s in the box (e.g., a function vs. a number). This smacks of rule-based thinking and obscures the legitimate connection between inverses, so I don’t like it twice over.
  3. The middle-ground approach: The common symbol \Box^{-1} means slightly different things depending on what’s in the box, but there is a relation between these slightly different meanings. I’m calling this the middle-ground approach because it seems to bring out this relationship without invoking all the machinery of group theory. This is probably how I would choose to answer the question, should it come up; I’d probably talk in some amount of detail about how we could consider both things instances of some more generic idea of inverses. I think we’d all pretty much agree that this is a better way of explaining the relationship, even though it may be difficult right now to articulate why.

But Spencer, you’re thinking, the title of this blog post is something about linguistics, and this is just a bunch of math. You’re right; now it’s time to borrow some linguistics words to give better descriptions to ways #2 and #3 above.

The words Dr Zazkis chose to borrow were homonymy and polysemy. These are good words that do exactly the work that we’d like them to do. Here are some definitions I synthesized from various google results:

Homonymy: the relation between words with identical forms and sounds but different and unrelated meanings. Example: “river bank” vs. “savings bank” vs. “bank shot” vs. “bank of interview questions.” (Yikes!)

Polysemy: the relation between words with identical forms and sounds but multiple, contiguous meanings. These meanings emanate from a central origin, and they form a network such that understanding any one meaning contributes to understanding any other meaning. (The wiki page on polysemy is v interesting.)

It’s probably clear where this is going now: way #2 above is understanding the different \Box^{-1}s as homonymous, and way #3 is understanding them as polysemous. This fits so super well: we could certainly call 5^{-1} “five-inverse” just like we call f^{-1} “f-inverse,” and there is absolutely a central origin for all these words (i.e., the group-theoretic construct of inverses).

What I really like about this, pedagogically speaking, is the network-y bits of the definition of polysemy I gave above: understanding one kind of inverse will help you understand another. Calling both 5^{-1} and f^{-1} “inverses” helps us recognize and talk about both the similarities (in both cases, the one thing “undoes” the other thing) and the differences (the operations are different) between the two cases. What’s more, I think this is precisely the way in which way #3 feels better than way #2. Look how much mileage we got by borrowing some ideas from linguistics!

I’ve got two more post ideas lined up that build on this idea of borrowing interesting things from linguistics. The first borrows the idea of metonymy to talk about inverses of linear transformations; the second borrows the idea of backward transfer (from people who study second-language learning).

I’ll close this post with this lovely quote Rina Zazkis presented, from Henri Poincaré:

Mathematics is the art of giving the same name to different things.