For the Love of Maths

Reflections from Teaching: When Are Learners Novices No More?

“Novice learners may benefit most from well-guided low-paced instructional procedures, while more knowledgeable learners may benefit more from minimally guided forms of instruction.” -Slava Kalyuga

The Example that Led to Reflection

I never cease to be amazed at the level of knowledge that my teachers keep bringing to the table in my class. Last week we were discussing probability trees, and one student was leading the activity with the following tree (probabilities of drawing a yellow, green or black ball without replacement):Untitled

After the student was finished answering a couple of questions we had about the tree, I posed the challenge “Create a question where the final answer is 2/5.” I asked this question because I wanted them to get more comfortable with conditional probability. For example, the probability that we will draw a black ball, given that the first ball is yellow is 2/5, so P(B|Y) = 2/5.

Much to my surprise, the first answer given was “Determine the probability of drawing a black or green ball, given that the first ball drawn was black.” I had to sit back and try to figure out where this answer was coming from since I had not anticipated it (this is both the joy and challenge of allowing students to lead the discussion)!

Since the events of drawing a black ball and drawing a green ball are mutually exclusive, we can calculate

P(B or G | B) = P(B|B) + P(G|B) = 1/5 + 1/5 = 2/5.

Can you determine the branches used to create this question? After doing some of what Michael Jacobs calls “Maths C.S.I.” I had successfully determined how the student was thinking.

Are All of Our Students Really Novices?

Over the weekend I began pondering about how there is a lot of talk that mathematics students need to be treated like novices, especially in elementary school. For example, in Anna Stokke’s C.D. Howe Report, she states

To be effective, instructional techniques must cater to the limitations of a person’s working memory, which can hold only a limited amount of new information. This is particularly important for novice learners who have difficulty focusing on new concepts when their working memory is overwhelmed.

I don’t necessarily disagree with the statement above – one which is taken from Kirschner, Sweller & Clark, and heavily founded in Cognitive Load Theory – it is important for us as teachers to understand when learners may have limitations, and how to effectively combat these limitations. I do, however, think it is important for us to also reflect on how often we treat our students as novice learners, and realize their potential as non-novice learners. Those who argue in favour of CLT often view their learners as novices, effectively by-passing the expert-reversal effect. Stated briefly, the expert-reversal effect states methods that typically work well to elicit learning in novice learners are not necessarily the best methods to elicit learning in non-novice learners. For example, as one progresses in their knowledge of mathematics, worked examples become less conducive to learning.

In lieu of this thought, I pose some questions:

1) Are all of our students actually novice learners? Is it possible that our students are sometimes non-novices?

2) If we agree that at least some of our students are non-novices, what methods should we utilize to elicit learning in these individuals? Must it still be direct instruction and worked examples?

3) If we believe that our students are novice learners, will we ever see them as non-novice learners? Does this belief we hold affect their learning?

Overworked, Underpaid & At-Risk

Working during the summer in the service industry as a tasting room manager at a local winery prompted me to remind our visitors this year to be kind, proactive and understanding. The local paper covered my story (below), and the original article that I emailed to them is can be found below as well. As always, I hope you are all safe during these trying times; please reach out to share your stories as we plan our educational reopening this fall.


Original article sent to several papers:

Phase two and phase three reopening has been both a blessing and a curse. While my staff and I are grateful to get back to work, we face a unique problem in the service industry: fewer people want to work, yet the demand in the service industry is higher than ever. For some, the risk of exposure to COVID is significant enough that they are uncomfortable coming back to work. For others, it is easier to remain on CERB than come back to work. Whatever the reason, the bottom line is that the service industry is severely understaffed with regards to how high the demand is. 

Since there are fewer staff, those that have chosen to come back to work are overworked and stressed out. Are those servers choosing to return to work getting an increase in wage? Likely not, as food and liquor establishments are scrambling to recoup lost revenue. And with physical distancing making the number of tables in the average restaurant lower than normal, this means that it is harder to supplement a server’s wage with tips, making it even more appealing to remain on CERB for some. 

Not only are those in industry overworked and underpaid, they also represent a unique at-risk group to contracting the virus. Since travel in Canada is being promoted, people are itching to get out of their local area, and airlines are now packing patrons onto flights closer than canned sardines; folks are flooding into BC expecting a normalized summer vacation – and I fear this is only going to get worse throughout August. On average, I come into contact with about 200 people per day, and this is not an unlikely number of people to see during a given shift if you are in the industry. 

So for those of you fortunate enough to be able to travel this summer, here are a few words of advice:

  1. Be kind. Your servers are likely overworked, underpaid and stressed out – if we are short with you, it’s likely because we’re exhausted trying to get everyone to adhere to our safety protocols. 
  2. Book ahead and have patience. Your servers will do their best to accommodate you, but they can’t always make that happen in a timely manner due to the high demand – understand that sometimes we have to turn you away. 
  3. Keep your close contact group small. Your servers represent a unique at-risk group due to the sheer number of guests seen during a given shift – so keep us safe by keeping yourself safe and keeping your bubble small. 

From all of us in the service industry we thank you whole-heartedly for your support and understanding.

Intertwining Cognitive Load & Perceptual Load

Cognitive Load Theory discusses intrinsic and extraneous load, while Load Theory of Attention discusses perceptual load – how similar and how different are these two theories?

At researchED Vancouver in February, I had the opportunity to discuss Nillie Lavie’s Load Theory of Attention and how it relates to Cognitive Load Theory and AD/HD. If you didn’t get a chance to attend, you can find a YouTube version of my talk here. This blog post is a summary of how I think Cognitive Load Theory and Load Theory of Attention appear to complement each other.

Sweller’s Cognitive Load Theory discusses the concepts of intrinsic and extraneous cognitive load through the use of the two major players in cognitive architecture: the working memory and the long-term memory store. Currently, it is theorized that the long-term memory store is infinite in capacity; yet the working memory, or processing space, has limited capacity. Not only can facts/information enter the working memory from the environment, but it also has some ability to “search out” stored facts/information from our long-term memory. When the capacity of the working memory is filled, and processing power becomes limited, we say an individual is experiencing cognitive overload.

Information can enter the working memory from either the environment or the long-term memory store. Here, the white circles represent the limited capacity of the working memory. Each black X represents information coming from the environment; whereas, each red X represents information from long-term storage. Ideally, we want learners to have some working memory capacity open for processing (blank white circle), so as not to experience cognitive overload.

One could imagine that if a lot of information, or cognitive demand, is coming from the environment, that this may quickly take up working memory space and lead to cognitive overload. With respect to teaching, the way we set-up, explain, and execute our lessons is referred to extraneous cognitive load. In other words, Cognitive Load Theory states that the way we present information and tasks to the learner is important. For instance, inquiry-based learning has a high extraneous load for novice learners compared to worked-out examples; yet worked-out examples tend to have a high extraneous load for expert learners compared to inquiry-based learning (an oddity known as the expertise-reversal effect).

If a task is cognitively demanding due to the way it is presented, such as inquiry-based learning for novices, extraneous cognitive load is high. If too much working memory capacity is taken up, we experience cognitive overload, and our ability to process new information decreases.

The same is true in the other direction. That is, one could imagine that if a question is intrinsically hard, then our working memory might try to “search out” a bunch of past knowledge from our long-term memory. With respect to teaching, Cognitive Load Theory states that we need to be mindful of how much mental effort is required of our learners to perform a particular task – this is known as intrinsic cognitive load. This type of cognitive load is often measured using element interactivity. For example, solving a related rates problem in calculus has high intrinsic load, as learners are required to keep track of many interacting elements (the picture or model, variable representing changing quantities, implicit derivatives, the original word problem); yet finding the derivative of y = x^2 is likely to have low intrinsic load.

If a task is cognitively demanding due to the task having many interacting concepts, intrinsic cognitive load is high. If too much working memory capacity is taken up, we experience cognitive overload, and our ability to process new information decreases.

Keeping Sweller’s Cognitive Load Theory in mind, let’s turn our focus to Lavie’s Load Theory of Attention. We will keep the two major players from before (working memory and long-term memory), and add in one new player: the sensory memory. It is believed that the sensory memory holds stimuli coming from the environment just long enough to be transferred to our working memory. Lavie’s Load Theory of Attention (or Perceptual Load Theory) discusses attentional capacities through the use of task-relevant stimuli and distractors. She theorizes that our attentional resources are of finite capacity, and that the perceptual load of task-relevant stimuli determines whether or not distractors get processed.

Before entering the working memory, stimuli have to pass through our sensory memory. It is here, where our brain selects which stimuli to pay attention to. Our attentional resources are of limited capacity,
similar to the working memory, and are shown with white circles.

One could imagine that, for a given task, there will be task-relevant and task-irrelevant information (or stimuli). Let’s call task-irrelevant stimuli distractors; these are the items we would like to keep away from our processing space. Suppose that the task-relevant stimuli don’t demand all of our attentional capacity in the sensory memory. In these cases, under low perceptual load, the leftover capacity is taken up by any number of distractors, and both the task-relevant stimuli and distractors are sent to the working memory space. Now distractors begin competing for processing space. For instance, consider a lecturer using PowerPoint. Assuming that you know the material very well, and it is easy to read the slides, you are in a scenario of low perceptual load. The free attentional capacity allows distractors to get through to your working memory space. You might begin to think about what is for dinner, the last song that played on the radio, why the person in front of you is wearing socks and sandals, or what new notification you received on your cell phone.

Task-relevant stimuli are indicated with a black X; whereas distractors are represented with a black D. Under the case of low perceptual load, task-relevant stimuli do not take up all of our attentional capacity, so distractors fill up the remaining space. Both distractors and task-relevant stimuli now have the ability to enter working memory and compete for processing space.

In contrast to the above scenario, imagine that the task-relevant stimuli do demand all of our attentional capacity in the sensory memory. In this case, under high perceptual load, the deficit in available attentional capacity results in distractors not taking up any of this space. Now only task-relevant stimuli are sent to the working memory for processing. For instance, consider the case where we are listening to a lecturer again. This time, we don’t know the material as well, and she is writing at the board in a handwriting style that is slightly messy. In this case of high perceptual load, our attentional capacity is maxed out trying to decode the handwriting and language used to explain the concepts. It is more challenging to think about distractors, as they aren’t vying for processing space in your working memory.

Under the case of high perceptual load, task-relevant stimuli take up all of our attentional capacity. Only task-relevant stimuli have the ability to enter working memory. Distractors are not processed.

As you might be able to see, the two theories developed by Sweller and Lavie seem highly complementary. Of interest to me is that it does not seem as though either theory has acknowledged the other as of yet. However, it does seem as though Perceptual Load Theory and Cognitive Load Theory might offer insights into each other’s realm:

  • How much of the extraneous load of a task comes from processing distractors?
  • Do distractors affect intrinsic cognitive load?
  • If we decrease extraneous cognitive load, does this always lead to less processing of distractors?
  • How can we create lessons in such a way to ensure that perceptual load is “high enough”? And how high is “high enough”?

I’m sure there are other concepts that could be intertwined as well, but these are some of the first questions that come to my mind. As always, I welcome your thoughts and questions on this reflection.

Discovering a Complex Relationship

A discussion of complex roots of a parabolas leads to a connection between the modulus of the solution and the vertex of the parabola.

Consider the following problem: Find all roots of the function y = (x-3)^2 + 4.

Now, most of us will know this as the equation of a parabola in the xy-plane; one whose vertex is at the point (3,4).


And most of us would be happy noting that this equation does not have any roots over the real numbers. For those of you who want to travel down the rabbit hole of complex numbers though, let’s take a walk.

Assume that we now have the equation y = (z-3)^2 +4, where z = a + bi is permitted to be a complex number. We can do a bit of algebra to find the two complex roots of this equation:


Now, one of my clever calculus students was trying to make the connection between distance and the modulus of a complex number. He was trying to connect the modulus of these particular solutions to the distance from the origin (0,0) to the vertex of the parabola (3,4). He wrote down |z| < 5? and |z| > 3?

Notice that he was thinking about the right triangle formed below, where the 5 is the hypotenuse and the 3 and 4 are the legs:

desmos-graph (1)

I believe he was on this track because I had mentioned that the modulus formula for
z = a + bi is given by |z| = sqrt(a^2 + b^2). So it makes sense that he was thinking about the distance to the origin here (just not on the correct plane). After a bit more discussion, he was still adamant about 3 < |z| < 5, which is certainly true for this particular example, since |z| = sqrt(3^2 + 2^2) = sqrt(13).

Then I stopped and thought about this. I found it weird that the modulus of the solution was greater than the length of the smaller leg, yet smaller than the length of the hypotenuse. I dove in to see if it would work in general.

Assume y = (x – a)^2 + b is a parabola in the xy-plane that lies above the x-axis with b > 1. Extend this parabola naturally over the complex numbers and find its roots:

gif (1)

Consider the modulus of these complex roots:

gif (2)

Now, since b > 1, we can see that

gif (3)

and this is fascinating to me because it tells me that the complex modulus of our roots will lie on some kind of ring in the complex plane! In fact, we know a < |z| < c, where a is the y-coordinate of the vertex of the parabola and c is the distance of the vertex to the origin (see the blue triangle and parabola given above). In the Re/Im plane we would get a region that looks like this:

desmos-graph (2)

All in all we didn’t get too far discussing the complex modulus, but it was definitely still a bull’s-eye in my books.

A Glimpse into the Mind of a Student with ADHD

“ADHD makes it sound like I have a lack of focus, but I think of it more like a mismanagement of focus.” -student with ADHD

Here are the first few pages of a recent calculus midterm of one of my students who has been diagnosed with ADHD. I’ll let you take a peek at what you see before I give my reflection.


Page 1: The beginning of question one, which asked students to calculate derivatives of some complicated functions.

End Q1

Page 2: More derivative calculations.


Page 3: The beginning of question two, which asked students to calculate a few limits.

Q2 part 2

Page 4: More limit calculations.

Now, I want you to go back and take a look at the first page, where question #1 ii required the knowledge of the derivative of log_{3}(x). You can see that the student set up the equation log_{3}(x) = b in order to help him determine the derivative using the quotient rule. But the giant “?” beside b’ caught my interest. (Of course, if there is anything else of interest to you please leave a comment!)

Now, he begins playing around at the top of the page, recalling rules for how to deal with logarithms. There is a y = 3^x and a y = log_{3}(x) indicating to me that he was thinking about potentially finding the derivative using inverse functions or implicit differentiation. However, not much happens here, so we will catch up in a few pages.

The next page is nothing special, in that he tackles the next couple derivative questions without making any more thoughts on the log base three problem he is having. But check out the top of the third page. Here, he correctly gets the relationship between exponentials and logarithms: 3^x = b means log_{3}(b) = x (or vice versa). Then there is a little bit of play at the bottom of the page trying to re-write this relationship in various ways to potentially get a nice equation to differentiate. Aside from now having the inverse relationship solidified, not much headway is gained on the initial problem.

Finally, on the last page, we see one last attempt to think about 3^x = u, perhaps a nod to the variables I use when doing the chain rule (dy/dx = dy/du * du/dx). This is the final attempt to determine the solution to the log base three problem, and the rest of the test continues in a normal fashion.

The most interesting thing from my perspective is embedding what I see in a cognitive load theory setting. We know that the working memory has limited capacity to hold and synthesize information. This information can come from either environmental stimuli, or as schema entering from long-term memory. I was always under the impression that trying to cut back environmental stimuli for students with ADHD was a must, as this allows the working memory to focus more on the task at hand. However, seeing this test had me thinking a bit deeper.

At the college level, we are typically good at minimizing outside distractions; doors are closed, rooms are quiet and I cross my fingers that maintenance has fixed any lights that are in strobe-mode. However, as I do not have ADHD myself, I cannot comment on what outside stimuli might still be entering the working memory. Perhaps a song that was heard earlier that morning? Whether or not he forgot his lunch at home? What plans are for after school? So let’s assume that some working memory space has been allocated to this.

Now it’s test time. Since this particular student is quite adept at mathematics, most schema enter the working memory quite effortlessly. We can see this demonstrated on page 2, where some complex derivatives are handled. From my perspective, it is actually the snag of not knowing the derivative of log_{3}(x) that pushes the working memory over its capacity. Look at how often he returns back to the problem – at the top of page 1, the top & bottom of page 3, and at the top of page 4.

Just how taxing is it on the working memory to be subconsciously processing this log base three problem over the course of four test questions? How debilitating would this be if there were not well-developed schema to draw from when writing this test? How much more success would there have been if he was able to dislocate this log base three problem from his working memory, instead of it continually returning back to occupy his focus? I find these questions super interesting, and I have thoughts, but no particular answers. If there are any readers who have studied cognitive load theory from the perspective of individuals with ADHD, I’d love to read a bit more on this topic.

Understanding “Understanding” Part III

A post where we explore how to define understanding in a cognitive load theory framework.

In my last two blog posts, I discussed the concepts of element interactivity, as well as intrinsic and extraneous cognitive load. We say information has high element interactivity if there are many elements of the information that must be processed together at the same time. High element interactivity generally implies high intrinsic cognitive load. Here, intrinsic cognitive load refers to a working memory load caused by the intrinsic nature of information that we are trying to process. Finally, extraneous cognitive load refers to a working memory load imposed by the pedagogical nature of the information being taught.

Defining Understanding

Now that we know about element interactivity, we can use this concept to define understanding. In a cognitive load setting, understanding is the ability to process all interacting elements in working memory at one time. Since the focus is on interacting elements, it does not make sense to define understanding to individual elements, such as learning one French vocabulary word (cat = chat).

Let’s analyze our previous examples. Consider the math fact 3 + 5 = 8. According to our definition, if a learner is able to answer 3 + 5 = ? correctly, without having process all of the interacting elements, we would say that she has demonstrated understanding of the question at hand. I would argue then, that using a strategy such as tallying up three and five on her fingers would display a lack of understanding. Even beginning with three fingers and counting up to eight, whilst being a more effective strategy, still displays a lack of understanding as she is processing some or all of the elements individually. Of course, I am not arguing that students shouldn’t be permitted to use these counting strategies. It is likely that these are crucial stepping stones in the learning trajectory, and the instructor needs to be mindful of when the student seems ready to move beyond these strategies.

Understanding and Incorrectness

One aspect of the definition that I am curious about is when the learner makes a mistake in the process. Consider solving for x in 3x – 10 = 5. Is it possible for the student to understand, yet be incorrect? Are these mutually exclusive events? Let’s say the student solution is

3x – 10 = 5
3x = -15
x = -5

This is incorrect, but it still shows us that they understand the process of solving for x, and that they can process all of this information in working memory at once. Does understanding come down to a judgement call on the side of the instructor in these cases?

Instructional Implications – A Case for Quick Math Fact Recall

Let’s try to deconstruct our current pedagogy in light of this definition of understanding. Consider all of the multiplication facts that our students must recall. There is element interactivity amongst one individual fact (3 x 4 = 12), as well interaction amongst all of the multiplication facts for three, as well as interaction amongst all facts up to 9 x 9 = 81! Working memory might get overwhelmed, as intrinsic load is high due to the number of facts that must be remembered.

Think also about what our current curriculum states: students should be comfortable with knowing other concepts, such as knowing 3 x 4 = 4 + 4 + 4 = 12, building array models, or knowing about the commutative property.  All of this increases extraneous cognitive load; thus requiring more time and effort for the students to move the facts to long-term memory. I would argue that this is why we have seen a shift to moving recall of the multiplication facts to later grade levels. In British Columbia, students aren’t expected to recall facts for 3s or 4s until Grade 5; and there is no mention of the harder facts like 7s, 8s or 9s.

To compare, I had my multiplication facts memorized by the end of Grade 3 in the 80s in Ontario. Some might argue that we were taught without understanding (this alternate definition is a bit fuzzy, but typically is interpreted as knowing how to complete a question utilizing a model). This is false, as I have many documents showing that we indeed used models. But the key difference here is that the focus of instruction was on automatization of facts, and that models were used to introduce concepts and as help when students weren’t understanding. Models were used to decrease intrinsic load, not to increase extraneous load!


Addition and subtraction models were used to introduce the concepts in Grade 1.

For such a large task, such as learning the multiplication facts, why not have students learn the individual facts first? Using techniques such as interleaved and spaced practice, and introducing new fact families after long exposure to previous ones, would be beneficial for learning. After students are comfortable with recall of the facts, then we can focus our teaching on developing understanding (the fuzzier definition) of how multiplication is connected to other concepts. Of course, once students can recall the multiplication facts, they have displayed understanding in the cognitive load sense, as they can process all of the elements together at once. So why would we want to learn our facts first, before connecting to other concepts? Once the facts are remembered well, then the can be retrieved quickly and efficiently, leading to lowered intrinsic load, and more working memory capacity to work on the current problem of connecting the fact to another concept.

In conclusion, I am not saying that we shouldn’t explain why certain facts are the way they are! This can certainly be done as motivation to the problem, and mixed throughout as needed; however, this should not be the focus of the learning because this increases extraneous load and not all students will successfully move the facts into long-term memory store this way.