Word Vectors Institute: Introductions and Overview

This tutorial introduces the WWP curriculum on word embedding models.

Overview

This institute series: four institutes in all, generously funded by the National Endowment for the Humanities: July 2019: Introductory, research focused May 2021 (rescheduled from 2020): Introductory, teaching focused July 2021 (rescheduled from 2020): Intensive, research focused May 2022 (rescheduled from 2021): Intensive, teaching focused

Our focus for this event: What are word embedding models? What's different about them? What are they good for? What do all the specialized terms mean? How do word embedding models work? How can we build fascinating and effective curricular materials that make use of word embedding models? How do we explain word embedding models in our teaching, at different levels and for different pedagogical contexts?

To situate this event a bit: this is the third of a series of four institutes, in which we're trying to approach the general topic of word embedding models from both a teaching and a research perspective, and also for audiences with different levels of comfort with programming so this third event focuses on research usage from an intensive standpoint

We’re not expecting any prior knowledge of text analysis and certainly none of word embedding models (that’s why you’re here!) but we hope everyone will come away feeling comfortable with several things: What word embedding models are and how they differ from other text analysis/machine learning approaches The vocabulary and specialized terminology used to talk about word embedding models How word embedding models work: what is actually happening under the hood and how that affects the kinds of research and interpretive work we can do with this technique How to explain and contextualize these approaches, particularly in the context of our research and scholarship How to read and modify the R code used to train and query the models (but not write new R code from scratch)

To situate this walkthrough a bit: This walkthrough was developed using materials from the third of a series of four institutes, in which we approached the general topic of word embedding models from both a teaching and a research perspective. We designed this walkthrough for audiences with different levels of comfort with programming, so no high-level experience with programming is required to follow along. The primary focus of this walkthrough is on research usage from an intensive standpoint

We’re not expecting any prior knowledge of text analysis and certainly none of word embedding models (that’s why you’re here!) but we hope you will come away from this walkthrough feeling comfortable with several things: What word embedding models are and how they differ from other text analysis/machine learning approaches The vocabulary and specialized terminology used to talk about word embedding models How word embedding models work: what is actually happening under the hood and how that affects the kinds of research and interpretive work we can do with this technique How to explain and contextualize these approaches, particularly in the context of our research and scholarship How to read and modify the R code used to train and query the models (but not write new R code from scratch)

What we will not be covering: We have developed an easy-to-use web interface which we'll use a bit (and which you may find very useful for teaching), which lets you query existing trained models We will be going into more detail later, but in this intensive guide, we will be getting into the actual process of training and querying models on the command line We will be using RStudio, which is a command-line environment for running R code Although we will be using appropriate jargon and terminology when referring to the tools and methods in this guide, we do not assume any prior knowledge and thus, terms will be defined when appropriate.

Finding the right level

This is also a sort of meta-workshop: Part of the goal of this grant is to explore ways of making word embedding models approachable and useful and persuasive to many different audiences without dumbing them down; we’re trying to develop appropriate explanatory narratives that are somewhere in between “word vectors are a fun tool! See the clusters!” and technical language that assumes deep expertise So we are going to be interested in thinking with you about that boundary: about what parts of this topic are especially challenging, and how we can best understand them and explain them to others: for instance, colleagues, students, and readers of articles where you draw on these techniques Your current unfamiliarity with the topic is a brief and precious resource for you as teachers: this is your moment to reflect on what is hardest to understand, so that you can anticipate the things others may find confusing or worth unpacking, and explain them in terms that are legible and appropriately pitched

This walkthrough approaches word embedding models from a meta-perspectivee: Part of the goal is to explore ways of making word embedding models approachable, useful, and persuasive to many different audiences without oversimplifying them; The guide uses explanatory narratives that are somewhere in between “word vectors are a fun tool! See the clusters!” and technical language that assumes deep expertise. This way, word embedding models are demystified as much as possible while still familiarizing you with the terms that practitioners use to talk about the same processes. So we will be exploring that boundary a bit and asking you to consider the following: what parts of this topic are especially challenging, and how we can best understand those challenges and explain them to others: for instance, colleagues, students, and readers of articles where you draw on these techniques Your current unfamiliarity with the topic is a brief and precious resource for you as a researcher or a teacher (or both!): this is your moment to reflect on what is hardest to understand, so that you can anticipate the things others may find confusing or worth unpacking, and explain them in terms that are legible and appropriately pitched

A quick look at the schedule...

Monday and Tuesday: An initial orientation and a showcase of some pedagogical projects A deeper explanation of terminology and concepts A walkthrough of some commented code samples and some hands-on experimentation

Wednesday and Thursday: A close look at corpus and data preparation A close look at the model training process More hands-on experimentation using commented code walkthroughs

Friday: More hands-on practice, and a walkthrough of the code to visualize word embedding models Wrapping up and next steps

Quick look at the schedule: Our basic strategy here is to examine and explain word embedding models several times, at increasing levels of detail, so that you have a chance to internalize one level of knowledge before we dive into the next deeper level. We’ll be working intensively with commented code walkthroughs: these are R programs with detailed comments and some specific places where you can make modifications and specify parameters; these are designed so that you don't have to actually write any R code, but can become familiar with how it works and how to adapt it We'll also spend time doing hands-on work in small groups so that you have a chance to practice and explore on your own During the workshop, we will be using a version of RStudio that is installed on a shared server, so that you (and we) don't have to deal with the complexities of getting RStudio running on everyone's individual computers. However, before the workshop on Wednesday and Thursday, for those who are interested, we will also do a walkthrough of how to download and install RStudio on your own computer; no obligation but if you're interested all are welcome. On the final day, we'll do a bit of experimentation with code to visualize word embedding models, and then we’ll wrap up with a discussion of next steps (including what would be involved in tackling RStudio and the command line if you’re so inclined).

A quick roadmap for this walkthrough: The strategy of this walkthrough, is to examine and explain word embedding models several times, at increasing levels of detail, so that you have a chance to internalize one level of knowledge before we dive into the next deeper level. We’ll be working intensively with commented code walkthroughs: these are R programs with detailed comments and some specific places where you can make modifications and specify parameters; these are designed so that you don't have to actually write any R code, but can become familiar with how it works and how to adapt it We also include several hands-one exercises for you to try out so that you have the opportunity to put key concepts into practice. The code we will be using in this walkthrough is written in the programming language R and is run using the command line tool RStudio. We won't spend a lot of time explaining how to download and install RStudio on your own computer, so we recommend taking some time to install RStudio on your own computer or to connect to a shared server with RStudio already available. Finally, we'll do a bit of experimentation with code to visualize word embedding models, and then we’ll wrap up with a discussion of possible next steps.

Making notes

What did you try?

What settings did you use? (Which corpora, what query terms...etc.?)

What result did you get?

What didn't make sense?

What do you want to remember to try later?

We've provided a fair amount of time for individual and small-group experimentation, and time for you to think about your own research projects

However, this workshop will really just be a start, a chance to get comfortable with fundamental concepts

I want to talk for a moment about some suggestions for how to take this work with you and continue it in your own time after you get home: For all of the sessions, we have a shared notes document [share link] for anything you want to write down that might be useful to the group, and we'll also ask you to make notes there during some of the small-group hands-on work we'd also like to suggest that you keep something like a lab notebook: an informal, personal (but somewhat detailed) record of what you tried, what worked, what questions you have, what you want to follow up on later More specifically, it's helpful to remember details like what words you queried, what corpora you were comparing, what settings you used We have created some samples and templates as inspiration, which are in our shared Google space Later on, these kinds of notes can also be useful in documenting your results, for purposes of writing about them in your research; very similar to documenting your bibliographic sources for a research article Screen shots can also be a convenient way to keep a record of a notable result. Questions?

We've provided a fair amount of space in the walkthrough for you to think about your own research projects as well as how you see word embedding models playing a role in answering the research questions you may have

However, this walkthrough is a soft introduction to the concept of word embedding models, a chance to get comfortable with fundamental concepts although there is much more that word embeddings are useful for

Since we can't possibly cover every application word embedding models may have for humanities researchers, we want to offer some suggestions for how to take this work with you and continue it in your own time after you have familiarized yourself with the content that we have provided: We encourage you to have a dedicated space for notes that you can use to record anything that might be useful for a later exploration. At key points, we will also ask you to use your notetaking space to jot down some thoughts, particularly during some of the hands-on exercises We suggest that in addition to more traditional notetaking, that you keep something like a lab notebook: an informal, personal (but somewhat detailed) record of what experiments you tried, what worked, what questions you have, what you want to follow up on later More specifically, it's helpful to remember details like what words you queried, what corpora you were comparing, what settings you used We have created some samples and templates as inspiration, which you may use as a model Later on, these kinds of notes can also be useful in documenting your results, for purposes of writing about them in your research; very similar to documenting your bibliographic sources for a research article Screen shots can also be a convenient way to keep a record of a notable result.

Model?

So, with those preliminaries out of the way, let’s get into our first explanation of word embedding models. For this first explanatory pass through, we won't dwell in detail on the terminology or the mathematics: we'll keep to a sort of metaphorical level of explanation to get a feel for things.

And the first term I want to talk about is the word model model is a potent concept in digital humanities, because so much of what we do depends on models of one kind or another: creating digital representations of real-world objects and ideas, and using them to study those things in some of the earlier domains of DH we're used to thinking of models representationally: as static proxies for research objects (like texts or artifacts) that capture what is salient to us about those artifacts: a model as a TEI-encoded text in more recent domains such as machine learning, a model is more of a predictive or generative tool: something we can use to model the behavior of a system and not only learn more about it, but also produce new things that follow the rules and probabilities of the system: the kind of model that is represented by a schema

Word-embedding models have properties of both, but in important respects are more like this latter type: they model the language of a corpus in a way that focuses on questions like "if I'm reading or writing this sentence, what's the most likely next word?" or "based on the words I'm seeing in this little region, what is the most likely word at the center of that region"? in other words, word-embedding models are interested in a probabilistic model of language that represents the interconnections between words as likelihoods based on proximity

The practical applications of this kind of modeling are familiar: predictive text on your phone! But in digital humanities, models of this kind are also valuable because they let us understand language better and help us do research on specific topics and historical formations. So where the machine-learning research in industry is focused on getting the most accurate predictions of what word I'm trying to type, through a somewhat abstract, de-historicized understanding of language, in digital humanities we need to pay close attention to language as represented in our specific corpora (representing a time period, a genre, a set of authors, etc.) and also to the assumptions we're making about language when we train our models.

And the first key term we will cover is the word model The term model is a potent concept in digital humanities, because so much of what we do depends on models of one kind or another: creating digital representations of real-world objects and ideas, and using them to study those things in some of the earlier domains of DH models tended to be thought of representationally: as static proxies for research objects (like texts or artifacts) that capture what is salient to us about those artifacts: a model as a TEI-encoded text in more recent domains such as machine learning, a model is regarded as more of a predictive or generative tool: something we can use to model the behavior of a system and not only learn more about it, but also produce new things (objects, items, or even new models) that follow the rules and probabilities of that system: this is the kind of model that can be represented by a schema

Word-embedding models have properties of both of these perspectives on models, but in important respects are more like the latter type: word embedding models model the language of a corpus in a way that focuses on questions like "if I'm reading or writing this sentence, what's the most likely next word?" or "based on the words I'm seeing in this little region, what is the most likely word at the center of that region"? in other words, word-embedding models are interested in a probabilistic model of language that represents the interconnections between words as likelihoods based on proximity

The practical applications of this kind of modeling are likely already familiar: for example, predictive text on your phone! But in digital humanities, models of this kind are also valuable because they let us understand language better and help us do research on specific topics and historical formations. So where machine-learning research in industry is often focused on getting the most accurate predictions of what word a user is trying to type, through a somewhat abstract, de-historicized understanding of language, in digital humanities we need to pay close attention to language as represented in our specific corpora (representing a time period, a genre, a set of authors, etc.) and also to the assumptions we're making about language when we train our models. I.e., we are interested in emphasizing humanistic questions and training while using these methods.

A first look at word vectors

At the simplest level, a word embedding model is a model of a text corpus that represents word usage in the corpus by locating each word in space

Metaphorically, we can imagine that those spatial locations show us neighborhoods of words that tend to occur in the same contexts

Another way to think about these neighborhoods is that they are answers to the question: what are the words most likely to appear near word X? or what word X is most likely to appear in this context?

So the clusters we see are groups of words that might be predicted by the same kinds of contexts. What can we imagine those contexts to be, based on the clusters we're seeing here? Start with cluster 5 (accompanying mad lib): words relating to expressions of risk and despair, unhappy futurity we can see how these words could plausibly fit into very similar contexts How about clusters 6 and 7? (righteous war; early modern female virtue?) Cluster 8 is a little different: not really a thematic cluster: what is the predictive context here? How about clusters 9 and 4?

At the simplest level, a word embedding model is a model of a text corpus that represents word usage in the corpus by locating each word in space

Metaphorically, we can imagine that those spatial locations show us neighborhoods of words that tend to occur in the same contexts

Another way to think about these neighborhoods is that they are answers to the question: what are the words most likely to appear near word X? or what word X is most likely to appear in this context?

Thinking with vectors

So this is interesting in itself: these clusters of words tell us something about how our corpus uses language it shows semantic connections between words

It's also interesting because we can do further analysis: These neighborhoods aren’t just clusters of words that are impressionistically near one another: they are positioned in a spatial relationship to the rest of the model that spatial relationship can be described mathematically the position of each word is represented by a vector (essentially, vectors are lines that aim out at different angles and distances) this means that we can actually compare the position of one word mathematically with the position of another word, and we can represent the difference in their positions as: another vector! We don’t want to examine that math just yet, but we can take advantage of it.

If you had a chance to read Ryan Heuser's analysis of riches and virtue, or Ben Schmidt's analysis of the Rate My Professor data, where he considers breaking down the gender binary, they are taking advantage of this same idea: that we can use these vectors, these spatialized relationships between words, as an analytical tool and that although in a sense space is a metaphor here (or at least a purely mathematical kind of reality), nonetheless it has a level of internal consistency and truth-value that means we can do meaningful analyses based on it.

Here is an interesting thought: these clusters of words tell us something about how our corpus uses language So, these clusters show semantic connections between words; they show us how words are related to one another within the world of the corpus. However, to be as precise as possible, it is important to understand that the model is only aware of the language in terms of words it has access to. This means that the relationships between words represented by these clusters only captures the relationships that these words have within this specific corpus rather than language more broadly

Let's analyze this idea a little further: These neighborhoods aren’t just clusters of words that are impressionistically near one another: they are positioned in a spatial relationship to the rest of the model, meaning that words don't just have relationships with other words in a shared cluster, but clusters have relationships with other clusters which are represented by their distance or proximity in what is called vector space because word embedding models represent words numerically, that spatial relationship can be described mathematically the position of each word is represented by what is called a vector (essentially, vectors are lines that aim out at different angles and distances; they allow us to figure out where exactly in vector space a word is located) since vectors allow us to locate words in this space, this means that we can actually compare the position of one word mathematically with the position of another word, and we can represent the difference in their positions as: another vector. This means that we can use addition and subtraction to analyze language! We don’t want to examine that math just yet, but we can take advantage of it.

If you get a chance to read Ryan Heuser's analysis of riches and virtue, or Ben Schmidt's analysis of the Rate My Professor data, where he considers breaking down the gender binary, both writers are taking advantage of this same idea: that we can use these vectors, these spatialized relationships between words, as an analytical tool and that although in a sense space is a metaphor here (or at least a purely mathematical kind of reality), nonetheless it has a level of internal consistency and truth-value that means we can do meaningful analyses based on it.

Locating words in vector space

So how do those words get located in this space? What does the spatial metaphor really mean?

We will go into the details much more fully, very soon. But for this initial orientation: This model of our corpus, in which each word is represented by a vector, is created through a training process, in which a software program works its way through the text, over and over, making observations about what words appear near one another essentially, building a model of the corpus that addresses the question If I have word X, what words are most likely to appear nearby? at each observation, it adjusts the position of the words by the end of the training process, the model contains very detailed information about where each word is positioned relative to all or most of the others: this information is more detailed the more thoroughly we do the training this training process can be varied depending on what actual task or insight or research we're trying to support: if we are Google and we're trying to develop text prediction systems, the most interesting words will be the single word right after word X. On the other hand, if we're digital humanists and we're trying to understand discourse more generally, the words surrounding word X might all be equally interesting. And in fact different researchers might be interested in the words very close to word X (words that suggest how syntax behaves) or in the words more loosely associated (which might suggest conceptual connections)

This slide shows some actual quotations from WWO where the word danger occurs: if we imagine the training process working its way through the text and making observations, we can see that when it encounters the word danger it repeatedly sees words nearby like approaching, imminent, apprehend: terms that convey futurity, threat, warning, causality, states of knowledge: these establish a semantic context there are also function-words that appear nearby that don't carry semantic associations, but do establish that danger is a noun and can be the object of prepositions like to and the subject of prepositions like of, which would assist in the Google-word-prediction kinds of tasks.

So how do those words get located in this space? What does the spatial metaphor really mean?

We will go into the details much more fully, very soon. But for this initial orientation: This model of our corpus, in which each word is represented by a vector, is created through a training process, in which a software program works its way through the text, over and over, making observations about what words appear near one another essentially, the program is working towards building a model of the corpus that addresses the question If I have word X, what words are most likely to appear nearby? at each observation, it adjusts the position of the words to reflect its increased understanding of the corpus by the end of the training process, the model contains very detailed information about where each word is positioned relative to all or most of the others within the corpus's vocabulary: this information is more detailed the more thoroughly we do the training this training process can be varied depending on what actual task or insight or research we're trying to support: if we are Google and we're trying to develop text prediction systems, the most interesting words will be the single word right after word X. On the other hand, if we're digital humanists and we're trying to understand discourse more generally, the words surrounding word X might all be equally interesting. And in fact different researchers might be interested in the words very close to word X (words that suggest how syntax behaves) or in the words more loosely associated (which might suggest conceptual connections)

Multidimensionality

You may be thinking, as I did, words have many different associations: if location in space is representing the semantic affiliations of each word, how can a word be in multiple places at one time? In three-dimensional space, this would indeed be very difficult but in our word vector model, there are enormous numbers of dimensions; very difficult to picture

In this diagram, on the left, the word bank has two associations: with the semantic space of money, and with the semantic space of rivers in this very simple view, each of those relationships is expressed as a single dimension (the river association is on the y axis and the money association is on the x axis each line only has dimensionality/distance on that one axis, and the location of bank is thus defined by two dimensions (easy to draw on a slide)

On the right, we have a more complicated situation: the word set has many more associations. We can't draw an equivalent diagram, but we can still imagine: each relationship is on a single, distinct dimension there are just way more than two or three of these dimensions (we have to imagine them all sprouting off in five-dimensional space) and the position of set is defined by five dimensions so it's not that the word is in five different places at a time, but rather that its unique location within this cloud of vectors is based on information about those five relationships

If this feels baffling right now, don't worry--in my experience this idea takes a little time to sink in. Let it sit in your mind as a metaphor for now: a big cloud of words, with neighborhoods of related words; closer words are more closely related.

Questions at this stage?

You may be thinking, words have many different associations: if location in space is representing the semantic affiliations of each word, how can a word be in multiple places at one time? In three-dimensional space, this would indeed be very difficult but in our word vector model, there are enormous numbers of dimensions; it's very difficult to picture

If this feels baffling right now, don't worry--this idea takes a little time to sink in. Let it sit in your mind as a metaphor for now: a big cloud of words, with neighborhoods of related words; words that are closer are more closely related semantically.

Factors that affect the behavior of the model

Size of the corpus: a larger corpus supports more precise word positionings for uncommon words

Content of the corpus: genre? uniformity of language?

Preparation of the corpus: correction of errors (e.g. from OCR) elimination of noise

The training process: parameters!! (coming up soon)

I mentioned earlier that we need to be attentive and critical about how this model is created; there are a number of things that affect how a word embedding model will perform for us.

The size of the corpus matters a lot (and you'll remember that we specified that you had to have at least a million words): this is because the training process, where we actually create the model, starts from zero information: everything the model knows about where words are located, it learns from that training process, which goes through the text and observes what words are near what other words for common words, the training process gets a lot of data very quickly, but for uncommon words there's less information available it takes a certain minimum size corpus to provide enough information about each word (from repeated usage) to make the model reasonably accurate in its representation of less common words what other factors might be in play here? When might we be able to get away with a smaller corpus?

The content of the corpus also matters a lot: what if you have a corpus where there are no common words? (what would be an example of such a corpus?) what about a corpus in multiple languages? some genres are much more vocabulary-dense than others: for instance, poetry has more uncommon words, less filler; novels use more commonplace words; a corpus of technical documents might have a very large proportion of uncommon words (how might that affect our model?)

The data preparation also matters a lot (and we're going to spend two whole sessions on this later on): remember that a word here is any token, any string of characters with space around it, so if the text has lots of typographical errors, each incorrect word will still count as a unique word; how might that affect our model? similarly, our corpus might contain things like page numbers, stage directions, running headers: would those be useful? inconvenient?

And finally, the training process matters: during the training process, we can control various settings that affect what observations are made about the texts, and how that information is used we will also explore this at greater length over the next few days

As mentioned earlier, we need to be attentive and critical about how this model is created; there are a number of things that affect how a word embedding model will perform for us.

The size of the corpus matters a lot (and you'll remember that we specified that you typically have to have at least a million words): this is because the training process, where we actually create the model, starts from zero information: everything the model knows about where words are located, it learns from that training process, which goes through the text and observes what words are near what other words for common words, the training process gets a lot of data very quickly, but for uncommon words there's less information available it takes a certain minimum size corpus to provide enough information about each word (from repeated usage) to make the model reasonably accurate in its representation of less common words what other factors might be in play here? When might we be able to get away with a smaller corpus?

Comparison with other forms of text analysis

Other forms you might have heard of: Word frequency analysis and concordancing (for instance, Voyant Tools) Topic models

As part of our orientation, it may also be helpful to situate word embedding in relation to some other kinds of digital analysis we may already be familiar with; all of these are ways to get an understanding of texts at scale

Has anyone here already experimented with word frequency, for instance with Voyant tools? Just what it sounds like: computing the frequency of different words in the corpus, possibly comparing frequency of words between different texts including their relative frequency (that is, frequency that has been normalized, such as frequency per thousand words) useful as a way to get a sense of the vocabulary of a text can be used even on small collections and individual texts

How about topic models: has anyone used those? For instance, tools like Mallet? Topic models are closer to word embedding models They are trained models: that is, we go through a training process that examines a text corpus and generates a model based on it A topic model assigns words to topics based on their occurrence within the same document: it gives you a view of the document collection that represents the topics or patterns of word collocation that appear in them but it doesn't pay attention to where they occur within that document: it treats the whole document as a single bag of words A topic model can be generated from a small text collection

What's distinctive about word embedding models: they give you a view of semantic relationships and spaces within the model (i.e. the corpus) as a whole they pay much closer attention to word proximity than topic models do: they use information about the immediate context of a word they don't pay attention to individual documents during the training process (and there's no way to get back to the individual documents once the model is trained) they require a much larger corpus to get meaningful results they give us much more information about the semantics of individual words, whereas topic models mostly give us a view of the topics rather than the individual words in the topic

The larger question of what word embedding models are distinctively good for is one that we will explore as a group in the rest of the institute!

Have you ever experimented with word frequency, for instance with Voyant tools? This type of analysis is just what it sounds like: computing the frequency of different words in the corpus, possibly comparing frequency of words between different texts including their relative frequency (that is, frequency that has been normalized, such as frequency per thousand words) this method is useful as a way to get a sense of the vocabulary of a text it can be used even on small collections and individual texts

How about topic models: is this methodology familiar to you? For instance, tools like Mallet? Topic models are much more similar to word embedding models than word frequency counts Topic models are trained models: that is, we go through a training process that examines a text corpus and generates a model based on it A topic model assigns words to topics based on their occurrence within the same document: it gives you a view of the document collection that represents the topics or patterns of word collocation that appear in them but it doesn't pay attention to where they occur within that document: it treats the whole document as a single bag of words A topic model can be generated from a small text collection

The larger question of what word embedding models are distinctively good for is one that we will explore later on in this walkthrough!

Disclaimers! Questions?

Questions?

I should note here: we have been working hard to understand word embedding models and develop this curriculum; however, the underlying math is undeniably challenging. At some points in the next few days, I anticipate that you're going to have questions that we actually can't answer, because we haven't yet fully mastered that deeper layer. We're going to treat these as learning and teaching moments! After all, these are also questions that our students and colleagues will be asking us. So part of what we're exploring here is how to understand the boundaries of what we know, and how to respond effectively based on that knowledge, whatever level we may be at.

Questions at this stage?

As a quick disclaimer: we have been working hard to understand word embedding models and develop this curriculum; however, the underlying math is undeniably challenging. At some points we anticipate that you're going to have questions that will not be easily answered or covered in detail in this walkthrough because we haven't yet fully mastered that deeper layer. Let's treat these as learning and teaching moments! After all, these are also questions that our students and colleagues will be asking us. So part of what we're exploring here is how to understand the boundaries of what we know, and how to respond effectively based on that knowledge, whatever level we may be at.