Introduction to Word Vectors
Julia Flanders
2019-04-01
A Road Map
As we’ve already seen, word vectors are complicated...
The next few sessions are intended to offer an overview, from several different angles:
- a walk through the special concepts and terminology so that we’re all comfortable
with them
- a walk through the actual process of training and querying a model
- an exploration of the mathematical side of word embedding models: what do we mean
by vector space?
- a review of the tool set we use with word embedding models: what are the actual technologies
we use and what role do they play?
Hopefully by the end, we’ll have gone over the same material from enough different
perspectives that it will all make perfect sense!
And at various points, we’ll take a step back and think about the explanatory process
itself: what kinds of explanation might work best for different audiences (especially
readers of our scholarship, project collaborators, colleagues, grant reviewers, also
potentially our students)
Corpus and model
We’re going to hear the terms corpus and model a lot this week: let’s look more closely at those terms
Corpus:
- In the simplest sense, the corpus is the body of textual material we are analysing
- A set of documents in some machine-readable form, that is ready for the word2vec program
to ingest
- Our corpus might be derived from a larger research collection (or several different
collections), maybe in another format (like TEI/XML) that contains extra information
that we take advantage of when we generate the corpus that will be fed into Word2Vec
- So to get from the research collection to the corpus we might need to do some data conversion: from XML (or some other format) to plain
text (which is what the Word2Vec tool requires)
- And we might also need to do some cleaning and regularization, to tame the irregularities
of the original research collection. A little later on, we’ll think about data formats
and cleaning processes in more detail.
- So when we talk about the corpus here, we’re talking about the plain-text corpus that is ready to be fed into Word2Vec
Model:
- As we’ve already noted, the term model is an important one in digital humanities: in general terms, a model is a representation
of something we are interested in, that captures some features of importance, in a
way that makes it easier for us to examine and learn about that object of interest.
So for instance a TEI-encoded text is a representation of a text that makes the structure and content of that text easier for us to see and work with.
A word-embedding model of a corpus is a representation of a corpus of texts, in a way that makes the semantic relationships between words easier for us to see
and work with.
- Practically speaking, the model we will be dealing with is a processed version of the corpus, produced by the Word2Vec
tool, which represents the positioning of each word within the model as a vector
- So for now the key point is: the corpus is a collection of documents, while the model
is a processed, computed representation of the textual data contained in those documents
The data preparation process is how you get from the research collection to the corpus
The training process is how you get from the corpus to the model.
Parameters
Remember that we said different researchers might want to use the model for different
things, which would result in training/generating the model somewhat differently.
The way we control that training process is by adjusting a set of parameters.
You can think of the training process (where we take a corpus and create a model of
it) as being sort of like an industrial operation:
- you take some raw materials and feed them into a big machine, and on the other end
you get out some product
- and this hypothetical machine has a whole bunch of knobs and levers on it that you
use to control the settings
- in our word2vec model training, the parameters are those knobs and levers, that control
the training process
- depending on how you adjust them, you get differently trained models with different
behaviours
We’ll take a quick look now at two of these parameters, so that you can get a sense
of how they affect the training process; they also have an important impact on how
we interpret the results of the model. Later in the week, we’ll look at these parameters
in more detail and think about the effect these specific settings have on our models.
Window
The first parameter for us to consider is the concept of the window
And here we come to a fundamental assumption for a lot of text analysis: that words
that are used together have something to do with one another
What does it mean for words to be used together?
- right next to one another? all or nothing?
- more relevant the closer they are? sort of a gradient?
- contained within the same semantic construct, like a sentence or a paragraph? (problem:
we’re working with plain text so we don’t have access to semantic constructs)
In Word2Vec, instead of these, we use a window:
- a span of text of a specified length, like a viewing port that we move over the text
that allows us to see X words at a time
- we can control the size of the window (it is one of the parameters we just talked about)
- the Word2Vec algorithm is like a bookworm reading its way through the text, bite by
bite
- each taste is localized by the window: each bite gives the processor a set of words
that are considered used together
- and the size of the bite affects how many words are considered together in this way
- a bigger window lets us treat larger groups of words as related
- [pause and discuss for a moment:] what might be the results for our analysis of a
larger or smaller window? (imagine a window that is thousands of words, as big as
an entire chapter; imagine a window that is only two words wide)
Remember that this is a machine learning process and moreover it is an unsupervised machine learning process: one that starts from a state of complete ignorance and has to bootstrap itself.
- So another way to imagine the approach being taken in the training process: picture
that you have a big bag containing all the words in the corpus. You shake the bag
and then dump it out on the floor. Now you start reading the corpus (i.e the actual
texts with their actual word order).
- Each time you read a word, you make observations about the words around it.
- Remember that this one observation doesn’t give you any kind of Truth!! 100%!! about those words: it’s just one little observed fact. Probabilistically, it contributes
a tiny bit to our belief about the whole corpus.
- So based on those observations about word X, we move each of the context words a tiny
bit closer to word X. Now we look at the next word X and its companions, and we move those words a little bit.
- note that the window is giving us two pieces of information: what’s in the window, and what’s not in the
window. We’ll come back to this in more detail later, but for now we can say that
in addition to moving the words we do see, we also update the position of some of the words we don’t see as we read the text.
Iterations
We’ve talked about the creation of a model as a training process, and we’ve just imagined it as a bookworm eating its way through the text,
repeatedly. The trained model is the representation of the probability that words
appear within the same window.
- As we just noted, the model begins in a state of complete randomness: words dumped
on the floor. But after one read through the corpus, the words on the floor have moved
around a bit. The machine is learning! Now, if we repeat the process, we can move
them a bit further--it might seem as if we’re getting the same information as we got
before, but because the words on the floor are now in different (better) positions
already, what we’re doing is refining that information further.
- each pass through the corpus provides another set of adjustments, making the model
more accurate
- each of these passes is called an iteration, and the more iterations the training process does, the more accurate the model (but
of course the more time the process takes)
- you can control the number of iterations: it is another of the parameters we mentioned a moment ago
Vectors: a first look
Let’s look next at some terms that may seem most distant from our humanistic expertise:
the ones that refer to the mathematical aspects of word embedding models. The word
vector has come up already: what is a vector and how is it relevant in this case? We’ll
start with a simple explanation first, and then circle back a bit later for more detail.
A vector is basically a line that has both a specific length and a specific direction
or orientation in space:
- we can describe that line using coordinates: one coordinate for each axis of information
we have about the line
- in this example, the vector is the thick black line that starts at the origin (the
point where all three axes are at zero) and extends out to the point in space where
the x axis (the blue number) is at 3, the y axis (the red number) is at 2, and the
z axis (the green number) is at zero
- its direction and length are defined by those three dimensions
- any questions? This may be new for some and probably not current knowledge for most
of us!
In a word-embedding model, the model represents a text corpus almost like a dandelion:
as if each word were at the end of one of the little dandelion threads:
- each thread projects at a slightly different angle
- each word is located at a slightly different point in this cloud of words
- and words that are nearer to one another in meaning are also nearer to one another
in vector space.
Cosine Similarity: What is a cosine anyway?
So what does it mean to be near something in vector space? How do we measure this kind of proximity or association?
If we understand these vectors as lines whose directionality and length reflects word
associations in the corpus, then the more closely aligned two vectors are (the more
they are going in the same direction for the same distance), the nearer they are for our purposes.
We can measure that alignment by using a mathematical expression called a cosine. What is a cosine?
- If we have two vectors (two lines extending out in different directions), what we
really have is a triangle (the third leg would be the line connecting the ends of
those two vectors)
- Within a triangle, a cosine is a way of representing an angle in relation to the lengths
of its two legs
- the exact formula (for right triangles) is shown here on the slide, but even without
parsing that in detail, we can say that the cosine of an angle is the ratio between
its two sides (for triangles without a right angle, the formula is a little more complex)
- So when those two sides are very similar in length and direction, the cosine is going
to get closer and closer to 1
Cosine Similarity
So now we can come back to our question of how to measure nearness. In word embedding models the measure of nearness that we use is something called
cosine similarity.
- Roughly speaking, this is a measure of the similarity of two vectors, based on the
cosine of the angle between them
- As we’ve seen, the more similar the two vectors are, the closer their ratio gets to
1. And in fact the values of cosine similarity range between zero and one: two identical
vectors have a cosine similarity of 1; two absolutely dissimilar vectors have a cosine
similarity of zero
- So the smaller the cosine similarity, the less similar the words are, and the farther
apart they are in vector space
- We’ll talk a bit later on about what level of similarity really counts as similar, and you’ll get a feel for it
- In general, anything above .5 starts to feel meaningful
So in this example (a real-world example from the WWP corpus), if we take the word
sacred as our starting point, the words holy and consecrated are fairly close in meaning (and have high cosine similarity); the word shrine is more distant but still related enough to be interesting
So far so good? Questions?
Querying
So what can we do with this information? We’ve created a model of our corpus (a representation
that helps us see some aspect of that information more clearly and easily): how do
we use it?
The first thing we might try is just querying the model about the neighborhood of
a word we’re interested in: essentially, asking it questions about where specific
words are located and what is around them:
- this slide shows a simple example using the WWP’s Women Writers Vector Toolkit, but
in this workshop we will be working in the RStudio environment that we looked at in
the last session, so we can design much more complex queries
- we can enter a search term, and get back a list of the words that are closest to it
in vector space: that is, words that are probably semantically related to it, based
on the way those words appear together in the corpus
- as we can see from this list, these aren’t necessarily synonyms: there are many different
ways words can be related as we will discover
- but they are words that tend to appear in the same contexts as our query term (in
this case, discussions of families and familial roles and relationships)
Clustering
Another way we can interact with the model is to ask it more generally, where are your semantically dense zones? Or please show me some clusters of related words!
This process is somewhat similar to topic modeling:
- it says What if we divided up the corpus into X different zones? where are the centers of
those zones, and what is nearest to those centers?
- just as in topic modeling where we say, in effect, if our corpus has X number of topics, what would they be?
- or if we were looking at a map of a region, we might say if we were going to build ten new Home Depot stores, where should they go so that
most people have the shortest drive? and who lives in those regions?
To generate these clusters (as part of the initial model training process):
- the modeling program runs a clustering algorithm that randomly chooses a number of
locations within the vector space—in this case, three (like throwing a set of three
darts at the map)
- then, it makes a series of adjustments to those locations to move them closer to actual
population centers, places where words are close to one another within the vector space
- if we kept up the adjustment process for a long time, we would eventually get a perfect
result: the three most significant semantic zones within the model
Clusters: an example
So what we get at the end of the process is clusters of words that are like neighborhoods within the vector space: densely populated areas where words are grouped together
around a concept or a textual phenomenon.
- in principle, the number of clusters is up to us
- in our own model training for this workshop, since we are working directly with the
R code, we can choose how many clusters we want
- the WWVT doesn’t give you this option but the WWP chose 150 as a reasonable number
- for the Toolkit we stop the process after 40 adjustments, so the clusters will come
out a bit different every time you reset them, but when running the code yourself
in RStudio you can control that more precisely.
- in this example, for instance (which shows three of the 150 clusters), there’s a cluster
that’s roughly associated with religiously-oriented death ceremony, and another one
that is old-fashioned cavalry-oriented warfare, but the one in the middle is harder
to describe as a concept: it’s more like the space of dialogue and spoken language markers
Check the time and consider stopping here!
Vector Math 1
One more thing we can do to explore the word information in our vector space model:
we can examine the relationships between words, taking advantage of the fact that
each word is represented as a vector, which is a kind of number
To understand how this works, we need to envision a little more clearly how words
are positioned in this vector space model:
- during the training process, the word2vec algorithm is examining the text (looking
through its little window at successive groupings of words)
- and with each observation, it adjusts the position of the words in the model to take
into account the word associations it observes
- so for a word like bank, it might observe some instances where that word is associated with words like funds and revenue, and so it moves bank closer to those words: it adds information that makes an association between these
words
- then maybe it observes some other instances where bank is associated with words like river and lake and Hudson, and it moves the word bank a little closer to those words
- so by the end, each word is positioned in vector space in a way that reflects its
associations (some weaker and some stronger) with many of the other words in the corpus
- we can think of each association as being like a rubber band that pulls a pair of
words together; each word is being pulled in multiple different directions, with different
strengths of association, and its net position is the result of all of those pulls
Vector Math 2
We can use this information to tease out more specific semantic spaces for individual
words:
- For instance, we might imagine that the word grace has some rubber band pulling it towards a word like beauty. What if we cut that rubber band? What part of the semantic field might pull grace more into their orbit if beauty were out of the running? We can find this out by subtracting the vector for beauty from the vector for grace: the result is a set of associations that are specifically religious
- Similarly, instead of cutting that rubber band, we might intensify its strength and
allow it to pull grace towards it more strongly (putting grace into a zone where its aesthetic associations are most powerful). We can do this by
adding the vector for beauty to the vector for grace.
Note that words here are just proxies or symptoms (imperfect ones) for the concepts
we might be interested in:
- As we think about what words to add or subtract, it’s important to think about how those words are related to the concept we’re
trying to examine (and it’s worth trying different words)
- Also, the semantic associations of words are very corpus-specific: in a corpus of
financial documents, the term grace might be exclusively associated with the grace period for bill payment
- So knowing our corpus is really crucial
Validation
As we use our model in these various ways, we’re going to get some results (hopefully)
that look very predictable, and some others that look provocative and fascinating,
and maybe some others that look bizarre and unexpected. How can we tell the difference
between an interpretive breakthrough and a glitch resulting from some terrible flaw
in our training process?
Once we’ve generated a model, there are ways we can and should test it to see whether
it is actually a useful representation that will give research results we can use.
That testing process is called validation. To validate a model, we can ask questions like these:
Are your results consistent across models?
- When you train a series of models on the same corpus using the same parameters, do
you get consistent cosine similarities for the same sets of words?
- (Note: because training a model is a probabilistic process, you won’t get identical
results from model to model, even if they’re trained on exactly the same corpus, but
the results should be comparable.)
Do you get plausible word groupings?
- When you generate groups of similar terms (either by generating clusters, or by querying a specific word), do you get
plausibly related groups of words for common and moderately common query terms? (Common
within your corpus, that is!)
- If you don’t get plausible groupings for moderately common words, this would be a
sign to proceed with caution; if you don’t get plausible groupings for even common
words, this would be a sign that the model may not be very useful (this might be because
of small corpus size, or some other factor).
Does vector math work as you would expect?
- When you do the various forms of vector math (addition, subtraction) do your results
continue to seem plausible?
If we didn’t stop before, consider stopping now!
Circling back: another look at vectors
Now that we’ve worked through the basic concepts, let’s circle back and consider the
whole picture of word vectors or word embedding models, and introduce a few additional complexities.
[if starting the day here, check in and see if people want to recap anything]
A quick review: we’ve already noted that a vector is basically a line that has both
a specific length and a specific direction or orientation in space:
- so here again in this example, the vector is a line that starts at the origin (the
point where all three axes are at zero) and extends out to the point in space where
the x axis is at 3, the y axis is at 2, and the z axis is at zero
- we can think of those three axes as representing three pieces of information: together,
they constitute a unique vector in three-dimensional space.
- I’m going to pause here for a moment and let the diagram sink in a bit more, because
at this stage in the explanation, it helps to have a sense of what the diagram is
telling us. Does anyone want to test out their understanding of how those three axes
(the x, y, and z) are contributing information to the direction and distance of that
vector? Does everyone see how the blue number 3 comes from the blue x axis? etc.?
Words as vectors
The example we were just looking at shows a vector defined by three dimensions: three
different numbers representing three different axes of meaning. However, when we’re
working with word embedding models, we are working with vectors that are defined by
many more dimensions. So in order to understand that scenario, we need to get a little
more comfortable with two ideas:
- a vector is just an assemblage of dimensions
- each dimension represents an association that has been observed
So let’s take the first example on this slide (the idea may look familiar if you read
the Jay Allamar Illustrated Introduction to Word Embeddings):
- Our little chart here shows three people (Jo, Lee, Robin) and for each person it shows
an assemblage of dimensions
- each dimension represents an association
- in this case, that association has to do with the person’s affinity for specific animals:
perhaps through observations of how many pets of each type they have, or their response
when they encounter the animal in the wild
- So each person in this chart is represented by a vector with five dimensions: a line
in five-dimensional space
- if we want to compare two people and find out whether they tend to like the same animals,
impressionistically we can say that people with high or low affinity for the same animals are similar: the color coding is highlighting this
pattern.
- But if we want a quantitative way to talk about that similarity, we can use the measure
called cosine similarity, which is a way of measuring the angle between two lines.
- Here, we’re doing exactly the same thing, except that our lines are defined by five dimensions instead of three.
- the calculation isn’t hard (you can find an Excel version on the web!) and what it
shows us is that the cosine similarity between our two mammal-lovers is very high,
whereas the similarity between the mammal-lovers and the person who prefers birds/lizards/beetles
is quite low.
Pause for questions and reflection!
So, taking this a step farther, let’s look at the chart on the right:
- Here, instead of looking at people and their association with animals, we’re looking
at words and their association with other words
- those other words have been observed in proximity (in the window) with our target word, to a greater or lesser extent
- we’re not giving numbers here, but imagine that the green boxes are the ones that
were observed more often, and the orange boxes are the words that were observed less
often, and maybe the greenish-yellow boxes were somewhere in the middle
So what do we see when we look at the righthand chart?
- What kind of cosine similarity would we expect to find between danger and peril? A high or low similarity?
- How about between danger and horses? Horses and goats?
A few interesting things to note:
- all of the words are contributing information to each of the vectors, even when the
actual association observed is low (I’ll come back to this in a minute)
- and in fact the chart goes way off the edge of the screen to the right: there could
in principle be hundreds of words contributing to the distinctive vector that is danger
Negative Sampling
So let’s now add another concept. Cast our minds back to the little bookworm eating
through the corpus, making observations about the words that are near the target word, and adjusting the position of the words within the model. The information
about those words that it observes is being fed into our little chart here. But how
about the words that aren’t being observed?
We mentioned earlier that these are also significant. When the bookworm takes a bite,
there are a huge number of words that are not in that sample, and the model training
process could (in principle) use that information to adjust all of the words in the
corpus, moving them away from the target word. In practice, it doesn’t adjust all of the words (since that would be too much work) but it adjusts
some of the words: a random sample. This is called negative sampling, and it is one of the parameters we can adjust: we can say how many of these non-appearing
words should have their positions updated with each observation. If we have a large
negative sampling value, the model training will be more precise, but the training
process will take a lot longer.
Looking again at our chart: If time and computing power were no object, we could imagine
the chart extending off to the right so that every word in the corpus is listed, and
we could imagine the position of every word in the model being adjusted with each
observation, so that both the positive and negative sampling information would be
fully reflected in the model. We could think of this situation as a kind of perfect model:
- showing all words exerting some probabilistic influence on each other
- in terms of text prediction, all words have some probability of being the next word even if that probability is very, very low
- in this perfect model, the vector for each word has as many dimensions as there are words in the
corpus
Let’s test this idea a little further:
- imagine that the window is the size of the corpus: now all words are related to all
other words equally! Let that sink in for a moment: our understanding of the relatedness of words is strongly determined by our observational parameters: it isn’t intrinsic,
it’s something we control.
- And in fact in some forms of unsupervised modeling, like a topic model, which operates
on the whole document, the window size is in effect the entire document: the model
training process says which words appear in the same document?
- But in word embedding models, our concept of relatedness is a bit more precise than
this: we are interested in things that are happening more at the sentence or phrase
level, where the association between words reflects the way writers are actually articulating
specific ideas
One more look at our perfect model:
- note that it contains a lot of empty space: places where we are noticing that in fact
the word toothbrush is not related to the words danger, horses, etc.
- without getting too far into the weeds, it turns out that this empty space is a problem:
largely because it makes the data set very, very large.
So what do we do about that?
Embedding!
To make the model more compact, and hence easier to process while you wait, clever
people developed a technique called embedding which flattens the model: reducing it from a very large number of dimensions (like,
thousands) to a somewhat smaller number of dimensions (like, hundreds).
For those of you who may have read Edwin Abbott’s Flatland, you might remember how when a sphere visits Flatland, the two-dimensional creatures
there see it as a circle: a three-dimensional entity flattened or projected onto two dimensions. Something similar sometimes happens to Wily Coyote.
We are not going to cover the mathematics of it, but we will look at a few effects/results.
In simple terms:
- in our perfect model, remember that the position of each word in the model is a vector, and that
vector is essentially a complicated multidimensional number
- each of the dimensions of that number is another word in the corpus, one number for each word (even the
unrelated words)
- in the flattened version, that is no longer true: the position of each word is still a vector, but
that vector’s dimensions are no longer individual words, and the number of dimensions
is no longer the total vocabulary of the corpus.
- instead, we choose the number of dimensions, as one of the parameters for the training
process
- the embedding process then compresses the model down to that number of dimensions, and reduces
the empty space of the unrelated words.
So by specifying the number of dimensions, we are in effect specifying how many other
words each word’s position takes into account:
- if we choose a very low number of dimensions, the model will have very little information
about the word relationships within our corpus
- if we choose a very high number of dimensions, the model will have a lot of information
about the word relationships in the corpus
- however, the sweet spot is also going to depend on the total size and total vocabulary of the corpus: for
a corpus with a tiny vocabulary, a large number of dimensions may not be very useful.
I’m afraid there’s a little ’magic happens here’ at this stage--the mathematical details
are a little out of scope for this institute, but there are some good sources in the
readings for those who want to understand this more fully.
The word vector process: Data preparation
So another way to put this all together is to walk through the entire process in order,
step by step. There are basically three major acts in this drama, very much like a
classic comedy
In the first act, we set up the problem and introduce the main characters:
- We analyse our problem and establish a set of research questions we want to focus
on
- We gather a corpus of documents that are relevant to this research; at this stage
they may be a motley bunch cobbled together from various sources, with differing quality,
accuracy, transcription conventions, etc.
- And we might do some data cleanup on the corpus to improve consistency or make the
data better suited to our research: for instance, filtering out unnecessary information
like page numbers, or regularizing/modernizing spelling
As part of this process, we might discover things that cause us to reassess or expand
our research question: so it’s helpful to keep an open mind and be prepared to treat
this as an iterative process.
The word vector process: Training the model
In the second act, we get the real meat of the plot: in this case, the process where
we train our model and create a vector space representation of our corpus:
- First, we set the parameters for the training process: we choose the window size (which
does what?); we set the number of iterations (which does what?); we set the number
of dimensions (which does what?) and we set the negative sampling (which does what?)
- Second, we actually run the training process: our little caterpillar eats its way
through the corpus, taking bigger or smaller bites (depending on window size), the
number of times through depends on the number of iterations we set.
- And third, we validate the model: we test it for plausibility
The word vector process: Iteration and refinement
As before, this is an iterative process!
- When you’re first training a model, it’s a good idea to try different parameter settings
just to find out what difference they make
- And when you validate the model, you might see something that prompts you to go back
and change a parameter and try again: for instance, with a very small corpus, you
might need to do extra iterations (because with a small corpus, there isn’t as much
information being generated about word relationships during each iteration, so you
need to run the process more times to get the same level of accuracy)
- And the model training process might in turn send you back to the corpus: you might
discover that your corpus is just too small and you need to go back and add some more
materials. Or you might find that your corpus is too heterogeneous: maybe you’d like
to try splitting it into two and treating them separately.
The word vector process: Querying and research
In the final act, as with a proper comedy, we reach resolution and answers: this is
where we can start querying our model and doing our research (although as we’ve seen,
the corpus-building and model-training processes are also definitely integral to the
research process)
Tools for word embedding models
To wrap up this session, let’s take a quick look at the tools we use for working with
word embedding models
We can arrange them in order of abstraction:
- the most foundational tool in this set is the word embedding algorithms themselves. These are mathematical processes
that perform computations that generate a word embedding: a representation of a corpus as a vector space that has been squashed or flattened in useful ways. The two main word embedding algorithms in common use are Word2Vec
(developed by Tomas Mikolov at Google) and GLoVE (developed by a research group at
Stanford). For this workshop, we are using Word2Vec.
- When we want to actually run those algorithms on our data, we need to have a computer
program that will do things like read in the corpus, run the algorithm on it, allow
us to set parameters, etc. We could write one ourselves if we were clever that way
but there already exist specific software packages we can use: specific implementations
of the word embedding algorithms. Two in common use are the WordVectors package (written
in R by Ben Schmidt) and the GenSim package (written in Python by a Czech researcher,
Radim Řehůřek). For this workshop, we are using the WordVectors R package.
- In order to run these programs on your computer, you need to have an environment within
which the programming language (R, Python) can operate: something that understands
the R or Python language and can run it within the operating system on your computer.
These software environments are sort of like sandboxes or life support systems for
specific languages. Examples include RStudio, which is an environment for working
in the R programming language and running R code, and Jupyter Notebooks, which are
an environment for working in the Python programming language and running Python code.
Within these environments, we can train models and we can also query and interact
with them.
- An added option (which we’re only touching on briefly in this workshop) is the Women
Writers Vector Toolkit, which is a set of programs that create a web interface for
Word2Vec, and allow you to query the trained models without having to use RStudio
or interact directly with any of the underlying layers
Those layers are all sitting underneath us and they each have effects on the outcomes
of our work:
- the environment we’re working in is the result of a number of layers of decisions
that could have been made differently
- and even if you don’t want to make a different decision, in a teaching context you
might want your students to understand the effects of a different set of choices
- so over time, you may want to revisit them as you gain more familiarity and comfort
with these tools
- the important note to end on here is that this workshop is intended to be a starting
point
- the things we observe about word vectors and how they work are not universal, but
local and situational; however, we can learn a lot from these experiments
Discussion and questions
So now let’s take a step back, with this more detailed perspective:
- Are there further questions or things that need more explanation?
- Any mew perspectives on the examples we saw earlier?
- Any reflections on the explanatory process? What worked and what didn’t?