Code Walkthroughs

The following notebooks provide code and instructions for training and querying models using the wordVectors R package developed by Ben Schmidt and Jian Li. These versions have been formatted for the web to make them easier to read. To download and run the RMarkdown versions, see the latest release on GitHub. To see the full set of code notebooks, along with addtional context and instructions, see the GitHub repository. Some of the notebooks are designed for RStudio Server, others for RStudio Desktop, and some can be used with either.

  • Introduction to R and RStudio gives an introduction to the basic concepts of the R programming language and the RStudio programming environment. It can be used in both the RStudio Server environment and on your own computer.
  • Word Vectors Starter Queries provides a framework for querying a model that has already been trained (it does not include the code for the model training process). It assumes that you are working in the RStudio Server environment, so it does not include code for loading in external code packages, since those are provided within the Server environment.
  • Word Vectors Training, Querying, and Validation provides a full framework for the entire process of training, query, and validating a model. As with the Starter Queries walkthrough, it assumes you are working in the RStudio Server environment, so it does not include code for loading external packages, and it does include instructions for getting your output files out of RStudio Server and onto your own computer, and for loading your files into the RStudio Server environment.
  • Word Vectors Installation, Training, Querying, and Validation covers the same functionality as the walkthrough above, but it assumes you are running the walkthrough on your own computer rather than in RStudio Server.
  • Word Vectors Visualization provides more detailed code for visualizing an existing trained model, working through a set of example plots with the Women Writers Online collection. It can be used in both the RStudio Server environment and on your own computer.

Additional Resources

We’ve also published supplemental worksheets on the Women Writers Project “Resources” page:

Python Resources

In addition to the R walkthroughs, we also offer a set of Python notebooks that provide substantially the same content, also available on GitHub:

  • Introduction to Python provides an overview of fundamental concepts in the programming language Python that are necessary for the subsequent notebooks. The notebook assumes that users have an understanding of basic programming concepts but perhaps not Python specific knowledge.
  • Introduction to Word Vectors in Python provides an introductory framework for importing data, cleaning data, training a Word2Vec model, querying that model, and finally evaluating that model. The notebooks use a sample dataset of nineteenth-century American recipes which has been included with the directory. This sample dataset can be modified.
  • Exploratory Visualization With Word2Vec provides a framework for exploratory visualization techniques using Word2Vec. The notebook uses a sample model provided with the directory, but this sample model can be swapped out with another.
  • Further Explorations of Word Vectors in Python elaborates on the Word2Vec notebooks above to provide possibilities for further analysis as well as to discuss the broader world of machine learning that Word2Vec is a part of.
  • Evaluation Guide for Word Embedding Models provides a process for evaluating word embedding models and outlines some of the considerations involved in developing model testing routines.

Read more about these files in the repository, or download the full set in this release.