Hello! Right now I’m back to working on neural networks with sketches of faces.
current goal: cluster the faces somehow
As a starting point, I thought it’d be fun to, instead of generating faces, get the neural network to do some unsupervised clustering of the faces! The idea is:
- Get the Machine Learning to cluster the faces into groups
- See if I like the faces in some groups more than others
- If I do, then maybe just train a model on the cluster of faces that I like
how do you actually do clustering of a sequence of vectors though?
The usual way to do clustering is with k-means or something, but these drawings of faces aren’t a single vector, they’re a sequence of vectors! So k-means wouldn’t make any sense.
I Googled “rnn unsupervised clustering” a little bit and learned about a way to do this: autoencoders!
It seems like the way an autoencoder works at a high level is:
- Create “encoder” RNN that translates the input into a lower-dimensional vector (like 4 dimensions or something)
- Create a “decoder” RNN that translates the 4-dimensional
Train both of them together, with the objective function being something like:
loss = F.cross_entropy(decoder(encoder(input)), input)
where we try to get
decoder(encoder(x)) as close to
x as possible.
I found a tutorial on the PyTorch wiki talking about how to use this encoder / decoder pattern to do translation from French to English.
questions I still have about autoencoders
I’m still pretty confused about how this encoder / decoder pattern actually works, and I didn’t get very far on this today. So here are some questions in the hopes that I can answer them tomorrow!
- when training, do I need to embed my original input vector into a higher
dimensional space (with a
nn.Embedding)? (I don’t think so, because they’re vectors and not integer labels, but I’m not sure)
Encoderclass in the translation tutorial outputs 2 vectors, an output and a hidden vector. Which one is the encoding, the output or the hidden vector? (or both???)
- Should my hidden vector have a lot of dimensions (like 50), or should it have the same number of dimensions as I want classes to categorize my faces into (like 5?)
- Both of the examples I’m looking at use a
relufunction as part of their neural networks. What does
tomorrow: maybe write a toy autoencoder!
Maybe tomorrow I’ll try to do a simpler autoencoder example with some toy data, I think that might clarify things for me! As always trying to use a technique I don’t understand at all with a complicated dataset is really confusing and demoralizing, I think if I simplify the dataset a LOT it should go better.