Hello! Right now I’m back to working on neural networks with sketches of faces.

### current goal: cluster the faces somehow

As a starting point, I thought it’d be fun to, instead of generating faces, get the neural network to do some unsupervised clustering of the faces! The idea is:

- Get the Machine Learning to cluster the faces into groups
- See if I like the faces in some groups more than others
- If I do, then maybe just train a model on the cluster of faces that I like

### how do you actually do clustering of a sequence of vectors though?

The usual way to do clustering is with k-means or something, but these drawings of faces aren’t a single vector, they’re a sequence of vectors! So k-means wouldn’t make any sense.

I Googled “rnn unsupervised clustering” a little bit and learned about a way to do this: autoencoders!

It seems like the way an autoencoder works at a high level is:

- Create “encoder” RNN that translates the input into a lower-dimensional vector (like 4 dimensions or something)
- Create a “decoder” RNN that translates the 4-dimensional

Train both of them together, with the objective function being something like:

```
loss = F.cross_entropy(decoder(encoder(input)), input)
```

where we try to get `decoder(encoder(x))`

as close to `x`

as possible.

I found a tutorial on the PyTorch wiki talking about how to use this encoder / decoder pattern to do translation from French to English.

### questions I still have about autoencoders

I’m still pretty confused about how this encoder / decoder pattern actually works, and I didn’t get very far on this today. So here are some questions in the hopes that I can answer them tomorrow!

- when training, do I need to embed my original input vector into a higher
dimensional space (with a
`nn.Embedding`

)? (I don’t think so, because they’re vectors and not integer labels, but I’m not sure) - The
`Encoder`

class in the translation tutorial outputs 2 vectors, an output and a hidden vector. Which one is the encoding, the output or the hidden vector? (or both???) - Should my hidden vector have a lot of dimensions (like 50), or should it have the same number of dimensions as I want classes to categorize my faces into (like 5?)
- Both of the examples I’m looking at use a
`relu`

function as part of their neural networks. What does`relu`

mean?

### tomorrow: maybe write a toy autoencoder!

Maybe tomorrow I’ll try to do a simpler autoencoder example with some toy data, I think that might clarify things for me! As always trying to use a technique I don’t understand at all with a complicated dataset is really confusing and demoralizing, I think if I simplify the dataset a LOT it should go better.