What is One-Hot encoding? And why do you want to use Embeddings instead of one-hot encoding?

Whenever you have some sort of categorical data (labels), that have no particular order and/or cannot be ordered (you don’t know their order, you cannot assign greater/smaller values), but you need them to be numerical, you can devise a set of new features – basically, you create a binary set of features, that have one only when the original features has the correct value, otherwise it is set to be zero. This means, that you end up with a number of binary vectors that is equal to the number of categorical data.

A good intro is available here: https://www.educative.io/blog/one-hot-encoding

As these encodings may explode in your attribute space (imagine, for each word, for each tense, you get new feature/attribute), a technique used is “embeddings”. This technique in essence takes concepts and puts them on the axis of the n-dimensional space. Then words are put in this space, with the end-goal, that similar words are put together. Note that I used concept and similar words notions – these can be different, but the end-goal is always the same – similar words are always close-by. One can use different encodings on the axis.

An informative source is https://en.wikipedia.org/wiki/Word_embedding, while one of the better known approaches is word2vec.