Whenever you have some sort of categorical data (labels), that have no particular order and/or cannot be ordered (you don’t know their order, you cannot assign greater/smaller values), but you need them to be numerical, you can devise a set of new features – basically, you create a binary set of features, that have one only when the original features has the correct value, otherwise it is set to be zero. This means, that you end up with a number of binary vectors that is equal to the number of categorical data.
A good intro is available here: https://www.educative.io/blog/one-hot-encoding
As these encodings may explode in your attribute space (imagine, for each word, for each tense, you get new feature/attribute), a technique used is “embeddings”. This technique in essence takes concepts and puts them on the axis of the n-dimensional space. Then words are put in this space, with the end-goal, that similar words are put together. Note that I used concept and similar words notions – these can be different, but the end-goal is always the same – similar words are always close-by. One can use different encodings on the axis.
An informative source is https://en.wikipedia.org/wiki/Word_embedding, while one of the better known approaches is word2vec.