Category Archives: AI

What is One-Hot encoding? And why do you want to use Embeddings instead of one-hot encoding?

Whenever you have some sort of categorical data (labels), that have no particular order and/or cannot be ordered (you don’t know their order, you cannot assign greater/smaller values), but you need them to be numerical, you can devise a set of new features – basically, you create a binary set of features, that have one only when the original features has the correct value, otherwise it is set to be zero. This means, that you end up with a number of binary vectors that is equal to the number of categorical data.

A good intro is available here: https://www.educative.io/blog/one-hot-encoding

As these encodings may explode in your attribute space (imagine, for each word, for each tense, you get new feature/attribute), a technique used is “embeddings”. This technique in essence takes concepts and puts them on the axis of the n-dimensional space. Then words are put in this space, with the end-goal, that similar words are put together. Note that I used concept and similar words notions – these can be different, but the end-goal is always the same – similar words are always close-by. One can use different encodings on the axis.

An informative source is https://en.wikipedia.org/wiki/Word_embedding, while one of the better known approaches is word2vec.

Always forgetting Precision vs. Recall

As a constant reminder, because I keep forgetting them, I revisited this nice blog here: https://shiffdag.medium.com/what-is-accuracy-precision-and-recall-and-why-are-they-important-ebfcb5a10df2

Basically, these sentences are something that should be kept in mind:

PrecisionWhat proportion of positive identifications was actually correct? => Precision = TP/(TP+FP)

RecallWhat proportion of actual positives was identified correctly? => Recall = TP/(TP+FN)

What is accuracy? Accuracy = TP+TN/(TP+TN+FP+FN)

Basically, always construct a confusion matrix, if you can.

Working with Jupyter notebooks in VSCode

Well, there are some nuances, when selecting this way of working. First thing first – www.jupyter.org – the main website of these notebooks. Don’t forget to read what they enable, how they operate. In a nutshell – well documented code, that can be nicely presented etc.

The prerequisites are:

  • Installed VSCode
  • Installed Conda environment with which you run your notebook

First, you will have to install all extensions in VSCode that enable you to view, operate jupyter notebooks. Don’t worry, VSCode will offer you some, select the ones that are trusted, i.e., from Microsoft.

So, you are able to load the notebook. Next step is to select the right conda environment for them to be run in. You can select it on the right (next figure) – I selected dlwp. You can also use the old trick with Cmd+Shift+P.

If something is amiss, you will be notified once each cell is run. For me, I had to install ipykernel in the dlwp environment.

conda install -n dlwp ipykernel --update-deps --force-reinstall

conda install -c conda-forge ipywidgets

How to install huggingface libraries

When you want to play with Natural Language Processing (NLP), you want Transformers – developed by Huggingface. Currently, this is SOTA and is used also outside NLP field (with great success).

So, you have created your conda environment and you installed pytorch libraries in it. Now, it is time to install transformers – a library.

First, the terminal install is described here. For brewity:

conda install -c huggingface transformers

Side note: -c means channel, so this command will install transformers, from the channel huggingface.

And when (not IF, but WHEN) you will get into trouble (like you will be missing datasets:

pip install datasets

Conda environment

You may think of Conda as the venv in Python – coarsely speaking, offers the same functionality. It is useful for creating virtual environments, which can be used with Python – a special set of libraries, etc.

You can get it here: https://docs.conda.io/en/latest/miniconda.html

How to initialize Conda

Assuming you have downloaded Conda and you are in fresh terminal, you can create your first environment. I will call it dlwp, as I am working with this environment.

conda create -n dlwp

conda activate dlwp

conda deactivate

How to make Conda play nice with VS Code

In VSCode (assuming all Python extensions are installed), you install Python Environment Manager, an extension that allows to view and organise virtual environments for Python. How do you access it? CMD+Shift+P then select environment you want. You can set it as default for your workspace too.

Simple 😀

And of course, how to list all the environments

conda env list