Skip to content

Frameworks

Neural Network Frameworks

As we have learned already, to be able to train neural networks efficiently we need to do two things:

  • To operate on tensors, eg. to multiply, add, and compute some functions such as sigmoid or softmax
  • To compute gradients of all expressions, in order to perform gradient descent optimization

Pre-lecture quiz

While the numpy library can do the first part, we need some mechanism to compute gradients. In our framework that we have developed in the previous section we had to manually program all derivative functions inside the backward method, which does backpropagation. Ideally, a framework should give us the opportunity to compute gradients of any expression that we can define.

Another important thing is to be able to perform computations on GPU, or any other specialized compute units, such as TPU. Deep neural network training requires a lot of computations, and to be able to parallelize those computations on GPUs is very important.

✅ The term 'parallelize' means to distribute the computations over multiple devices.

Currently, the two most popular neural frameworks are: TensorFlow and PyTorch. Both provide a low-level API to operate with tensors on both CPU and GPU. On top of the low-level API, there is also higher-level API, called Keras and PyTorch Lightning correspondingly.

Low-Level API TensorFlow PyTorch
High-level API Keras PyTorch Lightning

Low-level APIs in both frameworks allow you to build so-called computational graphs. This graph defines how to compute the output (usually the loss function) with given input parameters, and can be pushed for computation on GPU, if it is available. There are functions to differentiate this computational graph and compute gradients, which can then be used for optimizing model parameters.

High-level APIs pretty much consider neural networks as a sequence of layers, and make constructing most of the neural networks much easier. Training the model usually requires preparing the data and then calling a fit function to do the job.

The high-level API allows you to construct typical neural networks very quickly without worrying about lots of details. At the same time, low-level API offer much more control over the training process, and thus they are used a lot in research, when you are dealing with new neural network architectures.

It is also important to understand that you can use both APIs together, eg. you can develop your own network layer architecture using low-level API, and then use it inside the larger network constructed and trained with the high-level API. Or you can define a network using the high-level API as a sequence of layers, and then use your own low-level training loop to perform optimization. Both APIs use the same basic underlying concepts, and they are designed to work well together.

Learning

In this course, we offer most of the content both for PyTorch and TensorFlow. You can choose your preferred framework and only go through the corresponding notebooks. If you are not sure which framework to choose, read some discussions on the internet regarding PyTorch vs. TensorFlow. You can also have a look at both frameworks to get better understanding.

Where possible, we will use High-Level APIs for simplicity. However, we believe it is important to understand how neural networks work from the ground up, thus in the beginning we start by working with low-level API and tensors. However, if you want to get going fast and do not want to spend a lot of time on learning these details, you can skip those and go straight into high-level API notebooks.

✍️ Exercises: Frameworks

Continue your learning in the following notebooks:

Low-Level API TensorFlow+Keras Notebook PyTorch
High-level API Keras PyTorch Lightning

After mastering the frameworks, let's recap the notion of overfitting.

Overfitting

Overfitting is an extremely important concept in machine learning, and it is very important to get it right!

Consider the following problem of approximating 5 dots (represented by x on the graphs below):

linear overfit
Linear model, 2 parameters Non-linear model, 7 parameters
Training error = 5.3 Training error = 0
Validation error = 5.1 Validation error = 20
  • On the left, we see a good straight line approximation. Because the number of parameters is adequate, the model gets the idea behind point distribution right.
  • On the right, the model is too powerful. Because we only have 5 points and the model has 7 parameters, it can adjust in such a way as to pass through all points, making training the error to be 0. However, this prevents the model from understanding the correct pattern behind data, thus the validation error is very high.

It is very important to strike a correct balance between the richness of the model (number of parameters) and the number of training samples.

Why overfitting occurs

  • Not enough training data
  • Too powerful model
  • Too much noise in input data

How to detect overfitting

As you can see from the graph above, overfitting can be detected by a very low training error, and a high validation error. Normally during training we will see both training and validation errors starting to decrease, and then at some point validation error might stop decreasing and start rising. This will be a sign of overfitting, and the indicator that we should probably stop training at this point (or at least make a snapshot of the model).

overfitting

How to prevent overfitting

If you can see that overfitting occurs, you can do one of the following:

  • Increase the amount of training data
  • Decrease the complexity of the model
  • Use some regularization technique, such as Dropout, which we will consider later.

Overfitting and Bias-Variance Tradeoff

Overfitting is actually a case of a more generic problem in statistics called Bias-Variance Tradeoff. If we consider the possible sources of error in our model, we can see two types of errors:

  • Bias errors are caused by our algorithm not being able to capture the relationship between training data correctly. It can result from the fact that our model is not powerful enough (underfitting).
  • Variance errors, which are caused by the model approximating noise in the input data instead of meaningful relationship (overfitting).

During training, bias error decreases (as our model learns to approximate the data), and variance error increases. It is important to stop training - either manually (when we detect overfitting) or automatically (by introducing regularization) - to prevent overfitting.

Conclusion

In this lesson, you learned about the differences between the various APIs for the two most popular AI frameworks, TensorFlow and PyTorch. In addition, you learned about a very important topic, overfitting.

🚀 Challenge

In the accompanying notebooks, you will find 'tasks' at the bottom; work through the notebooks and complete the tasks.

Post-lecture quiz

Review & Self Study

Do some research on the following topics:

  • TensorFlow
  • PyTorch
  • Overfitting

Ask yourself the following questions:

  • What is the difference between TensorFlow and PyTorch?
  • What is the difference between overfitting and underfitting?

Assignment

In this lab, you are asked to solve two classification problems using single- and multi-layered fully-connected networks using PyTorch or TensorFlow.