Frameworks
Neural Network Frameworks¶
As we have learned already, to be able to train neural networks efficiently we need to do two things:
- To operate on tensors, eg. to multiply, add, and compute some functions such as sigmoid or softmax
- To compute gradients of all expressions, in order to perform gradient descent optimization
Pre-lecture quiz¶
While the numpy
library can do the first part, we need some mechanism to compute gradients. In our framework that we have developed in the previous section we had to manually program all derivative functions inside the backward
method, which does backpropagation. Ideally, a framework should give us the opportunity to compute gradients of any expression that we can define.
Another important thing is to be able to perform computations on GPU, or any other specialized compute units, such as TPU. Deep neural network training requires a lot of computations, and to be able to parallelize those computations on GPUs is very important.
✅ The term 'parallelize' means to distribute the computations over multiple devices.
Currently, the two most popular neural frameworks are: TensorFlow and PyTorch. Both provide a low-level API to operate with tensors on both CPU and GPU. On top of the low-level API, there is also higher-level API, called Keras and PyTorch Lightning correspondingly.
Low-Level API | TensorFlow | PyTorch |
---|---|---|
High-level API | Keras | PyTorch Lightning |
Low-level APIs in both frameworks allow you to build so-called computational graphs. This graph defines how to compute the output (usually the loss function) with given input parameters, and can be pushed for computation on GPU, if it is available. There are functions to differentiate this computational graph and compute gradients, which can then be used for optimizing model parameters.
High-level APIs pretty much consider neural networks as a sequence of layers, and make constructing most of the neural networks much easier. Training the model usually requires preparing the data and then calling a fit
function to do the job.
The high-level API allows you to construct typical neural networks very quickly without worrying about lots of details. At the same time, low-level API offer much more control over the training process, and thus they are used a lot in research, when you are dealing with new neural network architectures.
It is also important to understand that you can use both APIs together, eg. you can develop your own network layer architecture using low-level API, and then use it inside the larger network constructed and trained with the high-level API. Or you can define a network using the high-level API as a sequence of layers, and then use your own low-level training loop to perform optimization. Both APIs use the same basic underlying concepts, and they are designed to work well together.
Learning¶
In this course, we offer most of the content both for PyTorch and TensorFlow. You can choose your preferred framework and only go through the corresponding notebooks. If you are not sure which framework to choose, read some discussions on the internet regarding PyTorch vs. TensorFlow. You can also have a look at both frameworks to get better understanding.
Where possible, we will use High-Level APIs for simplicity. However, we believe it is important to understand how neural networks work from the ground up, thus in the beginning we start by working with low-level API and tensors. However, if you want to get going fast and do not want to spend a lot of time on learning these details, you can skip those and go straight into high-level API notebooks.
✍️ Exercises: Frameworks¶
Continue your learning in the following notebooks:
Low-Level API | TensorFlow+Keras Notebook | PyTorch |
---|---|---|
High-level API | Keras | PyTorch Lightning |
After mastering the frameworks, let's recap the notion of overfitting.
Overfitting¶
Overfitting is an extremely important concept in machine learning, and it is very important to get it right!
Consider the following problem of approximating 5 dots (represented by x
on the graphs below):
Linear model, 2 parameters | Non-linear model, 7 parameters |
Training error = 5.3 | Training error = 0 |
Validation error = 5.1 | Validation error = 20 |
- On the left, we see a good straight line approximation. Because the number of parameters is adequate, the model gets the idea behind point distribution right.
- On the right, the model is too powerful. Because we only have 5 points and the model has 7 parameters, it can adjust in such a way as to pass through all points, making training the error to be 0. However, this prevents the model from understanding the correct pattern behind data, thus the validation error is very high.
It is very important to strike a correct balance between the richness of the model (number of parameters) and the number of training samples.
Why overfitting occurs¶
- Not enough training data
- Too powerful model
- Too much noise in input data
How to detect overfitting¶
As you can see from the graph above, overfitting can be detected by a very low training error, and a high validation error. Normally during training we will see both training and validation errors starting to decrease, and then at some point validation error might stop decreasing and start rising. This will be a sign of overfitting, and the indicator that we should probably stop training at this point (or at least make a snapshot of the model).
How to prevent overfitting¶
If you can see that overfitting occurs, you can do one of the following:
- Increase the amount of training data
- Decrease the complexity of the model
- Use some regularization technique, such as Dropout, which we will consider later.
Overfitting and Bias-Variance Tradeoff¶
Overfitting is actually a case of a more generic problem in statistics called Bias-Variance Tradeoff. If we consider the possible sources of error in our model, we can see two types of errors:
- Bias errors are caused by our algorithm not being able to capture the relationship between training data correctly. It can result from the fact that our model is not powerful enough (underfitting).
- Variance errors, which are caused by the model approximating noise in the input data instead of meaningful relationship (overfitting).
During training, bias error decreases (as our model learns to approximate the data), and variance error increases. It is important to stop training - either manually (when we detect overfitting) or automatically (by introducing regularization) - to prevent overfitting.
Conclusion¶
In this lesson, you learned about the differences between the various APIs for the two most popular AI frameworks, TensorFlow and PyTorch. In addition, you learned about a very important topic, overfitting.
🚀 Challenge¶
In the accompanying notebooks, you will find 'tasks' at the bottom; work through the notebooks and complete the tasks.
Post-lecture quiz¶
Review & Self Study¶
Do some research on the following topics:
- TensorFlow
- PyTorch
- Overfitting
Ask yourself the following questions:
- What is the difference between TensorFlow and PyTorch?
- What is the difference between overfitting and underfitting?
Assignment¶
In this lab, you are asked to solve two classification problems using single- and multi-layered fully-connected networks using PyTorch or TensorFlow.