Working Notes: a commonplace notebook for recording & exploring ideas.
Home. Site Map. Subscribe. More at expLog.
(with fastai & pytorch) #+AUTHOR: Jeremy Howard, Sylvain Gugger #+CATEGORY: book #+FILETAGS: deeplearning:fastai #+STATUS: Skimmed
Skimming notes
Read backwards next time, the last few chapters are the most interesting
Look for "*" highlights to find interesting papers tha I should read and are relevant
Debugging tools including CAM, different activation visualizations, particularly
I know terms more intuitively now, but still can't visualize / code it up from scratch
Deep learning is fairly massive, but what's valuable depends on what I'm working on
Notes
Metrics are how humans interpret a model's performance, Loss is how the SGD algorithm interprets it.
Head is the new layer of randomized weights added to a pre-trained model to customize it by transfer learning.
Deep learning uses gradient to also refer to the value of the derivative at a specific point, instead of just the function itself.
Automatic differentiation is in the autograd package, see Pytorch documentation: https://pytorch.org/docs/stable/autograd.html#function
Exercises
print(1 + 1)
#+RESULTS: : 2 8. (Chapter workbook) 9. I know Jupyter well enough. 10. It's hard for normal programs to identify images because - it's hard to define the steps we personally apply to identify an image. 11. Weight assignments are values that define how a program will operate. 12. We call weights the model parameters. 13. (Input + Weights) -> Model -> Results -> Measure of performance (cycle back) 14. Can't truly understand or describe the impacts of the different parameters or why they're set to that particular value in the first place. 15. Universal approximation theorem 16. Training a model: run it on several training cases that have labels to allow learning and iterating towards a correct solution. 17. Feedback loops will reinforce biases, because the new training data will be more biased. 18. 224 used to be a standard size, but is not strictly required; size now defines resource consumption vs better accuracy. 19. Classification is choosing between different distinct classes of objects; regression is determining a best continuous numerical value. 20. Validation set is to actually evaluate the behavior of the model on data it hasn't been trained with; ultimately you end up tuning your hyper-parameters on the validation set.
Test set is the data that a fixed set of hyper-parameters and
weights are evaluated against, so that there's no chance of
training against it. Ideally, kept as a black box.
Defaults to .2%; or randomly pulls out values2
Random might not work for time series predictions; there a better validation set would be to predict the future from past data.
Overfitting is when the model starts tuning itself too perfectly to the training data, so much that it' can't generalize to data it's seen before. For example, it might memorize all the inputs given enough training time.
Metric is a human specific measure of the performance of a model. Loss is for the stochastic gradient descent function.
Pre-trained models mean you need much less data and much less training time; the initial layers already break down most of the concepts of the data, we don't need to re-train them.
Head is the additional layer added to customize a pre-trained model.
Early layers of the CNN find things like edges, graphical concepts; later layers start finding things like eyes, etc.
Image models can be applied to other concepts by converting the input into an image of some sort.
Architecture = structure of the model, layers, number of neurons, etc. The template of th emodel that we're trying to fit; the actual mathematical function.
Segmentation: a model that can understand every pixel of an image.
y_range: Describes that it should return a range of numbers and not a specific classification.
Hyperparameters: choices regarding network architecture, learning rates, data augmentation strategies, etc. Parameters about parameters.
Maintain a test_set that engineers/consultants can't train on to evaluate the model.
GPUs allow for many more highly parallel computation, and have their own VRAM allowing a much higher bandwidth. Transferring memory from cpu to gpu can be slow.
Feedback loops: anytime training data is derived only from results of running the previous model. Eg. filter bubbles in social networks, biased policing, etc.
Depends on format; but some form of encoding of intensity, hue and saturation of pixels -- either RGB, or grayscale; and then compressed lossily or otherwise.
Validation files are already broken out separately. /valid, /train and labels.csv.
— Kunal