diff --git a/beginner_source/knowledge_distillation_tutorial.py b/beginner_source/knowledge_distillation_tutorial.py index 19d1553e7a..8892a81ec6 100644 --- a/beginner_source/knowledge_distillation_tutorial.py +++ b/beginner_source/knowledge_distillation_tutorial.py @@ -56,9 +56,9 @@ # A common practice in neural networks is to normalize the input, which is done for multiple reasons, # including avoiding saturation in commonly used activation functions and increasing numerical stability. # Our normalization process consists of subtracting the mean and dividing by the standard deviation along each channel. -# The tensors "mean=[0.485, 0.456, 0.406]" and "std=[0.229, 0.224, 0.225]" were already computed, -# and they represent the mean and standard deviation of each channel in the -# predefined subset of CIFAR-10 intended to be the training set. +# The tensors "mean=[0.485, 0.456, 0.406]" and "std=[0.229, 0.224, 0.225]" are commonly used +# normalization values. Note that these values correspond to ImageNet statistics rather than +# statistics computed directly from the CIFAR-10 training set. # Notice how we use these values for the test set as well, without recomputing the mean and standard deviation from scratch. # This is because the network was trained on features produced by subtracting and dividing the numbers above, and we want to maintain consistency. # Furthermore, in real life, we would not be able to compute the mean and standard deviation of the test set since,