Paper Reading List (Updated Regularly)

Metric Learning

SoftTriple Loss: Deep Metric Learning Without Triplet Sampling
- Proposes a method that performs classification and metric learning with a single loss. It was proven that smoothing the triplet loss leads to cross entropy (they are essentially the same).
Visual Explanation for Deep Metric Learning
- Visualization of metric learning models
Embedding Expansion: Augmentation in Embedding Space for Deep Metric Learning
- By applying several simple rule-based optimizations to mining, accuracy improvements were confirmed across all distance loss functions.
Moving in the Right Direction: A Regularization for Deep Metric Learning
- Compares regularization techniques for deep metric learning and discusses the risks of triplet loss.
Deep Metric Learning via Adaptive Learnable Assessment
- Replaces mining rules with a learning-based approach and adopts an episode-based training scheme.

Spatiotemporal Contrastive Video Representation Learning
- Applies SimCLR to video classification tasks; would like to incorporate this.
Predicting Video with VQVAE
- Achieves 65% on Kinetics-600; a teacher_forcing-like approach is possible.
Is Space-Time Attention All You Need for Video Understanding?
- A Transformer-based video classifier with many novel aspects.
VideoMix: Rethinking Data Augmentation for Video Classification
- Proposes VideoMix, a new data augmentation for video action recognition. The T-VideoMix method seems applicable.
TSM: Temporal Shift Module for Efficient Video Understanding
- Addresses the problem of 3D CNNs being too heavy by inserting a TSM module into 2D CNNs as a substitute. TSM has zero parameters, so the complexity remains that of a 2D CNN.

Improved Conditional VRNNs for Video Prediction
- Predicts unknown video frames using a Variational Recurrent Autoencoder. A typical RAE that is very simple -- the go-to approach for generation.
Video Prediction via Example Guidance
- Not finished reading; the first multimodal model for video future prediction.
Predictive Learning: Using Future Representation Learning Variational Autoencoder for Human Action Prediction
- Two-stream approach using RGB and Optical Flow.

Revisiting ResNets: Improved Training and Scaling Strategies
- Training and scaling strategies for ResNet.
An annotation-free whole-slide training approach to pathological classification of lung cancer types using deep learning
- Avoids image downscaling by using a Unified Memory (UM) mechanism and several GPU memory optimization techniques.
Prototypical Contrastive Learning of Unsupervised Representations
- EM algorithm-based clustering that modifies the distance function to make clusters harder to converge, thereby suppressing overfitting.