Paper Reading List (Updated Regularly)
Metric Learning
- SoftTriple Loss: Deep Metric Learning Without Triplet Sampling
- Proposes a method that performs classification and metric learning with a single loss. It was proven that smoothing the triplet loss leads to cross entropy (they are essentially the same).
- Visual Explanation for Deep Metric Learning
- Visualization of metric learning models
- Embedding Expansion: Augmentation in Embedding Space for Deep Metric Learning
- By applying several simple rule-based optimizations to mining, accuracy improvements were confirmed across all distance loss functions.
- Moving in the Right Direction: A Regularization for Deep Metric Learning
- Compares regularization techniques for deep metric learning and discusses the risks of triplet loss.
- Deep Metric Learning via Adaptive Learnable Assessment
- Replaces mining rules with a learning-based approach and adopts an episode-based training scheme.
Video Tasks
- Spatiotemporal Contrastive Video Representation Learning
- Applies SimCLR to video classification tasks; would like to incorporate this.
- Predicting Video with VQVAE
- Achieves 65% on Kinetics-600; a teacher_forcing-like approach is possible.
- Is Space-Time Attention All You Need for Video Understanding?
- A Transformer-based video classifier with many novel aspects.
- VideoMix: Rethinking Data Augmentation for Video Classification
- Proposes VideoMix, a new data augmentation for video action recognition. The T-VideoMix method seems applicable.
- TSM: Temporal Shift Module for Efficient Video Understanding
- Addresses the problem of 3D CNNs being too heavy by inserting a TSM module into 2D CNNs as a substitute. TSM has zero parameters, so the complexity remains that of a 2D CNN.
Future Prediction
- Improved Conditional VRNNs for Video Prediction
- Predicts unknown video frames using a Variational Recurrent Autoencoder. A typical RAE that is very simple -- the go-to approach for generation.
- Video Prediction via Example Guidance
- Not finished reading; the first multimodal model for video future prediction.
- Predictive Learning: Using Future Representation Learning Variational Autoencoder for Human Action Prediction
- Two-stream approach using RGB and Optical Flow.
Training Methods
- Invariant Information Clustering for Unsupervised Image Classification and Segmentation
- An unsupervised training method that can directly output predictions; robust to noise.
- Supervised Contrastive Learning
- Performs supervised learning based on SimCLR.
- Unsupervised Learning of Visual Features by Contrasting Cluster Assignments
- SwAV
- A Simple Framework for Contrastive Learning of Visual Representations
- SimCLR
- AutoAugment: Learning Augmentation Policies from Data
- Performs data augmentation using a learning-based approach.
- What Makes Training Multi-modal Classification Networks Hard?
Others
- Revisiting ResNets: Improved Training and Scaling Strategies
- Training and scaling strategies for ResNet.
- An annotation-free whole-slide training approach to pathological classification of lung cancer types using deep learning
- Avoids image downscaling by using a Unified Memory (UM) mechanism and several GPU memory optimization techniques.
- Prototypical Contrastive Learning of Unsupervised Representations
- EM algorithm-based clustering that modifies the distance function to make clusters harder to converge, thereby suppressing overfitting.