Skip to main content

Paper Reading List (Updated Regularly)

Metric Learning

  1. SoftTriple Loss: Deep Metric Learning Without Triplet Sampling
    • Proposes a method that performs classification and metric learning with a single loss. It was proven that smoothing the triplet loss leads to cross entropy (they are essentially the same).
  2. Visual Explanation for Deep Metric Learning
    • Visualization of metric learning models
  3. Embedding Expansion: Augmentation in Embedding Space for Deep Metric Learning
    • By applying several simple rule-based optimizations to mining, accuracy improvements were confirmed across all distance loss functions.
  4. Moving in the Right Direction: A Regularization for Deep Metric Learning
    • Compares regularization techniques for deep metric learning and discusses the risks of triplet loss.
  5. Deep Metric Learning via Adaptive Learnable Assessment
    • Replaces mining rules with a learning-based approach and adopts an episode-based training scheme.

Video Tasks

  1. Spatiotemporal Contrastive Video Representation Learning
    • Applies SimCLR to video classification tasks; would like to incorporate this.
  2. Predicting Video with VQVAE
    • Achieves 65% on Kinetics-600; a teacher_forcing-like approach is possible.
  3. Is Space-Time Attention All You Need for Video Understanding?
    • A Transformer-based video classifier with many novel aspects.
  4. VideoMix: Rethinking Data Augmentation for Video Classification
    • Proposes VideoMix, a new data augmentation for video action recognition. The T-VideoMix method seems applicable.
  5. TSM: Temporal Shift Module for Efficient Video Understanding
    • Addresses the problem of 3D CNNs being too heavy by inserting a TSM module into 2D CNNs as a substitute. TSM has zero parameters, so the complexity remains that of a 2D CNN.

Future Prediction

  1. Improved Conditional VRNNs for Video Prediction
    • Predicts unknown video frames using a Variational Recurrent Autoencoder. A typical RAE that is very simple -- the go-to approach for generation.
  2. Video Prediction via Example Guidance
    • Not finished reading; the first multimodal model for video future prediction.
  3. Predictive Learning: Using Future Representation Learning Variational Autoencoder for Human Action Prediction
    • Two-stream approach using RGB and Optical Flow.

Training Methods

  1. Invariant Information Clustering for Unsupervised Image Classification and Segmentation
    • An unsupervised training method that can directly output predictions; robust to noise.
  2. Supervised Contrastive Learning
    • Performs supervised learning based on SimCLR.
  3. Unsupervised Learning of Visual Features by Contrasting Cluster Assignments
    • SwAV
  4. A Simple Framework for Contrastive Learning of Visual Representations
    • SimCLR
  5. AutoAugment: Learning Augmentation Policies from Data
    • Performs data augmentation using a learning-based approach.
  6. What Makes Training Multi-modal Classification Networks Hard?

Others

  1. Revisiting ResNets: Improved Training and Scaling Strategies
    • Training and scaling strategies for ResNet.
  2. An annotation-free whole-slide training approach to pathological classification of lung cancer types using deep learning
    • Avoids image downscaling by using a Unified Memory (UM) mechanism and several GPU memory optimization techniques.
  3. Prototypical Contrastive Learning of Unsupervised Representations
    • EM algorithm-based clustering that modifies the distance function to make clusters harder to converge, thereby suppressing overfitting.