语义分割论文综述

前置常用理论和技巧

此后语义分割主要围绕着有效的上下文信息，高分辨率特征表示，引入额外信息(边界)，超高分辨率分割等角度展开。

目前比较火的是使用transformer进行语义分割的研究。

空间上下文
- 金字塔结构，Pyramid Scene Parsing Network(PSPNet)
- Deeplab系列，使用ASPP结构和空洞卷机，包括V1，V2，V3，plus
相关上下文
- Dual Attention Network for Scene Segmentation(DANet)。自注意机制position attention + channel attention
- Object-Contextual Representations for Semantic Segmentation(OCRNet)。像素所属类别的特征对像素进行增强
- CCNet: Criss-Cross Attention for Semantic Segmentation(CCNet)。降低attention计算量
- Context Encoding for Semantic Segmentation(EncNet)。使用传统视觉中残差编码来建模上下文
- Disentangled Non-Local Neural Networks。attention解耦为pairwise term和unary term分别进行学习
空间+相关上下文
- Adaptive Pyramid Context Network for semantic Segementation，结合金字塔更改attention的采样位置

Deep High-Resolution Representation Learning for Visual Recognition(HRNet)，维持高分辨率分支
Encoder-Decoder恢复原始分辨率
- SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation，记录池化的索引，在decoder部分直接使用索引还原
- Semantic Flow for Fast and Accurate Scene Parsing(SFNet)。使用光流进行不同阶段特征融合。
- GFF: Gated Fully Fusion for Semantic Segmentation，使用soft gate机制取控制不同阶段的特征融合。

Gated-SCNN: Gated Shape CNNs for Semantic Segmentation，引入额外的分支监督语义边缘
SegFix: Model-Agnostic Boundary Refinement for Segmentation，通用的边界区域后处理模块
hard-pixel-aware：
- Loss Max-Pooling for Semantic Image Segmentation
- Not All Pixels Are Equal: Difficulty-Aware Semantic Segmentation via Deep Layer Cascade

Collaborative Global-Local Networks for Memory-Efficient Segmentation of Ultra-High Resolution Images。将原图下采样和crop分别输入到网络中

Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers，VIT结构应用在语义分割上；
SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers，在transformer中引入多尺度特征；
Per-Pixel Classification is Not All You Need for Semantic Segmentation，在transformer中使用聚类思想进行语义分割。