CoralSRT: Revisiting Coral Reef Semantic Segmentation by Feature Rectification via Self-supervised Guidance

Ziqiang Zheng1 Yuk-Kwan Wong1 Binh-Son Hua2 Jianbo Shi3 Sai-Kit Yeung1

1The Hong Kong University of Science and Technology 2Trinity College Dublin 3University of Pennsylvania

International Conference on Computer Vision, ICCV 2025

CoralSCOP teaser image
Fig. 1 Corals can grow in diverse shapes, textures, and regions, thus leading to high physical and appearance stochasticity. It is challenging to acquire visually consistent knowledge for segmenting corals, in contrast to segmenting objects (e.g., fish). We measure the feature distribution of 400 masked fish and coral images extracted from foundation models (FMs), and found that the average pairwise distance among coral samples is higher than that of fish. We propose CoralSRT, an add-on self-supervised feature rectification module, to reduce the stochasticity of coral features. Our method requires no human annotations, retraining/fine-tuning FMs, or even domain-specific data.
Abstract

We investigate coral reef semantic segmentation, in which multifaceted factors, like genes, environmental changes, and internal interactions, can lead to highly unpredictable growth patterns. Existing segmentation approaches in both computer vision and coral reef communities have failed to incorporate the intrinsic properties of corals, specifically their self-repeated, asymmetric, and amorphous distribution of elements, into model design. We propose CoralSRT, a feature rectification module via self-supervised guidance, to reduce the stochasticity of coral features extracted by pretrained foundation models (FMs), as demonstrated in Fig. 1. Our insight is that while different corals are highly dissimilar, individual corals within the same growth exhibit strong self-affinity. Using a superset of features from FMs learned by various pretext tasks, we extract a pattern related to the intrinsic properties of each coral to strengthen within-segment affinity, aligning with centrality. We investigate features from FMs that were optimized by various pretext tasks on significantly large-scale unlabeled or labeled data, which already contain rich information for modeling both within-segment and cross-segment affinities, enabling the adaptation of FMs for coral segmentation. CoralSRT can rectify features from FMs to more efficient features for label propagation and lead to further significant semantic segmentation performance gains, all without requiring additional human supervision, retraining/finetuning FMs or even domain-specific data. These advantages help reduce human effort and the need for domain expertise in data collection and labeling. Our method is easy to implement, and also task- and model-agnostic. CoralSRT bridges the self-supervised pre-training and supervised training in the feature space, also offering insights for segmenting elements/stuffs (e.g., grass, plants, cells, and biofoulings).

Key Contributions
Insights
Framework Overview

Fig. 2 Framework overview of proposed CoralSRT to rectify features of frozen FMs based on model-generated mask guidance or human annotations. We force features within each semantic-agnostic segment to approach its centrality to reduce the stochasticity of coral features, leading to more efficient features for label propagation in the feature space. On the right-hand side, we demonstrate Rec(·) is learning high-dimensional features inside the segment via the centrality (e.g., median value), which is stable between different inferior segments due to the intrinsic self-repeated and amorphous properties of corals.

Motivation and Reformulation

The coral reef semantic segmentation could be categorized as stuff segmentation. COCO-Stuff (Caesar et al., 2018) conducted the first attempt to do stuff segmentation and summarized five key properties between “instances/things” and “stuffs”: shape, size, parts, instances, and texture. Inspired by this work, we have also summarized the challenges of conducting coral segmentation:

Failure of Promptable Segmentation

Promptable segmentation models (e.g., SAM and CoralSCOP) lead to under-inclusive and over-inclusive outputs. The mask with red edge is for illustration, not model-generated.

Key Differences Between Instances and Stuff

The key difference between segmenting the fish and the corals: the fish has a visually consistent structural unit while the corals do not have. No matter which part of the fish is occluded, we humans can almost imagine its boundary and shape. But for corals, we cannot imagine a consistent output from two occluded inputs with different regions occluded.

Our Formulation

Our simple and fundamental problem formulation for coral reef semantic segmentation: segment as the basis to model within-segment and cross-segment affinities.

Results

PCA visualization

PCA comparison (first 3 channels) of features from different algorithms and foundation models.

Sparse-to-dense conversion

Sparse-to-dense conversion based on features from different algorithms and foundation models.

Zero-shot ability

Zero-shot sparse-to-dense conversion results on the Seaview dataset.

Model-agnostic

Our model could promote the features from various foundation models.

© 2025 Ziqiang Zheng, Yuk-Kwan Wong, Binh-Son Hua, Jianbo Shi, Sai-Kit Yeung. All rights reserved.