CoralSRT

CoralSCOP teaser image — Fig. 1 Corals can grow in diverse shapes, textures, and regions, thus leading to high physical and appearance stochasticity. It is challenging to acquire visually consistent knowledge for segmenting corals, in contrast to segmenting objects (*e.g.*, fish). We measure the feature distribution of 400 masked fish and coral images extracted from foundation models (FMs), and found that the average pairwise distance among coral samples is higher than that of fish. We propose **CoralSRT**, an add-on self-supervised feature rectification module, to reduce the stochasticity of coral features. Our method requires no human annotations, retraining/fine-tuning FMs, or even domain-specific data.

Abstract

We investigate coral reef semantic segmentation, in which multifaceted factors, like genes, environmental changes, and internal interactions, can lead to highly unpredictable growth patterns. Existing segmentation approaches in both computer vision and coral reef communities have failed to incorporate the intrinsic properties of corals, specifically their self-repeated, asymmetric, and amorphous distribution of elements, into model design. We propose CoralSRT, a feature rectification module via self-supervised guidance, to reduce the stochasticity of coral features extracted by pretrained foundation models (FMs), as demonstrated in Fig. 1. Our insight is that while different corals are highly dissimilar, individual corals within the same growth exhibit strong self-affinity. Using a superset of features from FMs learned by various pretext tasks, we extract a pattern related to the intrinsic properties of each coral to strengthen within-segment affinity, aligning with centrality. We investigate features from FMs that were optimized by various pretext tasks on significantly large-scale unlabeled or labeled data, which already contain rich information for modeling both within-segment and cross-segment affinities, enabling the adaptation of FMs for coral segmentation. CoralSRT can rectify features from FMs to more efficient features for label propagation and lead to further significant semantic segmentation performance gains, all without requiring additional human supervision, retraining/finetuning FMs or even domain-specific data. These advantages help reduce human effort and the need for domain expertise in data collection and labeling. Our method is easy to implement, and also task- and model-agnostic. CoralSRT bridges the self-supervised pre-training and supervised training in the feature space, also offering insights for segmenting elements/stuffs (e.g., grass, plants, cells, and biofoulings).

Key Contributions

We propose CoralSRT, revisiting coral segmentation from intrinsic properties of Corals.
Demonstrated efficiency regarding data, model and optimization.
Offered a model-agnostic design that supports popular foundation models such as DINO, DINOv2, DINOv3, iBoT, SAM, SAM2, and CoralSCOP.

Insights

Self-affinity: while different corals are highly dissimilar, individual corals within the same growth exhibit strong self-affinity.
Intrinsic properties incorporated: we extract a pattern related to the intrinsic properties of each coral to strengthen within-segment affinity, aligning with centrality
Why not promptable segmentation: we have comprehensively dissected why promptable segmentation, such as SAM, is less effective for coral segmentation (especially sparse-to-dense).
Optimization-friendly: without requiring additional human supervision, retraining/finetuning FMs or even domain-specific data.

Framework Overview

Fig. 2 Framework overview of proposed CoralSRT to rectify features of frozen FMs based on model-generated mask guidance or human annotations. We force features within each semantic-agnostic segment to approach its centrality to reduce the stochasticity of coral features, leading to more efficient features for label propagation in the feature space. On the right-hand side, we demonstrate Rec(·) is learning high-dimensional features inside the segment via the centrality (e.g., median value), which is stable between different inferior segments due to the intrinsic self-repeated and amorphous properties of corals.

Motivation and Reformulation

The coral reef semantic segmentation could be categorized as stuff segmentation. COCO-Stuff (Caesar et al., 2018) conducted the first attempt to do stuff segmentation and summarized five key properties between “instances/things” and “stuffs”: shape, size, parts, instances, and texture. Inspired by this work, we have also summarized the challenges of conducting coral segmentation:

Amorphous distribution leading to irregular boundaries. Corals often feature non-uniform, encrusting, or intricate growth patterns that defy simple geometric descriptions, such as irregular edges.
Repeatability or Fractality: The structure of corals exhibits a self-similar, fractal-like nature, where patterns or arrangements recur at varying scales.
Diversity. Corals or coral reefs consist of a wide variety of components, contributing to their complex appearance.
Self-occlusion: Due to clustering or overlapping elements, parts of the structure obscure others, complicating visual interpretation.
Asymmetry. The whole reef structure is usually asymmetric, with amorphous shapes.

Key Differences Between Instances and Stuff

The key difference between segmenting the fish and the corals: the fish has a visually consistent structural unit while the corals do not have. No matter which part of the fish is occluded, we humans can almost imagine its boundary and shape. But for corals, we cannot imagine a consistent output from two occluded inputs with different regions occluded.