Benjamin Planche - Researcher and Author - Computer Vision and Machine Learning

Openings

We have several openings (internship & full-time). More details and application here.

News

(Sep. 2024) Our 3DGS-based solution for digitally reconstructed radiographs (DRRs) got accepted at NeurIPS 2024!
(Jul. 2024) Our paper on body part modeling got accepted at ECCV 2024!
(May. 2024) Our paper on few-shot segmentation received an early-accept into MICCAI 2024!
(Feb. 2024) Our DareNeRF solution for 4D scenes modeling got accepted at CVPR 2024.
(Jan. 2024) Our paper on part-body association got accepted at ICLR 2024.
(Dec. 2023) Two papers accepted at AAAI 2024: Disguise and MODIF.
(Jan. 2023) Our paper on multi-view human mesh regression got accepted at AAAI 2023 for an oral presentation!
(Oct. 2022) Looking forward to presenting our 3 accepted papers—PREF (oral), PseudoClick, and HMR-CRA—at ECCV 2022!
(Sep. 2022) See you virtually at the poster session #6 for our HMR-MMAF paper, this week at MICCAI 2022.
(Jun. 2022) Had a great time presenting our SMPL-A paper in-person, at CVPR 2022.
(Jan. 2022) Honored to receive the Best Dissertation Award from the Franco-German University! Article about the award here.
(Nov. 2021) Honored to receive the Outstanding Dissertation Award from the University of Passau!

Research

I am interested in computer vision, domain adaptation, image rendering, inverse problems, and photography. Much of my research is about training robust recognition systems on scarce data and bridging the gap between real and synthetic modalities.

Paper Teaser - High-level comparison to SOTA NeRF modeling

DDGS-CT: Direction-Disentangled Gaussian Splatting for Realistic Volume Rendering

Zhongpai Gao^EC, Benjamin Planche^EC, Meng Zheng, Xiao Chen, Terrence Chen, Ziyan Wu
Annual Conference on Neural Information Processing Systems (NeurIPS), 2024 (EC equal contribution)

Digitally reconstructed radiographs (DRRs) are 2D X-ray images simulated from 3D CT volumes, commonly used preoperatively but limited intraoperatively due to computational challenges, especially with physics-based Monte Carlo methods. While analytical DRRs are more efficient, they miss anisotropic X-ray image formation phenomena. We introduce a novel approach combining realistic physics-based X-ray simulation with efficient, differentiable DRR generation using 3D Gaussian splatting. Our direction-disentangled 3DGS separates isotropic and direction-dependent radiosity, approximating anisotropic effects without complex runtime simulations.

Qualitative Results - Body part predictions

Divide and Fuse: Body Part Mesh Recovery from Partially Visible Human Images

Tianyu Luan, Zhongpai Gao, Luyuan Xie, Abhishek Sharma, Hao Ding, Benjamin Planche, Meng Zheng, Ange Lou, Terrence Chen, Junsong Yuan, Ziyan Wu
European Conference on Computer Vision (ECCV), 2024

We introduce a novel bottom-up approach for human body mesh reconstruction, specifically designed to address the challenges posed by partial visibility and occlusion in input images. Our method reconstructs human body parts independently before fusing them, thereby ensuring robustness against occlusions. We design Human Part Parametric Models that independently reconstruct the mesh from a few shape and global-location parameters, without inter-part dependency. A specially designed fusion module then seamlessly integrates the reconstructed parts, even when only a few are visible.

Qualitative Results - Segmentation on medical scans

Few-Shot 3D Volumetric Segmentation with Multi-Surrogate Fusion

Meng Zheng, Benjamin Planche, Zhongpai Gao, Terrence Chen, Richard J. Radke, Ziyan Wu
Medical Image Computing and Computer Assisted Intervention (MICCAI), 2024 [early accept]

We present MSFSeg, a novel few-shot 3D segmentation framework with a lightweight multi-surrogate fusion (MSF). MSFSeg is able to automatically segment unseen 3D objects/organs (during training) provided with one or a few annotated 2D slices or 3D sequence segments, via learning dense query-support organ/lesion anatomy correlations across patient populations. Our proposed MSF module mines comprehensive and diversified morphology correlations between unlabeled and the few labeled slices/sequences through multiple designated surrogates, making it able to generate accurate cross-domain 3D segmentation masks given annotated slices or sequences.

DaReNeRF: Direction-aware Representation for Dynamic Scenes

Ange Lou, Benjamin Planche, Zhongpai Gao, Yamin Li, Tianyu Luan, Hao Ding, Terrence Chen, Jack Noble Ziyan Wu
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024

NeRF models using plane-based explicit representations struggle to achieve high-fidelity rendering for scenes with complex motions. In response we present a novel direction-aware representation (DaRe) approach, where the learned representation undergoes an inverse dual-tree complex wavelet transformation (DTCWT) to recover plane-based information. DaReNeRF computes features for each space-time point by fusing vectors from these recovered planes, yielding more accurate NVS results. Furthermore, we introduce a trainable masking approach mitigating storage issues w.r.t. DTCWT parameters, without significant performance decline.

PBADet: A One-Stage Anchor-Free Approach for Part-Body Associations

Zhongpai Gao, Huayi Zhou, Meng Zheng, Benjamin Planche, Terrence Chen, Ziyan Wu
International Conference on Learning Representations (ICLR), 2024

PBADet is a novel one-stage, anchor-free approach for part-body association detection. Building upon the anchor-free object representation across multi-scale feature maps, we introduce a singular part-to-body center offset that effectively encapsulates the relationship between parts and their parent bodies. Our design is inherently versatile and capable of managing multiple parts-to-body associations without compromising on detection accuracy or robustness.

Result - Application to missing organ extrapolation

Implicit Modeling of Non-rigid Objects with Cross-Category Signals

Yuchun Liu, Benjamin Planche, Meng Zheng, Zhongpai Gao, Pierre Sibut-Bourde, Fan Yang, Terrence Chen, M Salman Asif, Ziyan Wu
AAAI Conference on Artificial Intelligence (AAAI), 2024

We propose MODIF, a multi-object deep implicit function that jointly learns the deformation fields and instance-specific latent codes for multiple objects at once. Our emphasis is on non-rigid, non-interpenetrating entities such as organs. To effectively capture the interrelation between these entities and ensure precise, collision-free representations, our approach facilitates signaling between category-specific fields to adequately rectify shapes. We also introduce novel inter-object supervision: an attraction-repulsion loss is formulated to refine contact regions between objects.

Paper Teaser - High-level justification for utility-preserving de-identification

Disguise without Disruption: Utility-Preserving Face De-identification

Zikui Cai, Zhongpai Gao, Benjamin Planche, Meng Zheng, Terrence Chen, M Salman Asif, Ziyan Wu
AAAI Conference on Artificial Intelligence (AAAI), 2024

Solutions to de-identify sensitive images of individuals often compromise non-identifying facial attributes relevant to downstream tasks. We introduce Disguise, an algorithm to seamlessly de-identify facial images while ensuring the usability of the modified data. Our solution is firmly grounded in the domains of differential privacy and ensemble-learning research. It involves extracting and substituting depicted identities with synthetic ones, generated using variational mechanisms to maximize obfuscation and non-invertibility. Additionally, we leverage supervision from a mixture-of-experts to disentangle and preserve other utility attributes.

Paper Teaser - Result from our human mesh regression method

Progressive Multi-view Human Mesh Recovery with Self-Supervision

Xuan Gong, Liangchen Song, Meng Zheng, Benjamin Planche, Terrence Chen, Junsong Yuan, David Doermann, Ziyan Wu
AAAI Conference on Artificial Intelligence (AAAI), 2023 [oral]

We propose a novel simulation-based training pipeline for multi-view human mesh recovery, which (a) relies on intermediate 2D representations which are more robust to synthetic-to-real domain gap; (b) leverages learnable calibration and triangulation to adapt to more diversified camera setups; and (c) progressively aggregates multi-view information in a canonical 3D space to remove ambiguities in 2D representations.

Paper Teaser - Results of our sky coverage prediction module

Vision on the Bog: Cranberry Crop Risk Evaluation with Deep Learning

Peri Akiva, Benjamin Planche, Aditi Roy, Peter Oudemans, Kristin Dana
Computers and Electronics in Agriculture, 2022

Our goal is to develop state-of-the-art computer vision algorithms for image-based crop evaluation and weather-related risk assessment to support real-time decision-making for growers. Our cranberry bog monitoring system maps cranberry density (based on fruit instance segmentation) and predicts short-term cranberry internal temperatures (predicting solar irradiation and fruit temperature in an end-to-end differentiable network).

Federated Learning with Privacy-Preserving Ensemble Attention Distillation

Xuan Gong, Liangchen Song, Rishi Vedula, Abhishek Sharma, Meng Zheng, Benjamin Planche, Arun Innanje, Terrence Chen, Junsong Yuan, David Doermann, Ziyan Wu
IEEE Transactions on Medical Imaging (TMI), 2022

We propose a privacy-preserving FL framework leveraging unlabeled public data for one-way offline knowledge distillation. The central model is learned from local knowledge via ensemble attention distillation. Our technique uses decentralized and heterogeneous local data like existing FL approaches, but more importantly, it significantly reduces the risk of privacy leakage. We demonstrate that our method achieves very competitive performance with more robust privacy preservation based on extensive experiments on image classification, segmentation, and reconstruction tasks.

PREF: Predictability Regularized Neural Motion Fields

Liangchen Song, Xuan Gong, Benjamin Planche, Meng Zheng, David Doermann, Junsong Yuan, Terrence Chen, Ziyan Wu
European Conference on Computer Vision (ECCV), 2022 [oral]

We leverage a neural motion field for estimating the motion of all points in a multiview setting. Modeling the motion from a dynamic scene with multiview data is challenging due to the ambiguities in points of similar color and points with time-varying color. We propose to regularize the estimated motion to be predictable. If the motion from previous frames is known, then the motion in the near future should be predictable. Therefore, we introduce a predictability regularization by first conditioning the estimated motion on latent embeddings, then by adopting a predictor network to enforce predictability on the embeddings. [project webpage]

Self-supervised Human Mesh Recovery with Cross-Representation Alignment

Xuan Gong, Meng Zheng, Benjamin Planche, Srikrishna Karanam, Terrence Chen, David Doermann, Ziyan Wu
European Conference on Computer Vision (ECCV), 2022

We propose cross-representation alignment utilizing the complementary information from the robust but sparse representation (2D keypoints). Specifically, the alignment errors between initial mesh estimation and both 2D representations are forwarded into regressor and dynamically corrected in the following mesh regression. This adaptive cross-representation alignment explicitly learns from the deviations and captures complementary information: robustness from sparse representation and richness from dense representation.

PseudoClick: Interactive Image Segmentation with Click Imitation

Qin Liu, Meng Zheng, Benjamin Planche, Srikrishna Karanam, Terrence Chen, Marc Niethammer, Ziyan Wu
European Conference on Computer Vision (ECCV), 2022

We ask the question: can our model directly predict where to click, so as to further reduce the user interaction cost? To this end, we propose PseudoClick, a generic framework that enables existing segmentation networks to propose candidate next clicks. These automatically generated clicks, termed pseudo clicks in this work, serve as an imitation of human clicks to refine the segmentation mask. We build PseudoClick on existing segmentation backbones and show how our click prediction mechanism leads to improved performance.

Self-supervised 3D Patient Modeling with Multi-modal Attentive Fusion

Meng Zheng, Xuan Gong, Benjamin Planche, Fan Yang, Ziyan Wu
Medical Image Computing and Computer Assisted Intervention (MICCAI), 2022 [early accept]

We propose a generic modularized 3D patient modeling method consists of (a) a multi-modal keypoint detection module with attentive fusion for 2D patient joint localization, to learn complementary cross-modality patient body information, leading to improved keypoint localization robustness and generalizability in a wide variety of imaging and clinical scenarios; and (b) a self-supervised 3D mesh regression module which does not require expensive 3D mesh parameter annotations to train, bringing immediate cost benefits for clinical deployment.

SMPL-A: Modeling Person-Specific Deformable Anatomy

Hengtao Guo, Benjamin Planche, Meng Zheng, Srikrishna Karanam, Terrence Chen, Ziyan Wu
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022

We present the first learning-based approach to estimate the patient's internal organ deformation for arbitrary human poses in order to assist with radiotherapy and similar medical protocols. The underlying method first leverages medical scans to learn a patient-specific representation that potentially encodes the organ's shape and elastic properties. During inference, given the patient's current body pose information and the organ's representation extracted from previous medical scans, our method can estimate their current organ deformation to offer guidance to clinicians.

Physics-based Differentiable Depth Sensor Simulation

Benjamin Planche, Rajat Vikram Singh
IEEE/CVF International Conference on Computer Vision (ICCV), 2021

We introduce DDS, a novel end-to-end differentiable simulation pipeline for the generation of realistic depth scans, built on physics-based 3D rendering and custom block-matching algorithms. Each module can be differentiated w.r.t sensor and scene parameters; e.g., to automatically tune the simulation for new devices over some provided scans or to leverage the pipeline as a 3D-to-2.5D transformer within larger computer-vision applications. [full version, with sup-mat]

AI on the Bog: Monitoring and Evaluating Cranberry Crop Risk

Peri Akiva^EC, Benjamin Planche^EC, Aditi Roy, Kristin Dana, Peter Oudemans, Michael Mars
IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2021 (EC equal contribution)

We propose an end-to-end cranberry health monitoring system to enable and support real time cranberry over-heating assessment and facilitate informed decisions that may sustain the economic viability of farms. Our system performs: 1) cranberry fruit segmentation to delineate fruit regions that are exposed to sun, 2) prediction of cloud coverage and sun irradiance to estimate the inner temperature of exposed cranberries.

Incremental Scene Synthesis

Benjamin Planche, Xuejian Rong, Ziyan Wu, Srikrishna Karanam, Harald Kosch, YingLi Tian, Jan Ernst, Andreas Hutter
Annual Conference on Neural Information Processing Systems (NeurIPS), 2019

We present a method to incrementally generate complete 2D or 3D scenes. Our framework can register observations from a non-localized agent in a global representation, which can be used to synthesize new views as well as fill in gaps in the representation while observing global consistency.

Seeing Beyond Appearance - Mapping Real Images into Geometrical Domains for Unsupervised CAD-based Recognition

Benjamin Planche^EC, Sergey Zakharov^EC, Ziyan Wu, Harald Kosch, Andreas Hutter, Slobodan Ilic
IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2019 (EC equal contribution)

Tackling real/synthetic domain adaptation from a different angle, we introduce a pipeline to map unseen target samples into the synthetic domain used to train task-specific methods. Denoising the data and retaining only the features these recognition algorithms are familiar with, our solution greatly improves their performance.

Keep it Unreal: Bridging the Realism Gap for 2.5D Recognition with Geometry Priors Only

Sergey Zakharov^EC, Benjamin Planche^EC, Ziyan Wu, Harald Kosch, Andreas Hutter, Slobodan Ilic
International Conference on 3D Vision (3DV), 2018 [oral] (EC equal contribution)

We propose a novel approach leveraging only CAD models to bridge the realism gap for depth images. Purely trained on synthetic data, playing against an extensive augmentation pipeline in an unsupervised manner, our GAN learns to effectively segment depth images and recover the clean synthetic-looking depth information even from partial occlusions.

DepthSynth: Real-Time Realistic Synthetic Data Generation from CAD Models for 2.5D Recognition

Benjamin Planche, Ziyan Wu, Kai Ma, Shanhui Sun, Stefan Kluckner, Oliver Lehmann, Terrence Chen, Andreas Hutter, Sergey Zakharov, Harald Kosch, Jan Ernst
International Conference on 3D Vision (3DV), 2017 [oral]

We present an end-to-end framework which simulates the whole mechanism of depth sensors, generating realistic depth data from 3D models by comprehensively modeling vital factors, e.g., sensor noise, material reflectance, surface geometry. Our solution covers a wider range of devices and achieves more realistic results than previous methods.

3D Object Instance Recognition and Pose Estimation Using Triplet Loss with Dynamic Margin

Sergey Zakharov, Wadim Kehl, Benjamin Planche, Andreas Hutter, Slobodan Ilic
IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2017

Inspired by the descriptor learning approach of Wohlhart et al. [link], we propose a method that introduces the dynamic margin in the manifold learning triplet loss function. Introducing the dynamic margin allows for faster training times and better accuracy of the resulting low dimensional manifolds.

The Brightnest Web-Based Home Automation System

Benjamin Planche^EC, Bryan Isaac Malyn^EC, Daniel Buldon Blanco^EC, Manuel Cerrillo Bermejo^EC
International Conference on Ubiquitous Computing and Ambient Intelligence (UCAmI), 2014 [oral] (EC equal contribution)

Brightnest is a generic and user-friendly web-based Home Automation System. Its interface provides users with information on the whole system or with control over the devices and their rules. The modular architecture is based on "JS Drivers", their REST API imitating the way a computer usually handles new devices.

Book

A few years ago, I got the opportunity to co-author a book, teaching how to leverage deep learning to create powerful image processing apps with TensorFlow 2.0 and Keras. While some technical examples in the book are now a bit outdated outdated (with regard to the TensorFlow API), the book also covers the foundations of deep learning, illustrated with publicly-available code examples (see GitHub link below).

Hands-On Computer Vision With TensorFlow 2

Benjamin Planche and Eliot Andres
Packt Publishing, 2019

Computer vision solutions are becoming increasingly common, making their way in fields such as health, automobile, social media, and robotics. With the release of TensorFlow 2, the brand new version of Google's open source framework for machine learning, it is the perfect time to jump on board and start leveraging deep learning for your visual applications!
This book is a practical guide to building high performance systems for object detection, segmentation, video processing, smartphone applications, and more. By its end, you will have both the theoretical understanding and practical skills to solve advanced computer vision problems with TensorFlow 2.0. [Amazon | Packt | GitHub]