Apple has officially announced its participation in the 2025 International Conference on Computer Vision (ICCV), scheduled from October 19 to 23 in Honolulu. This biennial event, alternating with the European Conference on Computer Vision (ECCV), serves as a pivotal platform for discussing significant advancements in computer vision.
Keynote Presentation by Dr. C. Thomas
A highlight of Apple’s involvement is the keynote address by Dr. C. Thomas, the company’s Applied Research Manager for Machine Learning. Dr. Thomas is slated to speak at the 3rd Workshop on Vision-based Industrial Inspection (VISION) on Sunday, October 19, at 9:15 a.m. The specific topic of the presentation will be disclosed in the coming days.
Eight Research Papers to Be Presented
Apple will present eight research papers, reflecting its commitment to advancing the field of computer vision:
1. ETVA: Evaluation of Text-to-Video Alignment via Fine-grained Question Generation and Answering
This study introduces a novel method for assessing the alignment between textual descriptions and video content by generating and answering detailed questions, thereby enhancing the accuracy of text-to-video models.
2. MM-Spatial: Exploring 3D Spatial Understanding in Multimodal Large Language Models (LLMs)
This research delves into how multimodal LLMs comprehend three-dimensional spaces, aiming to improve applications like augmented reality and autonomous navigation.
3. Scaling Laws for Native Multimodal Models
The paper examines the scalability of models that process multiple types of data simultaneously, providing insights into optimizing performance as these models grow in complexity.
4. Stable Diffusion Models are Secretly Good at Visual In-Context Learning
This work reveals that stable diffusion models, primarily known for generating images, also excel in understanding visual context, which could lead to more sophisticated image editing tools.
5. STIV: Scalable Text and Image Conditioned Video Generation
STIV introduces a framework for generating videos conditioned on both text and images, offering potential advancements in content creation and multimedia applications.
6. UINavBench: A Framework for Comprehensive Evaluation of Interactive Digital Agents
This study presents a benchmarking framework to assess the performance of interactive digital agents, focusing on their ability to navigate and interact within user interfaces.
7. Unified Open-World Segmentation with Multi-Modal Prompts
The research proposes a unified approach to segmenting objects in images using prompts from multiple modalities, enhancing the flexibility and accuracy of image segmentation tasks.
8. UniVG: A Generalist Diffusion Model for Unified Image Generation and Editing
UniVG introduces a versatile diffusion model capable of both generating and editing images, streamlining workflows in digital content creation.
Participation in the Women in Computer Vision Workshop
In addition to these presentations, Apple is actively supporting diversity in the field. Researchers Patricia Vitoria Carrera and Tanya Glozman will serve as mentors at the post-workshop dinner of the Women in Computer Vision Workshop, commencing at 1 p.m. on Sunday, October 19.
Apple’s Ongoing Commitment to Computer Vision
Apple’s active participation in ICCV 2025 underscores its dedication to advancing computer vision technologies. By presenting cutting-edge research and engaging with the global academic community, Apple continues to contribute significantly to the evolution of machine learning and artificial intelligence.