Table of Contents
- OpenCV Google Summer of Code 2024
- OpenCV Accepted Projects:
- OpenCV Project Ideas List:
- Ideas:
- IDEA: Neural 3D Capture and Rendering
- IDEA: 3D Object Capture with SplaTAM
- IDEA: 3D Radiance Toolbox
- IDEA: Multi-camera calibration part 3
- IDEA: Multi-camera calibration test
- IDEA: Multi-camera calibration toolbox
- IDEA: Quantized models for OpenCV Model Zoo
- IDEA: RISC-V Optimizations
- IDEA: Dynamic CUDA support in DNN
- IDEA: OpenGL support with GTK 3 and GTK 4
- IDEA: Animated images and other functionality for imgcodecs
- IDEA: Synchronized multi camera video recorder
- IDEA: libcamera back end for VideoCapture
- IDEA: Add support for HEIF/HEIC image format
- IDEA: Cylindrical surface detection in point clouds
- IDEA: Better ArUco functions in OpenCV
- IDEA: Implementing SMPLer with opencv
- IDEA: 3D reconstruction using Gaussian splatter
- IDEA: Implementing and optimizing any Stable Diffusion algorithm using opencv
- IDEA: Better LLMs support in OpenCV (1)
- IDEA: Better LLMs support in OpenCV (2)
- Idea Template:
- All Ideas Above
- Contributors
- Mentors:
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
OpenCV Google Summer of Code 2024
Jump to Ideas List
Example use of computer vision: (https://learnopencv.com/)
OpenCV Accepted Projects:
- 🚧 TBD Spreadheet of projects link
Contributor | Title | Mentors | Passed |
---|
Important dates:
Date (2024) | Description | Comment |
---|---|---|
Feb 6 | Organization Applications Closed | 👍 |
Feb 21 | Org Acceptance | 👍 |
Mar 18 | Proposals Open | 👍 |
Apr 24 | Proposal Ranking | 👍 |
April | Slots Allocated | 👍 |
May 27 | Coding Begins | 👍 |
July 12 | Midterm Evals | 👍 |
Aug 19-26 | Get final code in | 👍 |
Aug 26-Sep 2 | Final Mentor Evals | 👍 |
Sep 3 | Results | 🏃 |
Nov 4 | Extended Session Code in | 🥚 |
Nov 11 | Extended Mentor Evals | 🦅 |
Resources:
- GSoC Home Page
- GSoC OpenCV Project Page
- Project Discussion List Mentor List
- OpenCV Home Site OpenCV Wiki OpenCV Forum Q&A Developer meeting notes
- How to do a pull request/How to Contribute Code
- Source Code: GitHub/opencv and GitHub/opencv_contrib
- IRC Channel: #opencv Libera.chat, Slack: https://open-cv.slack.com
- OpenCV GSoC Dashboard
OpenCV Project Ideas List:
Index to Ideas Below
- Neural 3D Capture and Rendering
- 3D Object Capture with SplaTAM
- 3D Radiance Toolbox
- Multi-camera calibration part 3
- Multi-camera calibration test
- Multi-camera calibration toolbox
- Quantized models for OpenCV Model Zoo
- RISC-V Optimizations
- Dynamic CUDA support in DNN
- OpenGL support with GTK 3 and GTK 4
- Animated images and other functionality for imgcodecs
- Synchronized multi camera video recorder
- libcamera back-end for VideoCapture
All work is in C++ unless otherwise noted.
Ideas:
-
IDEA: Neural 3D Capture and Rendering
- Description: Refine SplaTAM or Droid-SLAM 3D capture plus Gaussian Splatting GS code (we cannot use Inria GS code due to licensing issues, NerfSlam better) for users to easily capture neural 3D models of scenes. Possibly allow accuracy boosts using Charuco boards in the scene. Be able localize arbitrary camera poses in the above scenes.
- Expected Outcomes:
- Droid SLAM + Gaussian Splatting code and application, well documented for ease of user's capturing 3D models
- Captures an indoor (e.g. a room) or outdoor scene (e.g. side of yard)
- Working code pull request on Github
- Documentation and YouTube videos showing how to use it.
- Droid SLAM + Gaussian Splatting code and application, well documented for ease of user's capturing 3D models
- Resources: OpenCV, Example NeRF Code, Droid-SLAM paper, Droid-SLAM Code, Gaussian Splatting Description, Gaussian Splatting Code or alternatively SplaTAM. Another possibility is to use Gassian Splatting SLAM
- Skills Required: Good software development skills, good knowledge of deep methods of 3D visual capture and use of Gaussian Splatting models.
- Possible Mentors: Doug Lee, Gary Bradski
- Difficulty: Medium to Hard
- Duration: 350 hours
-
IDEA: 3D Object Capture with SplaTAM
- Description: Use the above or SplaTAM (possibly allowing accuracy boosts using Charuco boards) to capture and log 3D objects to collect a database. We might also use GaussianObject if their code is published and has a compatible license. We then want to use the resulting Neural Render to find/segment the object in a scene.
- Expected Outcomes:
- Easy capture of 3D objects
- Working code pull request on Github
- An app that allows camera capture from phone, Data processing on server or cloud to process/render and display the object on comptuer including an open license to use captured objects
- Use object splats to segment objects in a scene.
- Documentation and YouTube videos showing how to use it.
- Easy capture of 3D objects
- Resources: OpenCV, NerfStudio, Gaussian Splatting Description](https://towardsdatascience.com/a-comprehensive-overview-of-gaussian-splatting-e7d570081362), SplaTAM and code. A new paper that doesn't yet have code, and we'll need to check license is [GaussianObject}(https://gaussianobject.github.io/).
- Skills Required: Good software development skills, good knowledge of deep methods of 3D visual capture, and use of Gaussian Splatting models.
- Possible Mentors: Gary Bradski
- Difficulty: Medium to Hard
- Duration: 350 hours
-
IDEA: 3D Radiance Toolbox
- Description: Create a toolbox for easy capture of radiance scenes. You use NerfStudio or Droid SLAM (Simultaneous Localization and Mapping) plus Gaussian Splatting as a default technique, the main thing is ease of capture from iPhone or Android (fixing focal lengths) to capture the scene and send it to a cloud or server. To initialize and calibrate, images of a Charuco board) can be used to begin/end a capture. Data goes to the cloud or as server where subsequent views in the scene can be accurately localized in 6DOF (degrees of freedom: X,Y,Z,orientation).
- Expected Outcomes:
- Easy capture of 3D scenes
- Working code pull request on Github
- An app that allows camera capture from phone, Data processing on server or cloud to process/render and display the object on comptuer including an open license to use captured objects. New views from the same or security-type/web cameras can be localized in the scene.
- The app may use an initial Charuco board to set a coordinate system and to calibrate the capture camera.
- This app is intended to be modular and later be able to wrap the above two projects.
- Documentation and YouTube videos showing how to use it.
- Easy capture of 3D scenes
- Resources: OpenCV, Example NeRF Code, Droid-SLAM paper, Droid-SLAM Code, Gaussian Splatting Description, Gaussian Splatting Code
- Skills Required: Good software development skills, computer vision/deep net experience. iPhone and/or Android experience
- Possible Mentors: Gary Bradski
- Difficulty: Medium to Hard
- Duration: 175 hours
-
IDEA: Multi-camera calibration part 3
- Description: During GSoC 2023 a new cool multi-camera calibration algorithm was improved: https://github.com/opencv/opencv/pull/24052. This year we would like to finish this work with more test cases, tune the accuracy and build higher-level user-friendly tool (based on the script from the tutorial) to perform multi-camera calibration. If this is completed before the internship is up, then we'll move on to leveraging the IMU or marker-free calibration.
- Expected Outcomes:
- A series of patches with more unit tests and bug fixes for the multi-camera calibration algorithm
- New/improved documentation on how to calibrate cameras
- A short YouTube video showing off how to use the calibration routines
- Skills Required: Mastery of C++ and Python, mathematical knowledge of camera calibration, ability to code up mathematical models
- Difficulty: Medium-Difficult
- Possible Mentors: Maksym Ivashechkin, Alexander Smorkalov
- Duration: 175 hours
-
IDEA: Multi-camera calibration test
- Description: We are looking for a student to curate best of class calibration data, collect calibration data with various OpenCV Fiducials, and graphically produce calibration board and camera models data (script). Simultaneously, begin to write comprehensive test scripts of all the existing calibration functions. While doing this, if necessary, improve the calibration documentation. Derive from this expected accuracy of fiducial types for various camera types.
- Expected Outcomes:
- Curate camera calibration data from public datasets.
- Collect calibration data for various fiducials and camera types.
- Graphically create camera calibration data with ready to go scripts
- Write test functions for the OpenCV Calibration pipeline
- New/improved documentation on how to calibrate cameras as needed.
- Statistical analysis of the performance (accuracy and variance) of OpenCV fiducials, algorithms and camera types.
- A YouTube video showing describing and demonstrating the OpenCV Calibration testss.
- Resources: OpenCV Fiducial Markers, OpenCV Calibration Functions, OpenCV Camera Calibration Tutorial 1, OpenCV Camera Calibration Tutorial 2
- Skills Required: Mastery of C++ and Python, mathematical knowledge of camera calibration, ability to code up mathematical models
- Difficulty: Medium
- Possible Mentors: Jean-Yves Bouguet, Alexander Smorkalov
- Duration: 175 hours
-
IDEA: Multi-camera calibration toolbox
- Description: Build a higher-level user-friendly tool (based on the script from the calibration tutorial) to perform multi-camera calibration. This should allow easy multi-camera calibration with at multiple Charco patterns and possibly other calibration fiducial patterns. The results will use Monte-Carlo sampling to determine parameter stability, allow easy switching of camera models and output the camera calibration parameters and the fiducial patterns pose in space as well as the extrinsic locations of each camera relative to the others.
- Expected Outcomes:
- Tool with convenient API that will be more or less comparable and compatible with Kalibr tool (https://github.com/ethz-asl/kalibr)
- New/improved documentation on how to calibrate cameras
- A Youtube video demonstrating how to use the box
- Skills Required: Python, mathematical knowledge of camera calibration, ability to code up mathematical models
- Difficulty: Medium-Difficult
- Possible Mentors: Jean-Yves Bouguet, Gary Bradski
- Duration: 175 hours
-
IDEA: Quantized models for OpenCV Model Zoo
- Description: Many modern CPUs, GPUs and specialized NPUs include special instructions and hardware blocks for accelerated inference, especially for INT8 inference. The models don't just become ~4x smaller compared to FP32 original models, the inference speed increases significantly (by 2x-4x or more) as well. The number of quantized models steadily increases, however, beyond image classification there are not so many 8-bit computer vision models with proven high-quality results. We will be interested to add to our model zoo (https://github.com/opencv/opencv_zoo) 8-bit models for object detection, optical flow, pose estimation, text detection and recognition etc.
- Expected Outcomes:
- Series of patches to OpenCV Zoo and maybe to OpenCV DNN (when OpenCV DNN misses 8-bit flavors of certain operations) to add the corresponding models.
- If quantization is performed by student during the project, we will request the corresponding scripts to perform the quantization
- Benchmark results to prove the quality of the quantized models along with the corresponding scripts so that we can reproduce it.
- Skills Required: very good ML engineering skills, good Python programming skills, familiarity with model quantization algorithms and model quality assessment approaches
- Possible Mentors: Feng Yuantao, Zhong Wanli, Vadim Pisarevsky
- Difficulty: Medium
- Duration: 90 to 175 hours, depending on the particular model.
-
IDEA: RISC-V Optimizations
- Description: RISC-V is one of main target platforms for OpenCV. During past several years we brought in some RISC-V optimizations based on RISC-V Vector extension by adding another backend to OpenCV scalable universal intrinsics. We refactored a lot of code in OpenCV to make the vectorized loops compatible with RISC-V backend and more or less efficient. Still, we see a lot of gaps and the performance of certain functions can be further improved. For some critical functions, like convolution in deep learning, it makes sense perhaps to implement custom loops using native RVV intrinsics instead of using OpenCV scalable universal intrinsics. This is what we invite you to do.
- Expected Outcomes:
- A series of patches for core, imgproc, video and dnn modules to bring improved loops that use OpenCV scalable universal intrinsics or native RVV intrinsics to improve the performance. In the first case the optimizations should not degrade performance on other major platforms like x86-64 or ARMv8 with NEON.
- Resources:
- Skills Required: mastery plus experience coding in C++; good skills of optimizing code using SIMD.
- Possible Mentors: Mingjie Xing, Maxim Shabunin
- Difficulty: Hard
- Duration: 350 hours
-
IDEA: Dynamic CUDA support in DNN
- Description: OpenCV DNN module includes several backends for efficient inference on various platforms. Some of the backends are heavy and bring in a lot of dependencies, so it makes sense to make the backends dynamic. Recently, we did it with OpenVINO backend: https://github.com/opencv/opencv/pull/21745. The goal of this project is to make CUDA backend of OpenCV DNN dynamic as well. Once it's implemented, we can have a single set of OpenCV binaries and then add the necessary plugin (also in binary form) to accelerate inference on NVidia GPUs without recompiling OpenCV.
- Expected Outcomes:
- A series of patches for dnn and maybe core module to build OpenCV DNN CUDA plugin as a separate binary that could be used by OpenCV DNN. In this case OpenCV itself should not have any dependency of CUDA SDK or runtime - the plugin should encapsulate it. It is fine if the user-supplied tensors (cv::Mat) are automatically uploaded to GPU memory by the engine (cv::dnn::Net) before the inference and the output tensors are downloaded from GPU memory after the inference in such a case.
- Resources:
- Skills Required: mastery plus experience coding in C++; good practical experience in CUDA. Acquaintance with deep learning is desirable but not necessary, since the project is mostly about software engineering, not about ML algorithms or their optimization.
- Possible Mentors: Alexander Smorkalov
- Difficulty: Hard
- Duration: 350 hours
-
IDEA: OpenGL support with GTK 3 and GTK 4
- Description: OpenCV 4.x supports integration with GTK 2, but it's out of date for most of modern Linux distribution. GTK 3+ provides new API for OpenGL integration that needs to be supported in OpenCV 4.x+.
- Expected Outcomes:
- Highgui GTK 3+ backend for OpenGL
- Skills Required: C++, practice with Linux.
- Possible Mentors: TBD
- Difficulty: Medium
- Duration: 175
-
IDEA: Animated images and other functionality for imgcodecs
- Description: It's not a single big project but rather a series of tasks under the "image codecs" umbrella. The goal is to make a series of improvements in the existing opencv_imgcodecs module, including, but not limited to, the following items (please, feel free to propose your ideas):
- support animation encoding/decoding.
- zlib-ng integration to accelerate png codec
- Expected Outcomes:
- imgcodecs new API
- Skills Required: C++, practice with Linux.
- Possible Mentors: Vincent Rabaud
- Difficulty: Medium
- Duration: 90 or 175, depending on the exact scope
- Assigned to: Chao ZHU for GIF #25691, Suleyman TURKMEN for WebP #25608.
- Description: It's not a single big project but rather a series of tasks under the "image codecs" umbrella. The goal is to make a series of improvements in the existing opencv_imgcodecs module, including, but not limited to, the following items (please, feel free to propose your ideas):
-
IDEA: Synchronized multi camera video recorder
- Description: Multi-camera calibration and multi-view scenarios require synchronous recording with multiple cameras. Need to tune cv::VideoCapture or/and VideoWriter and implement sample for video recording with several cameras with timestamps
- Expected Outcomes:
- Sync video recording sample for several cameras: V4L2, RTSP(?)
- Resources: Overview
- Skills Required: C++
- Possible Mentors: Alexander S.
- Difficulty: Easy-Medium
- Duration: 175
-
IDEA: libcamera back end for VideoCapture
- Description: Discussion: #21653
- Expected Outcomes:
- MIPI camera support on Raspberry Pi
- Resources:
- Skills Required: C++, Linux
- Possible Mentors: TBD
- Difficulty: Medium
- Duration: 175
IDEA: Add support for HEIF/HEIC image format
- Description: The High-Efficiency Image Format (HEIF) is becoming popular due to its superior compression compared to JPEG and its support for multiple images in one file, animations, and transparency. The goal is to implement support for loading and saving HEIF format files within OpenCV, including still images and animated sequences. This includes decoding HEIF images, including HEIC (Apple’s variant), and encoding them while ensuring efficient memory and processing performance.
- Expected Outcomes:
- Resources:
- Discussion on https://github.com/opencv/opencv/issues/14534
- libheif (https://github.com/strukturag/libheif/)
- HEIF solution from Nokia (https://github.com/nokiatech/heif).
IDEA: Cylindrical surface detection in point clouds
- Description: OpenCV already provides RANSAC-based plane and sphere detection and methods for estimating point cloud normals. The goal is to extend the current capabilities by adding robust cylindrical surface detection for point clouds, leveraging the existing Universal Sample Consensus (USAC) framework. This new functionality can be generalized to support additional shape primitives, such as cones and tori, while maintaining efficient computation and accuracy.
- Expected Outcomes:
- Resources:
- Mentor: Wanli Zhong
IDEA: Better ArUco functions in OpenCV
- Description: ArUco, commonly used for camera pose estimation and augmented reality applications, was first integrated into opencv_contrib repository by one of its authors S. Garrido-Jurado in 2015. In 2019 the ArUco library's license was changed to GPLv3 in 2019, and the version in OpenCV and that of the ArUco research group diverged since. The ArUco in OpenCV was later improved and moved to the main repository in 2022 with the efforts from the OpenCV community and the core team. The ArUco group has continued to develop on ArUco and made significant advancements. It will be benificial if these improvements could be incorporated into OpenCV.
- Expected Outcomes:
- Resources:
- Mentor: Shiqi Yu
IDEA: Implementing SMPLer with opencv
- Description:
- Expected Outcomes:
- Resources:
- Mentor: Zihao Mu
IDEA: 3D reconstruction using Gaussian splatter
- Description:
- Expected Outcomes:
- Resources:
IDEA: Implementing and optimizing any Stable Diffusion algorithm using opencv
- Description:
- Expected Outcomes:
- Resources:
IDEA: Better LLMs support in OpenCV (1)
- Description: Large Language Models, or LLM, are AI models designed to understand, generate and manipulate human language. One of the barrier to better support LLMs is tokenizer support. A tokenizer can break down raw text into smaller pieces (tokens), which are easier for models to understand and process. This project aims to integrate tokenizer in dnn module.
- Expected Outcomes: A patch that integrates tokenizer in dnn module
- Resources:
- Let's Build the GPT Tokenzier https://youtu.be/zduSFxRajkE?si=JF725Ipnzc4R5Nnc
- OpenAI's tokenizer https://github.com/openai/tiktoken
- Hugging Face's AutoTokenizer https://github.com/huggingface/transformers/blob/main/src/transformers/models/auto/tokenization_auto.py
- Mentor: Yuantao Feng
IDEA: Better LLMs support in OpenCV (2)
- Description: LLMs support in OpenCV is unclear. One of the most key feature in the current dnn engine is fixed memory allocation after model importing, which helps speeding up model inference for CNNs but then becomes the limit of running LLMs that has flexible inputs. This project aims to try LLMs as many as possible with OpenCV, fix dnn engine to support more LLMs, and write demos to show LLMs inference with OpenCV.
- Expected Outcomes:
- Several patches that fix dnn engine to support more LLMs.
- Several patches that add demos to show LLMs inference with OpenCV.
- Resources:
- GPT2 inference with OpenCV https://github.com/opencv/opencv/blob/5.x/samples/dnn/gpt2_inference.py
- LLMs that llama.cpp supports https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#description
- Mentor: Yuantao Feng
Idea Template:
1. #### _IDEA:_ <Descriptive Title>
* ***Description:*** 3-7 sentences describing the task
* ***Expected Outcomes:***
* < Short bullet list describing what is to be accomplished >
* <i.e. create a new module called "bla bla">
* < Has method to accomplish X >
* <...>
* ***Resources:***
* [For example a paper citation](https://arxiv.org/pdf/1802.08091.pdf)
* [For example an existing feature request](https://github.com/opencv/opencv/issues/11013)
* [Possibly an existing related module](https://github.com/opencv/opencv_contrib/tree/master/modules/optflow) that includes some new optical flow algorithms.
* ***Skills Required:*** < for example mastery plus experience coding in C++, college course work in vision that covers optical flow, python. Best if you have also worked with deep neural networks. >
* ***Possible Mentors:*** < your name goes here >
* ***Difficulty:*** <Easy, Medium, Hard>
* ***Duration:*** <8 weeks is 320 hard, Medium 240, Easy 200>
-
All Ideas Above
-
Have these Additional Expected Outcomes:
- Use the OpenCV How to Contribute and see Aruco/ChAruco Markers as a high quality finished example.
- Add unit tests described here, see also the Aruco test example
- Add a tutorial, and sample code
- see the Aruco tutorials and how they look on the web.
- See the Aruco samples
- Make a short video showing off your algorithm and post it to Youtube. Here's an Example.
-
Contributors
How to Apply
The process is described at GSoC home page
How contributors will be evaluated once working:
- Contributors will be paid only if:
- Phase 1:
- You must generate a pull request
- That builds
- Has at least stubbed out (place holder functions such as just displaying an image) functionality
- With OpenCV appropriate Doxygen documentation (example tutorial)
- Includes What the function or net is, what the function or net is used for
- Has at least stubbed out unit test
- Has a stubbed out example/tutorial of use that builds
- See the contribution guild
- and the coding style guild
- the line_descriptor is a good example of contribution
- You must generate a pull request
- Phase 2:
- You must generate a pull request
- That builds
- Has all or most of the planned functionality (but still usable without those missing parts)
- With OpenCV appropriate Doxygen documentation
- Includes What the function or net is, what the function or net is used for
- Has some unit tests
- Has a tutorial/sample of how to use the function or net and why you'd want to use it.
- Optionally, but highly desirable: create a (short! 30sec-1min) Movie (preferably on Youtube, but any movie) that demonstrates your project. We will use it to create the final video:
- You must generate a pull request
- Extended period:
- TBD
- Phase 1:
Mentors:
- Contact us, preferably in February or early March, on the opencv-gsoc googlegroups mailing list above and ask to be a mentor (or we will ask you in some known cases)
- If we accept you, we will post a request from the Google Summer of Code OpenCV project site asking you to join.
- You must accept the request and you are a mentor!
- You will also need to get on:
- You then:
- Look through the ideas above, choose one you'd like to mentor or create your own and post it for discussion on the mentor list.
- Go to the opencv-gsoc googlegroups mailing list above and look through the project proposals and discussions. Discuss the ideas you've chosen.
- Find likely contributors, ask them to apply to your project(s)
- You will get a list of contributors who have applied to your project. Go through them and select a contributor or rejecting them all if none suits and joining to co-mentor or to quit this year are acceptable outcomes.
- Make sure your contributors officially apply through the Google Summer of Code site prior to the deadline as indicate by the Contributor Application Period in the time line
- Then, when we get a slot allocation from Google, the administrators "spend" the slots in order of priority influenced by whether there's a capable mentor or not for each topic.
- Contributors must finally actually accept to do that project (some sign up for multiple organizations and then choose)
- Get to work!
If you are accepted as a mentor and you find a suitable contributor and we give you a slot and the contributor signs up for it, then you are an actual mentor! Otherwise you are not a mentor and have no other obligations.
- Thank you for trying.
- You may contact other mentors and co-mentor a project.
You get paid a modest stipend over the summer to mentor, typically $500 minus an org fee of 6%.
Several mentors donate their salary, earning ever better positions in heaven when that comes.
Potential Mentors List:
Ankit Sachan
Anatoliy Talamanov
Clément Pinard
Davis King
Dmitry Kurtaev
Dmitry Matveev
Edgar Riba
Gholamreza Amayeh
Grace Vesom
Jiri Hörner
João Cartucho
Justin Shenk
Michael Tetelman
Ningxin Hu
Rostislav Vasilikhin
Satya Mallick
Stefano Fabri
Steven Puttemans
Sunita Nayak
Vikas Gupta
Vincent Rabaud
Vitaly Tuzov
Vladimir Tyan
Yida Wang
Jia Wu
Yuantao Feng
Zihao Mu
Admins
Gary Bradski
Vadim Pisarevsky
Shiqi Yu
GSoC Org Application Answers
- Home
- Deep Learning in OpenCV
- Running OpenCV on Various Platforms
- OpenCV 5
- OpenCV 4
- OpenCV 3
- Development process
- OpenCV GSoC
- Archive
© Copyright 2024, OpenCV team