Global-scale Observations of the Limb and Disk (GOLD)

Action Recognition

GOLD will fly an ultraviolet (UV) imaging spectrograph on a geostationary satellite to measure densities and temperatures in Earth’s thermosphere and ionosphere. The goal of the investigation is to address an overarching question in heliophysics science: What is the global-scale response of the thermosphere and ionosphere to forcing in the integrated Sun-Earth system? Measurements from GOLD will be used, in conjunction with sophisticated models of the terrestrial thermosphere and ionosphere, to revolutionize our understanding of the space environment. More information can be found at the GOLD website at GOLD.

GPS Coordinates Estimation using Shadow Trajectories

Action Recognition

Using only shadow trajectories of stationary objects in a scene, we demonstrate that using a set of six or more photographs are sufficient to accurately calibrate the camera. Moreover, we present a novel application where, using only three points from the shadow trajectory of the objects, one can accurately determine the geo-location of the camera, up to a longitude ambiguity, and also the date of image acquisition without using any GPS or other special instruments. We refer to this as "geo-temporal localization". We consider possible cases where ambiguities can be removed if additional information is available. Our method does not require any knowledge of the date or the time when the pictures are taken, and geo-temporal information is recovered directly from the images. We demonstrate the accuracy of our technique for both steps of calibration and geo-temporal localization using synthetic and real data.

Action Recognition

View-invariant Recognition of Human Actions and Body Poses Description:

We developed a novel framework for view-invariant recognition of human actions and human body poses. Unlike previous works that regard an action as a whole object, or as a sequence of individual poses, we represent an action as a set of pose transitions defined by all possible triplets of body points, i.e., we break down further each pose into a set of point-triplets and find invariants for the motion of these triplets across frames. We proposed two distinct methods of recognizing pose transitions independent of camera calibration matrix and viewpoint, and applied them to the problem of action recognition and pose recognition. UCF-CIL Action Dataset was used in this work.

Euclidean Path Modeling for Video Surveillance

We address the issue of Euclidean path modeling in a single camera for activity monitoring in a multi-camera video surveillance system. The method consists of a path building training phase and a testing phase. During the unsupervised training phase, after auto-calibrating a camera and thereafter metric rectifying the input trajectories, a weighted graph is constructed with trajectories represented by the nodes, and weights determined by a similarity measure. Normalized-cuts are recursively used to partition the graph into prototype paths. Each path, consisting of a partitioned group of trajectories, is represented by a path envelope and an average trajectory. For every prototype path, features such as spatial proximity, motion characteristics, curvature, and absolute world velocity are then recovered directly in the rectified images or by registering to aerial views. During the testing phase, using our simple yet efficient similarity measures for these features, we seek a relation between the trajectories of an incoming sequence and the prototype path models to identify anomalous and unusual behaviors. Real-world pedestrian sequences are used to evaluate the steps, and demonstrate the practicality of the proposed approach.


Diminished Reality

In this work, we present a novel technique for realistically removing static or moving objects in a video sequence obtained by a perspective camera. Different from previous efforts, which are typically based on processing in the 3D data volume, we slice the volume along the motion manifold of the moving object, and therefore reduce the search space from 3D to 2D, while still preserving the spatial and temporal coherence. In addition to the computational efficiency, based on geometric video analysis, the proposed approach is also able to handle real videos under perspective distortion, as well as common camera motions, such as panning, tilting, and zooming. The experimental results demonstrate that our algorithm performs comparably to 3D search based methods, but extends the current state-of-the-art techniques to videos with projective effects, as well as illumination changes.


Calibration and Light Source Orientation from Solar Shadows

In this work, we have developed a method for recovering camera parameters from perspective views of daylight shadows in a scene, given only minimal geometric information determined from the images. This minimal information consists of two 3D stationary points and their cast shadows on the ground plane. We show that this information captured in two views is sufficient to determine the focal length, the aspect ratio, and the principal point of a pinhole camera with fixed intrinsic parameters. In addition, we are also able to compute the orientation of the light source. We also demonstrate the application to an image-based rendering problem.


Self-Calibration for Turn-table Sequences under Variable Zoom

In this work, we propose a novel method for self-calibration, using constant inter-frame motion from an image sequence of an object rotating around a single axis with varying camera internal parameters. Our approach makes use of the facts that in many commercial systems rotation angles are often controlled by an electromechanical system, and that the inter-frame essential matrices are invariant if the rotation angles are constant but not necessary known. Using the bundle adjustment method tailored to the special case, i.e., static camera and constant object rotation, the 3D structure of the object is recovered and the camera parameters are refined simultaneously.


Analysis and Processing of Dental CT Scans

In this project, We present an approach to accurately align the CT scans of a patient to a stone-cast model of his/her mandible or maxilla, and use the result of registration to clean up the patient's scans from artifacts and defects. The proposed approach assumes that the maxillofacial features are roughly symmetric with respect to a 3D plane. Then 3D volumetric models of both the patient and the stone-cast are reconstructed from the input data using a marching cube algorithm. The planes of symmetry are extracted using an improved Extended Gaussian Images method. After an initial alignment of the two volumes guided by the plane of symmetry due to 3D homology, we minimize a global cost function that depends on the sum of square differences (SSD) of patient data with the stone-cast model to finally recover the rigid transformation between the two scans.


3D Object Transfer

Given two video sequences of different scenes acquired with moving cameras, it is interesting to seamlessly transfer a 3D object from one sequence to the other. In this paper, we present a video-based augmented reality approach to geometry-correctly transfer realistic 3D objects between two or more non-overlapping, moving video sequences. Our framework builds upon techniques in camera pose estimation, 3D spatiotemporal video alignment, depth recovery, key-frame editing, natural video matting, and image-based rendering.


2D Object Transfer and Video Compositing

In this project, we have developed a unified framework for realistically compositing scenes from multiple videos or images directly in the 2D image domain. Our approach is flexible in that it allows for compositing of video streams or images without any prior knowledge of the geometry or the lighting, in both source and target scenes. This flexibility also allows for inserting virtual objects in a realistic fashion. We combine this work with our shadow and reflection synthesis techniques to provide an image-based direct lighting in the composition process for generating realistic rendering including shadows and reflections, with the color characteristics learnt from the target scene. Various direct lighting scenarios for daylight and night time are demonstrated.


Shadow and Reflection Synthesis

In this project, we developed an approach for creating realistic shadows and reflections of objects composited into a novel background. We observe that scenes containing a ground plane and some up-right objects, e.g. walls, crowds, desks, street lamps, trees, etc., are common in natural environments, and show that two vertical lines and their cast shadows are enough to recover primary light source, which makes it possible to construct shadows of inserted planar objects in a single target image. We also demonstrate a technique to calibrate the camera when two views are available, which enables us to create shadows of general 3D objects in a target scene. For reflections, we exploit the fact an object and its reflection on a planar surface define a cross-ratio in the image plane, which of course is invariant to camera perspective transformation. This allows us to realistically generate object reflections directly in the image domain.


Real-time 3D Fire Simulation

In this project, we present a method for real-time simulation of 3-dimensional fire inspired by an old trick known as the "silk torch". Silk torch has proven illusive effects on human visual perception of fire. We combine this compelling visual effect of the computer generated silk torch with additional physical properties of the combustion process, such as the speed of the vaporized fuel, the effect of wind, the gravity, and other internal/external forces in order to provide realistic simulation of the dynamics of various types of flame under different conditions.


Camera Calibration Using Mirror Symmetry (ICPR Piero Zamperoni Best Paper Award)

This project addresses the problem of calibrating a pinhole camera from images of an isoceles trapezoid. Assuming a unit aspect ratio and zero skew, we introduce a novel and simple camera calibration approach. The key features of the proposed technique are its simplicity and the lack of need for 3D coordinate information about the calibrating object - i.e. the isosceles trapezoid. By ultilizing the symmetry of such trapezoid, we show that one can obtain both the internal and the external camera parameters. To demonstrate the effectiveness of the algorithm, we present the processing results on synthetic and real images, and compare our results to Zhang’s flexible calibration method.


Expression Morphing

In this project, we propose an image-based approach to photo-realistic view synthesis by integrating field morphing and view morphing in a single framework. We thus provide a unified technique for synthesizing new images that include both viewpoint changes and object deformations. For view morphing, we relax the requirement of monotonicity along epipolar lines to piecewise monotonicity, by incorporating a segmentation stage prior to interpolation. This allows for dealing with occlusions and visibility issues, and hence alleviates the “ghosting effects” that typically occur when morphing is performed between distant viewpoints. We have particularly applied our approach to the synthesis of human facial expressions, while allowing for wide change of viewing positions and directions.


Image-Based Metrology

In this project, we describe how 3D Euclidean measurements can be made in a pair of perspective images, when only minimal geometric information are available in the image planes. This minimal information consists of one line on a reference plane and one vanishing point for a direction perpendicular to the plane. Given these information, we show that the length ratio of two objects perpendicular to the reference plane can be expressed as a function of the camera principal point. Assuming that the camera intrinsic parameters remain invariant between the two views, we recover the principal point and the camera focal length by minimizing the symmetric transfer error of geometric distances. Euclidean metric measurements can then be made directly from the images. To demonstrate the effectiveness of the approach, we present the processing results for synthetic and natural images, including measurements along both parallel and non-parallel lines.


Self-calibrated Stereo Modeling

The traditional stereo reconstrction techniques based on point correspondences and the estimation of the cameras from the fundamental matrix introduce a four-fold ambiguity. Moreover, there is a projective ambiguity inherent in the fundamental matrix. We show that a symmetric object can be modeled even under partial occlusion with a pair of uncalibrated stereo images. This implies that unlike traditional stereo algorithms, we can extract 3D information from two arbitrary viewpoints, even when there is no left-to-right point correspondences. To demonstrate the effectiveness of the method, we present experimental results on both synthetic and real images.


Urban Simulation

In this project We have introduced a new method for rectifying stereo pairs that does not require any calibration information or any knowledge of the epipolar geometry. We then describe a Bayesian stereo technique that fuses information from both monocular and binocular vision in order to overcome the complexity of data in cluttered/dense urban areas. Finally a model for reducing perspective distortions is given, which otherwise could introduce severe errors in the actual 3D models. We have also developed a multi-sensor platform for acquisition and construction of 3D city models at close range, which we plan to fuse with far-range stereo models.