This research proposes a method to track a known runway image to land an unmanned aerial vehicle (UAV) automatically by finding a perspective transform between the known image and an input image in real-time. Apparently, it improves the efficiency of feature detectors in real-time, so they can better respond to perspective transformation and reduce the processing time. A UAV is an aircraft that is controlled without a human pilot on board. The flight of a UAV operates with various degrees of autonomy, either autonomously using computational-limited on-board computers or under remote control by a human operator. UAVs were originally applied for missions where human access was not readily available or where it was dangerous for humans to go. Nowadays, the most important problem in monitoring by an autopilot is that the conventional system using only the GPS sensors provides inaccurate geographical positioning. Therefore, controlling the UAV to take off from or land on a runway needs professional input which is a scarce resource. The characteristics of the newly developed method proposed in this paper are: (1) using a lightweight feature detector, such as SIFT or SURF, and (2) using the perspective transformation to reduce the effect of affine transformation that results in the feature detector becoming more tolerant to perspective transformation. In addition, the method is also capable of roughly localizing the same template in consecutive frames. Thus, it limits the calculation area that feature matching needs to work on.
It is generally accepted that landing an aerial vehicle is a delicate process to control. No matter if it is a commercial flight or a fixed-wing unmanned aerial vehicle (UAV), the pilot needs to be trained. Even after training, because of certain situations, accidents happen. A commercial flight must have an instrumental landing system which is very expensive, to avoid human error. However, a small vehicle usually only has a global positioning system (GPS) to specify coordinates, which may be erroneous, especially when landing on a narrow runway, or if jammed either intentionally or unintentionally. Thus, to land a small UAV automatically, it is necessary to have much better methods than just using a GPS to land, such as a runway detection scheme.
Generally, a UAV requires a pilot to control the landing. However, the pilot must be trained in different and limited situations, with a 50% chance that an accident could occur, with 70% of these happening during landing (factors associated with humans). Consequently, the UAV landing process should be automatic to reduce the accident rate and decrease the duties of the pilot. The main problem is that the accuracy of a GPS is usually insufficient to detect the runway, especially with a moderate quality GPS on a narrow runway (3 m in width).
A short review is provided underneath on the various funded (military) research on “automatic landing assist systems”, before reviewing other related works of the present paper:
There are long stripes on the sides of the runway,  similar to many automatic driving systems adopted for land vehicles, with (two) lines for landing. A video of Ruchanurucks and his team’s system can be viewed.2
There are four or more distinct markers with known positions. Ruchanurucks et al.  extensively studied perspective-n-point (PnP) algorithms. They showed the best kinematic chain between equipment on board and the ground. Not only camera-to-ground kinematic information was derived, but the work also shows how to properly calibrate between a camera and an inertial measurement unit (IMU), which is usually aligned with the plane, was also shown.
If there are unknown planar markers on the ground (they are known to be planar but without any coordinates), Sereewattana et al.  proposed a concept paper to land on “any planar area”. At that time, the camera and IMU calibration were not perfect, unlike the earlier mentioned work. However, it was also interesting for emergency cases involving landing anywhere that were planar with some features.
During the time we were working on the aforementioned projects, we were contacted again by personnel from the Defense Technology Institute. They asked us to find the positional relationship between an airplane and a runway (i.e., its planar image). The difference between the final earlier mentioned work and the present work is that the present work deals extensively with a known runway and assumptions are as follows:
Aviation organizations often have bird’s eye pictures of their runway already.
If the stripes are not present on the runway for any reason, can we still land the UAV, automatically?
At first glance, we thought it would be easier than the other work we have done, as it seemed like a simpler PnP; however, this paper deals with tracking input runway the image in real-time with respect to a planar template, to find a homogeneous transformation between the UAV, that can tilt a lot, so that the image is substantially more difficult than off-line feature extraction and matching.
Generally, in the field of image feature matching, there are multiple sources of errors, such as lighting, orientation, or perspective changes. In this paper, the key common challenge was viewpoint changes. Changes in capturing angle of the same object alter many image processing descriptors. Such alteration is unfavorable as it degrades matching accuracy. We focused on improving the matching rate for features on the planar object images.
The issue of matching images of a planar template in real-time using key points with viewpoint changes is the crux of this research. In other words, we propose a method to enhance the accuracy of many existing feature-matching algorithms with the presence of a large perspective transform between scenes, based on geometrical analysis by forecasting the orientation of the runway. Either speeded-up robust features (SURF) or scale-invariant feature transform (SIFT; ) were chosen as representative methods, which have been generally accepted to be very good algorithms. Generally, the proposed scheme applies to other feature detectors as well.
Second, time consumption was tackled using feedback loops. The proposed method uses a PnP algorithm like a well-known method in machine vision of homography to forecast the position of runway; this makes the process a lot faster, which will be shown in the Experimental Results section. Other PnP algorithms are applicable as well, as in Ruchanurucks et al. , depending on their time consumption, which could be benchmarked with our experimental conditions or results.
The rest of the article is organized as follows. Next, a literature review is provided. Then, the overall method is briefly explained. Details of the method are explained next. To avoid confusion, the details are then converted to algorithms. Experimental results are verified using real landing videos of a small UAV. Finally, we conclude that the proposed scheme is essential for landing on a known runway based on feature matching.
This literature review covered two aspects. First, we present vision-based aerial vehicle control using image processing. Second, we discuss PnP solutions for rotation (R) and translation (t).
First, we present vision-based aerial vehicle control using features on the ground/another vehicle. Hence, we mainly present vision-based aerial vehicle control, instead of describing well-documented feature descriptors, as the former provides information on real-world problems and thus leads to our finding.
Following the problem outlined in the Introduction, setting multiple cameras and a high-performance estimating system on the ground is one way to help find out the coordinates of an airplane. Then, the data can be sent back to the airplane to control landing . However, this system setting is cumbersome.
On the other hand, there are also schemes using a known simple marker on the ground and a camera on the plane to achieve clear points of interest around landing area . However, their comprehensive controlling scheme was designed only for miniature air vehicles.
Other similar research to that just mentioned, proposes using one large-sized symbol (an airbag) that can be clearly seen at the landing point . However, this is applicable only if the wings of the UAVs are not damaged by contact with the symbol.
To avoid contact between the plane and the airbag, some works use a symbol on the ground [8,9]. For all such methods above, the objects must large enough and have a different shape from the nearby environment. It is also good if there is a planar area that is large enough to be a runway. However, we cannot argue that such symbols are not useful.
For convenience, some research mentions that no equipment or symbol should be set up at the airport. Instead, they propose to use existing points of interest on the runway, which must be robust enough to accommodate changes of light and tilt angle using methods such as Bay et al.’s  as a feature detector. Our method would be more comprehensive than existing methods.
There are even more advanced matching methods than the local feature methods mentioned earlier. Global methods , which generate a matching map before searching, could be affected by changes in surrounding areas, e.g., a new unknown object in the scene. Such changes would hinder the matching process. Thus, we believe local feature methods are more suitable for matching in dynamic areas such as runways.
Examples of methods that rely on local feature matching methods are Goncalves et al.  and Ding et al. . The former paper relies heavily on homography to assist landing. The latter paper finds the unknown landing point safely by analyzing the relationship of the interesting points or feature points in two pictures taken from different cameras at different times to analyze an area that is smooth and planar enough for a UAV to land without accident. To provide the right data for landing, they need to have many points of interest to average the error in the system.
Regarding PnP algorithms, POSIT (POS with ITerations) by DeMenthon and Davis  is among the oldest; however, it is well-known in pose estimation algorithms, probably because the authors state clearly that it requires only 25 lines of code. In other words, it was fast on that time. The underlying strengths of this work include: (1) using scaled orthographic projection along with the general perspective projection; and (2) their iterative algorithm even enhances the convergence to a better minimum. However, the authors admit that their algorithm does not use the fact that the rotational matrix is orthonormal. In this sense, this work produces sub-optimal outputs.
Efficient perspective-n-point (EPnP)  deals with pose estimation using a non-iterative solution with O(n), where n is the number of feature points. Many state-of-the-art procedures at that time were slower. Furthermore, they claimed that the method is more accurate than many other existing non-iterative solutions at that time. Additionally, even their accuracy was lower than in iterative solutions, though they considered the time consumed was better. Specifically, they stated that their approach was still less accurate than Lu’s . Regardless, Lepetit et al.  claimed that if their method was used to initialize Gauss–Newton, it could achieve accuracy as high as that of Lu .
Lepetit et al.  compared their method (EPnP) with Ansar and Daniilidis , clamped (direct linear transform (DLT) , another least-square algorithm similar to homography), EPnP followed by , EPnP followed by Gauss–Newton. Generally, Lepetit et al.  on its own is inferior to Lu ; however, when used in combination with Lu  or Gauss–Newton, it is almost as accurate as Lu  alone, with less time consumption. For example, it is even used to initialize Lu , and they claim that the overall solution is still O(n). In other words, it is faster than Lu  alone. This is no surprise as they already produce a good initial starting point for Lu . The key idea that makes this algorithm strong is its representation of the coordinates of n 3D points as a weighted sum of four virtual control points. Thus, the optimization is performed over just four coefficients.
Li et al.  is a state-of-the-art approach for various kinds of point cloud, both planar and non-planar. They compared their work with six well-known/state-of-the-art algorithms [14,17] followed by Gauss–Newton [15,19,20]. In most cases, their approach yielded better results, with the presence of image noises. The key ideas that make their algorithm very strong include: (1) never linearize anything unnecessarily, so they solve a perspective-3-point problem without applying the conventional linearizing method, (2) use the two farthest-away points to represent an axis, to reduce noise, and (3) for the remaining two axes, along with the translation vector, calculate using least squares.
However, Li et al.  mention that their result in the planar case is still inferior to that of Schweighofer and Pinz , which targets only the planar case. Schweighofer and Pinz  is a special planar case extension of Lu , which targets both planar and non-planar cases. The extension  assumes there will be maximally two minima for a pose, based on indeed any algorithm (though, this is not extensively proven). An iterative solution of Lu  is selected in their paper as the criteria in Lu  always lead to a convergence (though, maybe not a true pose [15,20]). Hence we should discuss extensively Lu  before Schweighofer and Pinz , whose strong points include: (1) it is an iterative algorithm that is proven to always converge, (2) the optimization is tailored for pose estimation, in contrast to works that rely only on general methods like Gauss–Newton or its extension like Lavenberg–Marquardt, and (3) in each iteration, R/t will be updated and will finally converge to a minimum.
In summary, many landing assist methods, which rely on existing features on a runway, regardless of the PnP algorithm used, suffer from the fact that when the input images are tilted, the feature detection efficiency is reduced. We aim to address this problem in this paper using the widely used homography as our PnP algorithm. We compared differences in the PnP methods for landing assist, without feature warping in Ruchanurucks et al. .
Our method is based on detecting existing points of interest on the runway. The overview of the system is shown in Fig. 1. We further improve earlier works that rely on existing features on a runway [11,12] by estimating the appropriate orientation of a template image to best match with the runway in real-time by performing a perspective transform on template (feature) points. The region of interest (ROI) is also generated to guide image processing on where to look for a runway in the next frame.
Importantly, the proposed method relies a lot on feature detection to acquire points of interest on the runway. Detection and matching are developed based on an algorithm called SIFT or SURF. By using the result of the SIFT or SURF feature of the present frame, as mentioned, we can estimate the orientation of the template image to solve the problem of robustness toward the affine transform of SIFT or SURF as well as to scope the ROI (estimating the position of the airport in pictures and cut out the environment).
Region of Interest Generation.
However, from Fig. 3 where the ROI was adopted for searching, it can be observed that the area of features (dark rectangle) is not well aligned with the runway; despite even performing RANSAC (RANdom SAmple Consensus) because even within the runway, there are multiple features that are the same. Consequently, those outliers contribute to inferior localization of the runway. The reason behind this inferior matching and perspective transformation is the perspective difference between the input image and the template (Fig. 4).
Template’s Perspective Transform.
Importance of Translation Cancel Matrix.
The underlying reason for A is that after generating each newly warped template, the computer sees the warped template as not the only object area, but also sees black surplus pixels like those on the left side of Fig. 5 as well. Over time, the excess space would get larger and push the template out of the frame. Eventually, there would not be enough features in the template left for accurate matching.
Figure 5 shows an example of a near-failed RANSAC after using Eqs. (5) and (6) multiple times (here, 25). The warped template is almost not appropriate for matching anymore due to the scaling/shifting of the template image. One can see the runway template moves further into lower right corner of the upper left area (the area corresponding to the warped template image). In other words, since warping does not guarantee that the warped template would still be within the image coordinates, the template could be occluded, leading to lost tracking in consecutive frames.
Hence, we introduce the notion of frame to generate A matrix. As RANSAC does not provide the result with 100% accuracy, the frame is used to shift the warped template to resolve the mentioned problem. The frame consists of the four corners of the template, which we call them as frame points (Fi). The frame points are warped (F'i) similarly to other points in the template; however, their usage is different. First, these four frame points are generated as follows.
Generating Four Points to Calculate Homography.
Finally, to reduce the computation time during homography (if we use all matching pairs for the homography calculation, it does take a lot of time), we are going to create four virtual template points for homography, so that we do not have to input many detected points to the homography, resulting in a time reduction. Homography requires at least four pairs of points. (A pair here implies a template point and a real-time input point. The number of required points is a different PnP solution to the PnP solution. For example, direct linear transform requires at least six points.)
In Fig. 7, this example chooses the points (Pi) at the corner of the zebra lines. One can choose other locations as they are just four virtual points based relatively on other features.
This subsection concludes all the aforementioned sub-procedures into a block diagram, as shown in Fig. 8, which includes earlier mentioned diagrams. The explanation of this diagram is portrayed as the algorithms shown in the Algorithm section.
This section is designed to interconnect with what has been portrayed. To explain Fig. 8, we subdivide the algorithm into procedures using pseudo code and given as follows:
Starting from setting the matrix value C0 = I. This is the identity matrix.
Detect and match using SIFT or SURF and RANSAC. (Use the ROI to set the area of interest to be only the part within the frame.) The result of this step is homography (Ht), as shown in the earlier block diagram, which portrays the layout of detecting the runway.Algorithm 1
Require: UndisROI Image, Warp Template
Descriptors_Temp⇐Descrip (Warp Template, Keypoints_Temp)
Descriptors_Image⇐Descrip (UndisROI_Image, Keypoints_ Image)
(Descriptors_Template, Descriptors_Image) ⇐ Match (Descriptors_Temp, Descriptors_Image)
Ht⇐GetHomography (Descriptors_Template, Descriptors_ Image)
Receive point values from the frames (Fi) (four points), which are acquired from the template (Q).Algorithm 2
Fi ⇐ GetFramePoint(Q)
Calculate for matrix (At) following Eq. (8) by using the matrix (F′i) as Eq. (7) to prevent translational accumulated error by sub-function (translation cancel) in Algorithm 3.Algorithm 3
Require: Fi, Ht, CtFiCt⇐PerspectiveTransform(Fi, Ct)
HtCtFi⇐PerspectiveTransform (FiCt, Ht)
Calculating for the virtual template point (P′i) by Eq. (9) and the block diagram in Fig. 6 to get the virtual template point (P′i) as a appoint of interest in the template image (Q) after warping, as explained in template generation and the (P′i) is the real-world point calculated in the video streaming.Algorithm 4
Require: Ht, Ct, Pi, ROI
CtPi⇐PerspectiveTransform (Pi, Ct)
P′m⇐PerspectiveTransform (CtPi, Ht)
P′i⇐(P′m + ROI)
Calculate the matrix to determine the angle for the next frame Ct.Algorithm 5
Require: Ct−1, At−1, Ht−1
Ct ⇐ Ct−1 * At−1 * Ht−1
This experiment consists of four parts. The first part is a comparison of the estimated time, the second part is testing of accuracy for detecting an airport with and without angle adjustment (warped template), the third part is the testing of robustness to noise by mixing Gaussian noise into the images, and the fourth part is verifying on detection of an occluded runway.
In first experiment, for the comparison of the estimated time, we design the program to run by using ROI and not using ROI cropping. Figure 9 shows the result of the experiment.
From the experimental result, the algorithm can match the points of interest on the runway correctly. One can see that at the beginning, with or without using ROI, it takes equal time. After the runway is located within consecutive frames, using ROI reduces the computation time greatly; we discussed this with military researchers who have dealt with many types of hardware and they believed the reduced time is applicable in real applications (Fig. 10).
The second experiment compared the accuracy of runway detection between the original and warped templates by warping the template (Q) to a more suitable template (Q’). After matching the image with SIFT or SURF and RANSAC, the result of matching is shown Fig. 11 which compares the results between using ROI only and using ROI + warping.
In detail, Fig. 11 compares the accuracy of runway detection between human expertise, ROI, and the warping image. P′i is the result of matching multiple images in a video, plotted with 15 frames from the video. Comparing human expertise (o-red), using ROI + non-warping (x-green), and using ROI + warping (x-blue), the results of the experiment show there is more error from using ROI + non-warping (x-green). Therefore, using ROI + warping (x-blue) is almost as accurate as human expertise (o-red), which has a deviation from the correct position of approximately ±6 pixels, quantitatively (Fig. 12).
The third experiment considered the robustness with respect to image noise by adding mixed Gaussian noise into the input images, starting from σ = 0 and increasing until the system could not detect the airport.
The results of the experiment indicated that the maximum interference signal that could be handled was σ = 80, as shown in Fig. 13 for detecting the runway with Gaussian noise (σ) values of 0, 30, 60, and 80. The results show that the algorithm is robust regarding the interference signal but only up to σ = 80, as above that the runway was not detected.
In the fourth experiment, we ran the algorithm on a runway with some parts occluded. In essence, we tested the method when the airplane was moving past the end of the runway. From Fig. 14, when the points of interest P3 and P4 are not in the images, the system still can detect the runway because there are enough features that the SIFT or SURF and RANSAC can calculate for the matrix H, so the system still can calculate P3 and P4 even though they are not in the images.
Even though we did not directly test earlier works that also relied on multiple planar markers, the effectiveness of our ROI and (template) occlusion avoidance scheme was clear suitable based on time efficiency and robustness, as shown in Fig. 14, respectively.
This paper presents a method to track a runway with four or more planar features to automatically land a UAV on a birds-eye-image-known runway, without having to measure the distance between all the markers beforehand. The system is superior to existing methods in the sense that we comprehensively rely on multiple parameter feedback loops to enhance similarity between prepared features (in template images) and real-time features. The comprehensive feedback loops are the reason that the system is robust.
Furthermore, utilizing ROI also decreases feature detection area/time; the ROI localization is also part of the feedback scheme. Finally, four virtual template points are selected and warped along with the template to reduce the PnP computation time; this is better than selecting any four points, as the four virtual points represent all the features detected. The overall paradigm is comprehensive and fast enough to tackle real-world problems, especially, for example, when the GPS is jammed.
The method itself is comparatively as complex as many SLAM (Simultaneous Localization And Mapping) methods. However, in contrast to a ground vehicle SLAM, many block diagrams are interconnected to tackle the 3D localization and matching.
Conflict of Interest
There are no conflicts of interest. This article does not include research in which human participants were involved. Informed consent is not applicable. This article does not include any research in which animal participants were involved.
Data Availability Statement
The authors attest that all data for this study are included in the paper.