

Overview
This project focuses on colorizing historical black and white images by aligning three separate color channels (red, green, and blue) that were captured using different filters. The main challenge is that these channels are often misaligned due to camera movement between exposures, requiring precise alignment algorithms to reconstruct the original color image. I implemented a multi-scale pyramid alignment algorithm using normalized cross-correlation (NCC) as the similarity metric.
Approach
The approach works by:
- Image Preprocessing: Crop border (10% from each edge) to eliminate artifacts that could interfere with alignment
- Multi-scale Alignment: Build image pyramids to handle large displacements efficiently
- Normalized Cross-Correlation: Use NCC to find optimal alignment offsets between color channels
- Edge-based Alignment: Sobel edge detection followed by NCC on the edge maps to solve Emir case
Part 1: Start with the simple example
To start the project, I start with the simple smaller example monastry.jpg
and cathedral.jpg
. After cropping the image into three separate parts (b, g, r), each part has a size of around (341, 390). While we are suggested to use alignment to align three different channels, naively calculating the result will lead to this:


We can see that the naive implementation is not good enough, the actual building is not aligned. Why is this happening? In the alignment process, I am using normalized cross-correlation to align the image. The normalized cross-correlation is a measure of the similarity between two images. The higher the normalized cross-correlation, the more similar the two images are. To find out the reason, I print out all the channels and we can see that there are borders along the edges of the image, and these borders contribute to the high normalized cross-correlation score.



Therefore, I choose to crop the image borders (each by 10% of the width and height) to get the following result:


Part 2: Play with Image Pyramid
To cope with larger files, I implement the image pyramid to downsample the image. This is because the simple way of aligning the image is of
By downsampling the image, we can first conduct a coarse alignment on the smaller image, upsample the image to the original size, and then conduct a finer alignment with smaller search range. This way, we can avoid the expensive computation of the large image.
Image | Shape | Regular Time (s) | Pyramid Time (s) | Speedup | Green Shift (Reg → Pyr) | Red Shift (Reg → Pyr) |
---|---|---|---|---|---|---|
emir.tif | 2570×2963 | 126.24 | 14.13 | 8.94× | (15, 15) → (49, 24) | (15, -4) → (26, -829) |
italil.tif | 2586×2978 | 133.96 | 10.31 | 12.99× | (15, 15) → (38, 21) | (15, 15) → (76, 35) |
church.tif | 2563×2909 | 131.11 | 21.51 | 6.09× | (15, 4) → (25, 4) | (15, -13) → (58, -4) |
three_generations.tif | 2570×2973 | 114.83 | 8.26 | 13.90× | (15, 12) → (53, 14) | (15, 8) → (112, 11) |
lugano.tif | 2597×3026 | 110.24 | 9.02 | 12.23× | (15, -15) → (41, -16) | (15, -15) → (92, -29) |
melons.tif | 2594×3017 | 103.43 | 7.81 | 13.25× | (15, 3) → (81, 10) | (15, 10) → (178, 13) |
lastochikino.tif | 2594×2961 | 99.84 | 7.50 | 13.30× | (-2, -2) → (-2, -2) | (15, 0) → (75, -8) |
icon.tif | 2597×2994 | 96.98 | 7.65 | 12.68× | (15, 15) → (41, 17) | (0, 15) → (89, 23) |
siren.tif | 2601×3056 | 121.68 | 9.88 | 12.32× | (15, -7) → (49, -6) | (15, -15) → (95, -25) |
self_portrait.tif | 2602×3049 | 127.43 | 10.11 | 12.60× | (15, 15) → (78, 29) | (15, 15) → (176, 37) |
harvesters.tif | 2577×2948 | 118.20 | 10.04 | 11.77× | (15, 15) → (59, 16) | (15, 6) → (124, 13) |
We can see that if we use the naive implementation, most of the images will reach the shifting bound. However, with the image pyramid, the pictures are allowed to align with each other from a larger search range, thus leading to a much better result. The results from pyramid are here:











It is really interesting though that one of the images produced by the pyramid has relatively large shift error, which is the emir.tif
. As mentioned in the project description:
Note that in the case like the Emir of Bukhara (show on right), the images to be matched do not actually have the same brightness values (they > are different color channels), so you might have to use a cleverer metric, or different features than the raw pixels.
Therefore, I separately implement a different metric to align the image. I use the sobel kernel to extract the edges of the image, and then use the normalized cross-correlation to align the image. Here you can see the edges are much more aligned.




Part 3: Self-selected Example
I have selected two more examples from the gallery, naming the house and the lake. And here are the results:

