Monday, January 9, 2017

How I back up my family photos and videos

I have accumulated terabytes of family photos and videos and I will continue to add more. Storage is cheap and getting cheaper. However, it turns out that managing terabytes of data isn't trivial. At least one of my hard disks have failed each year.

How do I store the data? How to back them up? How to minimize the chance of loss for a low cost?

Well, here's my latest "simple" solution.

Hardware:

  • One external Thunderbolt four hard disk case containing:
    • One hard disk for all my data (DATA)
    • One hard disk for cloning (DATA-CLONE)
    • One hard disk for Time Machine (TM)
    • One hard disk for Crashplan (CP)
  • One external hard disk double dock.
  • Several bare hard disks for offline clones of DATA. (All named DATA-OFFLINE)
  • A few bare hard disks for offline clones of CP. (All named CP-OFFLINE)
Automated processes:
  • Time machine backup of DATA to TM.
  • Crashplan backup of DATA to CP.
  • Four times a day clone of DATA to DATA-CLONE using Carbon Copy Cloner (rsync wrapper).
  • Clone DATA to DATA-OFFLINE if DATA-OFFLINE is plugged in.
  • Clone CP to CP-OFFLINE if CP-OFFLINE is plugged in.
Manual processes:
  • Regularly, I'll plug in a DATA-OFFLINE disk or a CP-OFFLINE disk into the hard drive dock.
  • Occasionally, I'll move a DATA-OFFLINE or CP-OFFLINE disk to an offsite location.
  • Occasionally, checksum all the files.

That's it.

This solution satisfies my requirements:

  • Not cloud based.
  • Survive two or more simultaneous hard disk loss.
  • Fast recovery.
  • Do not cause more loss during recovery.
  • Survive single site loss.
  • Survive accidental or malicious data corruption or loss.
  • Survive software bugs in a backup application.
  • Manageable without a Computer Science degree.

Here are some stuff I've learnt along the way:

  • It takes more than 10 hours to copy 2TB of data using USB3 or Thunderbolt. Good luck if you are still stuck with USB2.
  • Get a much larger hard disk than you need. If you have 2TB of data, get 4TB hard disks.
  • rsync is your friend. It can incrementally clone your data. It is a time-tested workhorse. CCC is just a nice wrapper around rsync.
  • Time machine provides versioning. But it uses hard links and doesn't efficiently handle renaming of files or folders.
  • Don't try to clone a Time Machine disk. Time Machine now supports multiple backup disks. Do that instead.
  • Crashplan also provides versioning. However, it is harder to recover data from Crashplan.
  • If you have an up-to-date clone, recovery is easy. Just rename the clone as the new data disk. Then update another clone. Incremental updates should be fast.
  • If you swap in a clone for the data disk, you have to tell Time Machine to ignore the GUID mismatch. Otherwise, Time Machine will copy everything again.
  • Do not store your data on your internal hard disk or SSD.
  • Do not use RAID.
    • RAID is for availability, not backup.
    • Using RAID correctly is hard.
    • Reconstructing a RAID array can trigger further failures.
    • RAID reconstruction can fail due to latent disk errors.
    • RAID reconstruction is slow.
    • The hardware RAID card/device is a single point of failure.
    • Hardware RAID devices may decide to break your RAID setup for unknown reasons.
    • You may not be able to buy a replacement RAID card/device.
    • You need RAID compatible hard disks.
    • Software RAID can also break due to temporary slow drive problems or cable problems.
  • Do not use network attached storage.
    • Lower priced NAS have slow processors and limited memory.
    • Gigabit ethernet is limited to about 100MB/s. A modern hard disk easily achieved 170MB/s.
    • Most NAS use proprietary RAID implementations. If the NAS box fails, good luck.
    • If not using RAID, why use a NAS?
    • It is yet another firmware to update.
    • My NAS developed a memory problem and randomly corrupted my pictures. There were no hardware checks.
  • Do not use USB2/3 external hard disks.
    • USB cables become lose or unplugged easily.
    • Each external hard disk also needs a power supply.
    • USB hubs do not work with external hard disks.
    • Many USB external hard disks do not have cooling fans. They run very hot and they die. The same models run well in a proper case with fans.
    • You can't read SMART status and hard disk temperature through the USB interface.
    • Both Seagate and WD use usb-sata adapter that "change" the geometry of the hard disk. If you remove the hard disk from the case and use it with a dock or plain SATA cable, you won't be able to read the data. The partition table is probably shifted.
  • Do not use external dual hard disks like WD My Book Duo. I've run into total data loss firmware bugs due to the RAID adapter.
  • Use bare 3.5" hard disks. You get to select the brand and model of the hard disks.
  • Use a multi-disk external casing with proper cooling.
  • Use Thunderbolt. You can read the SMART status and temperature.
  • Switch to SSD when prices come down.
  • 2.5" disks are slower but run cooler and more silently.

I have intentionally omitted detailed recovery processes. I hope this helps somebody. Your mileage may vary.

Friday, November 6, 2015

Hyperfocal distance with cropped image cheatsheet

TLDR: Hyperfocal distance is proportional to enlargement.

What happens to the hyperfocal distance if you intend to print a crop at the same size for the same viewing distance?


Back to basic assumptions:
  • Visual acuity of 5lp/mm at 25cm viewing distance. This is equivalent to a final print circle of confusion of 0.2mm.
  • Print size is 8x12. Enlargement = 12"/36mm = 8.47. Circle of confusion = 0.2mm/8.47 = 0.0236mm.
  • If you crop and print at the same size, you increase enlargement and decrease the circle of confusion.

Hyperfocal distance is inversely proportional to circle of confusion and is proportional to enlargement.

Example:

35mm full frame fixed focal length camera
f8.0 aperture
8x12 print size

Hyperfocal distance = 35 * 35 / 8 / 0.0236 = 6.5m.
With 1.5x crop (simulated 52mm), hyperfocal distance = 9.7m.
With 2x crop (simulated 70mm), hyperfocal distance = 13m.

Caveat:

Viewing distance is traditionally supposed to depend on the focal length (to get the same perspective), but these days, everything is viewed at phone distance :)

Thursday, November 5, 2015

Fix Whatsapp picture capture time

Whatsapp strips all EXIF info from any pictures sent through it. Without any EXIF info, Lightroom uses the file modification time as the capture time. If for some reason, the file modification time is wrong, then Lightroom gets confused. Luckily, Whatsapp puts the date in the filename. But Lightroom doesn't know about it.

So, here's a little Go program that changes the file modification time of files to the date in the filenames. Since the exact time is unknown, it is set to 3am local time.

Public domain.

UPDATE 2017/01/10: In addition to updating the file timestamp, also runs "exiftool" to update the EXIF dates in the jpg file. Overwrites the original jpeg file.

Use at your own risk!

package main

import (
        "flag"
        "fmt"
        "os"
        "os/exec"
        "regexp"
        "strconv"
        "time"
)

var trial = flag.Bool("n", true, "trail run")

func main() {
        flag.Parse()
        for _, f := range flag.Args() {
                fix(f)
        }
}

var re = regexp.MustCompile(`^([^-]*-)([0-9][0-9][0-9][0-9])([0-9][0-9])([0-9][0-9])(-WA.*)$`)

func fix(f string) {
        match := re.FindStringSubmatch(f)
        if len(match) > 0 {
                y, _ := strconv.Atoi(match[2])
                m, _ := strconv.Atoi(match[3])
                d, _ := strconv.Atoi(match[4])
                update(f, y, m, d)
        }
}

func update(f string, y int, m int, d int) {
        s, err := os.Stat(f)
        if err != nil {
                fmt.Printf("Unable to stat file %v\n", f)
                return
        }
        oldy, oldm, oldd := s.ModTime().Date()
        oldhh, oldmm, oldss := s.ModTime().Clock()
        hh, mm, ss := 3, 3, 3
        if y != oldy || time.Month(m) != oldm || d != oldd ||
                oldhh != hh || oldmm != mm || oldss != ss {
                t := time.Date(y, time.Month(m), d, hh, mm, ss, 0, time.Local)
                fmt.Printf("%v: %v -> %v\n", f, s.ModTime(), t)
                if !*trial {
                        err := os.Chtimes(f, time.Now(), t)
                        if err != nil {
                                fmt.Printf("Failed to change time on %v\n", f)
                                return
                        }
                        err = exec.Command("exiftool", "-overwrite_original_in_place", "-FileModifyDate>AllDates", f).Run()
                        if err != nil {
                                fmt.Printf("Failed to update exif of %v\n", f)
                                return
                        }
                        err = os.Chtimes(f, time.Now(), t)
                        if err != nil {
                                fmt.Printf("Failed to change time on %v\n", f)
                                return
                        }
                }
        }
}

Wednesday, September 12, 2012

Hyperfocal Distance Cheat


This post was from before the big reset. I keep coming back to this post to recompute the magic number for various lenses. I hope somebody finds this useful.

The magic number for Canon 5D mk II with 40mm 2.8 STM is 53 which is twice that of X100 with 23mm.

------

Once upon a time, I stumbled upon this concept called hyperfocal distance. For each given focal length, aperture and size of the circle of confusion, you can calculate a special number called the hyperfocal distance. When you focus the lens at the hyperfocal distance, everything from half the distance to infinity will be acceptably sharp. This maximizes the depth of field. This is useful for landscapes or groups of people.

There is only one problem. I have to memorize a table of values for each focal length and aperture. Yucks! Sure, I can carry a little pre-calculated table of values. Better but still yucks! Or use a phone app that will calculate it on the fly. That's too slow. Still yucks.

The X100 only has one focal length, 23mm. That's easy. I only need to memorize one chart:
f/2       13.2m
f/2.8     9.37m
f/4        6.64m
f/5.6     4.7m
f/8        3.33m
(computed online using DOFmaster)

Easier, but I'm no Tiger mom trained memorizing machine.

Oh, wait. If I multiply the aperture number by the hyperfocal distance, I get a number slightly above 26. All I need to remember is the number 26! That I can do!

It can't be a coincidence. So, I plugged in numbers into DOFmaster for 5D Mk II at 50mm. The product is about 83. That's when I finally went for the formula. Yes, I should have done it years ago. Stupid me. Wikipedia says:
H = f * f / N / c
(ignoring the irrelevant +f)
where
H is the hyperfocal distance
f is the focal length
N is the aperture number
c is the size of the circle of confusion
Using 0.02mm for circle of confusion, 23mm for focal length, I reproduced the table for X100. That checked out. Since f and c are constant, the formula reduces to:
H = C / N
where
C = f * f / c
For the X100, C = 23mm * 23mm / 0.02mm = 26450mm = 26.45m!

So, to find the hyperfocal length for any aperture, just divide 26.45m by the aperture number! Piece of cake!

Also notice that if you double the focal length, the hyperfocal distance goes up by 4.

For 5D Mk II with a 24mm wide angle lens, the magic number is 19.2m. For 50mm normal, you won't be too far off if you guessed it is about 80m (50 * 50 / 0.03 = 83.33m).

Interestingly, if I'm shooting at f/5.6 on the X100, everything from 2.3m will be in focus if I focus at 4.7m. I can just pre-focus at 4.7m, switch to manual focus to lock it and forget about focusing altogether! This will also be great for videos. The X100's movie mode does not track faces and it tends to shift the focus unnecessarily especially if the background has some high-contrast items like lights.

Hello world!

Hello, world
Hello, 世界