Linux – Linux & Liberal Arts

Link Wednesday (9.20.17)

DIY Git Repository Server
I’ve used local git repositories to keep track of all my various python projects, but until recently I did not use any remotes. However, lately I’ve realized the advantage of increasing separation between my local/experimental branch and master. It really only took me 15 minutes or so to set up a git server on one of my Pis. In doing so I looked at several tutorials but this was by FAR the best and most comprehensive.

Crontab Guru
Crontab isn’t hard to use. Or at least it shouldn’t be. But I find that every time I set up a new task I am wracked with doubts about the exact time it will be triggered. Why? I don’t know…the syntax is clear and I’ve used it for years. To make me feel better, however, some kind person made crontab.guru a website which will explain (in entirely unambiguous terms) what a particular crontab string (i.e. */5 * * * *) will do. It may be a silly safety blanket, but it has made me far happier while crontabbing.

Link Wednesday (9.13.17)

I’ve been keeping an archive of useful how-to articles and have decided to share them as a more-or-less weekly feature… Not to mention as an easy way of archiving them for my own future use.

Americanize Raspberry Pi
http://rohankapoor.com/2012/04/americanizing-the-raspberry-pi/

I don’t know why it took me literally years of working with Raspbian before I realized that the default configuration had several quirks from across the pond. Running through the steps in this article fully “Americanizes” your Pi–i.e. it sets the locales and download locations correctly. Now it is part of my basic New Pi Procedure.

Change Network Interface Name
http://ask.xmodulo.com/change-network-interface-names-permanently-linux.html

Has anyone else noticed that new systems tend to give their network interfaces weird names (like eno63214A… that sort of thing)? If anyone knows why, I’d love to find out but until then I will just vaguely blame it on IPv6. Not to resist the future, but the new style of name makes using wildcards with system monitors like Grafana highly annoying. Plus if eth0 was a good enough for my father and a generation of other Linux nerds, it’s good enough for me, dag nabbit! Unfortunately I could not figure out how to change the interface on my own. Thankfully Google came to the rescue!

Raspberry Pi Undervolt Warning
https://www.raspberrypi.org/forums/viewtopic.php?f=29&t=82373&start=75#p739517

After a long search I discovered that there is really no way to use software to find a Raspberry Pi’s current input voltage or power usage without adding additional hardware. However, there is a tricky way to make sure that your Pi isn’t “browning out” by monitoring the LED that indicates a low-power state. It’s a super clever hack and one that I plan on integrating with my larger monitoring system at some point. (Note this may only work with RPI1 & 2.)

Incremental Backup Script Explained

In an earlier post I laid out why I wanted yet another layer of backups for my important documents and I provided the script without much explanation. As an exercise for myself, I decided to post a step by step explanation for why I wrote the script the way I did. This will be educational for me, and hopefully interesting for you.

Alright, lets start at the beginning.

#! /bin/bash
# 

set -ue

I am highly influenced by David Pashley’s superb article “Writing Robust Bash Shell Scripts.” Pashley is not only a great programmer, he is a superb writer who can really explain his reasoning in a way that I find appealing.

Since I want this script to be robust, adding “set -ue “on the first line is a must. The “u” breaks the script if a variable is unset. This is critical as human error can easily turn a script into a vicious beast that will consume your data. The “e” breaks the script on any error. This ensures that everything behaves as expected. Unless there is a good reason not to, always “set -ue”

For reference here is the explain shell for the command.

Here is the first function:

function build_vars
{
# Current Date / Time
epoch=$(date +%s)
month=$(date +%m)
year=$(date +%Y)
day=$(date +%d)

# Building Environment: Directories
configdir="$config_dir"
yeardir="$backup_dir"/"$year"
monthdir="$yeardir"/"$month"
daydir="$monthdir"/"$day"
backupdir="$daydir"

# Building Environment: File Names
backupfile="$backupdir"/"$prefix"."$epoch".tar.bz2
logfile="$configdir"/"$prefix".log
lastepochfile="$configdir"/"$prefix".last
}

Basically this function sets up all of the variables which will be used in the rest of the script. Putting them in a single function improves readability and makes maintenance easy.

Here we also see the first crumbs of how the backup system works. By finding the current Unix epoch, the script discovers the current time in a non-timezone dependent way. It can then use this as reference point and as a means of ultimately building unique file names.

In addition, this function builds all of the directory names that will be used throughout the remained of the script.

I could probably make this more efficient by only running “date” once, but all of the target systems are multi-Ghz so I can endure a couple of wasted cycles.

Here is the second function:

function check_epoch
{
#Checking Time Since Last Run
if [ -n "$(find "$lastepochfile" -mtime -1)" ]; then 
  echo "Less than 24 hours since last archive"
  exit 0
fi
}

This function uses “find” to check if the file “$lastepochfile” is older than 1 day. As “$lastepochfile” is the last thing created in the script this effectively checks to see if the script successfully completed in the last 24 hours. Since I ended up setting a cron job that runs once an hour, this ensures that even if a system is only powered on for short times during the day the job will still get done but not OVER done which would waste cycles and file space.

For reference here is the explain shell for the find command used.

Here is the third function:

function check_env
{
#Checking Configuration Directory
if [ ! -d "$configdir" ]; then
  mkdir "$configdir"
  echo "$epoch" Making "$configdir" >> "$logfile"
fi

#Checking Year Directory
if [ ! -d "$yeardir" ]; then
  mkdir "$yeardir"
  echo "$epoch" Making "$yeardir" >> "$logfile"
fi

#Checking Month Directory
if [ ! -d "$monthdir" ]; then
  mkdir "$monthdir"
  echo "$epoch" Making "$monthdir" >> "$logfile"
fi

#Checking Day Directory
if [ ! -d "$daydir" ]; then
  mkdir "$daydir"
  echo "$epoch" Making "$daydir" >> "$logfile"
fi
}

While this covers a lot of lines, it is really doing something very simple, it checks to see if the year / month / day directory structure exists for the backup snap-shot that will ultimately be produced. If it does not exist then it makes it.

Effectively what this means is that each snapshot will be in its own directory which will be easy to browse to and find.

Here is the fourth function:

function write_file
{
#Incremental Tar & Compression
echo "$epoch" Writing "$backupfile" >> "$logfile"
find "$target_dir" -type f ! -name ".*" -newer "$lastepochfile" -print0 | \
tar czvf "$backupfile" --null -T -

#Log Success
echo "$epoch" Success "$backupfile" >> "$logfile"
}

This is the payload of the entire script. Everything else has just been leading up to this. First “find” is used to search “$target_dir” for all files that are newer than “$lastepochfile” and do not match the pattern .* . The reasoning for this should be pretty obvious, but I should not that I added the exclusion (by using the !) so that I wouldn’t needlessly copy lockfiles etc.

To ensure proper streaming of the files to tar I used the -printO.

It is important to note that this only searches out changed files and copies only their directory structure. My first approach copied the entire directory structure resulting in mostly empty directories… this resulted in overly large files as all those empty directories eat up space.

The tar line is pretty standard, note I use –null to ensure that it properly deals with null-delimited output and -T sets tar up to receive the filenames.

For reference here is the explain shell for the find and tar commands used.

Success is then logged.

Here is the fifth function

function write_epoch
{
# Unprotect Last Epoch File
chmod 600 "$lastepochfile"

# Write Last Epoch File
echo "$epoch" > "$lastepochfile"

# Protect Last Epoch File
chmod 400 "$lastepochfile"
}

In a way this is the most important part. First it unprotects “$lastepochfile” then it writes the epoch to it and finally the script protects “$lastepochfile.” As the entire system depends on the accuracy of “$lastepochfile” the protection should prevent casual deletion. Worst thing comes to worst, the epoch of the last run is saved in the log file and a new “$lastepochfile” could be created with the date.

Writing the epoch to the file isn’t necessary, but I think it provides a nice backup record and isn’t really that different than simply touching the file.

There is, likely, a way of doing this that does not require an external file (instead referencing the last created archive) but I like that this approach is easy to debug and trivial to understand.

Finally the actual program

# Configuration
prefix=important
target_dir=/home/user/Important
backup_dir=/home/user/ImportantArchive
config_dir=/home/user/ImportantArchive/.kb

#Program
build_vars
check_epoch
check_env
write_file
write_epoch

exit 0

As you can see, I’m really just using functions as a way of organizing the script. I prefer to think in terms of modules which I can replace as needed and this provides that. Also I like that “forcing” an update would be as simple as commenting out “check_epoch”.

So there you have it, it is no masterwork of programming but this script has now been comfortably running on two different systems automatically using CRON for 4 months without a single glitch. I can’t really imagine a better testament to its stability than that!

So what could I improve about it? I’m no expert at scripting so I’d love any feedback that anyone has.

Incremental Backup Script

I am Paranoid! (When it comes to Backups)

There I said it. I am a paranoid person but thankfully most of my paranoia is restricted to backups of important files. When you think about it, an academic truly is their files… my main document directory represents thousands of hours of time and years of my life. Scary thought, right?

So I pursue a multi-layered approach to backing up my critical files. My current approach utilizes manual backups, p2p backups / synchronization among my systems, and backups to Google Drive. However, even that wasn’t enough for me to be comfortable. What if I overwrote a file by accident? Only the p2p network (which uses Syncthing and includes rudimentary versioning) could save me. But what if the conflict resolution system made an incorrect decision and preserved the wrong copy of the file? What then?

Like I said, I’m paranoid. So I decided that each major work system that I use should make incremental backups on a daily basis. Essentially with the command line you can easily search for changed files, then tar them. This produces a snap-shot style backup which could, in theory, be used to regenerate your files as they were at any given time. Write a quick cron task and now it is scheduled and automated.

However, just to make things more complicated I had three more requirements:

While I want no more than one incremental, snap-shot style backup per day, I cannot guarantee that each system will be on at a given time.
I want the snap-shots to be easy to navigate.
As this is a script that I intend to run automatically for years, it has to have robust error handling and logging.

So this is now more complex than a single line of BASH, time for some scripting!

I am attaching the solution I came up with but I’ll explain my reasoning for each thing more in another post

Here is the script I came up with. There is nothing innovative about it but it has been running once an hour (it is scheduled with cron) for 4 months without a single hiccup or fault–my definition of success! If it is useful to you feel free to use/modify/abuse it… consider it under the MIT License.

Thanks for reading and I’d love to hear any thoughts that people have on how to improve it

#! /bin/bash
# 

set -ue

function build_vars
{
# Current Date / Time
epoch=$(date +%s)
month=$(date +%m)
year=$(date +%Y)
day=$(date +%d)

# Building Environment: Directories
configdir="$config_dir"
yeardir="$backup_dir"/"$year"
monthdir="$yeardir"/"$month"
daydir="$monthdir"/"$day"
backupdir="$daydir"

# Building Environment: File Names
backupfile="$backupdir"/"$prefix"."$epoch".tar.bz2
logfile="$configdir"/"$prefix".log
lastepochfile="$configdir"/"$prefix".last
}

function check_epoch
{
#Checking Time Since Last Run
if [ -n "$(find "$lastepochfile" -mtime -1)" ]; then 
  echo "Less than 24 hours since last archive"
  exit 0
fi
}

function check_env
{
#Checking Configuration Directory
if [ ! -d "$configdir" ]; then
  mkdir "$configdir"
  echo "$epoch" Making "$configdir" >> "$logfile"
fi

#Checking Year Directory
if [ ! -d "$yeardir" ]; then
  mkdir "$yeardir"
  echo "$epoch" Making "$yeardir" >> "$logfile"
fi

#Checking Month Directory
if [ ! -d "$monthdir" ]; then
  mkdir "$monthdir"
  echo "$epoch" Making "$monthdir" >> "$logfile"
fi

#Checking Day Directory
if [ ! -d "$daydir" ]; then
  mkdir "$daydir"
  echo "$epoch" Making "$daydir" >> "$logfile"
fi
}

function write_file
{
#Incremental Tar & Compression
echo "$epoch" Writing "$backupfile" >> "$logfile"
find "$target_dir" -type f ! -name ".*" -newer "$lastepochfile" -print0 | tar czvf "$backupfile" --null -T -

#Log Success
echo "$epoch" Success "$backupfile" >> "$logfile"
}

function write_epoch
{
# Unprotect Last Epoch File
chmod 600 "$lastepochfile"

# Write Last Epoch File
echo "$epoch" > "$lastepochfile"

# Protect Last Epoch File
chmod 400 "$lastepochfile"
}

# Configuration
prefix=important
target_dir=/home/user/Important
backup_dir=/home/user/ImportantArchive
config_dir=/home/user/ImportantArchive/.kb

#Program
build_vars
check_epoch
check_env
write_file
write_epoch

exit 0

Share this:

Share this:

Share this:

I am Paranoid! (When it comes to Backups)

Share this: