Friday, December 12, 2014

How to set up automatic Github uploads

Back to basics

For years, I have been playing with US Weather Data.

A couple years ago, I came up with a lookup server for US weather locations, and -of course- had all kinds of plans for how to expand it to more services and more countries.

So, of course, it went nowhere.

This week, I took a look at the issue again, and came up with a simple set of lookup tables (yes, a tiny database) to replace the server. It reduced the amount of code by about 70%, and it's finally in a maintainable, expandable form. It's simple, and ready for a few drive-by patches.

Which leads to the next question...how to share it with the world?

Well, in this decade, sharing software and data means putting it on GitHub.


Creating the GitHub repositories

We have a Python 3 script that creates the database, another Python 3 example script that consumes the database, and the three CSV files that make up the database tables. Most consumers will probably just want the data for their own purposes - they really don't need the scripts (and it would be quite rude for millions of users to hit NOAA servers to produce their own databases each week)

So I created two repositories on GitHub, one for the data and another for the optional scripts.

If you haven't used GitHub, you first go online and create your account and create the repositories (with initial README file) in a web browser. Then:

cd /tmp
git clone https://github.com/ian-weisser/data   # Create a clone of the online repository
cd data
cp /home/somewhere/else/data/* ./               # Add existing data files into the local clone
git add ./*                                     # Tell git to track the new data files
git commit -m "Add initial data"
git push origin master                          # Push the changes back to GitHub
  Username for 'https://github.com':
  Password for 'https://me@github.com': 
cd /tmp 
rm -rf /tmp/data                                # Clean up

Overall, it's pretty simple. For the next iteration:
  • We must automate the upload credentials so it doesn't prompt for a username and password each time.
  • We must add a cron job that creates the data for upload.
  • Let's move this project from /tmp on my laptop to the server where it will live permanently. This also means that the server should pull the latest Python script...which is also hosted on GitHub.

Let's setup the server:


sudo apt-get install git    # On this server, about 3 MB of download
sudo apt-get install python3-httplib2  # Script dependency
mkdir uploads
cd uploads
git clone https://github.com/ian-weisser/data          # Setup the local 'data' repository
git clone https://github.com/ian-weisser/weather       # Setup the local 'software' repository
weather/nws_database_creator.py                        # Use the script to update CSVs in ~/uploads/data
cd data
git config --global user.email "my_email@example.com"  # First time setup
git config --global user.name "My Name"                # First time setup
git config credential.helper store                     # First time using this repo setup
git add -A
git commit -m "Test update"
git push origin master
  Username for 'https://github.com': my-github-login-name   # First time using this repo setup
  Password for 'https://my-github-login-name@github.com':   # First time using this repo setup


Automating the process

So now subsequently, without all the first time setup:

#!/bin/sh

dir=/home/ian/uploads
hub="https://github.com/ian-weisser"
now=$(/bin/date +%F)

cd $dir/weather
/usr/bin/git pull "$hub"/weather      # Update the database_creator script

cd $dir/data
/usr/bin/git pull "$hub"/data         # Need to pull to be able to push later

$dir/weather/nws_database_creator.py  # Do the update
/usr/bin/git add -A                   # Record any changes
/usr/bin/git commit -m "Scheduled update $now"
/usr/bin/git push origin master
exit 0


And now we can run the shell script weekly from a user cron job:

5 6 * * 7 /home/ian/uploads/cron/weekly_weather > /home/ian/uploads/weather.log 2>&1


A couple of tests, and it works.

It's completely automatic.

Once a week, my server downloads the latest make-a-database script, builds the database, and uploads the database to GitHub for the world to use.


The data


If you want to see the final product (the lat/lon lookup tables for US weather), you can get them at:
https://raw.githubusercontent.com/ian-weisser/data/master/metar.csv
https://raw.githubusercontent.com/ian-weisser/data/master/radar.csv
https://raw.githubusercontent.com/ian-weisser/data/master/zone.csv

They will, of course, be automatically updated weekly.

No comments: