Tuesday, April 1, 2014

tkinter clock

I did some playing around with tkinter after seeing this post.

Here is my solution: How to easily display a running clock (text, not graphical) using tkinter.

The Clock Class

A running clock requires two functions: One to create the tkinter widget, and one to continuously update the widget with the current time.

If we house these in a class, then creating a clock (or multiple clocks) gets much easier. All the complexity is hidden from the mainloop and the rest of your script.

A newly created Clock widget will *automatically* update itself. Both functions call tick(). Yeah, tick() calls itself upon completion. Cool little trick tkinter offers for precisely uses like this.

Here's an example class:

import tkinter
import time

class Clock():
    """ Class that contains the clock widget and clock refresh """

    def __init__(self, parent):
        Create the clock widget
        It's an ordinary Label element
        self.time = time.strftime('%H:%M:%S')
        self.widget = tkinter.Label(parent, text=self.time)
        self.widget.after(200, self.tick)      # Wait 200 ms, then run tick()

    def tick(self):
        """ Update the display clock """
        new_time = time.strftime('%H:%M:%S')
        if new_time != self.time:
            self.time = new_time
        self.widget.after(200, self.tick)      # 200 =  millisecond delay
                                               #        before running tick() again 

And you implement it rather like this:

if __name__ == "__main__":
    Create a tkinter window and populate it with elements
    One of those elements merely happens to include the clock.

    # Generic window elements

    window = tkinter.Tk()
    frame  = tkinter.Frame(window, width=400, height=400 )

    # Add the frame elements, including the clock

    label1 = tkinter.Label(frame, text="Ordinary label")
    clock2  = Clock(frame)             # Create the clock widget
    clock2.widget.pack()               # Add the clock widget to frame
    label3 = tkinter.Label(frame, text="Ordinary label")


We created and placed the clock2 widget almost like any other widget. We simply used the Clock class instead of the tkinter.Label class. There's no need to start() or stop() the clock widget - once created, it automatically updates itself.

We can place multiple clock widgets within the same frame without problems.

Advanced Placement

The Clock class works will all geometry managers, just like any widget:


If we don't need to retain the 'clock' variable, or customize it's appearance, then we can simply place it like any other widget:

    tkinter.Label(frame, text="Ordinary label").pack()


One easy way to customize the clock's appearance is to use .configure
All Label configure options will work. The clock is simply a label.

    clock2 = Clock(frame)         # Create the clock widget
    clock2.widget.pack()          # Add the clock widget to frame


There is one more simplification we can make, detailed as part of this tkinter example. We can change the class to inherit Label properties instead of creating a .widget attribute.If you're new to Python classes, the concept of inheriting from a class may take a few tries to wrap your brain around.


class Clock():
    def __init__(self, parent):
        self.widget = tkinter.Label(parent, text=self.time)

if __name__ == "__main__":


class Clock3(tkinter.Label):
    def __init__(self, parent=None):
        tkinter.Label.__init__(self, parent)

if __name__ == "__main__":

Final result:

The final clock class looks like:

class Clock3(tkinter.Label):
    """ Class that contains the clock widget and clock refresh """

    def __init__(self, parent=None):
        tkinter.Label.__init__(self, parent)
        Create and place the clock widget into the parent element
        It's an ordinary Label element.
        self.time = time.strftime('%H:%M:%S')
        self.configure(text=self.time, bg='yellow')
        self.after(200, self.tick)

    def tick(self):
        """ Update the display clock every 200 milliseconds """
        new_time = time.strftime('%H:%M:%S')
        if new_time != self.time:
            self.time = new_time
        self.after(200, self.tick)

And you implement it rather like this:

if __name__ == "__main__":
    Create a tkinter window and populate it with elements
    One of those elements merely happens to include the clock.

    # Generic window elements

    window = tkinter.Tk()
    frame  = tkinter.Frame(window, width=400, height=400 )

    # Add the frame elements, including the clock

    label1 = tkinter.Label(frame, text="Ordinary label")
    clock2 = Clock(frame)             # Create the clock widget
    clock2.configure(bg='yellow')     # Customize the widget
    clock2.pack()                     # Add the clock widget to frame
    label3 = tkinter.Label(frame, text="Ordinary label")


Wednesday, March 12, 2014

Importing GTFS files into SQLite

A great comment in my previous post about demystifying GTFS transit schedule data pointed out that the various files in a GTFS file are simply database tables. Each file can be imported into a relational database as a separate table, and queried using SQL instead of the custom scripts I used.

In fact, I found SQL to be faster and easier to maintain than the Python script.
So thanks, Stefan, for the tip!

Here's a little more detail about exactly how to do it.

We will use the very simple, fast application SQLite for this, since our tables and queries will be rather simple and straightforward. Other possible databases include MongoDB and CouchDB. Indeed, for the very simple queries we used before, a series of good-old gdbm key-value databases could work.

Setup and importing GTFS tables into SQLite

In Ubuntu, installing SQLite3 is very simple:

sudo apt-get install sqlite3

Next, let's manually download the GTFS file for the Milwaukee County Transit System, uncompress it, create a new database, add a table to the database for the stops file, import stops file into the database, and save the database.

$ mkdir /home/me/GTFS                               # Create a working directory 
$ wget -O /home/me/GTFS/mcts.gtfs http://kamino.mcts.org/gtfs/google_transit.zip
                                                    # Download the GTFS file
$ unzip -d /home/me/GTFS /home/me/GTFS/mcts.gtfs    # Unzip the GTFS file
$ sqlite3

sqlite> attach /home/me/GTFS/mcts.db as mcts        # Create a new database
sqlite> create table stops(stop_id TEXT,stop_code TEXT,stop_name TEXT,
                           stop_desc TEXT,stop_lat REAL,stop_lon REAL,
                           zone_id NUMERIC,stop_url TEXT,timepoint NUMERIC);
sqlite> .separator ","                              # Tell SQLite that it's a CSV file
sqlite> .import /home/me/GTFS/stops.txt stops       # Import the file into a db table
sqlite> .dump                                       # Test the import
sqlite> delete from main.stops where stop_id like 'stop_id';  # Delete the header line
sqlite> select * from mcts.stops where stop_id == 5505;       # Test the import
sqlite> .backup mcts /home/me/GTFS/mcts.db          # Save the database
sqlite> .quit

Scripting imports

We can also script it. Here's a more robust script that creates multiple tables. The column names are explained on the first line of each table (which is why we must delete that line). The data types -TEXT, REAL, and NUMERIC-, and conversions from various Java, Python, C, and other datatypes are clearly explained in the SQLite documentation. The explanation of the field names, and expected datatype, is explained in the GTFS documentation. Each provider's GTFS file can include many optional fields, and may use different optional fields over time, so you are *likely* to need to tweak this script a bit to get it to work:

create table agency(agency_id TEXT,agency_name TEXT,agency_url TEXT,
                    agency_timezone TEXT,agency_lang TEXT, agency_phone TEXT);
create table calendar_dates(service_id TEXT,date NUMERIC,exception_type NUMERIC);
create table routes(route_id TEXT,agency_id TEXT,route_short_name TEXT,
                    route_long_name TEXT,route_desc TEXT,route_type NUMERIC,
                    route_url TEXT,route_color TEXT,route_text_color TEXT);
create table shapes(shape_id TEXT,shape_pt_lat REAL,shape_pt_lon REAL,
                    shape_pt_sequence NUMERIC);
create table stops(stop_id TEXT,stop_code TEXT,stop_name TEXT,
                   stop_desc TEXT,stop_lat REAL,stop_lon REAL,
                   zone_id NUMERIC,stop_url TEXT,timepoint NUMERIC);
create table stop_times(trip_id TEXT,arrival_time TEXT,departure_time TEXT,
                        stop_id TEXT,stop_sequence NUMERIC,stop_headsign TEXT,
                        pickup_type NUMERIC,drop_off_type NUMERIC);
create table trips(route_id TEXT,service_id TEXT,trip_id TEXT,
                   trip_headsign TEXT,direction_id NUMERIC,
                   block_id TEXT,shape_id TEXT);
.separator ','
.import /home/me/GTFS/agency.txt agency
.import /home/me/GTFS/calendar_dates.txt calendar_dates
.import /home/me/GTFS/routes.txt routes
.import /home/me/GTFS/shapes.txt shapes
.import /home/me/GTFS/stops.txt stops
.import /home/me/GTFS/stop_times.txt stop_times
.import /home/me/GTFS/trips.txt trips
delete from agency where agency_id like 'agency_id';
delete from calendar_dates where service_id like 'service_id';
delete from routes where route_id like 'route_id';
delete from shapes where shape_id like 'shape_id';
delete from stops where stop_id like 'stop_id';
delete from stop_times where trip_id like 'trip_id';
delete from trips where route_id like 'route_id';
select * from stops where stop_id == 5505;

And run that script using:

$ sqlite3 mcts.db < mcts_creator_script

Reading GTFS data from SQLite

Now, like in the previous GTFS post, let's find the next buses at the intersection of Howell and Oklahoma. There are four stops at that location: 658, 709, 5068, and 5152.

First, let's find the appropriate service codes for today's date:

# Query: The list of all service_ids for one date.
sqlite> SELECT service_id FROM calendar_dates WHERE date == 20140310;
[...long list...]

Scripting reads

These queries can also be scripted. Here's an example script that looks up the four stops we care about for a two-hour window:

-- Usage:  $ sqlite3 GTFS/mcts.db < GTFS/mcts_lookup.sh
-- Usage:  sqlite> .read GTFS/mcts_lookup.sh 

-- List of the valid service_id codes for the current date
CREATE VIEW valid_service_ids AS
   SELECT service_id 
   FROM calendar_dates 
   WHERE date == strftime('%Y%m%d', 'now', 'localtime')

SELECT stop_times.arrival_time, trips.route_id, trips.trip_headsign
   FROM trips, stop_times

   -- Match the trip_id field between the two tables
   WHERE stop_times.trip_id == trips.trip_id

   -- Limit selection to the stops we care about 
   AND stop_times.stop_id IN (658,709,5068,5152)

   -- Limit selection to service_ids for the correct day
   AND trips.service_id IN valid_service_ids

   -- Limit selection to the next hour from now
   AND stop_times.arrival_time > strftime(
                                 '%H:%M:%S', 'now', 'localtime', '-5 minutes')
   AND stop_times.arrival_time < strftime(
                                 '%H:%M:%S', 'now', 'localtime', '+1 hour')
   ORDER BY stop_times.arrival_time

-- Clean Up
DROP VIEW valid_service_ids;

And there are two ways to run the script:

sqlite&gt> .read lookup_script         # Within sqlite
$ sqlite3 mcts.db < lookup_script      # Shell script


I found that importing GTFS files into SQLite requires a lot of memory and CPU...but is faster than Python, and the scripts are smaller and easier to maintain. SQLite is a good, fast processor or pre-processor.

File sizes:

  • mcts.gtfs: 5.6M
  • mcts.db: 79M
  • mtcs.gtfs unzipped files: 86M
I think that preprocessing or shrinking those files will be important for low-power or low-bandwidth applications.

Query times:

Here are the query times for buses within a two-hour window at Howell and Oklahoma:
  • Python: 13 sec
  • SQLite: 2.5 sec

Code size:

To do those queries,
The Python 3 script: 206 lines.
SQLite script to import the GTFS file: 33 lines.
SQLite script to lookup the stops: 32 lines.

Friday, January 24, 2014

Transit schedule data demystified - using GTFS

General Transit Feed Specification (GTFS) is the Google-originated standard format for transit route, stop, trip, schedule, map, and fare data. Everything except realtime.

It's called a feed because it (usually) includes an RSS update for changes.
There are lists of feeds on the Google wiki, and on the separate GTFS data website.

Each organization's GTFS file includes all their services, so some agency files can get pretty big, and get updated often. Any schedule change or route adjustment means a new release of the entire GTFS file. The file itself is merely a big zipfile, containing several csv files that are strangely required to be mislabelled as .txt.

Here's the contents of Milwaukee County Transit System's GTFS file:

$ unzip -l mcts.zip 
Archive:  mcts.zip
  Length      Date    Time    Name
---------  ---------- -----   ----
      169  2014-01-10 05:01   agency.txt
    40136  2014-01-10 05:00   calendar_dates.txt
     5746  2014-01-10 05:01   routes.txt
   307300  2014-01-10 05:00   stops.txt
 35198135  2014-01-10 05:00   stop_times.txt
   650622  2014-01-10 05:01   trips.txt
  8369736  2014-01-10 05:01   shapes.txt
     3490  2014-01-10 05:01   terms_of_use.txt
---------                     -------
 44575334                     8 files

Yeah, 44MB unzipped.
But only 5MB zipped. Still not something you want to download every day to your phone.

Let's find a stop at Mitchell International Airport:

$ cat stops.txt | grep AIRPORT
7168,7168,AIRPORT,,  42.9460473, -87.9037345,,,1
7162,7162,AIRPORT & ARRIVALS TERMINAL,,  42.9469597, -87.9030569,,,0

It's right, there are two stops at the airport. Each stop has a latitude and longitude, a unique ID number, and a descriptive name. The final field designates a timepoint (1=Timepoint, 0=Not).

Let's try an intersection where two routes cross:

$ cat stops.txt | grep "HOWELL & OKLAHOMA"
709,709,HOWELL & OKLAHOMA,,  42.9882051, -87.9043319,,,1
658,658,HOWELL & OKLAHOMA,,  42.9885464, -87.9045333,,,1
$ cat stops.txt | grep "OKLAHOMA & HOWELL"
5152,5152,OKLAHOMA & HOWELL,,  42.9881561, -87.9046550,,,1
5068,5068,OKLAHOMA & HOWELL,,  42.9883466, -87.9041176,,,1

Here's a problem that will require some logic to solve. I consider the intersection to be one place (not a GTFS term). Many trips and routes can use the same stop. Multiple stops (GTFS terms) can exist at the same place. In this case, northbound, southbound, eastbound, and westbound buses each have a different stop at the same place.

This might make your job easier...or harder.

GTFS cares about trips and stops. It doesn't care that Stops #709 and #5152 are twenty meters apart, and serve different routes - that it's a transfer point. Nothing in GTFS explicitly links the two stops. Generally, you must figure out the logic to do that - you have the lat/lon and the name to work with.

GTFS does have an optional transfers.txt file, that fills in the preferred transfer locations for you. But that's for a more advanced exercise.

Let's see what stops at #709:

$ grep -m 5 ,709, stop_times.txt 
4819177_1560,06:21:00,06:21:00,709,         14,,0,0
4819179_1562,06:49:00,06:49:00,709,         14,,0,0
4819180_1563,07:02:00,07:02:00,709,         14,,0,0
4819181_1564,07:15:00,07:15:00,709,         14,,0,0
4819182_1565,07:28:00,07:28:00,709,         14,,0,0

These fields are trip_id, arrival_time, departure_time, and stop-sequence (14th).

Let's see the entire run of trip 4819177_1560:

$ grep 4819177_1560 stop_times.txt 
4819177_1560,06:09:00,06:09:00,7162,          2,,0,0  # Hey, look - stops out of sequence in the file
4819177_1560,06:09:00,06:09:00,7168,          1,,0,0  # Begin Trip
4819177_1560,06:11:00,06:11:00,7178,          3,,0,0
4819177_1560,06:20:00,06:20:00,8517,         13,,0,0
4819177_1560,06:21:00,06:21:00,709,         14,,0,0   # Howell & Oklahoma
4819177_1560,06:22:00,06:22:00,711,         15,,0,0
4819177_1560,07:17:00,07:17:00,1371,         66,,0,0
4819177_1560,07:19:00,07:19:00,6173,         67,,0,0
4819177_1560,07:20:00,07:20:00,7754,         68,,0,0  # End of trip 

We can also look up more information about trip 4819177_1560:

$ grep 4819177_1560 trips.txt 

This needs a little more explanation
  • route_id: Green Line (bus)
  • service_id (weekday/days-of-service): 13-DEC_WK
  • direction_id (binary, 0 or 1): 0
  • block_id (useful only if the same bus changes routes): 515111
  • shape_id (useful for route maps): 13-DEC_GRE_0_12

Let's look up the route_id:

$ grep GRE routes.txt
  GRE,MCTS,  GRE,MetroEXpress GreenLine,,3,http://www.ridemcts.com/Routes-Schedules/Routes/GRE/,,

The full route name is MetroEXpress GreenLine, it's a bus (type-3 = bus) route, and we have the operator website for it.

Let's look up the service_id:

$ grep -m 10 13-DEC_WK calendar_dates.txt 

Ah, this specific trip is a weekday (Monday-Friday) only trip.

Let's look up the route map shapefile for the trip:

$ grep 13-DEC_GRE_0_12 shapes.txt 
13-DEC_GRE_0_12,  42.946054, -87.903810,10001
13-DEC_GRE_0_12,  42.946828, -87.903659,10002
13-DEC_GRE_0_12,  42.946824, -87.903588,10003
13-DEC_GRE_0_12,  42.946830, -87.903472,10004
13-DEC_GRE_0_12,  43.123137, -87.915431,670004
13-DEC_GRE_0_12,  43.123359, -87.915228,670005
13-DEC_GRE_0_12,  43.124016, -87.914535,670006
13-DEC_GRE_0_12,  43.124117, -87.914440,670007

The line for this trip has 520 points. That's pretty detailed.

So what do we know?

We know that Stop #709 is served by the GreenLine route, it's the 14th stop in direction 0, it's a bus line, we have all the times the stop is served, and we have the route website. We know the route map and all the other stops of any trip serving that stop.

How can we find the next scheduled bus at stop #709?

One way is to start with all trips that stop at #709 from stop_times.txt.

Since we probably know what time it is, we can filter out all the past times, and most of the future times. This leaves us with a nice, small list of, say, 10 possibles that include trips that don't run today at all (we must delve deeper to determine).

We can look up each of those trips in trips.txt, and get the route.

Each trip also includes a service_id code. The calendar_dates.txt file tells us which dates each service_id code is valid.

Right, we need to do three lookups.

The shell code gets a bit complex with three lookups, so I shifted to Python and wrote a basic next-vehicle-at-stop-lookup in about 160 lines. Python lists are handy, since it can handle all the stops at a location just as easily as a single stop. Python's zip module is also handy, so I can read data directly from the zipfile. But at 13 seconds, Python is probably too slow for this kind of application:

$ time ./next_bus.py 

Next departures from Howell & Okahoma
16:28    51 TO 124TH ST. - VIA OKLAHOMA
16:43    51 TO 124TH ST. - VIA OKLAHOMA
16:45    51 TO NEW YORK

real 0m13.171s   # Ugh. If I had started 13 seconds sooner, I wouldn't be bored now.
user 0m10.740s
sys 0m0.260s

All that time crunching the GTFS file has not gone unnoticed.

Trip planners (like Google) pre-process the data, mapping out and caching link-node and transfer relationships, limiting the trip data to the next hour or two (as appropriate), and using rather fancy algorithms to prune the link-node map to a likely set of  possibilities before looking at trips along those links.

That's one reason Google Transit is much faster than 13 seconds.

But that's all advanced stuff.

Also advanced is how to integrate real-time data, which uses one of several different formats. Next time...

Sunday, January 5, 2014

Upstart Jobs at login

Login is not the same is startup. Let's just get that out of the way first.
  • Startup is the time between boot and the login screen. It's the habitat of system jobs.
  • Login is the time after you enter your password. It's the habitat of user jobs.

The easy way to run a task at login is to run a script from your .bashrc.
And the (deceptively not-) easy way to run a task at logout is to run a script from your .bash_logout

But today we're not doing it the easy way. Today we're going to use dbus and Upstart.

Emitting Upstart Signals from your .bashrc

It's terribly easy.

1) Emit a user-level Upstart signal by adding a line to .bashrc:

# Upstart signal that .bashrc is running
initctl emit "I_AM_LOGGING_IN"

2) Add a user-level Upstart job to ~.config/upstart/ for one user, or to /usr/share/upstart/sessions/ for all users:

# /home/$USER/.config/upstart/login_test.conf
description "login test"
start on I_AM_LOGGING_IN            # Start criteria
exec /bin/date > /tmp/login_test    # Do something

3) Open a new terminal window (to load the new .bashrc). When you open the window, the Upstart job creates the tempfile at /tmp/login_test.

Clean up: Restore your bashrc, and delete the sample Upstart job.

Can I emit system-level Upstart signals from .bashrc?

Not directly. The script runs as a user, not as root.

You can use a secondary method of triggering system-level Upstart signals, like sending a Dbus signal, or manipulating a file, or connecting to a socket.

Can I emit Upstart signals from .bash_logout?


Using initctl emit in .bash_logout will merely result in an error. The user-level Upstart daemon seems to be terminated before .bash_logout is run. The command will return a cryptic "Rejected send message"error from PID 1 (system Upstart). Since .bash_logout is not running as root, it cannot emit system-level signals.

Also, GUI terminal programs do not not run .bash_logout, unless you specify compatibility (with a flag) when you start.

That easy way of doing login actions is still too hard

Boy, are you difficult to please.

Okay, there is an even easier way, but it's more complicated to explain: Instead of .bashrc emitting an Upstart event, let Upstart listen for a dbus signal.

Here is an example of the dbus message that occurs when I login via SSH to a new session. This signal is emitted by systend-logind every time a new TTY, ssh, or X-based GUI login occurs.

The signal is not emitted when you are in a GUI environment and simply open a terminal window - that's not a login, that's a spawn of your already-existing GUI environment:

signal sender=:1.3 -> dest=(null destination) serial=497 
  path=/org/freedesktop/login1; interface=org.freedesktop.login1.Manager;
    string "4"
    object path "/org/freedesktop/login1/session/_34"

The important elements are the source, the "SessionNew" signal, and the path of the new session.

Aside, let's query systemd-logind to find if the login is to a TTY, X session, or SSH. logind has lots of useful information about each session:

$ dbus-send --system                              \ 
            --dest=org.freedesktop.login1         \
            --print-reply                         \
            --type=method_call                    \
            /org/freedesktop/login1/session/_34   \ # Path from the signal
            org.freedesktop.DBus.Properties.Get   \
            string:org.freedesktop.login1.Session \
method return sender=:1.3 -> dest=:1.211 reply_serial=2
   variant       string "sshd"

It's right. I did connect using ssh.

Now let's construct an Upstart job that runs when I login via a TTY, X Session, or SSH. We will use Upstart's built-in dbus listener.

# /home/$USER/.config/upstart/login_test.conf
description "login test"
start on dbus SIGNAL=SessionNew     # Listen for the dbus Signal
exec /bin/date > /tmp/login_test    # Do something

  • Now, whenever you login to a TTY, X session, or SSH session, the job will run.
  • If your job needs to tell the difference between those sessions, you know how to find out using dbus.
  • If *everybody* needs the job, place it in /usr/share/upstart/sessions/ instead of each user's .config/upstart/

What about super-easy logout jobs?

Logout jobs are harder, and generally not recommended. Not super-easy. They are hard because you can't guarantee they will run. Maybe the user will hold down the power button. Or use the "shutdown -h now" command. Or the power supply sent a message that the battery only has 60 seconds of life left. Or the user absolutely cannot miss that bus....

Here's the dbus signal that systemd-logind emits when a TTY, X, or SSH user session ends:

signal sender=:1.21 -> dest=(null destination) serial=286 
  interface=org.freedesktop.Accounts.User; member=Changed

All this tells me is that User1000 now has a different number of sessions running. Maybe it's a login (yes, it emits the same signal upon login). Maybe it's a logout.

Sure, we can do a login-and-logout Upstart job...

# /home/$USER/.config/upstart/login_test.conf
description "login and logout test"
start on dbus SIGNAL=Changed INTERFACE=org.freedesktop.Accounts.User
exec /bin/date > /tmp/login_test

...but then you need logic to figure out who logged in or logged out, and whether it's an event you care about. Certainly doable, but probably not worthwhile for most users.

In other words, if you want to backup-at-logout, you need to structure it as a backup-then-logout sequence. Logout is not an appropriate trigger to start the sequence...from the system's point of view.

But I really want to do a job a logout!

Okay, here's how to do a job when you log out of the GUI environment. Logging out is the trigger. This won't work for SSH or TTY sessions.

The Upstart jobs /etc/init/lightdm.conf and/etc/init/gdm.conf emit a system-level "desktop-shutdown" signal when the X server is stopped. You can use that job as your start criteria.

# /etc/init/logoff_test.conf
description "logout test"
setuid some_username        # Your script probably doesn't need to run as root
start on stopping lightdm   # Run *before* it is stopped
exec /bin/date > /tmp/logoff_test

Friday, January 3, 2014

Searching for the right Upstart signal or job

If you want to use Upstart to start/stop a job on any of the not-obvious triggers (like "startup"), then you need to do some digging to find the right trigger.

initctl show-config

Be careful, there are TWO sets of Upstart jobs: System jobs and user jobs. Use sudo to distinguish between them.

$ sudo initctl show-config dbus   # Use sudo for system-level jobs
  start on local-filesystems
  stop on deconfiguring-networking

$ initctl show-config dbus        # Omit sudo for user-level jobs
  start on starting xsession-init 

Searching for a job or a signal using grep

The initctl show-config command without any job name prints all the jobs. That means you can use grep on the full list. Here is an example of using grep to look for all root jobs that care about the system "startup" signal:

$ sudo initctl show-config | grep -B8 startup
  start on (starting mountall or (runlevel [016] and ((desktop-shutdown or stopped xdm) or stopped uxlaunch)))
  start on mounted MOUNTPOINT=/run
  stop on runlevel [06]
  start on runlevel [2345]
  stop on runlevel [!2345]
  start on (startup and (((graphics-device-added PRIMARY_DEVICE_FOR_DISPLAY=1 or drm-device-added PRIMARY_DEVICE_FOR_DISPLAY=1) or stopped udevtrigger) or container))
  emits virtual-filesystems
  emits local-filesystems
  emits remote-filesystems
  emits all-swaps
  emits filesystem
  emits mounting
  emits mounted
  start on startup
  start on runlevel [2345]
  stop on runlevel [!2345]
  start on mounted MOUNTPOINT=/
  start on mounted MOUNTPOINT=/
  start on (startup and started udev)
  start on runlevel S
  stop on runlevel [!S]
  stop on (started $WAIT_FOR or stopped $WAIT_FOR)
  start on filesystem
  emits recovery
  emits startup
  start on runlevel [2345]
  stop on runlevel [!2345]
  start on socket PROTO=inet PORT=34567 ADDR=
  start on (runlevel [23] and ((not-container or container CONTAINER=lxc) or container CONTAINER=lxc-libvirt))
  stop on runlevel [!23]
  start on ((startup and started udev) and not-container)
  emits not-container
  start on mounted MOUNTPOINT=/run
  start on mounted MOUNTPOINT=/dev
  start on (runlevel [23] and ((not-container or container CONTAINER=lxc) or container CONTAINER=lxc-libvirt))
  stop on runlevel [!23]
  start on ((((startup and filesystem) and started udev) and stopped udevtrigger) and stopped udevmonitor)
  start on runlevel [2345]
  start on block-device-added ID_FS_USAGE=crypto
  start on startup
  emits net-device-up
  emits net-device-down
  emits static-network-up
  start on net-device-added
  stop on net-device-removed INTERFACE=$INTERFACE
  emits plymouth-ready
  start on (startup or started plymouth-splash)
  start on (started plymouth and ((graphics-device-added PRIMARY_DEVICE_FOR_DISPLAY=1 or drm-device-added PRIMARY_DEVICE_FOR_DISPLAY=1) or stopped udev-fallback-graphics))
  start on (started dbus or runlevel [06])
  stop on stopping plymouth
  start on (stopped rc RUNLEVEL=[2345] and ((not-container or container CONTAINER=lxc) or container CONTAINER=lxc-libvirt))
  stop on runlevel [!2345]
  start on (startup and starting udevtrigger)

We found one job that emits startup (friendly-recovery).
We found seven jobs that listen for it: udev-fallback-graphics, mountall, kmod, udevtrigger, hostname, plymouth-ready, and udevmonitor

Searching for a signal using upstart-monitor

The upstart-monitor application is a handy GUI and command-line tool to listen to all the signal chatter in Upstart. The application is provided by the upstart-monitor package in the Ubuntu repositories. A bug in 13.10 prevents it from running on a non-GUI system like Ubuntu Server, but it's also easy to fix the bug yourself...

Here are the signals emitted by Upstart when I switch over to a TTY, login, wait ten seconds, and then logout. This isn't an example of monitoring logins (do that using consolekit or logind) - this is an example of monitoring the Upstart signals emitted by a change in tty2.

$ upstart-monitor --no-gui --destination=system-bus
# Upstart Event Monitor (console mode)
# Connected to D-Bus system bus
# Columns: time, event and environment

2014-01-03 23:23:43.013436 stopping JOB='tty2' INSTANCE='' RESULT='ok'
2014-01-03 23:23:43.020309 starting JOB='tty2' INSTANCE=''
2014-01-03 23:23:43.031193 starting JOB='startpar-bridge' INSTANCE='tty2--started'
2014-01-03 23:23:43.033055 started JOB='startpar-bridge' INSTANCE='tty2--started'
2014-01-03 23:23:43.040671 stopping JOB='startpar-bridge' INSTANCE='tty2--started' RESULT='ok'
2014-01-03 23:23:43.042496 stopped JOB='startpar-bridge' INSTANCE='tty2--started' RESULT='ok'
2014-01-03 23:23:43.044271 started JOB='tty2' INSTANCE=''

You can see the progression of signals: starting, started, stopping, stopped.
You can also see the nesting of jobs. startpar-bridge starts on starting tty2, and runs it's entire task of starting-started-stopping-stopped for tty2 to transition from starting to started.

If you want to trigger a job when tty2 is starting or started, you now know the signals that get emitted. Your job can listen for those signals.

Drawing out relationships using dotfiles

Dot diagram of Upstart user jobs
The initctl2dot application creates dotfile graphics of xdot application. initctl2dot is included with the upstart package, part of all Ubuntu installations (even ubuntu-minimal). xdot is a separate package available in the Ubuntu repositories (Software Center).

As the name implies, initctl2dot's input is initctl's output.You can manually trim an initctl show-show-config output, and input that to initctl2dot if you really want a specific diagram.

You can easily diagram and display the entire system job tree...though it's perhaps less useful than you may expect:

$ initctl2dot --system --outfile /tmp/upstart_root_tree.dot
$ xdot /tmp/upstart_root_tree.dot

You can also diagram the user job tree:

$ initctl2dot --user --outfile /tmp/upstart_user_tree.dot
$ xdot /tmp/upstart_user_tree.dot

Limiting the dotfile size

The initctl2dot manpage includes options for showing/hiding various relationship types (emit, start on, stop on, etc) for clarity.

Another handy option is the --restrict-to-jobs flag, to draw much smaller charts.

For example, let's diagram the system "startup" signal relationships we already discovered using grep:

$ initctl2dot --system --outfile /tmp/upstart_startup_tree.dot \
$ xdot /tmp/upstart_startup_tree.dot 

And there you have it. How to search system jobs and user jobs for useful signals, and how to easily diagram out the relationships among signals and jobs.

Friday, December 27, 2013

Amtrak real-time train data

Recently the US passenger rail operator Amtrak announced a nationwide real-time map of it's services.

That's great...until you try it.

The United States is big, even though the number of trains is fairly small. The number of Google map tiles is big. The polygons that show the routes are enormous. The nationwide train data is big.

That's a lot of data to load --and so slow to display-- if all I really want is realtime data on a single train (is my train late?).
After looking at the GETs emitted by all the (slow, unnecessary) javascript on that site, I figured out how to bypass it and pull just the table of current train status. It's 400kb, but it's a lot less than the over 1MB to go through the website and map....


The only fields required are 'version' and 'key' (which is different from the site cookie):


So we can easily download the current status table using wget:

$ wget -O train_status https://www.googleapis.com/mapsengine/v1/tables/

Since parsing 400kb of JSON using shell script will be awkward and annoying, let's change over to Python3:

import httplib2
url   = \

h = httplib2.Http(None)
resp, content = h.request(url, "GET")

Let's tell Python that the content is a JSON string, and iterate through the list of trains looking for train number 7. That's a handy train because each run takes over 24 hours - it will always have a status.

import json
all_trains = json.loads(content.decode("UTF-8"))
for train in all_trains["features"]:
   if train["properties"]["TrainNum"] == "7":
       do something

After locating the train we care about, here is data about it:

latitude     = train["geometry"]["coordinates"][0]
longitude    = train["geometry"]["coordinates"][1]
speed        = train["properties"]["Velocity"]
report_time  = train["properties"]["LastValTS"]
next_station = train["properties"]["EventCode"]

Let's find a station (FAR) along the route. This is a little tougher, because Python's JSON module reads each Station as a string instead of a dict. We need to identify Station lines, and feed those through the JSON parser (again).

for prop in train["properties"].keys():
          if "Station" in prop:
              sta = json.loads(train["properties"][prop])
              if sta["code"] == "FAR":
                  # Schedule data
                  station_time_zone        = sta['tz']
                  scheduled_arrival_time   = sta['scharr'] # Only if a long stop
                  scheduled_departure_time = sta['schdep'] # All stations

                  # Past stations
                  actual_arrival_time      = sta['postarr']
                  actual_departure_time    = sta['postdep']
                  past_station_status      = sta['postcmnt']

                  # Future stations
                  estimated_arrival_time   = sta['estarr']
                  future_station_status    = sta['estarrcmnt']

The final Python script for Train 7 at FAR is:


import httplib2
import json

url = "https://www.googleapis.com/mapsengine/v1/tables/01382379791355219452-08584582962951999356/features?version=published&key=AIzaSyCVFeFQrtk-ywrUE0pEcvlwgCqS6TJcOW4&jsonpCallback=retrieveTrainsData"
h = httplib2.Http(None)
resp, content = h.request(url, "GET")

all_trains = json.loads(content.decode("UTF-8"))
for train in all_trains["features"]:
   if train["properties"]["TrainNum"] == "7":
      for prop in train["properties"].keys():
          if "Station" in prop:
              sta = json.loads(train["properties"][prop])
              if sta["code"] == "FAR":
                  print('Train number {}'.format(train["properties"]["TrainNum"]))
                  if 'schdep' in sta.keys():
                      print('Scheduled: {}'.format(sta['schdep']))
                  if 'postcmnt' in sta.keys():
                      print("Status: {}".format(sta['postcmnt']))
                  if 'estarrcmnt' in sta.keys():
                      print("Status: {}".format(sta['estarrcmnt']))

And the result looks rather like:

$ python3 Current\ train\ status.py 

Train number 7
Scheduled: 12/26/2013 03:35:00
Status: 1 HR 4 MI LATE

Train number 7
Scheduled: 12/27/2013 03:35:00

Two more items.

First, the Python script can be tweaked to show *all* trains for a station. Just remove the train number filter. To be useful, you might add some time comprehension, since some scheduled times can be a day in the future or past.

Second, the station information table has a similar static page: https://www.googleapis.com/mapsengine/v1/tables/01382379791355219452-17620014524089761219/features?&version=published&maxResults=1000&key=AIzaSyCVFeFQrtk-ywrUE0pEcvlwgCqS6TJcOW4&callback=jQuery19106722136430546803_1388112103417&dataType=jsonp&jsonpCallback=retrieveStationData&contentType=application%2Fjson&_=13881121034

As with the train status JSON application, many of the fields are dummies. This shorter URL works, too: https://www.googleapis.com/mapsengine/v1/tables/01382379791355219452-17620014524089761219/features?&version=published&key=AIzaSyCVFeFQrtk-ywrUE0pEcvlwgCqS6TJcOW4&callback=jQuery19106722136430546803_1388112103417

Sunday, December 22, 2013

Python http.server and upstart-socket-bridge

In a previous post, I showed how to make the Python 3 socketserver.TCPServer class compatible with upstart-socket-bridge by overriding the server_bind() and server_activate() methods.

Python's http.server module builds on socketserver. Let's see if we can similarly make http.server compatible with upstart-socket-bridge.

About http.server

The http.server module is intended to be a simple way to create webservers. Most of the module is devoted to classes that handle incoming and outgoing data. Only one class, HTTPServer, handles the networking stuff.

Example 1

Here is the first example http.server.SimpleHTTPRequestHandler in the Python documentation:

import http.server
import socketserver

PORT = 8000

Handler = http.server.SimpleHTTPRequestHandler

httpd = socketserver.TCPServer(("", PORT), Handler)

print("serving at port", PORT)

When you run this code, and point a web browser to port 8000, the code serves the current working directory, with links to files and subdirectories.

Convert Example 1 to work with upstart-socket-bridge

The example server is only seven lines.

Line 3, the PORT, is no longer needed. Upstart will define the port.

Line 5, the socketserver.TCPServer line, will be the biggest change. We need to define a new class based on TCPServer, and override two methods. The is exactly what we did in the previous post.

Line 6, the print statement, can be deleted. When the job is started by Upstart, there is no display - nowhere to print to. An important side effect of starting the job using Upstart is that the Present Working Directory is / (root), unless you specify otherwise in the /etc/init config file that starts the job.

Because of the change to Line 5, the final result is 6 more lines...not bad, and now it works with upstart-socket-bridge.

import http.server
import socketserver

class MyServer(socketserver.TCPServer):
    def server_bind(self):
        """ Replace the socket FD with the Upstart-provided FD"""
        fd = int(os.environ["UPSTART_FDS"])
        self.socket = socket.fromfd(fd, socket.AF_INET, socket.SOCK_STREAM)

    def server_activate(self):
        pass            # Upstart takes care of listen()

Handler = http.server.SimpleHTTPRequestHandler
server = MyServer(None, Handler)

As always, the script is triggered by an appropriate Upstart job.

Test the script, in this case, by pointing a web browser at the port specified in the Upstart job.

Example 2

This example is also from the Python http.server documentation:

def run(server_class=HTTPServer, handler_class=BaseHTTPRequestHandler):
    server_address = ('', 8000)
    httpd = server_class(server_address, handler_class)

It's not a standalone example, but instead an abstract example of how to use the module in a larger application.

Let's reformat it into a working example:

import http.server
server_address = ('', 8000)
httpd = http.server.HTTPServer(server_address, handler_class)

Now the server runs...sort of. It gives us a 501 error message. Let's add a class to the handler that reads the path and gives a real, valid response. (Source)

import http.server

class MyHandler(http.server.BaseHTTPRequestHandler):
    def do_HEAD(self):
        self.send_header("Content-type", "text/html")
    def do_GET(self):
        self.send_header("Content-type", "text/html")
        content = ["<html><head><title>The Title</title></head>",
                   "<body>This is a test.<br />",
                   "You accessed path: ", self.path, "<br />",

server_address = ('', 8000)
httpd = http.server.HTTPServer(server_address, handler_class)

Okay, now this is a good working example of a working http.server. It really shows how http.server hides all the complexity of networking on one simple .server line and focuses all your effort on content in the handler. It's clear how you would parse the URL input using self.path. It's clear how you can create and send content using self.wfile.

Convert Example #2 to work with upstart-socket-bridge

There are two classes at work: http.server.BaseHTTPRequestHandler handles content doesn't care about the networking. http.server.HTTPServer handles netwokring and doesn't care about content.

Good news: HTTPServer is based on socketserver.TCPServer, which we already know how to patch!

The changes I made:

1) I want it to launch at each connection, exchange data once, and then terminate. Each connection will launch a separate instance. We no longer want the service to serve_forever().

httpd.serve_forever()    # Old
httpd.handle_request()   # New

2) Let's make it compatible with all three inits: sysvinit (deamon), Upstart, and systemd.

import os

if __name__ == "__main__":
    if "UPSTART_FDS" in os.environ.keys() \   # Upstart
    or "LISTEN_FDS" in os.environ.keys():     # systemd
        httpd = MyServer(None, MyHandler)     # Need a custom Server class
        httpd.handle_request()                # Run once
    else:                                     # sysvinit
        server_address = ('', 8000)
        httpd = http.server.HTTPServer(server_address, MyHandler)
        httpd.serve_forever()                 # Run forever

3) Add a custom handler that overrides server_bind() and server_activate() in both the socketserver and http.server.HTTPServer modules. This is the secret sauce that makes Upstart compatibility work:

import http.server, socketserver, socket, os

class MyServer(http.server.HTTPServer):
    def server_bind(self):
        # Get the File Descriptor from an Upstart-created environment variable
        if "UPSTART_FDS" in os.environ:
            fd = int(os.environ["UPSTART_FDS"])      # Upstart
            fd = int(os.environ["LISTEN_FDS"])       # Systemd
        self.socket = socket.fromfd(fd, socket.AF_INET, socket.SOCK_STREAM)

        # Only http.server.CGIHTTPRequestHandler uses these
        host, port = self.socket.getsockname()[:2]
        self.server_name = socket.getfqdn(host)
        self.server_port = port

4) Add the Upstart job to monitor the port:

# /etc/init/socket-test.py
description "upstart-socket-bridge test"
start on socket PROTO=inet PORT=8000 ADDR=
setuid your_username               # Does not need to run as root
exec /usr/bin/python3 /tmp/socket-server.py

And the final product looks like:


import http.server, socketserver, socket, os

class MyServer(http.server.HTTPServer):
    This class overrides two methods in the socketserver module:
        socketserver __init__ uses both server_bind() and server_activate()
    This class overrides one method in the http.server.HTTPServer class:
        HTTPServer uses both socketserver __init__ and it's own custom
    These overrides makes it compatible with Upstart and systemd socket-bridges
    Warning: It won't bind() or listen() to a socket anymore
    def server_bind(self):
        Get the File Descriptor from an Upstart-created environment variable
        instead of binding or listening to a socket.
        if "UPSTART_FDS" in os.environ:
            fd = int(os.environ["UPSTART_FDS"])
            fd = int(os.environ["LISTEN_FDS"])
        self.socket = socket.fromfd(fd, socket.AF_INET, socket.SOCK_STREAM)

        # From http.server:
        # http.server.CGIHTTPRequestHandler uses these.
        # Other handler classes don't use these.
        host, port = self.socket.getsockname()[:2]
        self.server_name = socket.getfqdn(host)
        self.server_port = port

    def server_activate(self):
        This socketserver method sends listen(), so it needs to be overridden

class MyHandler(http.server.BaseHTTPRequestHandler):
    A very simple custom handler.
    It merely reads the URL and responds with the path.
    This shows how you read data from a GET, and send a response. 
    def do_HEAD(self):
        self.send_header("Content-type", "text/html")
    def do_GET(self):
        self.send_header("Content-type", "text/html")
        content = ["The Title",
                   "This is a test.
                   "You accessed path: ", self.path, "

if __name__ == "__main__":
    if "UPSTART_FDS" in os.environ.keys() \     # Upstart
    or "LISTEN_FDS" in os.environ.keys():       # systemd
        httpd = MyServer(None, MyHandler)       # Use fd to get connection
        httpd.handle_request()                  # Handle once, then terminate
        server_address = ('', 8000)             # sysvinit, classic bind()
        httpd = http.server.HTTPServer(server_address, MyHandler)

Test the service:

  • The Python 3 script and Upstart job both use Port 8000.
  • Save the Python 3 script. Make it executable.
  • Save the Upstart job. Change the Upstart job to Port 8001. Make sure it points to the Python 3 script.

Webserver daemon using sysvinit - run forever:
  • Run the Python 3 script.
  • Start a web browser, and point it to http://localhost:8000/test/string
  • The browser should show a response
  • Kill the Python 3 script

Web service using upstart - run once:

  • Don't start the Python 3 script. Upstart will do it for you.
  • Start a web browser, and point it to http://localhost:8001/test/string
  • The browser should show a response.