That's great...until you try it.
The United States is big, even though the number of trains is fairly small. The number of Google map tiles is big. The polygons that show the routes are enormous. The nationwide train data is big.
That's a lot of data to load --and so slow to display-- if all I really want is realtime data on a single train (is my train late?).
After looking at the GETs emitted by all the (slow, unnecessary) javascript on that site, I figured out how to bypass it and pull just the table of current train status. It's 400kb, but it's a lot less than the over 1MB to go through the website and map....
https://www.googleapis.com/mapsengine/v1/tables/01382379791355219452-08584582962951999356/ features?version=published&key=AIzaSyCVFeFQrtk-ywrUE0pEcvlwgCqS6TJcOW4&maxResults=250& callback=jQuery19106722136430546803_1388112103417&dataType=jsonp& jsonpCallback=retrieveTrainsData&contentType=application%2Fjson&_=1388112103419
The only fields required are 'version' and 'key' (which is different from the site cookie):
https://www.googleapis.com/mapsengine/v1/tables/01382379791355219452-08584582962951999356/ features?version=published&key=AIzaSyCVFeFQrtk-ywrUE0pEcvlwgCqS6TJcOW4
So we can easily download the current status table using wget:
$ wget -O train_status https://www.googleapis.com/mapsengine/v1/tables/ 01382379791355219452-08584582962951999356/features?version=published\& key=AIzaSyCVFeFQrtk-ywrUE0pEcvlwgCqS6TJcOW4\&jsonpCallback=retrieveTrainsData
Since parsing 400kb of JSON using shell script will be awkward and annoying, let's change over to Python3:
#/usr/bin/python3 import httplib2 url = \ """https://www.googleapis.com/mapsengine/v1/table /01382379791355219452-08584582962951999356/features?version=published& key=AIzaSyCVFeFQrtk-ywrUE0pEcvlwgCqS6TJcOW4&jsonpCallback=retrieveTrainsData""" h = httplib2.Http(None) resp, content = h.request(url, "GET")
Let's tell Python that the content is a JSON string, and iterate through the list of trains looking for train number 7. That's a handy train because each run takes over 24 hours - it will always have a status.
import json all_trains = json.loads(content.decode("UTF-8")) for train in all_trains["features"]: if train["properties"]["TrainNum"] == "7": do something
After locating the train we care about, here is data about it:
latitude = train["geometry"]["coordinates"][0] longitude = train["geometry"]["coordinates"][1] speed = train["properties"]["Velocity"] report_time = train["properties"]["LastValTS"] next_station = train["properties"]["EventCode"]
Let's find a station (FAR) along the route. This is a little tougher, because Python's JSON module reads each Station as a string instead of a dict. We need to identify Station lines, and feed those through the JSON parser (again).
for prop in train["properties"].keys(): if "Station" in prop: sta = json.loads(train["properties"][prop]) if sta["code"] == "FAR": # Schedule data station_time_zone = sta['tz'] scheduled_arrival_time = sta['scharr'] # Only if a long stop scheduled_departure_time = sta['schdep'] # All stations # Past stations actual_arrival_time = sta['postarr'] actual_departure_time = sta['postdep'] past_station_status = sta['postcmnt'] # Future stations estimated_arrival_time = sta['estarr'] future_station_status = sta['estarrcmnt']
The final Python script for Train 7 at FAR is:
#/usr/bin/python3 import httplib2 import json url = "https://www.googleapis.com/mapsengine/v1/tables/01382379791355219452-08584582962951999356/features?version=published&key=AIzaSyCVFeFQrtk-ywrUE0pEcvlwgCqS6TJcOW4&jsonpCallback=retrieveTrainsData" h = httplib2.Http(None) resp, content = h.request(url, "GET") all_trains = json.loads(content.decode("UTF-8")) for train in all_trains["features"]: if train["properties"]["TrainNum"] == "7": for prop in train["properties"].keys(): if "Station" in prop: sta = json.loads(train["properties"][prop]) if sta["code"] == "FAR": print('Train number {}'.format(train["properties"]["TrainNum"])) if 'schdep' in sta.keys(): print('Scheduled: {}'.format(sta['schdep'])) if 'postcmnt' in sta.keys(): print("Status: {}".format(sta['postcmnt'])) if 'estarrcmnt' in sta.keys(): print("Status: {}".format(sta['estarrcmnt']))
And the result looks rather like:
$ python3 Current\ train\ status.py Train number 7 Scheduled: 12/26/2013 03:35:00 Status: 1 HR 4 MI LATE Train number 7 Scheduled: 12/27/2013 03:35:00
Two more items.
First, the Python script can be tweaked to show *all* trains for a station. Just remove the train number filter. To be useful, you might add some time comprehension, since some scheduled times can be a day in the future or past.
Second, the station information table has a similar static page: https://www.googleapis.com/mapsengine/v1/tables/01382379791355219452-17620014524089761219/features?&version=published&maxResults=1000&key=AIzaSyCVFeFQrtk-ywrUE0pEcvlwgCqS6TJcOW4&callback=jQuery19106722136430546803_1388112103417&dataType=jsonp&jsonpCallback=retrieveStationData&contentType=application%2Fjson&_=13881121034
As with the train status JSON application, many of the fields are dummies. This shorter URL works, too: https://www.googleapis.com/mapsengine/v1/tables/01382379791355219452-17620014524089761219/features?&version=published&key=AIzaSyCVFeFQrtk-ywrUE0pEcvlwgCqS6TJcOW4&callback=jQuery19106722136430546803_1388112103417
Thanks, this is awesome.
ReplyDeleteI did a GET on the URL you mention and got back a "login required" message;
curl 'https://www.googleapis.com/mapsengine/v1/tables/
01382379791355219452-08584582962951999356/features?version=published&
key=AIzaSyCVFeFQrtk-ywrUE0pEcvlwgCqS6TJcOW4&jsonpCallback=retrieveTrainsData'
{
"error": {
"errors": [
{
"domain": "global",
"reason": "required",
"message": "Login Required",
"locationType": "header",
"location": "Authorization"
}
],
"code": 401,
"message": "Login Required"
}
}
Well, I can't seem to duplicate your issue. Both curl and wget return lots of great data.
ReplyDeleteHowever, I *did* try the Amtrak website in a browser first to see if perhaps the URL had changed (it hadn't).
It's possible that Amtrak is tracking IPs and sessions, and blocking requests from IPs that lack an associated session.
Do play around with it a bit and let us know what you figure out!