[nolan@nprescott.com] $>  cat blog archive feed

Extricating Calendaring From Outlook

2017-03-12

Lately I've been working on simplifying some of the more annoying tools for mundane tasks at work, usually administrative. The most annoying to extricate myself from, by far, has been Outlook's web-mail and to a lesser extent Thunderbird. Here I detail my current solution to calendaring.

The Goal

My goal in approaching this problem was quite simple, because of how limited my use was to begin with - a quick look reveals I haven't needed to setup a meeting myself in at least 6 months. With that in mind, I set out to get a read-only view of my calendar within the tools I am currently using most, namely org-mode1.

A REST API

There purportedly exists a REST API for exactly this kind of thing; Problem being, I was entirely flummoxed by the requirements around authentication and authorization. The requirements for OAuth are onerous enough to seemingly preclude the kind of scripted access I'm looking to implement, with the need to create and register an application to generate an app secret, to request an access token. All for a calendar I've published openly!

The Published .ics File

Option two, since the calendar is available openly is to parse the .ics file. This is actually nearly workable using ical2org.awk - the only downside is actually in parsing recurring meeting successfully. While workable in AWK, I took a look at re-implementing some of the functionality in Python using the icalendar package:

from icalendar import Calendar

with open('calendar.ics') as f:
    cal = Calendar.from_ical(f.read())

events = [(component.get('summary'),
           component.get('dtstart'),
           component.get('dtend'))
          for component in c.walk() if component.name == 'VEVENT']

Which turned out to be a breeze to use, but necessitates installing the package on my work machine in the system environment, or fiddling with packaging. I thought there must be a middle ground available through the published calendar and, eventually, found a way.

Super-Hack to the Rescue

Viewing the publicly available calendar through the HTML/web interface is a painful thing, it can take over 50 seconds to load a single month-view of events for one person (me). It is actually confusing how poor the interface works, but it reaffirms my desire to distance myself from some of these irritations. Using Firefox's developer tools it's easy to see the calendar view is a thin HTML page loaded with JavaScript, used to parse well-formatted data received through an XHR. A POST is made to a URL like the following:

https://outlook.office365.com/owa/calendar/{public calendar id}/service.svc?action=FindItem&ID=-1&AC=1

With a JSON request body such as:

    {
        "__type": "FindItemJsonRequest:#Exchange",
        "Header": {
            "__type": "JsonRequestHeaders:#Exchange",
            "RequestServerVersion": "Exchange2013",
            "TimeZoneContext": {
                "__type": "TimeZoneContext:#Exchange",
                "TimeZoneDefinition": {
                    "__type": "TimeZoneDefinitionType:#Exchange",
                    "Id": "Eastern Standard Time"
                }}
        },
        "Body": {
            "__type": "FindItemRequest:#Exchange",
            "ItemShape": {
                "__type": "ItemResponseShape:#Exchange",
                "BaseShape": "IdOnly",
                "AdditionalProperties": [
                    {
                        "__type": "PropertyUri:#Exchange",
                        "FieldURI": "Start"
                    },
                    {
                        "__type": "PropertyUri:#Exchange",
                        "FieldURI": "End"
                    },
                    {
                        "__type": "PropertyUri:#Exchange",
                        "FieldURI": "Subject"
                    }]
            },
            "ParentFolderIds": [
                {
                    "__type": "FolderId:#Exchange",
                    "Id": "...",
                    "ChangeKey": "..."
                }],
            "Traversal": "Shallow",
            "Paging": {
                "__type": "CalendarPageView:#Exchange",
                "StartDate": "2017-02-26T00:00:00.001",
                "EndDate": "2017-04-02T00:00:00.000"
           ...

It turns out there are several interesting things about this approach. The first thing to notice is how the AdditionalProperties array seems like a kind of GraphQL field that responds with only the requested data, I've omitted any information about "all-day events" or "charms" because frankly I don't care. The second thing to note is the StartDate and EndDate are configurable to request a specific range.

Armed with a unique, publicly available calendar ID, the Action:FindItem header, and the appropriate POST body, the resulting data looks like:

{ ...
"Body": {
  "ResponseMessages": {
    "Items": [
      {
        "__type": "FindItemResponseMessage:#Exchange",
        "ResponseCode": "NoError",
        "ResponseClass": "Success",
        "HighlightTerms": null,
        "RootFolder": {
          "IncludesLastItemInRange": true,
          "TotalItemsInView": 44,
          "Groups": null,
          "Items": [
            {
              "__type": "CalendarItem:#Exchange",
              "ItemId": {
                "ChangeKey": "...",
                "Id": "..."
              },
              "ParentFolderId": {
                "Id": "...",
                "ChangeKey": "..."
              },
              "Subject": "Standup",
              "Sensitivity": "Normal",
              "Start": "2017-02-27T11:30:00-05:00",
              "End": "2017-02-27T11:45:00-05:00",
              "IsAllDayEvent": null,
              "FreeBusyType": "Busy",
              "CalendarItemType": "Exception",
              "Location": {
                "DisplayName": "Room A",
                ...
                }
              },
            ...
]}}]}}}

Which has a nice benefit over the ICS file, which is that each event is "de-normalized" in the sense that there is no need to interpret the semantics of recurring events. Each item in the Items array is a complete listing of the event. Though not particularly relevant to my use case, the above response body lacks one feature compared to the ICS file, which is a summary of the event, from the invitation e-mail.

A Simple Python Wrapper

Because I eschewed the icalendar package as being too much overhead to maintain, I thought it only fitting to use only the standard library to fetch and parse the calendar data.

And Org Agenda-izing It...

The final step to re-implement the most pressing functionality of ical2org.awk is a simple text re-formatting into a stand-alone org-mode file. That much means a leading * for each event, with a lightly formatted timestamp. I accomplished the whole wrapper and formatter with the following script:

import re
import json
from datetime import datetime, timedelta
from urllib.error import HTTPError
from urllib.request import Request, urlopen

url = 'https://outlook.office365.com/owa/calendar/{calendar id}/service.svc?action=FindItem&ID=-1&AC=1'

headers = {'Content-Type': 'application/json',
           'Action': 'FindItem'}

def format_post_data(json_file='request.json'):
    with open(json_file) as f:
        request_json = json.load(f)

    today = datetime.now()
    date_range = timedelta(days=7)
    format = lambda t: datetime.strftime(t, "%Y-%m-%dT%H:%M:%S.%fZ")
    request_json['Body']['Paging']['StartDate'] = format(today - date_range)
    request_json['Body']['Paging']['EndDate'] = format(today + date_range)

    return json.dumps(request_json)

def fetch_calendar(url, headers, request_data):
    '''
    request_data - bytes
    headers - dict
    '''
    try:
        req = Request(url, data=request_data, headers=headers)
        response = urlopen(req).read()
    except HTTPError as e:
        # gross, but a quick way to debug API errors, dump (potentially HTML)
        # responses into the traceback
        e.msg += e.read().decode()
        raise(e)

    raw_data = json.loads(response)

    events = ((_['Subject'], _['Location']['DisplayName'], _['Start'], _['End'])
              for _ in
              raw_data['Body']['ResponseMessages']['Items'][0]['RootFolder']['Items'])

    return events

def format_entry(subject, location, start, end):
    tidyup = lambda ts: (re.sub(r'(:\d{2}-\d{2}:\d{2}$)', '', ts)
                         .replace('T', ' '))
    start = tidyup(start)
    end = tidyup(end)
    return ('* {}\n'
            '  <{}>--<{}>\n'
            '  {}\n').format(subject, start, end, location)

def format_agenda(events):
    return('\n'.join(
        (format_entry(subject, location, start, end)
         for (subject, location, start, end) in events)))

request_data = format_post_data()
events = fetch_calendar(url, headers, request_data.encode())
print(format_agenda(events))

End Result

The end result is almost as uninteresting as you might imagine, I write it out to an agenda file that I don't typically access directly, instead invoking org-mode's "Week-agenda" to keep track of meetings to which I've been invited. This let's me track my own appointments locally, as I already do and removes the need for Thunderbird or Outlook to track the rest.

    Week-agenda (W10):
    Monday      6 March 2017 W10
    Tuesday     7 March 2017
      agenda:     11:30...... Daily standup
    Wednesday   8 March 2017
      agenda:     11:30...... Daily standup
      agenda:     13:00...... 1:1
    Thursday    9 March 2017
      agenda:     11:00...... Open Enrollment Meeting
      agenda:     11:30...... Daily standup
      agenda:     12:00...... Tech Talk
      agenda:     14:00...... Open Enrollment Meeting
    Friday     10 March 2017
    Saturday   11 March 2017
    Sunday     12 March 2017

Addendum

Of course, there are downsides to relying on an internal API such as this - as much was apparent the first day I tried actually using the above script. It seems the API is comically unreliable, throwing 500s on the majority of the requests made. I've no idea how Outlook manages to function when things are so unreliable, but I've got to wonder if it's not what contributes to the glacial load times. It looks like now might be as good a time as any to look into alternative approaches2.


  1. It occurred to me, after doing the above work that I might instead extend Excorporate to get an entire week's calendar and save it locally. An idea for another time perhaps.
  2. But not the REST API, there is nothing appealing about the thought of using that particular interface.
[nolan@nprescott.com] $> █