Friday, August 7, 2009

Simple webhooks with Clojure and Ring

When your data moves into cloud applications and collaboration is the rule, web hooks provide a way to extend what your cloud provider gives you.

Webhooks have been touted as the basis for what Anil Dash calls the "Pushbutton Web" which enables large-scale real-time collaboration between applications and humans. The concept behind webhooks is simple (as explained in detail by Jeff Lindsay in this video) and boils down to: it's easy to build a lightweight HTTP server nowadays, so why not let cloud-based applications interact with simple RESTful HTTP requests containing a payload of information of interest.

In this article, I will show a simple web hook processor written in Clojure using the Mark McGranaghan's Ring, a lightweight HTTP server framework based on the ideas in Rack. This web hook was written as part of the system for maintaining the clojure-contrib documentation
website.

Clojure-contrib is a collaborative project in which contributors check in new code on their own schedule. I built the "autodoc robot" to build a corresponding documentation site that reflects the latest state of clojure-contrib. The result of that process is seen here.

Using Webhooks to react to Commits on GitHub

Github provides commit hooks which allow me to be notified via http when someone updates clojure-contrib. So I wrote a simple processor in Clojure to receive these notifications and to update the documentation web site in response.

The issues:
  1. Not all calls to our webhook are legitimate. Whenever you open an HTTP port, bad guys will come sniffing around. We need to filter those out.
  2. In some cases, we want to ignore even legitimate requests. For example, the documentation for clojure-contrib is in the same project (on a different branch) as the source code. If we built documentation in response to all updates, we'd find ourselves in an infinite loop.
  3. Processing must be serialized. It take about 3 minutes to build the documentation on my server. In that time, more updates can come in. If we processed them immediately, the two processes would stomp on each other since they work on the same file spaces.
To address these issues (especially issue 3), I used an architecture that separates the HTTP request/response handling from the actual hook processing. In between, there is a queue that enforces single threading:



We can combine ring with this queuing model using the fill-queue function from clojure.contrib.seq-utils. fill-queue allows us to take this kind of multi-threaded data passing behavior and recast it as operations on a clojure seq, greatly simplifying the look of our code.

The main loop of our application looks like this:

(defn hook-server
"Build a simple webhook server on the specified port. Invokes ring to fill a blocking queue,
whose elements are processed by handle-payload."
[port]
(doseq [payload (fill-queue (fn [fill]
(ring.jetty/run {:port port} (partial app fill)))
:queue-size 10)]
(handle-payload payload)))

So the main loop of our program is simply a doseq over the sequence generated by incoming requests. fill-queue will create a new thread on which to run the producer function. This is important to prevent the request handler from blocking while we are doing processing.

Unlike a normal doseq, this doseq will block when the jetty server is waiting for a new request.

The first argument to the fill-queue function is the function that will produce the data. It, in turn, is supplied with another function to call when it has data (the argument fill to the lambda). We'll discuss that more below.

As requests come in, they are passed to the handle-payload function for processing.

Handling incoming requests


Ring wants to pass all requests to an app function that handles the request and returns the appropriate HTTP response. All the data is formatted as Clojure maps and vectors at this point.
(defn app
"The function invoked by ring to process a single request, req. It does a check to make
sure that it's really a webhook request (post to the right address) and, if so, calls fill
with the parsed javascript parameters (this will queue up the request for later processing.
Then it returns the appropriate status and header info to be sent back to the client."
[fill req]
(print-date)
(pprint req myout)
(cl-format myout "~%")
(if (and (= (:scheme req) :http),
(= (:request-method req) :post),
(= (:query-string req) nil),
(= (:content-type req) "application/x-www-form-urlencoded"),
(= (:uri req) "/github-post"))
;; TODO: respond correctly to the client when an exception is thrown
(do (fill (json/decode-from-str (:payload (parse-params (slurp* (:body req)) #"&"))))
{:status 200
:headers {"Content-Type" "text/html"}})
{:status 404
:headers {"Content-Type" "text/html"}}))
Here we see that after some debug logging, we simply check that various fields of the request have the "correct" type to be a real request for our webhook. In practice this (especially the "secret" URI) filters out all the attack attempts.

A GitHub webhook call is sent as an HTTP POST message with a parameter called "payload" which contains a JSON-encoded description of the update that occurred (the project, branch, files, etc.).

Ring passes the body of the request wrapped in an InputStream. We process that with slurp* from clojure.contrib.duck-streams to get all the data, then parse the parameters to get the payload parameter. Finally, we use Dan Larkin's clojure-json JSON decoder to decode the JSON into Clojure data structures.

Ring lacks much of what would be standard in a web framework designed for building full web sites. Mostly, this is a good thing for applications like this one. One thing that I need here that Ring doesn't have is the ability to parse out the POST parameters. For this, I stole a parse-params function from the Compojure framework.

Once I have the payload, I pass it to the fill function I was given by fill-queue. Note how I can use partial to slide that parameter through Ring and into the app function without Ring having to know about it at all. This can be a powerful technique for easily combining unrelated libraries without doing handstands.

Processing the payloads

Incoming payloads are processed sequentially as they get to the head of the queue created by fill-queue. The processing function, handle-payload, is quite simple:

(defn handle-payload
"Called when a request is dequeued with the parsed json payload. Sees if the
request matches anything in the action-table and, if so, executes the associated shell
command."
[payload]
(pprint payload myout)
(when-let [params (match-table payload)]
(cl-format myout "~a~%" (apply sh (concat (:cmd params) [:dir (:dir params)])))))
We look at the payload using match-table and, if match-table returns information, we use that information to execute a shell command. In the case of clojure-contrib, this command runs the documentation builder that will launch the ant job that builds the documentation and pushes it back to GitHub. The code that does this is online at contrib-autodoc.

The model for matching the payload structure is very simple:

(def action-table
[[[:repository :url] "http://github.com/richhickey/clojure-contrib"
[:ref] "refs/heads/master"
{:cmd ["ant"] :dir "/home/tom/src/clj/contrib-autodoc"}]
[[:repository :url] "http://github.com/tomfaulhaber/hook-test"
[:ref] "refs/heads/master"
{:cmd ["echo" "got here"] :dir "/home/tom/src/clj/contrib-autodoc"}]])


(defn match-elem
"Determine whether a given request, m, matches the action table element, elem."
[m elem]
(loop [elem elem]
(let [ks (first elem)
rem (next elem)]
(if (nil? rem)
ks
(when (= (apply get-in m ks) (first rem))
(recur (next rem)))))))

(defn match-table
"Match a request, m, against the action-table"
[m]
(some #(match-elem m %) action-table))

Each payload is parsed from JSON into hierarchical maps. The matcher takes a sequence in which each element is a pair of a vector of keys and value. The vector is used as hierchical key into the map using get-in. This key is used on the payload structure and compared to the value supplied. If the retrieved value equals the value provided, then that element matches. If all the elements of the sequence match, the the pattern matches. This is a simplistic type of matching, but it is sufficient for our purposes.

Conclusion

Webhooks provide a powerful way to interconnect and extend cloud-based applications. Building webhook-based services in Clojure can be done with a simple and elegant composition of work that already exists.

One important thing to remember when using webhooks (from any language) is that the webhook mechanism provides no real security. Therefore the webhook request should be used as a hint about what work should be done rather than a reliable source of information. Hopefully, future work on Web infrastructure will address this shortcoming in an otherwise powerful model.

The source code presented here lives on GitHub at github-hook

Correction: Ring is based on Rack, not Sinatra. I have corrected the text above.