ScalaSyd: Episode 9

We had a great meeting at ScalaSyd last Wednesday night, probably one of the best I’ve been to. These are my notes from the evening, which are in no way comprehensive – they are just the points I found most interesting.

Jed (@jedws) warned us at the start that there were two “pointy” talks with some soft stuff in the middle, but I found the two pointy talkers did an excellent job of conveying their pointy subjects to neophytes. Not an easy task, so well done, guys.

One-hole contexts or: how I learned to stop worrying and love the Zipper

Declan Conlon (@possiblywrong) kicked off by introducing us to Zipper data structures.

Early in his talk he said “ignore math at your peril”, suggesting that software engineering is about to be transformed by a wave of mainstream-permeating maths concepts.

A simple diagram showing a comparison between the internals of a List, an Iterator and a Zipper

My attempt at drawing a Zipper, in contrast to a List and an Iterator, using a very fat finger on an iPad at the back of a bus

Zippers are often described as “a data structure with a hole”.

They solve the problem of updating immutable data structures (e.g. lists and trees) efficiently. (Wikipedia seems to suggest that the efficiency is achieved when the updates are close to each other within the structure of the underlying data type)

He further described zippers as “a reification of a walk” and (thankfully for me) defined reification as pulling an idea or a process out of function/procedure land and into a type. So a zipper is a type representing your walk along/around a data structure. (I think a slightly more accurate description would be to say it represents one waypoint in your walk through a data structure.) It’s immutable, so as you take steps through the data structure you’ll get a new zipper instance with each step that represents the new waypoint.

The key benefit of the zipper is the ability, once you’ve navigated to a certain point, to update the value at the focus in O(1) time – it just efficiently produces a new zipper with a different focus. You can then “zip” the zipper up to produce a new, updated data structure.

The three parts of data that form a zipper are: path taken, focus, and remaining data. The path is a list of the steps taken to navigate to the current focus. In a zipper of a list, this may be as simple as a list of the items already iterated over. In the case of something more complex like a tree it will need to carry more information than just the items, such as which direction (left or right) was taken down the tree to arrive at this focus. You need this information in order to zip the zipper back up after the updates.

Declan then started talking about recent mathematical research on zippers and how they are essentially “product types” and as such you can take the derivative of them which gives you, in the case of a BTree, just two trees (the path and the remaining data) which leads to the name “data structure with a hole” and is somehow interesting if you have a better grasp of this area of maths than I do, but I would be lying to say I was following along at this point!

Some discussion ensued around how useful Product Types are. (But I don’t know what they are, so that bit was totally lost on me. That’s alright, though; part of the reason I go to tech Meetups is to find out about all the stuff that I should probably know about but don’t.)

I asked for a practical example of a problem that is well-solved by zippers and someone said if you represented some JSON as an immutable tree then a zipper could make modifying individual values within the data very efficient.

Some discussion on lenses followed. I think the consensus was that zippers are similar to lenses, but zippers are more general.

Additional resources:
Declan’s slides and code

Deployment of play/scala/akka apps or: what we have recently been doing at cloudbees to make it funnerer

Michael Neal (@michaelneale) gave a very short talk about some of the stuff they’re doing at CloudBees.

I didn’t note anything down from his talk (it was mostly screenshots and I think I’ve heard him speak about it before) but I finally got the name of this funky HTML presentation app that everyone is using: RevealJS

After Michael’s talk, we ended up discussing the slowness and memory demands of the Scalac compiler when Eric Torreborre (@etorreborre) said he tried CloudBees for building his Specs2 library and it failed to compile due to lack of memory.

James Roper from TypeSafe shared with us that there are people at TypeSafe who are working exclusively on improving the compiler performance.

Somebody said they replaced their HD with a large SSD and this sped up their compile time by more than 50%. (Perhaps TypeSafe should just buy every Scala developer a SSD? It could be cheaper than employing genius engineers to fix the compiler!?)

Jed said that his team had fiercely modularised their code base and that had sped up compilation a lot.

Reactive streams – Play’s Iteratee API

James Roper (@jroper), who works for TypeSafe on the Play framework, gave a talk about Iteratees, stating that he’s only just learnt about them himself and has only a year of Scala experience, so he hoped the presentation would translate to most people regardless of their functional programming exposure. (And it did!)

He said Iteratees is a concept with many implementations. (I’m going to go out on a limb and call it a pattern, even though its not in the GOF patterns book!)

He said it’s a functional solution to handling streams reactively. (By functional I assume he meant purely functional, i.e. without mutating state, not just the idea of passing functions around as values.)

He defined reactive as being the pattern of reacting to an event as it arrives, rather than asking for an event. (I consider this a micro-application of Inversion of Control, e.g. you don’t call the stream, the stream calls you.)

He went on to say that the reactivity in Iteratees flows in both directions: the Iteratee reacts to events from the stream, but the stream also reacts to the result of event handling returned from the Iteratee as it processes the stream’s events. (I think the main benefit of this was that the Iteratee has the ability to say “don’t call me anymore!” to the input driver.)

There was a lot of code shown and most of it was clear and easy to follow, but I was too busy parsing it in my head to write notes about it. I think you can see all the code in his original blog post, referenced below.

A bunch of tangled cables patching ports on a sound desk

Iteratees seem like a good idea, but this is my current mental picture of how Streams, Iteratees and Enumeratees are combined

James introduced Enumeratees at some point and I didn’t quite catch what their role is or how they work.

He showed an example of building up a CSV parser using simple Iteratees for things like separator, unquoted value, quoted value, etc. and then basically concatenating them using Scala’s for comprehension to make a full file parser. I got the impression from this example that the role of the Enumeratee was as somewhat of a collator. (I found this blog post about Iteratees later that described Enumeratees as Producers and Iteratees as Consumers, but I’m pretty sure from James’ talk that the stream had the “producer” role and the Enumeratee was doing something at the other end.)

Asked for an example of how Iteratees are used in Play, James said if you wanted to do your own parsing of multipart request data you would be required to provide an Iteratee, and also mentioned something about calling external services.

During discussion, someone asked what the benefits of Iteratees were and the answer provided was that, as well as the possibility of asynchronous execution, they’re good for when you need to control the flow of the stream, e.g. you want to stop or maybe even pause halfway.

Several people shared the view that Iteratees are known to be an excellent solution for controlling audio streaming and processing.

There was some discussion about how Play’s Iteratees and Futures play together, especially with regard to how errors bubble up, but I didn’t catch the upshot.

James’ example ended with the code below that demonstrates Iteratee and Enumeratee operators provided by Scalaz, including “the fish operator”. In my opinion, this kind of code conveys no meaning about what’s actually occurring and makes a convincing argument for being extremely conservative in operator definition.

val processCsvFile = csv ><> toInts ><> toThreeTuple &>> sumThreeTuple()

Additional resources:
Iteratees for Imperative Programmers (James’ blog post that the talk was based on)
Understanding Play2 Iteratees for Normal Humans
Some in-depth research into the benefits and use of Iteratees

Other Things I Learnt (unrelated to the main topics):

Declan used what looked like an unapply call in the LHS of an assignment of a case class, something I’ve never seen before except with tuples. So, I knew this was possible:

scala> val tuple = ("one", "two", "three")
tuple: (java.lang.String, java.lang.String, java.lang.String) = (one,two,three)

scala> val (a, b, c) = tuple
a: java.lang.String = one
b: java.lang.String = two
c: java.lang.String = three

but I never realised you could do this with anything that defines an Extractor (i.e. an unapply() method) like this:

scala> val some = Some("Graham")
some: Some[[java.lang.String]] = Some(Graham)

scala> val Some(name) = some
name: java.lang.String = Graham

and I saw you can even use wildcards if you only care about some of it!

scala> val names = List("Graham", "Lea")
names: List[[java.lang.String]] = List(Graham, Lea)

scala> val List(_, surname) = names
surname: java.lang.String = Lea

I also hadn’t seen this syntax below that James used – case variableName @ … in a match expression – which creates a name for the whole matched value that has the (more-specific) type of the match case rather than just that of the input value (note that I can’t pass ‘fruitOption’ to printASome(), but I can pass ‘some’):

scala> def printASome[A](some: Some[A]): Unit = println(some)
printASome: [A](some: Some[A])Unit

scala> val fruitOption: Option[String] = Some("banana")
fruitOption: Option[String] = Some(banana)

scala> fruitOption match {
     | case some @ Some(word) => println(word); printASome(some);
     | case _ =>
     | }
banana
Some(banana)

The Future/Promise implementation coming in Scala 2.10 has Success and Failure result possibilities (in contrast to Java’s Futures, which either succeed or throw an ExecutionException.)

Things I Need to Research

What’s the difference between a Function and a PartialFunction?

What are Product Types?

What are Lenses? (A type of anamorphism)

Image Credits:
Galapogas Tortoise at Chester Zoo by Adam Foster
Tangle of Cables by Charles McCartney

3 thoughts on “ScalaSyd Wrap-Up: November 2012”

Charles O'Farrell on November 18, 2012 at 9:37 am said:

Hi Graham,

Great write up! Some extra links for you:

Jed mentioned Iteratees and audio; he was referring to this great article:

http://blog.greweb.fr/2012/08/zound-a-playframework-2-audio-streaming-experiment-using-iteratees/

The article from Josh is also very good. I think you’ll find that he called “Enumerators” Producers and “Enumeratees” StreamConversion, which is also how James was using them. It is unfortunate that they sound/look almost the same.

If you’re interested in lenses I can highly recommend this talk from Edward Kmett. I’m vaguely tempted to cover it at ScalaSyd at some point (if we get desperate).

http://www.youtube.com/watch?v=efv0SQNde5Q

Charles

Reply ↓
- Graham Lea on November 18, 2012 at 12:43 pm said:
  
  Thanks for the extra links, Charles. Definitely worth a look.
  G.
  
  Reply ↓
Jed Wesley-Smith on November 19, 2012 at 8:38 am said:

Great post mate, thanks for taking the time to write this up.

Following up on product types, they simply refer to the fact that any type that is a combination of two types A and B (say Tuple2 for instance) encodes that fact that the possible set of values that an instance may have is the cartesian product of all values of A times all values of B, thus it is a product type! An Either of A, B on the other hand is a sum type – it can be all values of A plus all values of B.

This algebraic understanding of data-types makes for interesting ways to generalise thinking about your types and what they represent.

Here’s a link that talks about these ideas in a lot more detail:
http://blog.lab49.com/archives/3011

Reply ↓

Evolvable Me

Graham Lea

ScalaSyd Wrap-Up: November 2012