Visualize your data structures!

This page contains the materials for my talk “Visualize your data structures!”. Here are some past and future presentations:

You can use this page in two ways:

  • as a reference/refresher on the material covered in the talk;
  • as an interactive playground where you can try the same commands I presented.

Here is an overview:

Throughout this page we will assume the following declarations (each section might add its own):

import reftree.core._
import reftree.diagram._
import reftree.render._
import reftree.geometry._
import reftree.svg.animation.Frame
import reftree.svg.XmlSvgApi
import reftree.svg.XmlSvgApi.svgUnzip
import reftree.contrib.XmlInstances._
import reftree.contrib.OpticInstances._
import reftree.contrib.ZipperInstances._
import reftree.contrib.ShapelessInstances._
import reftree.util.Optics
import reftree.demo.Data
import reftree.demo.Shortcuts
import scala.collection.immutable._
import java.nio.file.Paths
import Diagram.{sourceCodeCaption ⇒ diagram}

To start an interactive session, just run

$ sbt demo
@ render(List(1, 2, 3))

and open diagram.png in your favorite image viewer (hopefully one that reloads images automatically on file change). You will also need to have GraphViz installed. The interactive session already has all the necessary imports in scope.

Introducing reftree

// extra declarations for this section
val renderer = Renderer(
  renderingOptions = RenderingOptions(density = 75),
  directory = Paths.get(ImagePath, "visualize", "intro")
)
import renderer._

reftree is a library for visualizing Scala data structures.

Let’s look at a quick usage example:

scala> case class Person(firstName: String, age: Int)
defined class Person

scala> val bob = Person("Bob", 42)
bob: Person = Person(Bob,42)

scala> diagram(bob).render("bob")

bob

That’s it! You can configure the visualization as you like:

scala> // render strings as a single box
     | import reftree.contrib.SimplifiedInstances.string
import reftree.contrib.SimplifiedInstances.string

scala> // rename the firstName field (pun not intended)
     | implicit val personConfig = (ToRefTree.DerivationConfig[Person]
     |   .tweakField("firstName", _.withName("name")))
personConfig: reftree.core.ToRefTree.DerivationConfig[Person] = DerivationConfig(None,Set(),Map(firstName -> <function1>))

scala> diagram(bob).render("bob-simplified")

bob-simplified

There are various ways you can use reftree:

  • improving the documentation of your projects;
  • live-coding demos;
  • exploring how things work;
  • anywhere you need diagrams of your Scala data structures.

(Incidentally, this talk is an example of all of the above.)

My previous reftree-powered talk focused on immutable data and various ways it can be manipulated (I do recommend it).

Today I would like to take you on a journey deep inside reftree itself, so that we can see how some of these techniques and concepts can be applied... to produce visualizations of themselves — using one of my favorite reftree features: animations.

Animation
  .startWith(Queue(1, 2, 3))
  .repeat(3)(_.iterate(2)(q ⇒ q :+ (q.max + 1)).iterate(2)(_.tail))
  .build(Diagram.toStringCaption(_).withAnchor("queue"))
  .render("queue")

queue

Inside reftree

// extra declarations for this section
import reftree.contrib.SimplifiedInstances.{option, seq, list}

val renderer = Renderer(
  renderingOptions = RenderingOptions(density = 75),
  directory = Paths.get(ImagePath, "visualize", "inside")
)
import renderer._

First, we need to grasp the basics of reftree.

To visualize a value of some type A, reftree converts it into a data structure called RefTree (surprise!), using a typeclass ToRefTree[A].

For case classes this is done automagically, using shapeless. (If you are curious about the magic, take a look at this file.) Given our friend bob, shapeless would provide a generic representation, which includes the field names (at the type level!) and the values (as a heterogeneous list):

scala> Shortcuts.generic(bob)
res7: shapeless.::[String with shapeless.labelled.KeyTag[Symbol with shapeless.tag.Tagged[String("firstName")],String],shapeless.::[Int with shapeless.labelled.KeyTag[Symbol with shapeless.tag.Tagged[String("age")],Int],shapeless.HNil]] = Bob :: 42 :: HNil

scala> diagram(Shortcuts.generic(bob)).render("generic")

generic

This information is enough to auto-generate a RefTree. Now, what does it look like? The best way to find out is to visualize a RefTree of a RefTree!

scala> diagram(Shortcuts.refTree(bob)).render("reftree")

reftree

As you can see, it contains values (Val) and references (Ref).

How do we get from RefTree to an image though? This is where GraphViz comes in. From a RefTree we can obtain a graph definition that can be rendered by GraphViz:

scala> Shortcuts.graph(bob).encode
res10: String =
digraph "Diagram" {
  graph [ ranksep=0.8 bgcolor="#ffffff00" ]
  node [ shape="plaintext" fontname="Source Code Pro" fontcolor="#000000ff" ]
  edge [ arrowsize=0.7 color="#000000ff" ]
  "-Person599628537" [ id="-Person599628537" label=<<table cellspacing="0" cellpadding="6" cellborder="0" columns="*" bgcolor="#ffffff00" style="rounded"><tr><td port="n" rowspan="2">Person</td><td bgcolor="#ffffff00"><i>name</i></td><td bgcolor="#ffffff00"><i>age</i></td></tr><hr/><tr><td port="0" bgcolor="#ffffff00">&middot;</td><td bgcolor="#ffffff00">42</td></tr></table>> ] [ color="#104e8bff" fontcolor="#104e8bff" ]
  "-java.lang.String1108513508" [ id="-java.lang.String1108513508" label=<<table cellspacing="0" cellpadding="6" cellborder="0" columns="*" bgcolor="#ffffff00" style="roun...

Going even further, we can ask GraphViz for an SVG output:

scala> Shortcuts.svg(bob)
res11: scala.xml.Node = <svg viewBox="0.00 0.00 171.00 168.00" height="168pt" width="171pt" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns="http://www.w3.org/2000/svg"><g class="graph" id="graph0"><title>Diagram</title><polygon points="-4,4 -4,-164 167,-164 167,4 -4,4" stroke="none" fill="none"/><g class="node" id="-Person599628537"><title>-Person599628537</title><path d="M20,-100C20,-100 143,-100 143,-100 149,-100 155,-106 155,-112 155,-112 155,-144 155,-144 155,-150 149,-156 143,-156 143,-156 20,-156 20,-156 14,-156 8,-150 8,-144 8,-144 8,-112 8,-112 8,-106 14,-100 20,-100" stroke="none" fill="none"/><text fill="#104e8b" font-size="14.00" font-family="Source Code Pro" y="-124.3" x="15.5" text-anchor="start">Person</text><polygon points="71.5,-128 71.5,-155 117.5,-155 117.5,-128 71.5...

At this point you might be guessing how we can use this as a basis for our animation approach. Every state of a data structure will be a separate frame in the SVG format. However, an animation consisting of these frames alone would be too jumpy. We need to add intermediate frames to smoothly “morph” one frame into another. With SVG being a vector format, this sounds simple. We just have to individually morph different aspects of the image:

  • graph node positions;
  • graph edges and their shapes;
  • colors;
  • stroke thickness;
  • transparency.

Ouch! A sane functional approach would definitely help here :)

Functional animation

// extra declarations for this section
val renderer = Renderer(
  renderingOptions = RenderingOptions(density = 75),
  directory = Paths.get(ImagePath, "visualize", "animation")
)
import renderer._

Let’s start by introducing an abstraction for morphing, or, in other words, interpolating things of type A:

trait Interpolation[A] {
  def apply(left: A, right: A, time: Double): A
  def sample(left: A, right: A, n: Int, inclusive: Boolean = true): Seq[A]
}

(If you are curious, here is the actual implementation.)

Once we have an instance of Interpolation[xml.Node], we can generate as many intermediate frames as we want! But how do we construct this instance?

Consider a lowly floating point number (it can represent an x coordinate of some element in our SVG, for example). There is an obvious way to implement Interpolation[Double], which reftree already defines as Interpolation.double:

scala> val numbers = Interpolation.double.sample(0, 10, 5).toList
numbers: List[Double] = List(0.0, 2.5, 5.0, 7.5, 10.0)

scala> diagram(numbers).render("numbers")

numbers

Now if you think about a point in 2D space, it’s just two numbers joined together:

scala> val point = Point(0, 10)
point: reftree.geometry.Point = 0.0 10.0

scala> diagram(point).render("point")

point

Can we use the number interpolation to interpolate these two numbers? To answer this question, let’s introduce more abstraction (in a great tradition of functional programming).

A lens Lens[A, B] is something that can “focus” on a piece of data of type B inside a data structure of type A and provide read-write access to it. We will use the excellent Monocle library to create lenses and other optics along the way:

scala> import monocle.macros.GenLens
import monocle.macros.GenLens

scala> val x = GenLens[Point](_.x)
x: monocle.Lens[reftree.geometry.Point,Double] = $anon$1@3cadee74

scala> val y = GenLens[Point](_.y)
y: monocle.Lens[reftree.geometry.Point,Double] = $anon$1@5633c31c

scala> (diagram(OpticFocus(x, point)).toNamespace("x") +
     |   diagram(OpticFocus(y, point)).toNamespace("y")).render("x+y")

x+y

Lenses provide several methods to manipulate data:

scala> x.get(point)
res16: Double = 0.0

scala> y.set(20)(point)
res17: reftree.geometry.Point = 0.0 20.0

scala> y.modify(_ + 20)(point)
res18: reftree.geometry.Point = 0.0 30.0

If we can read and write each coordinate field, we can interpolate them separately and update the point field by field. We do this by piping Interpolation.double through x and y lenses and combining the resulting interpolations:

scala> val pointInterpolation = (
     |   x.interpolateWith(Interpolation.double) +
     |   y.interpolateWith(Interpolation.double))
pointInterpolation: reftree.geometry.Interpolation[reftree.geometry.Point] = reftree.geometry.Interpolation$$anon$2@6e540a37

scala> val points = pointInterpolation.sample(Point(0, 0), Point(10, 20), 5).toList
points: List[reftree.geometry.Point] = List(0.0 0.0, 2.5 5.0, 5.0 10.0, 7.5 15.0, 10.0 20.0)

scala> diagram(points).render("points")

points

Of course, reftree already defines this as Point.interpolation.

Using the same approach, we can build a polyline interpolator (assuming the polylines being interpolated consist of equal number of points):

scala> Data.polyline1
res20: reftree.geometry.Polyline = 0.0 10.0,10.0 20.0

scala> Data.polyline2
res21: reftree.geometry.Polyline = 20.0 30.0,40.0 50.0

scala> val polylineInterpolation = (GenLens[Polyline](_.points)
     |   .interpolateEachWith(Point.interpolation))
polylineInterpolation: reftree.geometry.Interpolation[reftree.geometry.Polyline] = reftree.geometry.Interpolation$$anon$2@b392277

scala> val polylines = polylineInterpolation.sample(Data.polyline1, Data.polyline2, 3).toList
polylines: List[reftree.geometry.Polyline] = List(0.0 10.0,10.0 20.0, 10.0 20.0,25.0 35.0, 20.0 30.0,40.0 50.0)

scala> diagram(polylines).render("polylines")

polylines

We are finally ready to implement our first substantial interpolator: one that morphs graph edges. The following approach is inspired by Mike Bostock’s path tween, however reftree puts more emphasis on types and even includes its own SVG path parser and simplification algorithm.

The resulting animation should look like this:

edges-100

An edge is drawn with an SVG path, which consists of several commands, e.g. “move to”, “line to”, “bezier curve to”. Here is a minimized SVG snippet for an actual edge:

scala> Data.edge1
res23: scala.xml.Node = <svg viewBox="50 -200 130 70" height="70pt" width="130pt" shape-rendering="geometricPrecision" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"><g class="edge"><path d="M84.5,-195C84.5,-165.869 62.5907,-160.925 58.9962,-135.762" stroke="#104e8b" fill="none"/></g></svg>

scala> diagram(Data.edge1).render("edge")

edge

As you can see, the commands themselves are given in the d attribute inside the path element in a rather obscure format. Luckily, we have lenses and other optics at our disposal to plumb through this mess.

First, let’s get to the path element. reftree implements a few things that will help us:

  • XmlSvgApi, an implementation of several useful SVG operations for scala-xml. In particular, if offers a CSS selector-like method for matching elements of certain type and/or class.
  • An optic that focuses on an element deep inside XML or any other recursive data structure: Optics.collectFirst. It is actually an Optional, not a Lens, since the element might be missing.
scala> val edgePathElement = Optics.collectFirst(XmlSvgApi.select("path"))
edgePathElement: monocle.Optional[scala.xml.Node,scala.xml.Node] = monocle.Optional$$anon$6@ed4aed8

scala> diagram(OpticFocus(edgePathElement, Data.edge1)).render("edgePathElement")

edgePathElement

Next, we need to “descend” to the d attribute. Here is where optics really shine: we can compose Optional[A, B] with Optional[B, C] to get an Optional[A, C]:

scala> val d = XmlSvgApi.attr("d")
d: monocle.Optional[scala.xml.Node,String] = monocle.POptional$$anon$1@5c973321

scala> val edgePathString = edgePathElement composeOptional d
edgePathString: monocle.POptional[scala.xml.Node,scala.xml.Node,String,String] = monocle.POptional$$anon$1@2d637ef1

scala> diagram(OpticFocus(edgePathString, Data.edge1)).render("edgePathString")

edgePathString

Next, we will use an isomorphism, another kind of optic, to view the string as a nice case class:

scala> Path.stringIso
res27: monocle.Iso[String,reftree.geometry.Path] = monocle.PIso$$anon$10@123706aa

scala> val edgePath = edgePathString composeIso Path.stringIso
edgePath: monocle.POptional[scala.xml.Node,scala.xml.Node,reftree.geometry.Path,reftree.geometry.Path] = monocle.POptional$$anon$1@606f998f

scala> diagram(edgePath.getOption(Data.edge1)).render("edgePath")

edgePath

And finally, another isomorphism takes us from a Path to its sampled representation as a Polyline. (Purists will say that this is not really an isomorphism because it’s not reversible, but with a lot of points you can get pretty close ;))

scala> Path.polylineIso(points = 4)
res29: monocle.Iso[reftree.geometry.Path,reftree.geometry.Polyline] = monocle.PIso$$anon$10@6c863ee1

scala> def edgePolyline(points: Int) = edgePath composeIso Path.polylineIso(points)
edgePolyline: (points: Int)monocle.POptional[scala.xml.Node,scala.xml.Node,reftree.geometry.Polyline,reftree.geometry.Polyline]

scala> diagram(edgePolyline(4).getOption(Data.edge1)).render("edgePolyline")

edgePolyline

Let’s interpolate!

scala> def edgeInterpolation(points: Int) = edgePolyline(points).interpolateWith(Polyline.interpolation)
edgeInterpolation: (points: Int)reftree.geometry.Interpolation[scala.xml.Node]

scala> def edges(points: Int, frames: Int) = (Data.edge1 +:
     |   edgeInterpolation(points).sample(Data.edge1, Data.edge2, frames, inclusive = false) :+
     |   Data.edge2)
edges: (points: Int, frames: Int)scala.collection.immutable.Stream[scala.xml.Node]

scala> AnimatedGifRenderer.renderFrames(
     |   edges(4, 4).map(Frame(_)),
     |   Paths.get(ImagePath, "visualize", "animation", "edges-4.gif"),
     |   RenderingOptions(density = 200),
     |   AnimationOptions(framesPerSecond = 1)
     | )

scala> AnimatedGifRenderer.renderFrames(
     |   edges(100, 32).map(Frame(_)),
     |   Paths.get(ImagePath, "visualize", "animation", "edges-100.gif"),
     |   RenderingOptions(density = 200),
     |   AnimationOptions(framesPerSecond = 8)
     | )

With 4 points and 4 frames:

edges-4

With 100 points and 32 frames:

edges-100

Interpolating the entire image is left as an exercise for the reader, although the impatient will find the complete implementation here.

Notice that we never touched XML directly. In fact, equipped with the same set of optics for another format or representation, we would be able to operate on it without changing the code too much. Case in point: reftree supports both scala-xml and scala-js-dom (for Scala.js), with only 50 lines of implementation-specific code for each backend. This goes to show the flexibility and usefulness of optics.

Zipping it up

// extra declarations for this section
val renderer = Renderer(
  renderingOptions = RenderingOptions(density = 75),
  directory = Paths.get(ImagePath, "visualize", "zippers")
)
import renderer._

In the previous section we saw Optics.collectFirst — an optic that is able to perform modifications deep inside SVG. How do we go about implementing something like this, or, more generally, how do we edit recursive data structures such as XML?

This solution is called a “Zipper”, and was introduced by Gérard Huet in 1997. It consists of a “cursor” pointing to a location anywhere in a tree — “current focus”. The cursor can be moved freely with operations like moveDownLeft, moveRight, moveUp, etc. Current focus can be updated, deleted, or new nodes can be inserted to its left or right. Zippers are immutable, and every operation returns a new Zipper. All the changes made to the tree can be committed, yielding a new modified version of the original tree.

My zipper library provides a few useful movements and operations. Just like optics, it’s rather generic and flexible. The zipper can operate on any type, as long as an instance of the Unzip typeclass is available, which can be automatically derived in many cases. (Note that the derivation of Unzip for SVG can be found here.)

Consider a simple XML tree:

scala> Data.simpleXml
res34: scala.xml.Node = <tree value="1"><leaf value="2"/><leaf value="3"/><leaf value="4"/><tree value="5"><leaf value="6"/><leaf value="7"/></tree></tree>

scala> diagram(Data.simpleXml).render("simpleXml")

simpleXml

When we wrap a Zipper around this tree, it does not look very interesting yet:

scala> import zipper.Zipper
import zipper.Zipper

scala> val zipper1 = Zipper(Data.simpleXml)
zipper1: zipper.Zipper[scala.xml.Node] = Zipper(List(),<tree value="1"><leaf value="2"/><leaf value="3"/><leaf value="4"/><tree value="5"><leaf value="6"/><leaf value="7"/></tree></tree>,List(),None)

scala> (diagram(Data.simpleXml) + diagram(zipper1)).render("zipper1")

zipper1

We can see that it just points to the original tree. In this case the focus is the root of the tree, which has no siblings, and the parent zipper does not exist, since we are at the top level.

To move down the tree, we “unzip” it, separating the child nodes into the focused node and its left and right siblings:

scala> val zipper2 = zipper1.moveDownLeft
zipper2: zipper.Zipper[scala.xml.Node] = Zipper(List(),<leaf value="2"/>,List(<leaf value="3"/>, <leaf value="4"/>, <tree value="5"><leaf value="6"/><leaf value="7"/></tree>),Some(Zipper(List(),<tree value="1"><leaf value="2"/><leaf value="3"/><leaf value="4"/><tree value="5"><leaf value="6"/><leaf value="7"/></tree></tree>,List(),None)))

scala> (diagram(zipper1) + diagram(zipper2)).render("zipper1+2")

zipper1+2

The new Zipper links to the old one, which will allow us to return to the root of the tree when we are done applying changes. This link however prevents us from seeing the picture clearly. Let’s look at the second zipper alone:

scala> diagram(zipper2).render("zipper2b")

zipper2b

Great! We have 2 in focus and 3, 4, 5 as right siblings. What happens if we move right a bit?

scala> val zipper3 = zipper2.moveRightBy(2)
zipper3: zipper.Zipper[scala.xml.Node] = Zipper(List(<leaf value="3"/>, <leaf value="2"/>),<leaf value="4"/>,List(<tree value="5"><leaf value="6"/><leaf value="7"/></tree>),Some(Zipper(List(),<tree value="1"><leaf value="2"/><leaf value="3"/><leaf value="4"/><tree value="5"><leaf value="6"/><leaf value="7"/></tree></tree>,List(),None)))

scala> diagram(zipper3).render("zipper3")

zipper3

This is interesting! Notice that the left siblings are “inverted”. This allows to move left and right in constant time, because the sibling adjacent to the focus is always at the head of the list.

This also allows us to insert new siblings easily:

scala> val zipper4 = zipper3.insertLeft(<fruit/>)
zipper4: zipper.Zipper[scala.xml.Node] = Zipper(List(<fruit/>, <leaf value="3"/>, <leaf value="2"/>),<leaf value="4"/>,List(<tree value="5"><leaf value="6"/><leaf value="7"/></tree>),Some(Zipper(List(),<tree value="1"><leaf value="2"/><leaf value="3"/><leaf value="4"/><tree value="5"><leaf value="6"/><leaf value="7"/></tree></tree>,List(),None)))

scala> diagram(zipper4).render("zipper4")

zipper4

And, as you might know, we can delete nodes and update the focus:

scala> val zipper5 = zipper4.deleteAndMoveRight.set(<worm/>)
zipper5: zipper.Zipper[scala.xml.Node] = Zipper(List(<fruit/>, <leaf value="3"/>, <leaf value="2"/>),<worm/>,List(),Some(Zipper(List(),<tree value="1"><leaf value="2"/><leaf value="3"/><leaf value="4"/><tree value="5"><leaf value="6"/><leaf value="7"/></tree></tree>,List(),None)))

scala> diagram(zipper5).render("zipper5")

zipper5

Finally, when we move up, the siblings at the current level are “zipped” together and their parent node is updated:

scala> val zipper6 = zipper5.moveUp
zipper6: zipper.Zipper[scala.xml.Node] = Zipper(List(),<tree value="1"><leaf value="2"/><leaf value="3"/><fruit/><worm/></tree>,List(),None)

scala> diagram(zipper6).render("zipper6")

zipper6

When we are done editing, the .commit shorthand can be used for going all the way up (applying all the changes) and returning the focus. Notice how all the unchanged nodes are shared between the old and the new XML.

scala> val notSoSimpleXml = zipper6.commit
notSoSimpleXml: scala.xml.Node = <tree value="1"><leaf value="2"/><leaf value="3"/><fruit/><worm/></tree>

scala> (diagram(Data.simpleXml) + diagram(notSoSimpleXml)).render("notSoSimpleXml")

notSoSimpleXml

Using an XML zipper, a determined reader can easily implement advanced lenses, such as Optics.collectFirst, Optics.collectLeftByKey, etc, all found here.

To conclude, here is an animation of a zipper and the tree it operates on (from my previous talk), produced (as we know now) not without zippers’ help:

tree+zipper

That’s all! Thank you for reading this far. I hope you are leaving this page with some great reftree use-cases in mind :)

results matching ""

    No results matching ""