Zehao Xu

PhD, University of Waterloo

“Good diagrams clarify. Very good diagrams force the ideas upon the viewer. The best diagrams compellingly embody the ideas themselves.”

… Wayne Oldford (2003)


“The aim of interactive graphics is not to improve and polish a particular display till it conveys its message in an effective manner, but to use sets of displays to explore data sets and discover the information in them.”

… Unwin (1999)

Great northern diver

Zehao’s GitHub

Biography

This is Zehao Xu, a PhD student of Statistics at the University of Waterloo, supervised by Wayne Oldford. My research interests include Data Visualization, Data Analysis, Interactive Graphics, Machine Learning, package Development.

Currently, I am dedicated to project loon, which is built by Prof.Oldford and Dr.Waddell


Welcome to Loon Wonderland

In package loon development, I am mainly in charge of building some new functions to better serve data scientists (e.g. loonGrob, facets, …) and fixing some tiny bugs.

Additionally, I am also interested in building some loon derivatives:

  • loon.ggplot, an R package to turn ggplot graphic data structures into interactive loon plots.
  • loon.shiny, display loon widgets in a shiny app
  • loon.tourr provides tour mechanism (e.g. grand tour, guided tour, etc) in loon

Rasterly

Besides, I enjoy myself in large data visualization. So, a common question, how large is large? Is thousand large? Is million large? Is billion large? Trillion is large! In language R, most graphical systems can handle data points less than 10 thousand (with word handle, I mean the graphics can be rendered in reasonable time). Beyond it, the rendering time will increase dramatically (Don’t believe it? Try plot(rnorm(1e6))). If the number of observations reaches 1 million, R session may have a chance to be terminated. Package rasterly is built to visualize large data (even billion) in seconds.


Ggmulti

Everyone loves ggplot. It provides materials (i.e. serialaxes objects) to visualize high dimensional dataset in ggplot.

  • Serialaxes coordinates (i.e., parallel or radial axis systems)

  • General glyphs (e.g., polygons, images) to appear a scatterplot.

  • “More general” geom_histogram and geom_density to allow them to appear on serial axes.

Interests

  • Data Visualization
  • Data Analysis
  • Interactive Graphics
  • Machine Learning
  • Package Development

Education

  • PhD in Statistics, 2017

    University of Waterloo

  • Master in Computational Science, 2016

    University of Waterloo

  • BSc in Statistics and Risk Management, 2012

    Southwest University of Finance and Economic

Hobbies

GYM

MMA

basketball

Experience

 
 
 
 
 

Internship

Plotly

May 2019 – Sep 2019 Montreal
Responsibilities include:

  • Package development (rasterly)
  • Code review
 
 
 
 
 

PhD

University of Waterloo

Sep 2017 – Present Waterloo

Projects

*

NLP with Disaster Tweets in R

Use natural language processing to explore whether a tweet announces a disaster.

Ggmulti

It provides materials (i.e. serialaxes objects) to visualize high dimensional data in ggplot.

Loon

Exploratory interactive data visualization.

loon.ggplot

An add on package to loon that converts ggplot2 plots to interactive loon plots, vice and versa.

Loon.shiny

Interactive loon widget in wep shiny app

Rasterly

Easily and Rapidly Generate Raster Image Data with Support for Plotly.js

loon.tourr

Implenment tour algorithms in interactive graphical system loon.

Recent & Upcoming Talks

Interest Topics

Contact