Here's how John Resig uses Node.js to research Japanese art prints

By creating a data set of known information, you can discover patterns and missing pieces of logic in almost anything, even obscure Japanese art.

John Resig came out to Etsy Labs on Thursday to talk about his long running side project Ukiyo-e: collecting scans of Japanese prints from the web and using programming to learn as much as he can about them.

The talk was part of the Code As Craft speaker series. The project has evolved from being a personal project into becoming a real boon for academics, museums and art collectors. The talk was called, “Analyzing Japanese Art with Node.js and Computer Vision.”

The project started as a personal interest in Japanese woodblock prints, with the corresponding realization that what it would take to really understand the art form well was more than he could undertake (buy lots of expensive books, learn to read 17th century Japanese, etc). So he began to address the problem with code and ended up solving problems for art collectors that are unlikely to have been solved any other way.

Here are a few highlights from what was one of the most approachable and engaging pure dev talks we’ve heard. One thing to keep in mind about the following is that the study here is of prints, so many museums have the same or nearly the same piece:

Resig’s project has collected almost a quarter million images from libraries and museums around the world. The site where you can view the work and search it in all sorts of ways is called Ukiyo-e.
Some of the technology he used to build it: Nginx server, data in MongoDB, site crawling using PhantomJS and hosting data copied across servers worldwide using CBM.
He gathered his data up by scraping it from all kinds of institutional sites that posted artwork images. He referred to this as the “museum deep web” at one point because much of the data is hidden behind queries or search screens that Google doesn’t reach. So he had to write scripts for pages that would go in to the site and pull out images and metadata.
The metadata was actually harder to scrape than the images, and then he had to do a lot of clean up work in order to make vague data more specific for his elastic search system.
Many sites were challengingly decrepit. For example, he found one site so old that when you opened it you could see on the page what the last search done by whatever user had used it previously because the site only had one possible session built into it.
He tried a lot of computer vision systems to match similar images, and ended up settling on Tin Eye.
Once he started matching up images across institutions, he made interesting discoveries, such as prints that had been changed slightly (by cutting a piece out of a block and replacing it) or prints that seemed to be nearly exactly the same with different attributions.
He was also able to find instances where two images that didn’t match were actually parts of a larger piece, though the institutions that held the parts were not aware of it.
Code also helped to create agreements across transliterations of Japanese or complete information at institutions that had incomplete info about prints in their collection that other museums had complete info about.

He opened the talk by showing a set of prints posted for sale on an auction site. Using his system, Resig was able to identify the prints, though the auction house couldn’t. At the end, you learn that he ended up buying a lot of 20 prints at what market research would suggest was a very good deal. He showed the prints to interested attendees after the talk ended.

Resig closed with a productivity point. He said that while he really enjoyed working on the project, he also felt guilty not spending enough time on it.

He found that if he committed to writing a bit of code every day, he got more done, stayed more in the project and the weight came off.

His rule was that he had to write code every day, put it on Github and it had to be open source. As of this writing, his current streak is 122 days.

See all his work on Github here, and see research papers he’s done on the project here.

See the whole presentation below:

Watch live streaming video from etsycodeascraft at livestream.com

Resig is best known for inventing the jQuery Javascript Library. He currently works as Dean of Computer Science at Khan Academy and lives in Brooklyn.

John Resig shows his collection of woodblock prints.

Companies: Etsy

Series: Brooklyn

Here’s how John Resig uses Node.js to research Japanese art prints

John Resig presents his personal project, helping art institutions make sense of their collection of Japanese woodblock prints.

Join the conversation!