Tech & Science

How to Attach Original Metadata to Every Photo, Meme and GIF on the Internet

In the Magazine
A woman walks across a record breaking photo mosaic in Birmingham, England on August 23, 2008. Organizations like Mediachain Labs and the International Image Interoperability Framework aim to add metadata and trace the authorship of images online, but with 1.8 billion photos uploaded each day it is hardly a simple task. Darren Staples/Reuters

On January 8, 2015, in celebration of David Bowie’s 68th birthday, British illustrator Helen Green posted a GIF of the musician on Tumblr. A little over a year later, Bowie died, and Green’s animated portraits of him changing his appearance over his career spread across the internet. As of this writing, the reverse image search engine TinEye yields 28 pages of results for the GIF. Many versions of the file have been shrunken, some have been stretched, and others cropped to remove Green’s signature from the lower-right corner. Fans must search out the artist to credit her when sharing the GIF, and Green must vigilantly track down people using it for commercial purposes if she’s to get paid or even acknowledged.

Mediachain Labs cites Green when discussing the need for accurate attribution online. “An image goes viral, and millions of people see it, but the big disconnect is that information about what you’re looking at is lost,” says co-founder and engineer Denis Nazarov. Launched in December 2014 by Nazarov and Jesse Walden, Mediachain Labs is building an eponymous protocol that connects an image with the information relevant to it. TinEye or Google Images tells you the webpages where an image can be found; Mediachain tells you who created that image, what it’s titled, when it was produced and more. Since it’s an open protocol, Mediachain also allows for the development of other applications, like a service letting artists track their work across the web.

“The creator is all of a sudden able to connect with their audience, regardless of where their media is distributed,” says Walden.

Origin details, such as creator, title and year of production, are routinely appended to images through notes called metadata. The problem is that metadata is often lost as images travel around the web. Uploading Green’s GIF of Bowie to Facebook, for example, will strip the file of its metadata. To preserve that connection, the image and its metadata can be stored together in a database. By using a TinEye-like tool, the image or derivatives of it can be searched for in the database, which will retrieve the relevant metadata. Thus, you can learn that Green created that Bowie GIF, even if her signature is cropped out.

That sounds like a great solution, but it would require Mediachain Labs to create and maintain a centralized database of images and metadata. While this approach works for Shazam, which identifies songs by searching for them in its proprietary library of music, building such a database for images would be laborious, costly and probably futile, given the overwhelming number of pictures online. While Shazam works with 11 million songs, 1.8 billion photos are uploaded to the internet every day.

The Mediachain protocol sidesteps that with a decentralized database. Instead of the company acting as a hub, participants provide access to their own images and metadata. Thus, Mediachain can deliver information about the archives of the Museum of Modern Art, one of its partners, while all of that information remains in MoMA’s control. The database is also open, meaning anyone, from Green to MoMA’s director, can add to it, submitting GIFs or providing the provenance of historic photographs. Thanks to decentralization and openness, the protocol transforms the seemingly impossible feat of collecting information about every image on the internet into a collaborative—and therefore feasible—effort. Mediachain Labs has already amassed metadata on 2 million images, which it used to launch its first test network in July.

While that is a start to working through the very, very long backlog of pictures already posted, Mediachain is also tackling the deluge of new images uploaded daily. Nazarov and Walden hope developers will take advantage of their open protocol to create, for instance, a tool that allows users to easily add their artwork to the database. Nazarov points to authorship as one powerful incentive. “In a perfect world, if humanity had started with Mediachain,” he says with a laugh, “if the only camera was on your phone, where you were logged in, and every photo you took was time-stamped automatically, there would be no way it was possible for someone to claim something before you.”

Although the task before it remains gargantuan, Mediachain Labs has won over some impressive supporters. In June, the company announced that it had raised $1.5 million from Union Square Ventures (an early investor in Twitter, Foursquare and Kickstarter), as well as from the venture capital company Andreessen Horowitz, which funded Skype, Facebook and Airbnb. In addition to MoMA, Mediachain Labs can also count Getty Images, Europeana and the Digital Public Library of America (DPLA) among its partners.

Taking a different approach to the question of how internet users can learn more about digital images is the International Image Interoperability Framework. IIIF, pronounced “triple I F,” is a consortium of museums, libraries and universities committed to sharing their resources, or “interoperating.” Many such institutions maintain their digital archives in diverse formats, with images and metadata delivered through various, and often incompatible, means. This creates difficulties for researchers or developers hoping to draw material from multiple sources in a uniform way. IIIF began to address this problem in 2010.

The most straightforward approach to the challenge of interoperating would be for every institution to settle on a standard format for image and metadata delivery, but that isn’t going to happen. “Nobody is going to replace their digital image infrastructure just to interoperate on some medieval manuscripts or whatever,” says Jon Stroop, an IIIF editor and applications development manager at the Princeton University Library, a member of the consortium. (IIIF started as a way for medieval manuscript researchers to better access materials.)

Rather than asking every institution to redesign the channels through which its images and metadata flow, IIIF requests an additional, uniformly formatted stream. Several organizations have already adopted IIIF, making it attractive for developers to also work according to its specifications. The IIIF community is made up of more than 60 cultural heritage institutions. Besides Princeton and the Getty, these include Harvard University, Wikipedia and the DPLA. Examples of applications that have been made according to IIIF specifications are Mirador, a gallery viewer that includes metadata display, and OpenSeadragon, which allows for the impossibly detailed viewing of large images.

As a partner of both Mediachain Labs and IIIF, the DPLA is in a unique position to weigh the benefits and limitations of each organization's approach. The DPLA, which brings together the online resources of U.S. cultural heritage institutions and makes them accessible to internet users, has given Mediachain Labs access to its images and metadata and is advocating for adoption of the IIIF among its “hubs”—the libraries, museums and archives whose materials it aggregates.

Nearly 20 percent of the DPLA’s primary hubs have some version of IIIF. “It’s astonished me how rapidly IIIF has grown,” says DPLA Director of Technology Mark Matienzo. He also hints at a possible difficulty, saying, “I’m really interested to see IIIF gain more traction outside the space of cultural heritage.” IIIF’s community roster includes universities, libraries and museums but few commercial entities.

Matienzo believes Mediachain Labs faces a similar challenge regarding adoption. Although it’s too soon for the company to have produced results from its collaboration with the DPLA, he is hopeful. “Mediachain is the most promising thing out there,” he says. “But I’m also curious to see what kind of partnerships they’re going to put into place.” His concerns center on the decentralized database, which he sees as being bleeding edge enough to avert widespread use, raising the question of how the protocol would integrate with the rest of the web.

Mediachain Labs has several answers for that. Besides the two previously mentioned applications (one letting artists track their work across the web and the other registering their artwork to the decentralized database), Nazarov describes how the protocol could combine the various online Creative Commons libraries, and Walden imagines it integrating with a blogging platform to give writers access to use-granted images—all with automated attribution, of course. Their reliance on third-party developers may seem hopeful to the charitable and delusional to the cynical, but Mediachain Labs already has more than 200 members in its open-source community, following or contributing work to the protocol, and the company sees no other way forward. Nazarov and Walden struggled with developing more fully formed applications before realizing that, first, it was necessary to build an open protocol as a foundation.

“We might want to think of ourselves as creative geniuses, but what’s more important than us coming up with this great experience is building a platform for anyone to come up with an experience that they think is valuable,” says Walden. And, he might want to add, for them to get credit for it.