Software Development

This Drexel researcher can identify you based on how you write code

Aylin Caliskan-Islam spent the summer at the U.S. Army Research Lab developing a method of code “de-anonymization.” She calls her early results a breakthrough.

A Drexel researcher says code syntax contains identifying details. (Photo by Flickr user Yuri Samoilov, used under a Creative Commons license)

Could a developer contribute to a software project anonymously, wipe her fingerprints off the code and leave no trace?
Drexel researcher Aylin Caliskan-Islam is one step closer to creating a way to do so.

Aylin2014

Aylin Caliskan-Islam. (Courtesy photo)


Anonymization is “a serious concern for people who want to contribute to open source projects anonymously,” Caliskan-Islam said, pointing to how researchers have attempted to unmask the creator of Bitcoin and how developers work on large-scale privacy-focused open source projects like Tor.
The first step in covering a developer’s tracks, though, is figuring out if someone could identify a developer by analyzing their code. Caliskan-Islam, a native of Turkey and a Ph.D. student part of the Drexel lab that has developed software to anonymize authors, spent the summer at the U.S. Army Research Lab in Washington, D.C. developing a method to do just that. (She’s the first international Ph.D. student the Army hired as a summer intern for its Open Campus research initiative, she said.)
Out of 250 examples of source code pulled from the international Google Code Jam competition, she was able to identify authors at a 95 percent accuracy rate, as detailed in a recent academic paper. Given how small each piece of source code was (an average of 70 lines), she called it a breakthrough.
Her approach, which uses machine learning, involves doing what’s essentially a close read of the source code. She looks at things like the words used, the spacing and bracketing and most importantly, structure or syntax (see graphic below for a breakdown of that kind of analysis). All those things make up a developer’s coding style.
source code syntax tree

Here’s how Aylin Caliskan-Islam parses code to figure out who wrote it. (Courtesy image)


Other than leading to the development of an anonymization tool, possible applications include identifying cyber criminals and verifying claims of plagiarism. Caliskan-Islam said she’s not sure how the Army, who funded the project, will put her work to use.
Next up, Caliskan-Islam wants to focus on how to identify developers who have contributed to a project with many authors, like, for example, an open source software project.

Companies: Drexel University

Before you go...

Please consider supporting Technical.ly to keep our independent journalism strong. Unlike most business-focused media outlets, we don’t have a paywall. Instead, we count on your personal and organizational support.

3 ways to support our work:
  • Contribute to the Journalism Fund. Charitable giving ensures our information remains free and accessible for residents to discover workforce programs and entrepreneurship pathways. This includes philanthropic grants and individual tax-deductible donations from readers like you.
  • Use our Preferred Partners. Our directory of vetted providers offers high-quality recommendations for services our readers need, and each referral supports our journalism.
  • Use our services. If you need entrepreneurs and tech leaders to buy your services, are seeking technologists to hire or want more professionals to know about your ecosystem, Technical.ly has the biggest and most engaged audience in the mid-Atlantic. We help companies tell their stories and answer big questions to meet and serve our community.
The journalism fund Preferred partners Our services
Engagement

Join our growing Slack community

Join 5,000 tech professionals and entrepreneurs in our community Slack today!

Trending

The person charged in the UnitedHealthcare CEO shooting had a ton of tech connections

From rejection to innovation: How I built a tool to beat AI hiring algorithms at their own game

Where are the country’s most vibrant tech and startup communities?

The looming TikTok ban doesn’t strike financial fear into the hearts of creators — it’s community they’re worried about

Technically Media