Could a developer contribute to a software project anonymously, wipe her fingerprints off the code and leave no trace?
Drexel researcher Aylin Caliskan-Islam is one step closer to creating a way to do so.
Anonymization is “a serious concern for people who want to contribute to open source projects anonymously,” Caliskan-Islam said, pointing to how researchers have attempted to unmask the creator of Bitcoin and how developers work on large-scale privacy-focused open source projects like Tor.
The first step in covering a developer’s tracks, though, is figuring out if someone could identify a developer by analyzing their code. Caliskan-Islam, a native of Turkey and a Ph.D. student part of the Drexel lab that has developed software to anonymize authors, spent the summer at the U.S. Army Research Lab in Washington, D.C. developing a method to do just that. (She’s the first international Ph.D. student the Army hired as a summer intern for its Open Campus research initiative, she said.)
Out of 250 examples of source code pulled from the international Google Code Jam competition, she was able to identify authors at a 95 percent accuracy rate, as detailed in a recent academic paper. Given how small each piece of source code was (an average of 70 lines), she called it a breakthrough.
Her approach, which uses machine learning, involves doing what’s essentially a close read of the source code. She looks at things like the words used, the spacing and bracketing and most importantly, structure or syntax (see graphic below for a breakdown of that kind of analysis). All those things make up a developer’s coding style.
Other than leading to the development of an anonymization tool, possible applications include identifying cyber criminals and verifying claims of plagiarism. Caliskan-Islam said she’s not sure how the Army, who funded the project, will put her work to use.
Next up, Caliskan-Islam wants to focus on how to identify developers who have contributed to a project with many authors, like, for example, an open source software project.