Jonkman Microblog
  • Login
Show Navigation
  • Public

    • Public
    • Network
    • Groups
    • Popular
    • People

Notices by clacke (clacke@social.heldscal.la), page 12

  1. clacke (clacke@social.heldscal.la)'s status on Wednesday, 30-May-2018 14:31:51 EDT clacke clacke
    in reply to
    • clacke
    Normal magnet link is 1 MB/s though.
    In conversation Wednesday, 30-May-2018 14:31:51 EDT from social.heldscal.la permalink
  2. clacke (clacke@social.heldscal.la)'s status on Wednesday, 30-May-2018 14:31:17 EDT clacke clacke
    in reply to
    • Oneesan succubus
    @lain The Queen of Sweden!

    How very German. :-D
    In conversation Wednesday, 30-May-2018 14:31:17 EDT from social.heldscal.la permalink
  3. clacke (clacke@social.heldscal.la)'s status on Wednesday, 30-May-2018 14:27:57 EDT clacke clacke
    • Oneesan succubus
    It's very meme-compatible that @lain radio has instant.io links, but I tried one and nobody was seeding or it just didn't work. I tried with my wide-open Chromium too, so it wasn't just that I was blocking JS or something.

    https://lainradio.club/
    In conversation Wednesday, 30-May-2018 14:27:57 EDT from social.heldscal.la permalink

    Attachments

    1. File without filename could not get a thumbnail source.
      Lain Radio Club
  4. clacke (clacke@social.heldscal.la)'s status on Wednesday, 30-May-2018 14:20:32 EDT clacke clacke
    • Mayutan☕
    • SukinoSenze
    @sukinosenze @mayuutann Skype is a hog, takes forever to do anything, at least on my Samsung S5 and S5 Mini. But Skype and Telegram have the feature in common that you can edit messages. That really helps improve conversation a lot, removing typos that confuse, instead of having to say "oh hey I meant blah".
    In conversation Wednesday, 30-May-2018 14:20:32 EDT from social.heldscal.la permalink
  5. clacke (clacke@social.heldscal.la)'s status on Wednesday, 30-May-2018 14:15:58 EDT clacke clacke
    • Mayutan☕
    • SukinoSenze
    @sukinosenze @mayuutann WhatsApp has been pretty popular in Hong Kong, but a lot of people are switching over to WeChat these days. I guess because friends and family on the other side of the border use WeChat.
    In conversation Wednesday, 30-May-2018 14:15:58 EDT from social.heldscal.la permalink
  6. Federated Republic of Sean (freakazoid@retro.social)'s status on Wednesday, 30-May-2018 12:09:18 EDT Federated Republic of Sean Federated Republic of Sean

    You want cyberpunk dystopia? Because that's how you get cyberpunk dystopia.

    This is massive corporate welfare for the giant (mostly American) corporations who can easily afford to do the filtering that lets them keep their safe harbor under this law (and are already doing it). It's also corporate welfare for the American content cartel.

    And here I thought Europeans didn't like Google, Facebook, and Hollywood.

    In conversation Wednesday, 30-May-2018 12:09:18 EDT from retro.social permalink Repeated by clacke
  7. h (h@social.coop)'s status on Wednesday, 30-May-2018 14:09:58 EDT h h
    • jjg

    @jjg Jeri Ellsworth: First to fabricate logic gates in a garage, developing her own process from scratch.
    https://hackaday.com/2010/03/10/jeri-makes-integrated-circuits/

    Zeloof: First to make integrated circuit in a garage, following Jeri's lead.
    http://sam.zeloof.xyz/

    In conversation Wednesday, 30-May-2018 14:09:58 EDT from social.coop permalink Repeated by clacke
  8. Elly, nasty enby :heart_pan: (elizafox@mst3k.interlinked.me)'s status on Wednesday, 30-May-2018 13:50:31 EDT Elly, nasty enby :heart_pan: Elly, nasty enby :heart_pan:

    Admin PSA: please ensure your IPv6 outbound and inbound work on your instance. With Mastodon 2.2.0 or so (which a lot of instances still run) people can't federate with you because it will try IPv6 first and then give up. Disabling AAAA records isn't good enough if you run 2.2.0 because it won't retry outgoing either.

    You'll wind up with assloads of missing posts.

    In conversation Wednesday, 30-May-2018 13:50:31 EDT from mst3k.interlinked.me permalink Repeated by clacke
  9. Bob Mottram 🔧 ☕ ✅ (bob@soc.freedombone.net)'s status on Wednesday, 30-May-2018 13:53:13 EDT Bob Mottram 🔧 ☕ ✅ Bob Mottram 🔧 ☕ ✅
    Updated CryptPad to the latest version on #freedombone. It now has a new kanban app which is a bit flaky but works. So now you can collaboratively kanban on the mesh.
    In conversation Wednesday, 30-May-2018 13:53:13 EDT from soc.freedombone.net permalink Repeated by clacke
  10. rem (rem@bsd.network)'s status on Wednesday, 30-May-2018 13:59:14 EDT rem rem
    • Shawn Webb

    @lattera When coding style survives compilation https://blog.acolyer.org/2018/03/16/when-coding-style-survives-compilation-de-anonymizing-programmers-from-executable-binaries/

    In conversation Wednesday, 30-May-2018 13:59:14 EDT from bsd.network permalink Repeated by clacke

    Attachments

    1. Invalid filename.
      When coding style survives compilation: de-anonymizing programmers from executable binaries
      By adriancolyer from the morning paper

      When coding style survives compilation: de-anonymizing programmers from executable binaries Caliskan et al., NDSS’18

      As a programmer you have a unique style, and stylometry techniques can be used to fingerprint your style and determine with high probability whether or not a piece of code was written by you. That makes a degree of intuitive sense when considering source code. But suppose we don’t have source code? Suppose all we have is an executable binary? Caliskan et al., show us that it’s possible to de-anonymise programmers even under these conditions. Amazingly, their technique still works even when debugging symbols are removed, aggressive compiler optimisations are enabled, and traditional binary obfuscation techniques are applied! Anonymous authorship of binaries is consequently hard to achieve.

      One of the findings along the way that I found particularly interesting is that more skilled/experienced programmers are more fingerprintable. It makes sense that over time programmers acquire their own unique way of doing things, yet at the same time these results seem to suggest that experienced programmers do not converge on a strong set of stylistic conventions. That suggests to me a strong creative element in program authorship, just as experienced authors of written works develop their own unique writing styles.

      If we encounter an executable binary sample in the wild, what can we learn from it? In this work, we show that the programmer’s stylistic fingerprint, or coding style, is preserved in the compilation process and can be extracted from the executable binary. This means that it may be possible to infer the programmer’s identity if we have a set of known potential candidate programmers, along with executable binary samples (or source code) known to be authored by these candidates.

      Out of a pool of 100 candidate programmers, Caliskan et al. are able to attributed authorship with accuracy of up to 96%, and with a pool of 600 candidate programmers, they reach accuracy of 83%. These results assume that the compiler and optimisation level used for compilation of the binary are known. Fortunately, previous work has shown that toolchain provenance, including the compiler family, version, optimisation level, and source language, can be identified using a linear Conditional Random Field (CRF) with accuracy of up to 99% for language, compiler family, and optimisation level, and 92% for compiler version.

      One of the potential uses for the technology is identifying authors of malware.

      Finding fingerprint features in executable binaries

      So how is this seemingly impossible feat pulled off? The process for training the classifier given a corpus of works by authors in a candidate pool has four main steps, as illustrated below:

      1. Disassembly: first the program is disassembled to obtain features based on machine code instructions, referenced strings, symbol information, and control flow graphs.
      2. Decompilation: the program is translated into C-like pseudo-code via decompilation, and this pseudo-code is passed to a fuzzy C parser to generate an AST. Syntactical features and n-grams are extracted from the AST.
      3. Dimensionality reduction: standard feature selection techniques are used to select the candidate features from amongst those produced in steps 1 and 2.
      4. Classification: a random forest classifier is trained on the corresponding feature vectors to yield a program that can be used for automatic executable binary authorship attribution.

      You can download the code at https://github.com/calaylin/bda.

      Disassembly

      The disassembly step runs the binary through two different disassemblers: the netwide disassembler (ndisasm), which does simple instruction decoding, and the radare2 state-of-the-art open source disassembler, which also understands the executable binary format. Using radare2 it is possible to extract symbols, strings, functions, and control flow graphs.

      Information provided by the two disassemblers is combined to obtain our disassembly feature set as follows: we tokenize the instruction traces of both disassemblers and extract token uni-grams, bi-grams, and tri-grams within a single line of assembly, and 6-grams, which span two consecutive lines of assembly… In addition, we extract single basic blocks of radare2’s control flow graphs, as well as pairs of basic blocks connected by control flow.

      Decompilation

      Decompilation is done using the Hex-Rays commercial state-of-the-art decompiler, which produces human readable C-like pseudo-code. This code may be much longer than the original source code (e.g. decompiling a program that was originally 70 lines long may produce on average 900 lines of decompiled code).

      From the decompiled result, both lexical and syntactical features are extracted. Lexical features are word unigrams capturing integer types, library function names, and internal function names (when symbol table information is available). Syntactical features are obtained by passing the code to the joern fuzzy parser and deriving features from the resulting AST.

      Dimensionality reduction

      Following steps one and two, a large number of features can be generated (e.g., 705,000 features from 900 executable binary samples taken across 100 different programmers). A first level of dimensionality reduction is applied using WEKA’s information gain attribute selection criteria, and then a second level of reduction is applied using correlation based feature selection. The end result for the 900 binary samples is a set of 53 predictive features.

      Classification

      Classification is done using random forests with 500 trees. Data is stratified by author analysed using k-fold cross-validation, where k is equal to the number of available code samples per author.

      Evaluation results

      The main evaluation is performed using submission to the annual Google Code Jam competition, in which thousands of programmers take part each year. “We focus our analysis on compiled C++ code, the most popular programming language used in the competition. We collect the solutions from the years 2008 to 2014 along with author names and problem identifiers.”

      Datasets are created using gcc and g++, using each of O1, O2, and O3 optimisation flags (so six datasets in all). The resulting datasets contain 900 executable binary samples from 100 different authors. As we saw before, the authors are able to reduce the feature set down to 53 predictive features.

      To examine the potential for overfitting, we consider the ability of this feature set to generalize to a different set of programmers, and show that it does so, further supporting our belief that these features effectively capture programming style. Features that are highly predictive of authorial fingerprints include file and stream operations along with the formats and initializations of variables from the domain of ASTs, whereas arithmetic, logic, and stack operations are the most distinguishing ones among the assembly instructions.

      Without optimisation enabled, the random forest is able to correctly classify 900 test instances with 95% accuracy. Furthermore, given just a single sample of code (for training) from a given author, the author can be identified out of a pool of 100 candidates with 65% accuracy.

      The classifier also reaches a point of dramatically diminishing returns with as few as three training samples, and obtains a stable accuracy by training on 6 samples. Given the complexity of the task, this combination of high accuracy with extremely low requirement on training data is remarkable, and suggests the robustness of our features and method.

      The technique continues to work well as the candidate pool size grows:

      Turning up the difficulty level

      Programming style is preserved to a great extent even under the most aggressive level 3 optimisations:

      …programmers of optimized executable binaries can be de-anonymized, and optimization is not a highly effective code anonymization method.

      Fully stripping symbol information reduces classification accuracy by 24%, so even removing symbols is not an effective form of anonymisation.

      For the  pièce de résistance the authors use Obfuscator-LLVM and apply all three of its obfuscation techniques (instruction substitution, introducing bogus control flow, flattening control flow graphs). And the result? “Using the same features as before, we obtain an accuracy of 88% in correctly classifying authors.”

      … while we show that our method is capable of dealing with simple binary obfuscation techniques, we do not consider binaries that are heavily obfuscated to hinder reverse engineering.

      So you want to stay anonymous?

      If you really do want to remain anonymous, you’d better plan for that from the very beginning of your programming career, and even then it doesn’t look easy! Here are the conditions recommended by the authors:

      • Do not have any public repositories
      • Don’t release multiple programs using the same online identity
      • Try to have a different coding style (!) in each piece of software you write, and try to code in different programming languages.
      • Use different optimisations and obfuscations to avoid deterministic patterns

      Another suggestion that comes to mind is to use an obfuscater deliberately designed to prevent reverse engineering. Although since these weren’t tested, we don’t actually know how effective that will be.

      A programmer who accomplishes randomness across all potential identifying factors would be very difficult to deanonymize. Nevertheless, even the most privacy savvy developer might be willing to contribute to open source software or build a reputation for her identity based on her set of products, which would be a challenge for maintaining anonymity.

  11. clacke (clacke@social.heldscal.la)'s status on Wednesday, 30-May-2018 14:01:39 EDT clacke clacke
    • notklaatu
    • vaeringjar-highone
    @vaeringjar @klaatu Something may be happening. I saw people update the build instructions on the wiki, so I went to the mailing list for the darwinbuild project (a part of PureDarwin) and:

    https://lists.macosforge.org/pipermail/darwinbuild-dev/2017-April/000710.html
    In conversation Wednesday, 30-May-2018 14:01:39 EDT from social.heldscal.la permalink
  12. clacke (clacke@social.heldscal.la)'s status on Wednesday, 30-May-2018 11:28:44 EDT clacke clacke
    in reply to
    • clacke
    • Vale@ganbara.nai
    @valerauko I want more giggle girl CEOs.
    In conversation Wednesday, 30-May-2018 11:28:44 EDT from social.heldscal.la permalink
  13. clacke (clacke@social.heldscal.la)'s status on Wednesday, 30-May-2018 11:28:31 EDT clacke clacke
    • Vale@ganbara.nai
    @valerauko Neither do It. And I think that's great.
    In conversation Wednesday, 30-May-2018 11:28:31 EDT from social.heldscal.la permalink
  14. clacke (clacke@social.heldscal.la)'s status on Wednesday, 30-May-2018 11:03:52 EDT clacke clacke
    • notklaatu
    • vaeringjar-highone
    @vaeringjar @klaatu I don't know if NixOS/Darwin is interesting for anything other than build servers. But it is very interesting for that.
    In conversation Wednesday, 30-May-2018 11:03:52 EDT from social.heldscal.la permalink
  15. clacke (clacke@social.heldscal.la)'s status on Wednesday, 30-May-2018 10:57:14 EDT clacke clacke
    I love to live in a world where the CEO of a significant company like Travis CI can have a twitter handle like https://twitter.com/gigglegirl4e .
    In conversation Wednesday, 30-May-2018 10:57:14 EDT from social.heldscal.la permalink
  • After
  • Before
  • Help
  • About
  • FAQ
  • TOS
  • Privacy
  • Source
  • Version
  • Contact

Jonkman Microblog is a social network, courtesy of SOBAC Microcomputer Services. It runs on GNU social, version 1.2.0-beta5, available under the GNU Affero General Public License.

Creative Commons Attribution 3.0 All Jonkman Microblog content and data are available under the Creative Commons Attribution 3.0 license.

Switch to desktop site layout.