Dans Mon Catalogue

Jacob T. reviewed Flipping Bits: Your Credentials Are Certainly Mine by STÖK

No cover — Flipping Bits: Your Credentials Are Certainly Mine (2024, SEC-T 2024)

Did you know that if you change a single bit from 1 to 0 (or …

I loved that they put in the work and it paid off!

5 stars

While there are some more crazy theoretical works out there, this talk showed how they did the work and it paid off on something not theoretically new. Basically they built a bit-squatting system that would handle DNS, SSL reg, and HTTP/IMAP/SMTP for a domain 1-bit off of the target (e.g., coogle.com instead of google.com). This technique has been around for years, but it's been very crufty, and mostly just done to do a talk. These folks spent a lot of time investing into the tooling, and they showed how quickly it paid off, 1000s of OAuth creds for F500 companies, 15k emails with scanned documents, etc.

They assumed that they'd see more hits during the solar storm, but didn't see anything, which they found correlated with a paper that seems to saw that cosmic rays are not the cause of in-memory bit-flips. They also spend a bit of time discussing …

While there are some more crazy theoretical works out there, this talk showed how they did the work and it paid off on something not theoretically new. Basically they built a bit-squatting system that would handle DNS, SSL reg, and HTTP/IMAP/SMTP for a domain 1-bit off of the target (e.g., coogle.com instead of google.com). This technique has been around for years, but it's been very crufty, and mostly just done to do a talk. These folks spent a lot of time investing into the tooling, and they showed how quickly it paid off, 1000s of OAuth creds for F500 companies, 15k emails with scanned documents, etc.

They assumed that they'd see more hits during the solar storm, but didn't see anything, which they found correlated with a paper that seems to saw that cosmic rays are not the cause of in-memory bit-flips. They also spend a bit of time discussing defenses (from AWS' approach of buying every flipped domain), to cert pinning and other app-based defenses.

Jacob T. reviewed [Keynote] Is "AI" useful for fuzzing? by Brendan Dolan-Gavitt

Discussion of AI and its applications to security seems unavoidable nowadays, and, alas, this keynote …

A nice summary of the space

4 stars

As someone who sees a lot of LLM & security research, this keynote is a nice summary of where LLMs will likely (or have already) add value, and where they will never help, regardless of LLM ability.

In short, using LLMs to generate inputs is orders of magnitude too slow to outpace the shear speed of random/semi-random mutation. Using LLMs to generate fuzzing harnesses, and to build generator logic that generates inputs will pay off, LLMs can ingest specs, code, and revise their output to get around coverage blocks.

Jacob T. finished reading [Keynote] Reasons for the Unreasonable Success of Fuzzing by Thomas Dullien

The hacker culture of my youth (90s) was a very typical male-centric teenage subculture, with …

A nice talk that blends the personal story of the speaker with a prediction about where investments in fuzzing will go to maximize their ROI.

Jacob T. reviewed TCP Spoofing: Reliable Payload Transmission Past the Spoofed TCP Handshake by Yepeng Pan

TCP spoofing—the attack to establish an IP-spoofed TCP connection by bruteforcing a 32-bit server-chosen initial …

Eye opening and clever

4 stars

Going into this read, I figured that IP spoofing was of niche availability and applicability, especially in our TLS-dominated world. However, federated services such as SMTP, or database replication commonly use IP addresses for validation.

There are two core new discoveries here, a TCP stack weakness that results in dramatically smaller search spaces to brute-force the correct ISN to continue a TCP session (as few as four guesses!), and a few techniques for determining the ISN outright. Of these, the application-specific ones are cute and reliable. SMTP is the easiest to explain, but if you host your own DNS server for an attacker-controlled domain name, you can spoof a handshake that includes a "HELO .attacker.com". Once you get a hit on that DNS server, you have the correct ISN and can continue the session. Coupled with SPF records which specify which IPs/domains can send email on behalf of a domain, …

Going into this read, I figured that IP spoofing was of niche availability and applicability, especially in our TLS-dominated world. However, federated services such as SMTP, or database replication commonly use IP addresses for validation.

There are two core new discoveries here, a TCP stack weakness that results in dramatically smaller search spaces to brute-force the correct ISN to continue a TCP session (as few as four guesses!), and a few techniques for determining the ISN outright. Of these, the application-specific ones are cute and reliable. SMTP is the easiest to explain, but if you host your own DNS server for an attacker-controlled domain name, you can spoof a handshake that includes a "HELO .attacker.com". Once you get a hit on that DNS server, you have the correct ISN and can continue the session. Coupled with SPF records which specify which IPs/domains can send email on behalf of a domain, it's a powerful phishing/spam primitive.

The brute-forcing and generic TCP stack techniques were clever, but were a bit more difficult to understand from the paper, and likely less robust in real-world network scenarios. One required maintaining a server's connect queue at close to a buffer limit, which may be difficult with other legitimate traffic. Still a good read!

Jacob T. reviewed Poisoning Web-Scale Training Datasets is Practical by Nicholas Carlini

Deep learning models are often trained on distributed, web-scale datasets crawled from the internet. In …

Empirical evidence of LLM attacker economics

5 stars

With the race to collect and train on ever more data (and re-train on the latest data more quickly), the ability for LLM creators to perform even cursory checks against training set corruption is almost nil. This paper shows two ways an attacker can corrupt 0.01-1% of a LLM training dataset for a reasonable sum. Existing works have shown that for a specific desired error state, a 0.01% training data poisoning attack can yield a 60-90% chance of tampering with model performance.

There are two core primitives presented in this paper: 1. The corpi release a metadata archive of URLs, and then the fetched content. There are enough expired domains in the metadata that allows for an attacker to corrupt a percentage of the URLs being scraped. 2. Wikipedia is converted into a timestamped dump (e.g., a ZIM file) in a predictable order, and on a predictable schedule. By changing …

With the race to collect and train on ever more data (and re-train on the latest data more quickly), the ability for LLM creators to perform even cursory checks against training set corruption is almost nil. This paper shows two ways an attacker can corrupt 0.01-1% of a LLM training dataset for a reasonable sum. Existing works have shown that for a specific desired error state, a 0.01% training data poisoning attack can yield a 60-90% chance of tampering with model performance.

There are two core primitives presented in this paper: 1. The corpi release a metadata archive of URLs, and then the fetched content. There are enough expired domains in the metadata that allows for an attacker to corrupt a percentage of the URLs being scraped. 2. Wikipedia is converted into a timestamped dump (e.g., a ZIM file) in a predictable order, and on a predictable schedule. By changing Wikipedia articles just before archival, even if they are reverted by attentive editors, they will persist in the dump. The authors estimate that they could alter ~6.5% of Wiki articles during this process.

Jacob T. reviewed Sponge Examples: Energy-Latency Attacks on Neural Networks by Ilia Shumailov

The high energy costs of neural network training and inference led to the use of …

A devious optimization goal

5 stars

This paper explores input to DNN models that cause an asymptotic increase in power usage or timing. By using genetic algorithms in a white-box setting, the researchers could find image and text inputs that would drive up inference effort.

The results were impressive, causing a 6000x slowdown on a hosted Azure translation model.

Jacob T. reviewed Password-Stealing without Hacking: Wi-Fi Enabled Practical Keystroke Eavesdropping by Jingyang Hu

The contact-free sensing nature of Wi-Fi has been leveraged to achieve privacy breaches, yet existing …

Pretty amazing accuracy for a eaves-droppable side-channel

5 stars

This paper explores recovering victim key-presses through a Wi-Fi data channel know as Beam-forming Feedback Information. BFI is used to help wireless APs adjust their beam-forming TX to improve performance, but BFI contains data correlated by changes in device orientation, and the attenuation from nearby movement (e.g., fingers on keyboard). By training a NN, the researchers were able to recover numeric key-presses (from a numeric keyboard) with ~88% accuracy across a variety of devices.

Pretty impressive, and shows how difficult it is to account for side-channels across all the layers of the stack when it's relatively easy to train a very sensitive ML model to extract a tiny signal from the noise.

Jacob T. finished reading On the use of compression algorithms for network anomaly detection by Christian Callegari

Short easy read comparing three different compression algorithms for their performance in detecting suspicious log data from the DARPA '99 dataset.

Jacob T. reviewed Sparks by Ian Johnson

Sparks (2023, Oxford University Press, Incorporated)

Powerful

5 stars

A frank look into the long-term, and ongoing rewriting of history by the CCP, as well as the brave few who manage to continue to document and catalog the past as a time capsule for future generations.

Amazing how one of the largest man-made mass deaths is little known, and almost never studied in Western history. This book opens more questions than it answers, but helpfully comes with a guide for where to go next to learn more.

Not particularly cheery, but there is a glimmer of hope that shines throughout.

Jacob T. finished reading Skunk Works by Ben R. Rich;Leo Janos

Skunk Works (AudiobookFormat, 2015, Hachette Audio and Blackstone Audio)

A good story, obviously a rosy view, but nice to see how things were done so long ago. Thanks to @casey for the recommendation, and a nice reminder about how regulatory scar-tissue will be the death of any DoD-connected innovation centers.

Jacob T. started reading Skunk Works by Ben R. Rich;Leo Janos

Another @casey recommendation

Jacob T. reviewed Language Modeling Is Compression by Grégoire Delétang

It has long been established that predictive models can be transformed into lossless compressors and …

Really nice way to formalize a collective intuition

4 stars

This paper formally equates (lossless) compression algorithms with LLM/learning. While the Hutter Prize has postulated the connection, this paper shows how an LLM can act as a better compressor for multi-modality data than the domain specific standards of today. The authors also use the gzip compression algorithm as a generative model, with rather poor success, but build a mathematical framework to build on.

The paper also covers tokenization as compression, which is something that's been lacking in a lot of other scientific discourse on this subject. Overall a nice read, 4* only because it ends abruptly without fully exploring the space of compressors as generative models.

Jacob T. reviewed IPvSeeYou: Exploiting Leaked Identifiers in IPv6 for Street-Level Geolocation by Erik Rye

We present IPvSeeYou, a privacy attack that permits a remote and unprivileged adversary to physically …

Simple concept, powerful results

4 stars

Basically this research combines a legacy IPv6 addressing scheme where the MAC address is put into the address with crowd-sourced WiFi network scanning geo-location databases. The trick is figuring out the delta in MACs between the WAN and WLAN adaptors, but they are usually close, so with some clustering, they were able to get 39m accuracy for ~12M routers in over 100 countries.

Crazy to think putting a MAC address in a world-routing IP address was ever considered a good idea, but with networking gears' long life cycle, it will be a long-lasting mistake!

Jacob T. reviewed zk-creds: Flexible Anonymous Credentials from zkSNARKs and Existing Identity Infrastructure by Christina Garman

N/A

Really exciting research that could transform digital spaces

5 stars

zk-creds shows (and builds a Rust proof-of-concept for) how to use zero-knowledge proofs as a flexible and privacy-preserving identity framework. The core concepts allows for ZK linking of related proofs, and blinding those, while allowing for dynamic attestations. Practically, this allows for someone to scan the NFC data on their passport, signed by a trusted entity, and use that to e.g., prove that they are above a certain age, or are a person of X citizenship, all without having to get the trusted entity to onboard into the identity system.

The ability to set the gadgets that compute on identity attributes after the fact allows for changes to age verification policies, or other checks that all operate without revealing any other aspects of the identity (e.g. DOB or name). This work could form the basis for anonymous, but only-human social networks or other systems that use identity as a proxy …

zk-creds shows (and builds a Rust proof-of-concept for) how to use zero-knowledge proofs as a flexible and privacy-preserving identity framework. The core concepts allows for ZK linking of related proofs, and blinding those, while allowing for dynamic attestations. Practically, this allows for someone to scan the NFC data on their passport, signed by a trusted entity, and use that to e.g., prove that they are above a certain age, or are a person of X citizenship, all without having to get the trusted entity to onboard into the identity system.

The ability to set the gadgets that compute on identity attributes after the fact allows for changes to age verification policies, or other checks that all operate without revealing any other aspects of the identity (e.g. DOB or name). This work could form the basis for anonymous, but only-human social networks or other systems that use identity as a proxy for bot defense.

A great read, and comes with a proof-of-concept for a practical system based on scanning passports and ZK proving that the signature on it is valid.

Jacob T. reviewed tlock: Practical timelock encryption based on threshold BLS by Nicolas Gailly

We present a practical method to achieve timelock encryption, where a ciphertext is guaranteed to …

A breakthrough if it withstands scrutiny

5 stars

This paper (and the associated code/service: timevault.drand.love/) may be one of the most/only valuable contributions to come from the entire web3 ecosystem. The ability to commit to a future decryption time is a powerful primitive, such as in auctions, coordinated disclosure, and other "dead man's switch" scenarios.

I look forward to this work being critiqued and built-upon for a whole host of interesting offerings.