Reviews and Comments

Jacob T.

jacob@knowledgehub.social

Joined 1 year, 4 months ago

This link opens in a pop-up window

Did you know that if you change a single bit from 1 to 0 (or …

I loved that they put in the work and it paid off!

5 stars

While there are some more crazy theoretical works out there, this talk showed how they did the work and it paid off on something not theoretically new. Basically they built a bit-squatting system that would handle DNS, SSL reg, and HTTP/IMAP/SMTP for a domain 1-bit off of the target (e.g., coogle.com instead of google.com). This technique has been around for years, but it's been very crufty, and mostly just done to do a talk. These folks spent a lot of time investing into the tooling, and they showed how quickly it paid off, 1000s of OAuth creds for F500 companies, 15k emails with scanned documents, etc.

They assumed that they'd see more hits during the solar storm, but didn't see anything, which they found correlated with a paper that seems to saw that cosmic rays are not the cause of in-memory bit-flips. They also spend a bit of time discussing …

Discussion of AI and its applications to security seems unavoidable nowadays, and, alas, this keynote …

A nice summary of the space

4 stars

As someone who sees a lot of LLM & security research, this keynote is a nice summary of where LLMs will likely (or have already) add value, and where they will never help, regardless of LLM ability.

In short, using LLMs to generate inputs is orders of magnitude too slow to outpace the shear speed of random/semi-random mutation. Using LLMs to generate fuzzing harnesses, and to build generator logic that generates inputs will pay off, LLMs can ingest specs, code, and revise their output to get around coverage blocks.

TCP spoofing—the attack to establish an IP-spoofed TCP connection by bruteforcing a 32-bit server-chosen initial …

Eye opening *and* clever

4 stars

Going into this read, I figured that IP spoofing was of niche availability and applicability, especially in our TLS-dominated world. However, federated services such as SMTP, or database replication commonly use IP addresses for validation.

There are two core new discoveries here, a TCP stack weakness that results in dramatically smaller search spaces to brute-force the correct ISN to continue a TCP session (as few as four guesses!), and a few techniques for determining the ISN outright. Of these, the application-specific ones are cute and reliable. SMTP is the easiest to explain, but if you host your own DNS server for an attacker-controlled domain name, you can spoof a handshake that includes a "HELO .attacker.com". Once you get a hit on that DNS server, you have the correct ISN and can continue the session. Coupled with SPF records which specify which IPs/domains can send email on behalf of a domain, …

Deep learning models are often trained on distributed, web-scale datasets crawled from the internet. In …

Empirical evidence of LLM attacker economics

5 stars

With the race to collect and train on ever more data (and re-train on the latest data more quickly), the ability for LLM creators to perform even cursory checks against training set corruption is almost nil. This paper shows two ways an attacker can corrupt 0.01-1% of a LLM training dataset for a reasonable sum. Existing works have shown that for a specific desired error state, a 0.01% training data poisoning attack can yield a 60-90% chance of tampering with model performance.

There are two core primitives presented in this paper: 1. The corpi release a metadata archive of URLs, and then the fetched content. There are enough expired domains in the metadata that allows for an attacker to corrupt a percentage of the URLs being scraped. 2. Wikipedia is converted into a timestamped dump (e.g., a ZIM file) in a predictable order, and on a predictable schedule. By changing …

The high energy costs of neural network training and inference led to the use of …

A devious optimization goal

5 stars

This paper explores input to DNN models that cause an asymptotic increase in power usage or timing. By using genetic algorithms in a white-box setting, the researchers could find image and text inputs that would drive up inference effort.

The results were impressive, causing a 6000x slowdown on a hosted Azure translation model.

The contact-free sensing nature of Wi-Fi has been leveraged to achieve privacy breaches, yet existing …

Pretty amazing accuracy for a eaves-droppable side-channel

5 stars

This paper explores recovering victim key-presses through a Wi-Fi data channel know as Beam-forming Feedback Information. BFI is used to help wireless APs adjust their beam-forming TX to improve performance, but BFI contains data correlated by changes in device orientation, and the attenuation from nearby movement (e.g., fingers on keyboard). By training a NN, the researchers were able to recover numeric key-presses (from a numeric keyboard) with ~88% accuracy across a variety of devices.

Pretty impressive, and shows how difficult it is to account for side-channels across all the layers of the stack when it's relatively easy to train a very sensitive ML model to extract a tiny signal from the noise.

Sparks (2023, Oxford University Press, Incorporated) 5 stars

Powerful

5 stars

A frank look into the long-term, and ongoing rewriting of history by the CCP, as well as the brave few who manage to continue to document and catalog the past as a time capsule for future generations.

Amazing how one of the largest man-made mass deaths is little known, and almost never studied in Western history. This book opens more questions than it answers, but helpfully comes with a guide for where to go next to learn more.

Not particularly cheery, but there is a glimmer of hope that shines throughout.

It has long been established that predictive models can be transformed into lossless compressors and …

Really nice way to formalize a collective intuition

4 stars

This paper formally equates (lossless) compression algorithms with LLM/learning. While the Hutter Prize has postulated the connection, this paper shows how an LLM can act as a better compressor for multi-modality data than the domain specific standards of today. The authors also use the gzip compression algorithm as a generative model, with rather poor success, but build a mathematical framework to build on.

The paper also covers tokenization as compression, which is something that's been lacking in a lot of other scientific discourse on this subject. Overall a nice read, 4* only because it ends abruptly without fully exploring the space of compressors as generative models.

We present IPvSeeYou, a privacy attack that permits a remote and unprivileged adversary to physically …

Simple concept, powerful results

4 stars

Basically this research combines a legacy IPv6 addressing scheme where the MAC address is put into the address with crowd-sourced WiFi network scanning geo-location databases. The trick is figuring out the delta in MACs between the WAN and WLAN adaptors, but they are usually close, so with some clustering, they were able to get 39m accuracy for ~12M routers in over 100 countries.

Crazy to think putting a MAC address in a world-routing IP address was ever considered a good idea, but with networking gears' long life cycle, it will be a long-lasting mistake!

Really exciting research that could transform digital spaces

5 stars

zk-creds shows (and builds a Rust proof-of-concept for) how to use zero-knowledge proofs as a flexible and privacy-preserving identity framework. The core concepts allows for ZK linking of related proofs, and blinding those, while allowing for dynamic attestations. Practically, this allows for someone to scan the NFC data on their passport, signed by a trusted entity, and use that to e.g., prove that they are above a certain age, or are a person of X citizenship, all without having to get the trusted entity to onboard into the identity system.

The ability to set the gadgets that compute on identity attributes after the fact allows for changes to age verification policies, or other checks that all operate without revealing any other aspects of the identity (e.g. DOB or name). This work could form the basis for anonymous, but only-human social networks or other systems that use identity as a proxy …

We present a practical method to achieve timelock encryption, where a ciphertext is guaranteed to …

A breakthrough if it withstands scrutiny

5 stars

This paper (and the associated code/service: timevault.drand.love/) may be one of the most/only valuable contributions to come from the entire web3 ecosystem. The ability to commit to a future decryption time is a powerful primitive, such as in auctions, coordinated disclosure, and other "dead man's switch" scenarios.

I look forward to this work being critiqued and built-upon for a whole host of interesting offerings.