The deriver field in Nix has always been a misfeature. It was intended to provide traceability back to the Nix expression used to create the derivation, but it doesn't actually do that (since that wasn't really possible in the pre-flakes world, without hermetic evaluation). So instead it just causes a lot of confusion when the deriver recorded in the binary cache doesn't match the local evaluation result, due to fixed-output derivations changing.
In the future, Nix will hopefully gain proper provenance tracking that will tell you exactly where a store path came from: https://github.com/NixOS/nix/pull/11749
I think Eelco has in mind a separate thing that would still be a store object field. But IMO we should not do that since derives are unique, and we should instead use the "build trace" instead, which properly handles that.
As Martin Schwaighofer has discussed, it is fine and in fact good for build traces entries to have arbitrary meta data, so the "claims" being cryptographically signed are more precise. (This is good for auditing, and if something looks suspicious, having full accountability.)
So on that grounds, if eelco would like to include some "this came from this flake" information as informal metadata. (formally the key must still the resolved derivation.) That is fine with me.
---
As I linked in my other reply, see my fast-growing https://github.com/NixOS/nix/pull/14408 docs PR where I try to formally nail all this stuff down for the first time.
I mentioned another alternative to adding flake-specific metadata to data structures that are transferred over the network, as part of the signed traces or otherwise, in a comment on that PR Eelco linked.
It's keeping flake-specific data locally, to guarantee that it matches how the user ended up with the data, not how the builder produced it. I think otherwise from the user POV such data could again look misleading.
Good point. It is misleading if different flakes end up producing the same derivation, and we don't want to resign our build trace entry to account for that (which would amplify reads into writes). Separate indirection for this eval->store layer accounting sounds good.
+1 to Farid, great write-up! What you’re seeing is the long-standing “deriver” mismatch: fixed-output derivations can change their .drv without changing the output path. Eelco is calling it out as well in the comment below. I believe the idea behind the path forward is there but happy to hear more!
If I understand this correctly, upcoming Ca-derivations will fix this by making these situations expected, properly-handled cases rather than a weird bug? https://nixos.wiki/wiki/Ca-derivations
Yes, a hope of mine is that we can stop using "hash derivation modulo" entirely.
I've recently started some fancy formal spec-level documentation here https://github.com/NixOS/nix/pull/14408 The "resolution" equivalence class is both simpler and better than the "hash derivation modulo ..." one.
(The fact that it is a mouthful to say what the derivations are modulo kinda gives the game away! I put "hash quotient derivation" in the docs to side-step the issue.)
To be clear, there is no bug here: derivers are simply not uniquely determined in the presence of fixed-output derivations, which is by design. That's even more true with CA derivations.
CA derivations also introduce the opposite situation, namely that the same derivation can produce different output paths for different users (if the build is not bitwise reproducible).
It's both, multiple derivations can produce the same (content-addressed) store object, and the derivations may not be reproducible and produce different (content-addressed) store objects each time.
The reality of executing arbitrary programs on non-deterministic computers is, unfortunately, N:M!
(Cue deterministic WASM derivations or something.)
ca-derivations from what i understand, fixed-output derivations but more general.
The point of the article to me (author) was that i found it odd that Nix replaces the derivations when calculating the output path but not the derivation path.
(talking about "paths" in Nix is so hard!)
The core store layer is quite small, and I am trying to thoroughly document it, with all 3 of:
- a more "academic" spec of what it does
- nuts-and-bolts JSON schema for many data types
- JSON golden tests instead of C++ literals in the unit tests as often as possible.
I hope this will make additional store layer easy to churn out.
(The "hash derivation modulo" that is so fiddly described in this blog post can be dropped in a world where we no longer have input addressing, and just have content-addressing. Or, in a world where we have a new, simpler type of input-addressing instead.)
Well, there's Guix as an alternative if you want a similar concept but different implementation philosophy. For me the major disadvantage of Guix is lack of package availability compared to Nix.
it's primary for every human involved, also, the way you check whether it's changed is by automatically comparing that full hash, not its starting symbols, so you don't care where in the full string it's positioned
> The semantic is "what did this configuration generate", not "what's this package's version".
Then why have the name/version at all like in those nameless cache dirs?
I don't care how many rubies I have, except for disk space, which I clean up regularly, so it's a bit moot.
I actually don't look at the package names either as much as I look at the number of hashes, which I find easy to eyeball.
Quite frankly, I don't really look at the paths anyway (on any kind of regular basis). I just know that when I've looked at them, the hash vs package name thing made sense to me because of the configuration -> result relationship. :)
Edit: oh, when I said I'm pretty sure I'm human, I meant "I'm human too but I don't seem to be seeing things the same way you do".
> I don't care how many rubies I have, except for disk space, which I clean up regularly, so it's a bit moot.
So you do care about how many rubies you have (one of the nix issues is indeed its size), especially if it's not a ruby but some bigger dependency. Your solution is doing regular cleanup, another option would be to casually notice while browsing in a file manager or even clicking the "size" column, in which case reading left to right from the name would help noticing the dupes and maybe doing something about it.
> Quite frankly, I don't really look at the paths anyway
So you were just arguing for the fun of it based on a superficial theory?
> I'm human too but I don't seem to be seeing things the same way you do
Yeah you do, you read left to right and there is no way you read "sadlfkjasdlfwroiupdfoser" as well as you read "ruby-1.2.3". Though since you don't actually read that you don't care about it, that's also human, though not the level of human that matters for this argument
No, I care about how many leftover rebuilds I have that I no longer use (typically all of them). Couldn't care less about any individual packages because I leave it to Nix to know what should be installed and what shouldn't.
I don't casually browse through the stores because I have no reason to.
> So you were just arguing for the fun of it based on a superficial theory?
Arguing? That's not what I'm doing, but maybe it's how you feel. Your initial post was a question. I replied to it. I guess your question was rhetorical, based on your responses to my comments.
I was giving you my perspective.
My various dealings with the paths comes from various adventures of debugging why my configs didn't produce what I thought (eg things not in path). It's also probably why I see the relationship as starting with config and ending with path on disk.
I have never gone on fishing expeditions around store paths. When I go out of my homedir and "root" fs, I know what hash I want from looking at a symlink, or some log output.
In nix packages (derivations) are so lightweight that your store has tens of thousands of them, many with the same name, or with no meaningful name at all. On the rare occasions that you need to look in the store for a package you’re much more likely to be looking for a particular hash than a particular name. That, and having the hash as a prefix looks nicer in tabular output.
> 2. listing the contents of the store directory would not be allowed
Wow, that's awful, that's what Windows AppStore does, so it's even hard to see how much of the preinstalled garbage there is or even whether you might have a huge game you forgot to uninstall but might want to to free up some space.
What's the cool benefit that could justify this limitation?
Nothing should rely on how store paths are named, ever. Like, there is actually no reason to know that hash 1234abc is a certain output of derivation xyz-12.1.0. The contents of the store can be garbage-collected at any point. So you actively do not want things outside the Nix store (or managed by NixOS tools, or Nix-aware tools) referencing paths in /nix/store.
If you do something like write a config file that references /nix/store/1234abc-xyz-12.1.0/bin/xyz, that config file will break the next time you update the derivation that produces that path. Again, this makes knowing what things are in the store completely pointless unless you are writing Nix-aware tooling or debugging, in which case there are tools to show you what path your derivation produced. But you should never need to do the opposite, which is to resolve which derivation produced a path in /nix/store/.
The Windows Store problem is completely orthogonal; paths in /nix/store are not "installed" on your system, they are derivations or outputs of Nix derivations. NixOS "installs" things by adding some of these to your PATH in a shell script that is also a derivation output in /nix/store.
It really doesn't matter. As a normal user, you don't use `drv` files directly, and everything you configure yourself will use attribute paths in nixpkgs. E.g. `pkgs.ruby` or `pkgs.ruby_3_3`.
That name is only there for debugging purposes. It doesn't actually mean anything and you only ever need to look at it to debug some hoary failing build.
The reason it's like this is because the only way to reliably grab it is to cut the string at the first hyphen - then the rest can be almost free text.
It you do it the other way it's harder. You can try this with nix commands /nix/store/<hash>-x is a valid way to refer to something in the store most of the time.
The deriver field in Nix has always been a misfeature. It was intended to provide traceability back to the Nix expression used to create the derivation, but it doesn't actually do that (since that wasn't really possible in the pre-flakes world, without hermetic evaluation). So instead it just causes a lot of confusion when the deriver recorded in the binary cache doesn't match the local evaluation result, due to fixed-output derivations changing.
In the future, Nix will hopefully gain proper provenance tracking that will tell you exactly where a store path came from: https://github.com/NixOS/nix/pull/11749
The biggest problem of all is that derivers are not unique! A separate "build trace" map will solve this.
Presumably this would support a big improvement to both SBOM generation as well as various UX features and workflow improvements.
is that the 'build-trace' feature I saw John write about ? (I want to explore that more)
I think Eelco has in mind a separate thing that would still be a store object field. But IMO we should not do that since derives are unique, and we should instead use the "build trace" instead, which properly handles that.
As Martin Schwaighofer has discussed, it is fine and in fact good for build traces entries to have arbitrary meta data, so the "claims" being cryptographically signed are more precise. (This is good for auditing, and if something looks suspicious, having full accountability.)
So on that grounds, if eelco would like to include some "this came from this flake" information as informal metadata. (formally the key must still the resolved derivation.) That is fine with me.
---
As I linked in my other reply, see my fast-growing https://github.com/NixOS/nix/pull/14408 docs PR where I try to formally nail all this stuff down for the first time.
I mentioned another alternative to adding flake-specific metadata to data structures that are transferred over the network, as part of the signed traces or otherwise, in a comment on that PR Eelco linked.
It's keeping flake-specific data locally, to guarantee that it matches how the user ended up with the data, not how the builder produced it. I think otherwise from the user POV such data could again look misleading.
Good point. It is misleading if different flakes end up producing the same derivation, and we don't want to resign our build trace entry to account for that (which would amplify reads into writes). Separate indirection for this eval->store layer accounting sounds good.
[dead]
+1 to Farid, great write-up! What you’re seeing is the long-standing “deriver” mismatch: fixed-output derivations can change their .drv without changing the output path. Eelco is calling it out as well in the comment below. I believe the idea behind the path forward is there but happy to hear more!
Also. Check out Farid's other posts.
If I understand this correctly, upcoming Ca-derivations will fix this by making these situations expected, properly-handled cases rather than a weird bug? https://nixos.wiki/wiki/Ca-derivations
Yes, a hope of mine is that we can stop using "hash derivation modulo" entirely.
I've recently started some fancy formal spec-level documentation here https://github.com/NixOS/nix/pull/14408 The "resolution" equivalence class is both simpler and better than the "hash derivation modulo ..." one.
(The fact that it is a mouthful to say what the derivations are modulo kinda gives the game away! I put "hash quotient derivation" in the docs to side-step the issue.)
To be clear, there is no bug here: derivers are simply not uniquely determined in the presence of fixed-output derivations, which is by design. That's even more true with CA derivations.
CA derivations also introduce the opposite situation, namely that the same derivation can produce different output paths for different users (if the build is not bitwise reproducible).
pick your poison: 1:N or N:1 ;P
It's both, multiple derivations can produce the same (content-addressed) store object, and the derivations may not be reproducible and produce different (content-addressed) store objects each time.
The reality of executing arbitrary programs on non-deterministic computers is, unfortunately, N:M!
(Cue deterministic WASM derivations or something.)
> (Cue deterministic WASM derivations)
"Rah Rah, this is why we need deterministic wasm derivations!" - Me
(There you go Ericson) Relevant links: https://github.com/WebAssembly/design/blob/main/Nondetermini...
ca-derivations from what i understand, fixed-output derivations but more general.
The point of the article to me (author) was that i found it odd that Nix replaces the derivations when calculating the output path but not the derivation path. (talking about "paths" in Nix is so hard!)
That makes sense, thanks for clarifying. Great writeup.
> The road to Nix enlightenment is no joke and full of dragons.
Nix was a great research project. Now is the time to rewrite it from the ground up.
The core store layer is quite small, and I am trying to thoroughly document it, with all 3 of:
- a more "academic" spec of what it does
- nuts-and-bolts JSON schema for many data types
- JSON golden tests instead of C++ literals in the unit tests as often as possible.
I hope this will make additional store layer easy to churn out.
(The "hash derivation modulo" that is so fiddly described in this blog post can be dropped in a world where we no longer have input addressing, and just have content-addressing. Or, in a world where we have a new, simpler type of input-addressing instead.)
Well, there's Guix as an alternative if you want a similar concept but different implementation philosophy. For me the major disadvantage of Guix is lack of package availability compared to Nix.
AFAIK Guix uses parts of Nix as a backend.
Isn't there a way to transpile the scripts from Nix to Guix?
It's not to hard to translate manually, but since the dependency tree is massive it doesn't seem feasible to do wholesale.
[dead]
I feel the same about HCL in Terraform. The tool is perfect, the language is bollocks.
Eh. This can be applied to so many technologies that run the world..
It has been rewritten a few times already. The "fixed output hash" is a dirty optimisation hack borne out of real-world needs and not a research idea.
As a mere mortal I find none of this surprising, mostly because I never understood any of it in the first place ... :)
> nix/store/24v9wpp393ib1gllip7ic13aycbi704g-ruby-3.3.9.drv
A different type of madness, but are ugly names so common, why not start with ruby-3.3.9 so any list of files is semantically sorted/readable?
The package name is "secondary" information in this context. The hash is the primary one because it's stable unless the input changes.
The semantic is "what did this configuration generate", not "what's this package's version".
it's primary for every human involved, also, the way you check whether it's changed is by automatically comparing that full hash, not its starting symbols, so you don't care where in the full string it's positioned
> The semantic is "what did this configuration generate", not "what's this package's version".
Then why have the name/version at all like in those nameless cache dirs?
It made sense to me when I looked at it, at mount points, at when it changed vs when it didn't, etc, so IDK what to tell you.
FWIW, I'm also pretty sure I'm human.
Edit: also, I'm pretty sure that I wouldn't find it any more or less complicated if the package name came first.
> at when it changed vs when it didn't
You still have this information! Just in a way where it becomes easier to track the difference or see how many rubies you have etc
> FWIW, I'm also pretty sure I'm human.
So you do read the "ruby" name/version , not just the hash?
I don't care how many rubies I have, except for disk space, which I clean up regularly, so it's a bit moot.
I actually don't look at the package names either as much as I look at the number of hashes, which I find easy to eyeball.
Quite frankly, I don't really look at the paths anyway (on any kind of regular basis). I just know that when I've looked at them, the hash vs package name thing made sense to me because of the configuration -> result relationship. :)
Edit: oh, when I said I'm pretty sure I'm human, I meant "I'm human too but I don't seem to be seeing things the same way you do".
> I don't care how many rubies I have, except for disk space, which I clean up regularly, so it's a bit moot.
So you do care about how many rubies you have (one of the nix issues is indeed its size), especially if it's not a ruby but some bigger dependency. Your solution is doing regular cleanup, another option would be to casually notice while browsing in a file manager or even clicking the "size" column, in which case reading left to right from the name would help noticing the dupes and maybe doing something about it.
> Quite frankly, I don't really look at the paths anyway
So you were just arguing for the fun of it based on a superficial theory?
> I'm human too but I don't seem to be seeing things the same way you do
Yeah you do, you read left to right and there is no way you read "sadlfkjasdlfwroiupdfoser" as well as you read "ruby-1.2.3". Though since you don't actually read that you don't care about it, that's also human, though not the level of human that matters for this argument
> So you do care about how many rubies you have
No, I care about how many leftover rebuilds I have that I no longer use (typically all of them). Couldn't care less about any individual packages because I leave it to Nix to know what should be installed and what shouldn't.
I don't casually browse through the stores because I have no reason to.
> So you were just arguing for the fun of it based on a superficial theory?
Arguing? That's not what I'm doing, but maybe it's how you feel. Your initial post was a question. I replied to it. I guess your question was rhetorical, based on your responses to my comments.
I was giving you my perspective.
My various dealings with the paths comes from various adventures of debugging why my configs didn't produce what I thought (eg things not in path). It's also probably why I see the relationship as starting with config and ending with path on disk.
I have never gone on fishing expeditions around store paths. When I go out of my homedir and "root" fs, I know what hash I want from looking at a symlink, or some log output.
> Edit: also, I'm pretty sure that I wouldn't find it any more or less complicated if the package name came first.
rkomorn.skills.tty.tab_completion -= 1;
Yeah, okay. Super cool HN comment quality.
In nix packages (derivations) are so lightweight that your store has tens of thousands of them, many with the same name, or with no meaningful name at all. On the rare occasions that you need to look in the store for a package you’re much more likely to be looking for a particular hash than a particular name. That, and having the hash as a prefix looks nicer in tabular output.
If I had my way
1. store paths would have no names at all
2. listing the contents of the store directory would not be allowed
3. store paths have more bits of information
Then store paths are halfway decent (but non-revocable) capabilities.
> 2. listing the contents of the store directory would not be allowed
Wow, that's awful, that's what Windows AppStore does, so it's even hard to see how much of the preinstalled garbage there is or even whether you might have a huge game you forgot to uninstall but might want to to free up some space.
What's the cool benefit that could justify this limitation?
Nothing should rely on how store paths are named, ever. Like, there is actually no reason to know that hash 1234abc is a certain output of derivation xyz-12.1.0. The contents of the store can be garbage-collected at any point. So you actively do not want things outside the Nix store (or managed by NixOS tools, or Nix-aware tools) referencing paths in /nix/store.
If you do something like write a config file that references /nix/store/1234abc-xyz-12.1.0/bin/xyz, that config file will break the next time you update the derivation that produces that path. Again, this makes knowing what things are in the store completely pointless unless you are writing Nix-aware tooling or debugging, in which case there are tools to show you what path your derivation produced. But you should never need to do the opposite, which is to resolve which derivation produced a path in /nix/store/.
The Windows Store problem is completely orthogonal; paths in /nix/store are not "installed" on your system, they are derivations or outputs of Nix derivations. NixOS "installs" things by adding some of these to your PATH in a shell script that is also a derivation output in /nix/store.
Very well said, thank you!
I'm glad other people also understand that the onus of motivation is on granting some privilege, not rescinding it :)
What actually happens if you remove read permissions on the /nix/store directory? Do things still work? I suppose I'll need to try!
https://github.com/NixOS/rfcs/blob/master/rfcs/0097-no-read-... is relevant.
Oh hmm did we never implement this? We should. Both because it is a good idea, and because accepted RFCs should be implemented.
I'm not aware of it being done yet. But since the RFC is accepted it should be pretty straightforward.
How could one debug if we couldn't view contents of the store directory?
You can still read individual store objects in their entirety. You just need to know the store path for the object that you want to read.
You can still use root or something to list all the store paths. (But ideally nothing else would be running as root / with that power.)
It really doesn't matter. As a normal user, you don't use `drv` files directly, and everything you configure yourself will use attribute paths in nixpkgs. E.g. `pkgs.ruby` or `pkgs.ruby_3_3`.
It's done that way on purpose. Precisely so you don't try to use the paths semantically. The names literally mean nothing in this context.
That contradicts the simple fact that the name includes "ruby" and isn't just a hash
That name is only there for debugging purposes. It doesn't actually mean anything and you only ever need to look at it to debug some hoary failing build.
The reason it's like this is because the only way to reliably grab it is to cut the string at the first hyphen - then the rest can be almost free text.
It you do it the other way it's harder. You can try this with nix commands /nix/store/<hash>-x is a valid way to refer to something in the store most of the time.
[dead]