Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion doc/manual/redirects.json
Original file line number Diff line number Diff line change
Expand Up @@ -337,7 +337,7 @@
"string-literal": "string-literals.html"
},
"language/derivations.html": {
"builder-execution": "../store/building.html#builder-execution"
"builder-execution": "../store/building.html"
},
"installation/installing-binary.html": {
"linux": "uninstall.html#linux",
Expand Down
3 changes: 2 additions & 1 deletion doc/manual/source/SUMMARY.md.in
Original file line number Diff line number Diff line change
Expand Up @@ -19,9 +19,10 @@
- [Nix Store](store/index.md)
- [File System Object](store/file-system-object.md)
- [Content-Addressing File System Objects](store/file-system-object/content-address.md)
- [Exposing in OS File Systems](store/file-system-object/os-file-system.md)
- [Store Object](store/store-object.md)
- [Content-Addressing Store Objects](store/store-object/content-address.md)
- [Store Path](store/store-path.md)
- [Store Path and Store Directory](store/store-path.md)
- [Store Derivation and Deriving Path](store/derivation/index.md)
- [Derivation Outputs and Types of Derivations](store/derivation/outputs/index.md)
- [Content-addressing derivation outputs](store/derivation/outputs/content-address.md)
Expand Down
2 changes: 1 addition & 1 deletion doc/manual/source/glossary.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@

A derivation can be thought of as a [pure function](https://en.wikipedia.org/wiki/Pure_function) that produces new [store objects][store object] from existing store objects.

Derivations are implemented as [operating system processes that run in a sandbox](@docroot@/store/building.md#builder-execution).
Derivations are implemented as [operating system processes that run in a sandbox](@docroot@/store/building.md).
This sandbox by default only allows reading from store objects specified as inputs, and only allows writing to designated [outputs][output] to be [captured as store objects](@docroot@/store/building.md#processing-outputs).

A derivation is typically specified as a [derivation expression] in the [Nix language], and [instantiated][instantiate] to a [store derivation].
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -107,7 +107,7 @@ $defs:
type: string
title: Store Directory
description: |
The [store directory](@docroot@/store/store-path.md#store-directory) this store object belongs to (e.g. `/nix/store`).
The [path to the store directory](@docroot@/store/store-path.md#store-directory-path) this store object belongs within (e.g. `/nix/store`).
additionalProperties: false

impure:
Expand Down
2 changes: 1 addition & 1 deletion doc/manual/source/protocols/store-path.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ where

- `name` = the name of the store object.

- `store-dir` = the [store directory](@docroot@/store/store-path.md#store-directory)
- `store-dir` = the [path of the store directory](@docroot@/store/store-path.md#store-directory-path)

- `digest` = base-32 representation of the compressed to 160 bits [SHA-256] hash of `fingerprint`

Expand Down
210 changes: 148 additions & 62 deletions doc/manual/source/store/building.md
Original file line number Diff line number Diff line change
@@ -1,101 +1,187 @@
# Building

## Normalizing derivation inputs
As discussed in the [main page on derivations](./derivation/index.md):

- Each input must be [realised] prior to building the derivation in question.
> A derivation is a specification for running an executable on precisely defined input to produce one or more [store objects][store object].
[realised]: @docroot@/glossary.md#gloss-realise
This page describes *building* a derivation, which is to say following the instructions in the derivation to actually run the executable.
In some cases the derivation is self-explanatory.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
In some cases the derivation is self-explanatory.
Some elements of derivations are self-explanatory.

For example, the arguments specified in the derivation really are the arguments passed to the executable.
In other cases, however, there is additional procedure true for all derivations, which is therefore *not* specified in the derivation.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really know what "procedure" refers to.

This page specifies this invariant procedure that is true for all derivations, too.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't understand this sentence either. What's the "invariant procedure"?


The chief design consideration for the building process is *determinism*.
Conventional operating systems are typically not designed with determinism in mind.
But determinism is needed to make Nix's caching a transparent abstraction.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
But determinism is needed to make Nix's caching a transparent abstraction.
But determinism is needed to make Nix's build caching a transparent abstraction.

or maybe "substitution"


> **Explanation**
>
> For example, no one wants to slightly modify a derivation, and then find that it no longer builds for an unrelated reason, because the original derivation *also* doesn't build anymore, but the cache hit on the original derivation was hiding this.
> We want builds that once succeed to continue succeeding, to encourage fearless modification of old build recipes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
> We want builds that once succeed to continue succeeding, to encourage fearless modification of old build recipes.
> We want builds that succeed once to continue succeeding, to encourage fearless modification of old build recipes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More importantly, it ensures that something that builds on one machine to build on another machine, e.g. two developers will get the same result.

> Determinism is what enables things that once worked to keep working.
The life cycle of a build can be broken down into 3 parts:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is also step 0: build/substitute the dependencies.


1. Spawn the builder process with the proper environment, including the correct process arguments, environment variables, and file system state.

2. Wait for the standard output and error of the process to be closed and/or the process to exit.
(If the standard streams are closed but the process hasn't exited, Nix will kill the process.)

Nix also logs the standard output and error of the process, but this is just for human convenience and does not influence the behavior of the system.
(Builder processes have no idea what the consumer of their standard output and error does with those streams, only that they are indeed consumed so buffers do not fill up and writes to them will continue to succeed.)

3. Processing the outputs after the builder has exited.

- Once this is done, the derivation is *normalized*, replacing each input deriving path with its store path, which we now know from realising the input.
The builder process on exit should have left beyond files for each output the derivation is supposed to produce.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The builder process on exit should have left beyond files for each output the derivation is supposed to produce.
The builder process on exit should have left behind files for each output the derivation is supposed to produce.

Also, "files" is vague, why not say "it should have created each store path the derivation is supposed to produce".

The files must be processed to turn them into bona fide store objects.
If the processing suceeds, those store objects are associated with the derivation as (the results of) a successful build.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
If the processing suceeds, those store objects are associated with the derivation as (the results of) a successful build.
If the processing succeeds, those store objects are associated with the derivation as (the results of) a successful build.

We don't really associate them though. (There's the deriver field in the database but it's kind of useless.) This should probably say something like "the store paths are registered as valid in the Nix database".


## Builder Execution {#builder-execution}
Step (3) happens externally, with just inert data since the process has exited or been killed by then.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"externally" is vague. External to what?

Step (1) however is best described not from Nix's perspective, but from the process's perspective.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Process" is also vague. The "build process" in general, or is that "process" in the OS sense?


> **Explanation**
>
> Ultimately, what matters for determinism is the behavior of IO operations that the process attempts (whether these are successes or failures), because of how they affect the output files, and how they affect the further execution of the builder process.
> From Nix (and the operating system)'s perspective, there are many, many different ways --- different implementation strategies --- of effecting the same I/O behavior,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
> From Nix (and the operating system)'s perspective, there are many, many different ways --- different implementation strategies --- of effecting the same I/O behavior,
> From Nix (and the operating system)'s perspective, there are many, many different ways different implementation strategies of effecting the same I/O behavior.

Not sure if markdown --- produces an actual em-dash.

> But from the process's perspective, there is only one correct behavior.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH I didn't understand this paragraph.

## What derivations can be built

Actually only some derivations are ready to be built.
In particular, only [*resolved*](./resolution.md) derivations can be built.
Comment on lines +50 to +51
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should probably read "Before derivations can be built, they need to be resolved".

That is to say, a derivation that depends on other derivations is not ready yet to be built, because those other derivations might not be built.
If the other derivations are indeed built, we can witness this fact by resolving the derivation, and converting all the derivation's input references into plain store paths.

> **Note**
>
> Note that [input-addressing](derivation/outputs/input-address.md) derivations are improperly resolved.
> As discussed on the linked page, the current input-addressing algorithm does not respect resolution-equivalence of derivations (\\(\\sim_\mathrm{Drv}\\)).
> That means that if Nix properly resolved an input-addressed derivation, the resolved derivation would have different input addresses, violating expectations.
> Nix therefore improperly resolves the derivation, keeping its original input address output paths, creating an invalid derivation that is both resolved and instructed to create the outputs at the originally expected paths.
Comment on lines +57 to +60
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Improper" and "violating expectations" is a questionable choice of words, since this is how Nix has always behaved. By definition, how Nix does it, is the proper way of doing it.

Also I don't really know what "resolution-equivalence of derivations" means or what "an invalid derivation that is both resolved and instructed to create the outputs at the originally expected paths" refers to. (Also, "invalid" in the context of store paths has a specific meaning, i.e. not registered as a valid path in the Nix database. Is that what it refers to?)

## Environment of the builder process

The [`builder`](./derivation/index.md#builder) is executed as follows:

- A temporary directory is created where the build will take place. The
current directory is changed to this directory.
### File system

The builder should have access to a limited file system where only certain objects are available.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The builder should have access to a limited file system where only certain objects are available.
The builder may have access to a limited file system where only certain store objects are available.

or more specifically, "only the store paths that it depends on".

"may" because sandboxing is not required.

The most important exposed files are the inputs (other store objects) of the (resolved) derivation.
Additionally, some other files are exposed.

#### Store inputs

The builder will be run against a file system in which the [closure] of the inputs is mounted inside the [store directory][store directory path].
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The word "mounted" should be avoided because that's just one possible implementation. E.g. we don't do that on macOS or when sandboxing is disabled on Linux.

In particular, consider a store that just contains this closure.
That store may be exposed to the file system according to the rules specified in the [Exposing Store Objects in OS File Systems](./store-path.md#exposing) documentation.
This precisely defines the file system layout of the store that should be visible to the builder process.

> **Note**
>
> Historically, Nix exposed *at least* the following store contents to the builder, but also arbitrarily other store objects, due to limitations around operating systems' file system virtualization capabilities, and wanting to avoid copying or moving files.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't use the word "Historically" if it's still the case today. We're not documenting platonic ideal Nix :-)

> It still can do this in so-called *unsandboxed* builds.
>
> Such builds should be considered an unsafe extension, but one that works less badly against non-malicious derivations than might be expected.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Unsafe" is not the right word here. It's just worse for determinism/reproducibility, but then we can never guarantee those entirely anyway.

> This is because store paths are relatively unpredictable, so a well-behaved program is unlikely to stumble upon a store object it wasn't supposed to know about.
>
> As operating systems developed better file system primitives, the need for disabling sandboxing has lessened greatly over the years, and this trend should continue into the future.
[realised]: @docroot@/glossary.md#gloss-realise
[closure]: @docroot@/glossary.md#gloss-closure
[store directory path]: ./store-path.md#store-directory-path

### Other file system state

- The current working directory of the builder process will be a fresh temporary directory that is initially empty.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... except for a couple of files.


See the per-store [`build-dir`](@docroot@/store/types/local-store.md#store-local-store-build-dir) setting for more information.

- The environment is cleared and set to the derivation attributes, as
specified above.
- Basic device nodes for essential operations (null device, random number generation, standard streams as a pseudo terminal)

(A pseudo terminal would not be strictly necessary since the standard streams are passively logging, not there to facilitate interaction.
But it is still useful to entice programs to do nicer logging with e.g. colors etc.)

- On Linux: Process information via `/proc`

- Minimal user and group identity information

- In addition, the following variables are set:
- A loopback-only network configuration with hostname set to `localhost`

- `NIX_BUILD_TOP` contains the path of the temporary directory for
this build.
> **Note**
>
> Fixed-output derivations have access to additional operating system state to facilitate communication with the outside world, such as network name resolution and TLS certificate verification.
> This is necessary because these derivations are allowed to access the network, unlike regular derivations which are fully sandboxed.
- Also, `TMPDIR`, `TEMPDIR`, `TMP`, `TEMP` are set to point to the
temporary directory. This is to prevent the builder from
accidentally writing temporary files anywhere else. Doing so
might cause interference by other processes.
### Environment variables {#env-vars}

- `PATH` is set to `/path-not-set` to prevent shells from
initialising it to their built-in default value.
The environment is cleared and set to the derivation attributes, as
specified above.

- `HOME` is set to `/homeless-shelter` to prevent programs from
using `/etc/passwd` or the like to find the user's home
directory, which could cause impurity. Usually, when `HOME` is
set, it is used as the location of the home directory, even if
it points to a non-existent path.
For most derivations types this must contain at least:

- `NIX_STORE` is set to the path of the top-level Nix store
directory (typically, `/nix/store`).
- For each output declared in `outputs`, the corresponding environment variable is set to point to the intended path in the Nix store for that output.
Each output path is a concatenation of the cryptographic hash of all build inputs, the `name` attribute and the output name.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably better to say that each output path has the form <storedir>/<hash>-<drv-name>(-output-name)>, where hash is some opaque value. E.g. it could be a random value for CA builds or when rebuilding an existing derivation with --rebuild.

(The output name is omitted if it’s `out`.)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
(The output name is omitted if its `out`.)
(The output name is omitted if it's `out`.)


- `NIX_ATTRS_JSON_FILE` & `NIX_ATTRS_SH_FILE` if `__structuredAttrs`
is set to `true` for the derivation. A detailed explanation of this
behavior can be found in the
[section about structured attrs](@docroot@/language/advanced-attributes.md#adv-attr-structuredAttrs).
In addition, the following variables are set:

- For each output declared in `outputs`, the corresponding
environment variable is set to point to the intended path in the
Nix store for that output. Each output path is a concatenation
of the cryptographic hash of all build inputs, the `name`
attribute and the output name. (The output name is omitted if
it’s `out`.)
- `NIX_BUILD_TOP` contains the path of the temporary directory for this build.

- If an output path already exists, it is removed. Also, locks are
acquired to prevent multiple [Nix instances][Nix instance] from performing the same
build at the same time.
- Also, `TMPDIR`, `TEMPDIR`, `TMP`, `TEMP` are set to point to the temporary directory.
This is to prevent the builder from accidentally writing temporary files anywhere else.
Doing so might cause interference by other processes.

- A log of the combined standard output and error is written to
`/nix/var/log/nix`.
- `PATH` is set to `/path-not-set` to prevent shells from initialising it to their built-in default value.

- The builder is executed with the arguments specified by the
attribute `args`. If it exits with exit code 0, it is considered to
have succeeded.
- `HOME` is set to `/homeless-shelter`.
(Without sandboxing, this serves as "soft sandboxing" --- it discourages programs from using `/etc/passwd` or the like to find the user's home directory, which could cause impurity.)
Usually, when `HOME` is set, it is used as the location of the home directory, even if it points to a non-existent path.

- The temporary directory is removed (unless the `-K` option was
specified).
- `NIX_STORE` is set to the path of the top-level Nix [store directory path] (typically, `/nix/store`).

- `NIX_ATTRS_JSON_FILE` & `NIX_ATTRS_SH_FILE` if `__structuredAttrs` is set to `true` for the derivation.
A detailed explanation of this behavior can be found in the [section about structured attrs](@docroot@/language/advanced-attributes.md#adv-attr-structuredAttrs).

## Builder Execution

- If an output path already exists, it is removed.
Also, locks are acquired to prevent multiple [Nix instances][Nix instance] from performing the same build at the same time.

- A log of the combined standard output and error is written to `/nix/var/log/nix`.

- The builder is executed with the arguments specified by the attribute `args`.
If it exits with exit code 0, it is considered to have succeeded.

- The temporary directory is removed (unless the [`--keep-failed`](@docroot@/command-ref/opt-common.md#opt-keep-failed) option was specified).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the build succeeds it's always removed, regardless of --keep-failed..


## Processing outputs

If the builder exited successfully, the following steps happen in order to turn the output directories left behind by the builder into proper store objects:

- **Normalize the file permissions**

Nix sets the last-modified timestamp on all files
in the build result to 1 (00:00:01 1/1/1970 UTC), sets the group to
the default group, and sets the mode of the file to 0444 or 0555
(i.e., read-only, with execute permission enabled if the file was
originally executable). Any possible `setuid` and `setgid`
bits are cleared.

> **Note**
>
> Setuid and setgid programs are not currently supported by Nix.
> This is because the Nix archives used in deployment have no concept of ownership information,
> and because it makes the build result dependent on the user performing the build.
The files must conform to the model described in the [Exposing in OS file systems](./file-system-object/os-file-system.md) section.
For example, timestamps and permissions must be forced to sentinel values.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if "sentinel value" is the right term. More like "canonical value".


- **Calculate the references**

Nix scans each output path for
references to input paths by looking for the hash parts of the input
paths. Since these are potential runtime dependencies, Nix registers
them as dependencies of the output paths.
Nix scans each output path for references to input store objects by looking for the store path digests of each input.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think "hash part" is more common in our docs than "store path digest". At least I don't remember ever seeing that terminology.

(The name part is ignored when scanning; an input's hash part that is not followed by a `-` and the correct name part still scans as a reference.
Likewise, a digest not preceded by the [store directory path] also still scans as a reference.)
Since these are potential runtime dependencies, Nix will register them as references of the output store object they occur in.

Nix also scans for references to other outputs' paths in the same way, because outputs are allowed to refer to each other.
Nix also scans for references from one output to another in the same way, because outputs are allowed to refer to each other.
If the outputs' references to each other form a cycle, this is an error, because the references of store objects much be acyclic.

In the case of derivations with fixed in advance output paths (i.e. [input-addressing] derivations, or [fixed content-addressing] derivations), the actual final store path to each output is used during the build.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
In the case of derivations with fixed in advance output paths (i.e. [input-addressing] derivations, or [fixed content-addressing] derivations), the actual final store path to each output is used during the build.
In the case of derivations with output paths that are fixed in advance (i.e. [input-addressing] derivations, or [fixed content-addressing] derivations), the actual final store path to each output is used during the build.

For [floating content-addressing] derivations, however, the final store path is not known in advance by definition.
Scratch store paths must therefore be used instead.
Scanning will use those scratch paths, but then any output-to-be that contains such a scanned scratch path must be rewritten to instead use the final (content-addressed) path of the output in question.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to avoid the passive voice here, i.e. "Nix rewrites the outputs..." instead of "... must be rewritten".


At this point, the file system data is in the proper form, and the valid acyclic reference data for each output is also calculated, so the outputs can be registered as proper store objects, and associated with the derivation in the [build trace] in the record for a successful build.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
At this point, the file system data is in the proper form, and the valid acyclic reference data for each output is also calculated, so the outputs can be registered as proper store objects, and associated with the derivation in the [build trace] in the record for a successful build.
At this point, the file system data is in the proper form. Nix then computes the references for each output, so the outputs can be registered as proper store objects. References must form an acyclic graph.

(dropping the "build trace" since I don't know what that is)


[Nix instance]: @docroot@/glossary.md#gloss-nix-instance
[input-addressing]: ./derivation/outputs/input-address.md
[fixed content-addressing]: ./derivation/outputs/content-address.md#fixed
[floating content-addressing]: ./derivation/outputs/content-address.md#floating
[build trace]: ./build-trace.md
Loading
Loading