Skip to content

Jwink3101/notefile

Repository files navigation

Notefile

notefile is a tool to quickly and easily manage sidecar metadata files ("notefiles") along with the file itself as a YAML file (with the extensions .notes.yaml).

It is not a perfect solution but it does address many main needs as well as concerns I have with alternative tools.

Notefile is designed to assist in keeping associated notes and to perform the most basic operations. However, it is not designed to do all possible things. Notes can be modified (in YAML) as needed with other tools including those included here.

It is also worth noting that while notefile can be used as a Python module, it is really design to be primarily a CLI.

Design & Goals

When a note or tag is added, a notefile is created in the same location with the same name plus .notes.yaml. The design is a compromise of competing factors of the alternatives.

For example, extended attributes are great but they are easily broken and are not always compatible across different operating systems. Other metadata like ID3 or EXIF changes the file itself and are domain-specific.

Similarly, single-database solutions (like TMSU) are cleaner but risk damage and are a single point of failure (corruption and recoverability). And it is not as explicit that they are being used on a file.

YAML notefiles provide a clear indication of their being a note (or tag) of interest and are cross-platform. Furthermore, by being YAML text-based files, they are not easily corrupted. Also, YAML files are easily read and written by humans.

The format is YAML and should not be changed. However, this code does not assume any given fields except:

  • filesize
  • sha256 (optional)
  • tags
  • notes

Any other data can be added and will be preserved across all actions.

Notefile primarily grew around files, but it can also attach notes to directories. Directory notes follow the same sidecar model, but the sidecar is placed in the parent directory rather than inside the target directory. For example, a note for somepath/subdir/ would live at somepath/subdir.notes.yaml (or the corresponding hidden/subdir-mode variant).

Directory support is newer and less refined than file support. It is intended to be useful, but the coupling between a directory and its note is weaker than the file case and the repair heuristics are correspondingly less exact.

JSON vs YAML

Notefile can write the notes as nicely-formatted YAML or as JSON (which is technically still YAML as YAML is a superset of JSON). JSON is that it is much faster to read than YAML but comes at cost of being hard to edit manually.

The extension will always be .yaml as YAML is a superset of JSON and any YAML parser should be able to read JSON

Install and Usage

Install

Install right from github:

$ python -m pip install git+https://github.com/Jwink3101/notefile.git

Optional PyYAML backend (LibYAML speedup when available):

$ python -m pip install "git+https://github.com/Jwink3101/notefile.git#egg=notefile[pyyaml]"

Requirements

The only real requirement is ruamel.yaml. However, if you have pyyaml (website) installed, notefile will use that as a faster read-only parser (writes still use ruamel.yaml). Even better, if you have LibYAML, it will be about 25x faster for reads.

Note: We avoid writing with PyYAML due to known issues. See PyYAML issue #121.

To install LibYAML, see: (based on these instructions):

Download the source package: http://pyyaml.org/download/libyaml/yaml-0.2.5.tar.gz.

To build and install LibYAML, run

$ ./configure
$ make
# make install

Then to install pyyaml,

$ python -m pip install pyyaml

Or via the optional extra:

$ python -m pip install "notefile[pyyaml]"

In my (limited) experience, pyyaml comes with Anaconda but not miniconda

Usage

Every command is documented. For example, run

$ notefile -h

to see a list of commands and universal options and then

$ notefile <command> -h

for specific options.

The most basic command will be

$ notefile edit file.ext

which will launch $EDITOR (or try other global variables) to edit the notes. You can also use

$ notefile mod -t mytag file.ext

to add tags.

Repairs

It is possible for the sidecar notefiles to get out of sync with the basefile. The two possible issues are:

  • metadata: The basefile has been modified thereby changing its size, sha256, and mtime
  • orphaned: The basefile has been renamed thereby orphaning the notefile

The repair function can repair either (but not both) types of issues. To repair metadata, the notefile is simply updated with the new file.

To repair an orphaned notefile, it will search in and below the current directory for the file. It will first compare file sizes and then compare sha256 values. If more than one possible file is the original, it will not repair it and instead provide a warning.

Directory notes work similarly, but not identically. Directory metadata and orphan repair are intentionally shallow:

  • directory notes do not hash file contents
  • directory notes do not recurse through the full tree
  • directory notes track only the immediate children returned by os.listdir()
  • orphan repair for directories uses the count of immediate subdirectories, the count of immediate non-note files, and a shallow hash of the sorted immediate child names

This makes directory notes much cheaper to track, but also means their repair matching is less exact than for files. File notes remain the more robust and more mature case.

File Hashes

By default, the SHA256 hash is computed. It is highly suggested that this be allowed since it greatly increases the integrity of the link between the basefile and the notefile sidecar. However, --no-hash can be passed to many of the functions and it will disable hashing.

Note that when using --no-hash, the file may still be rehashed in subsequent runs without --no-hash, depending on the opperation.

When repairing an orphaned notefile, candidate files are first compared by filesize and then by SHA256. While not foolproof, this greatly reduces the number of SHA256 computations to be performed; especially on larger files where it becomes increasingly unlikely to be the exact same size.

Directory notes use a different kind of hash. Instead of hashing file contents, they use a shallow hash of the sorted immediate child names in the directory. This is used only as a lightweight directory identity signal and should not be thought of as equivalent to a file content hash.

Hidden and Subdir Notefiles

Notes can be hidden and/or in a subdirectory. Consider file.txt. When a note is created with the following flags, the location of the note is as follows:

Flags Note Destination comment
--visible --no-subdir file.txt.notes.yaml default
--visible --subdir _notefiles/file.txt.notes.yaml
--hidden --no-subdir .file.txt.notes.yaml
--hidden --subdir .notefiles/file.txt.notes.yaml

For a directory target, the same rule applies except the note is stored alongside the directory in its parent directory. For example, the note for somepath/subdir/ would be one of:

Flags Note Destination
--visible --no-subdir somepath/subdir.notes.yaml
--visible --subdir somepath/_notefiles/subdir.notes.yaml
--hidden --no-subdir somepath/.subdir.notes.yaml
--hidden --subdir somepath/.notefiles/subdir.notes.yaml

The default is --visible and --no-subdir but both can be controlled with environmental variables:

$ export NOTEFILE_HIDDEN=true
$ export NOTEFILE_SUBDIR=true

Note that the flags only apply to creating a new note. For example if a visible note already exists, it will always go to that even if -H is set.

To hide or unhide a note, use notefile vis hide or notefile vis show on either file(s) or dir(s). These will also use the subdir setting

Changing the visibility of a symlinked referent will cause the symlinked note to be broken. However, by design it will still properly read the note and will be fixed when editing or repairing metadata.

Hidden notefiles are more easily orphaned since it is harder to move both files but not having a directory filling with notefiles can be helpful.

Tips

Scripts

Includes are some scripts that may prove useful. As noted before, the goal of notefile is to be capable but it doesn't have to do everything!

In those scripts (and the tests), actions are often performed by calling the cli(). While less efficient, notefile is really designed with CLI in mind so some of the other functions are less robust.

Tracking History

notefile does not track the history of notes and instead suggest doing so in git. They can either be tracked with an existing git repo or its own.

If using it on its own, you can tell git to only track notes files with the following in your .gitignore:

# Ignore Everything except directories, so we can recurse into them
*
!*/

# Allow these
!*.notes.yaml
!.gitignore

Alternatively, the export command can be used.

Known Issues

These will likely be addressed (roughly in order of priority)

  • Behavior with hidden files themselves is not consistent. A warning will be thrown
  • Directory support is newer and less refined than file support, especially around repair heuristics and edge cases

Additional Workflows

This tools includes a lot of features but does not include everything. More can be done in Python directly

For example, to search for all notes and perform a test do

import notefile
for note in notefile.find(return_note=True):
    # test on note.data (which is read automatically)

Additional fields can be added (or removed) from data and will be saved when write is called.

Note that notefile does support setting alternative note fields (but not tags) so that may be useful from the CLI.

Changelog

See Changelog

AI/LLM/Coding Agent Disclosure

Almost all of the original code was developed by hand by the author. Around version 0.9.0 (which is also when switched to numeric versioning) OpenAI Codex was used to improve flow, catch bugs, and patch the code.

Major features of safe queries (0.9.0) and directory notes (0.10.0) were developed heavily with Codex with human reviews and confirmation of test cases.

About

Tool to create sidecar YAML notefiles

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors