Skip to content

Annotation tracks#54

Merged
d-laub merged 15 commits intomainfrom
dlaub/annot-tracks
May 16, 2025
Merged

Annotation tracks#54
d-laub merged 15 commits intomainfrom
dlaub/annot-tracks

Conversation

@d-laub
Copy link
Collaborator

@d-laub d-laub commented Apr 6, 2025

In the context of GVL, we define annotation tracks as sample-agnostic annotations aligned to the reference genome. Across bioinformatics, this kind of data is often encountered as BED or GTF/GFF files but could also be formatted as any BED derivative (e.g. narrowPeak), bigBed, bigWig, or bedGraph.

  • Add API for reading and writing annotation tracks (currently limited to using BED files or in-memory DataFrames as source data)
  • Tests @bschilder

…ncorporating them along the track dimension.
@d-laub d-laub self-assigned this Apr 6, 2025
@d-laub d-laub added the type: enhancement New feature or request label Apr 6, 2025
@bschilder
Copy link
Collaborator

bschilder commented Apr 11, 2025

Just a note, reading in GTF files would also be helpful for getting exon coordinates provided by Ensembl.
This is the file that ProHap uses for this purpose:
https://ftp.ensembl.org/pub/release-113/gtf/homo_sapiens/

Specifically, ProHap uses gffutils to read in GTF files:
https://github.com/ProGenNo/ProHap/blob/6a247e0e7cdd4f6f8193f7e483c3c8384f0cf58a/docs/configBuilder.js#L5

https://github.com/search?q=repo%3AProGenNo%2FProHap%20gffutils&type=code

@bschilder
Copy link
Collaborator

gtfparse is another option. Instead of making a local SQL database, it reads in the GTF as a pandas dataframe.
We could of course write our own GTF parser, but even so this might provide some helpful guidance:
https://github.com/openvax/gtfparse

@d-laub
Copy link
Collaborator Author

d-laub commented Apr 14, 2025

gtfparse looks the best, though I'd rather just copy and attribute their source code for gtf -> polars since it's maybe 100 lines and frees us from adding a dependency, waiting for them if we need any bugfixes, etc

@bschilder
Copy link
Collaborator

gtfparse looks the best, though I'd rather just copy and attribute their source code for gtf -> polars since it's maybe 100 lines and frees us from adding a dependency, waiting for them if we need any bugfixes, etc

Agreed, less dependencies == less headaches

@bschilder
Copy link
Collaborator

Reassigning the tests to myself

@bschilder
Copy link
Collaborator

@d-laub can you point me to the notebook with the annotation track examples?

@d-laub
Copy link
Collaborator Author

d-laub commented May 9, 2025

https://gist.github.com/d-laub/4dc305158015f4f0d662f605243517f6

@bschilder
Copy link
Collaborator

https://gist.github.com/d-laub/4dc305158015f4f0d662f605243517f6

Following this tutorial:

temp = TemporaryDirectory(suffix=".gvl")
gvl.write(
    path=temp.name,
    bed=bed[:2],
    variants=gvl.Variants.from_file(variants),
    overwrite=True,
)
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[5], [line 5](vscode-notebook-cell:?execution_count=5&line=5)
      [1](vscode-notebook-cell:?execution_count=5&line=1) temp = TemporaryDirectory(suffix=".gvl")
      [2](vscode-notebook-cell:?execution_count=5&line=2) gvl.write(
      [3](vscode-notebook-cell:?execution_count=5&line=3)     path=temp.name,
      [4](vscode-notebook-cell:?execution_count=5&line=4)     bed=bed[:2],
----> [5](vscode-notebook-cell:?execution_count=5&line=5)     variants=gvl.Variants.from_file(variants),
      [6](vscode-notebook-cell:?execution_count=5&line=6)     overwrite=True,
      [7](vscode-notebook-cell:?execution_count=5&line=7) )

AttributeError: module 'genvarloader' has no attribute 'Variants'

Think this is just from the old API. So switched to supplying the variants file name directly:

temp = TemporaryDirectory(suffix=".gvl")
gvl.write(
    path=temp.name,
    bed=bed[:2],
    variants=variants,
    overwrite=True,
)

But this time i get the error:

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
Cell In[6], [line 2](vscode-notebook-cell:?execution_count=6&line=2)
      [1](vscode-notebook-cell:?execution_count=6&line=1) temp = TemporaryDirectory(suffix=".gvl")
----> [2](vscode-notebook-cell:?execution_count=6&line=2) gvl.write(
      [3](vscode-notebook-cell:?execution_count=6&line=3)     path=temp.name,
      [4](vscode-notebook-cell:?execution_count=6&line=4)     bed=bed[:2],
      [5](vscode-notebook-cell:?execution_count=6&line=5)     variants=variants,
      [6](vscode-notebook-cell:?execution_count=6&line=6)     overwrite=True,
      [7](vscode-notebook-cell:?execution_count=6&line=7) )

File ~/projects/GenVarLoader/python/genvarloader/_dataset/_write.py:160, in write(path, bed, variants, bigwigs, samples, max_jitter, overwrite, max_mem)
    [158](https://vscode-remote+ssh-002dremote-002bbamdev2-002ecshl-002eedu.vscode-resource.vscode-cdn.net/grid/koo/home/schilder/projects/VEP_protein/notebooks/~/projects/GenVarLoader/python/genvarloader/_dataset/_write.py:158) elif isinstance(variants, PGEN):
    [159](https://vscode-remote+ssh-002dremote-002bbamdev2-002ecshl-002eedu.vscode-resource.vscode-cdn.net/grid/koo/home/schilder/projects/VEP_protein/notebooks/~/projects/GenVarLoader/python/genvarloader/_dataset/_write.py:159)     variants.set_samples(samples)
--> [160](https://vscode-remote+ssh-002dremote-002bbamdev2-002ecshl-002eedu.vscode-resource.vscode-cdn.net/grid/koo/home/schilder/projects/VEP_protein/notebooks/~/projects/GenVarLoader/python/genvarloader/_dataset/_write.py:160)     gvl_bed = _write_from_pgen(path, gvl_bed, variants, max_mem)
    [161](https://vscode-remote+ssh-002dremote-002bbamdev2-002ecshl-002eedu.vscode-resource.vscode-cdn.net/grid/koo/home/schilder/projects/VEP_protein/notebooks/~/projects/GenVarLoader/python/genvarloader/_dataset/_write.py:161) elif isinstance(variants, SparseVar):
    [162](https://vscode-remote+ssh-002dremote-002bbamdev2-002ecshl-002eedu.vscode-resource.vscode-cdn.net/grid/koo/home/schilder/projects/VEP_protein/notebooks/~/projects/GenVarLoader/python/genvarloader/_dataset/_write.py:162)     gvl_bed = _write_from_svar(path, gvl_bed, variants, samples)

File ~/projects/GenVarLoader/python/genvarloader/_dataset/_write.py:464, in _write_from_pgen(path, bed, pgen, max_mem)
    [462](https://vscode-remote+ssh-002dremote-002bbamdev2-002ecshl-002eedu.vscode-resource.vscode-cdn.net/grid/koo/home/schilder/projects/VEP_protein/notebooks/~/projects/GenVarLoader/python/genvarloader/_dataset/_write.py:462) for _, is_last, (genos, chunk_end, chunk_idxs) in mark_ends(range_):
    [463](https://vscode-remote+ssh-002dremote-002bbamdev2-002ecshl-002eedu.vscode-resource.vscode-cdn.net/grid/koo/home/schilder/projects/VEP_protein/notebooks/~/projects/GenVarLoader/python/genvarloader/_dataset/_write.py:463)     chunk_idxs = chunk_idxs.astype(V_IDX_TYPE)
--> [464](https://vscode-remote+ssh-002dremote-002bbamdev2-002ecshl-002eedu.vscode-resource.vscode-cdn.net/grid/koo/home/schilder/projects/VEP_protein/notebooks/~/projects/GenVarLoader/python/genvarloader/_dataset/_write.py:464)     sp_genos = SparseGenotypes.from_dense(genos.astype(np.int8), chunk_idxs)
    [465](https://vscode-remote+ssh-002dremote-002bbamdev2-002ecshl-002eedu.vscode-resource.vscode-cdn.net/grid/koo/home/schilder/projects/VEP_protein/notebooks/~/projects/GenVarLoader/python/genvarloader/_dataset/_write.py:465)     ls_sparse.append(sp_genos)
    [467](https://vscode-remote+ssh-002dremote-002bbamdev2-002ecshl-002eedu.vscode-resource.vscode-cdn.net/grid/koo/home/schilder/projects/VEP_protein/notebooks/~/projects/GenVarLoader/python/genvarloader/_dataset/_write.py:467)     if is_last:

File ~/.conda/envs/genome-loader/lib/python3.12/site-packages/genoray/_svar.py:67, in SparseGenotypes.from_dense(cls, genos, var_idxs)
     [65](https://vscode-remote+ssh-002dremote-002bbamdev2-002ecshl-002eedu.vscode-resource.vscode-cdn.net/grid/koo/home/schilder/projects/VEP_protein/notebooks/~/.conda/envs/genome-loader/lib/python3.12/site-packages/genoray/_svar.py:65) # (s p v)
     [66](https://vscode-remote+ssh-002dremote-002bbamdev2-002ecshl-002eedu.vscode-resource.vscode-cdn.net/grid/koo/home/schilder/projects/VEP_protein/notebooks/~/.conda/envs/genome-loader/lib/python3.12/site-packages/genoray/_svar.py:66) keep = genos == 1
---> [67](https://vscode-remote+ssh-002dremote-002bbamdev2-002ecshl-002eedu.vscode-resource.vscode-cdn.net/grid/koo/home/schilder/projects/VEP_protein/notebooks/~/.conda/envs/genome-loader/lib/python3.12/site-packages/genoray/_svar.py:67) data = var_idxs[keep.nonzero()[-1]]
     [68](https://vscode-remote+ssh-002dremote-002bbamdev2-002ecshl-002eedu.vscode-resource.vscode-cdn.net/grid/koo/home/schilder/projects/VEP_protein/notebooks/~/.conda/envs/genome-loader/lib/python3.12/site-packages/genoray/_svar.py:68) lengths = keep.sum(-1)
     [69](https://vscode-remote+ssh-002dremote-002bbamdev2-002ecshl-002eedu.vscode-resource.vscode-cdn.net/grid/koo/home/schilder/projects/VEP_protein/notebooks/~/.conda/envs/genome-loader/lib/python3.12/site-packages/genoray/_svar.py:69) return cls.from_lengths(data, lengths)

IndexError: index 40 is out of bounds for axis 0 with size 20

Conda env

Details ``` # packages in environment at /grid/koo/home/schilder/.conda/envs/genome-loader: # # Name Version Build Channel _libgcc_mutex 0.1 conda_forge conda-forge _openmp_mutex 4.5 2_gnu conda-forge _sysroot_linux-64_curr_repodata_hack 3 haa98f57_10 aiohappyeyeballs 2.4.4 pyhd8ed1ab_1 conda-forge aiohttp 3.10.5 py312h41a817b_0 conda-forge aiohttp-cors 0.7.0 pyhd8ed1ab_1 conda-forge aiosignal 1.3.2 pyhd8ed1ab_0 conda-forge annotated-types 0.7.0 pyhd8ed1ab_1 conda-forge anyio 4.8.0 pyhd8ed1ab_0 conda-forge aom 3.6.1 h59595ed_0 conda-forge appdirs 1.4.4 pyhd3eb1b0_0 argcomplete 3.6.2 pypi_0 pypi argh 0.31.3 pypi_0 pypi argon2-cffi 23.1.0 pyhd8ed1ab_1 conda-forge argon2-cffi-bindings 21.2.0 py312h66e93f0_5 conda-forge arrow 1.3.0 pyhd8ed1ab_1 conda-forge asciitree 0.3.3 pypi_0 pypi astroid 3.2.4 py312h06a4308_0 asttokens 3.0.0 pyhd8ed1ab_1 conda-forge async-lru 2.0.4 pyhd8ed1ab_1 conda-forge async-timeout 5.0.1 pyhd8ed1ab_1 conda-forge attrs 25.3.0 pypi_0 pypi awkward 2.8.2 pypi_0 pypi awkward-cpp 45 pypi_0 pypi aws-c-auth 0.8.0 h56a2c13_4 conda-forge aws-c-cal 0.8.0 hd3f4568_0 conda-forge aws-c-common 0.9.31 hb9d3cd8_0 conda-forge aws-c-compression 0.3.0 hf20e7d7_0 conda-forge aws-c-event-stream 0.5.0 h68c3b0c_2 conda-forge aws-c-http 0.9.0 hfad4ed3_3 conda-forge aws-c-io 0.15.0 h17eb868_2 conda-forge aws-c-mqtt 0.11.0 h407ecb8_2 conda-forge aws-c-s3 0.7.0 hadeddc1_5 conda-forge aws-c-sdkutils 0.2.0 hf20e7d7_0 conda-forge aws-checksums 0.2.0 hf20e7d7_0 conda-forge aws-crt-cpp 0.29.0 h73f0fd4_6 conda-forge aws-sdk-cpp 1.11.407 h6a6dca0_6 conda-forge azure-core-cpp 1.14.0 h5cfcd09_0 conda-forge azure-identity-cpp 1.10.0 h113e628_0 conda-forge azure-storage-blobs-cpp 12.13.0 h3cf044e_1 conda-forge azure-storage-common-cpp 12.8.0 h736e048_1 conda-forge azure-storage-files-datalake-cpp 12.12.0 ha633028_1 conda-forge babel 2.17.0 pyhd8ed1ab_0 conda-forge beartype 0.20.2 pypi_0 pypi beautifulsoup4 4.13.3 pyha770c72_0 conda-forge binutils_impl_linux-64 2.43 h4bf12b8_2 conda-forge biopython 1.78 py312h5eee18b_0 blas 1.0 mkl bleach 6.2.0 pyh29332c3_4 conda-forge bleach-with-css 6.2.0 h82add2a_4 conda-forge bokeh 3.6.2 pyhd8ed1ab_1 conda-forge brotli 1.1.0 hb9d3cd8_2 conda-forge brotli-bin 1.1.0 hb9d3cd8_2 conda-forge brotli-python 1.1.0 py312h2ec8cdc_2 conda-forge bzip2 1.0.8 h4bc722e_7 conda-forge c-ares 1.34.4 hb9d3cd8_0 conda-forge ca-certificates 2025.2.25 h06a4308_0 cachecontrol 0.14.1 pyha770c72_1 conda-forge cachecontrol-with-filecache 0.14.1 pyhd8ed1ab_1 conda-forge cached-property 1.5.2 hd8ed1ab_1 conda-forge cached_property 1.5.2 pyha770c72_1 conda-forge cachetools 5.5.0 pyhd8ed1ab_1 conda-forge certifi 2025.4.26 pypi_0 pypi cffi 1.17.1 py312h06ac9bb_0 conda-forge charset-normalizer 3.4.2 pypi_0 pypi cleo 2.2.1 pyhd8ed1ab_0 conda-forge click 8.1.8 pypi_0 pypi cloudpickle 3.1.0 pyhd8ed1ab_1 conda-forge colorama 0.4.6 pyhd8ed1ab_1 conda-forge coloredlogs 15.0.1 pypi_0 pypi colorful 0.5.6 pyhd8ed1ab_0 conda-forge comm 0.2.2 pyhd8ed1ab_1 conda-forge contourpy 1.3.1 py312h68727a3_0 conda-forge crashtest 0.4.1 pyhd8ed1ab_1 conda-forge cryptography 44.0.0 py312hda17c39_0 conda-forge cslug 1.0.0 pypi_0 pypi cuda-cudart 12.4.127 0 nvidia cuda-cupti 12.4.127 0 nvidia cuda-libraries 12.4.1 0 nvidia cuda-nvrtc 12.4.127 0 nvidia cuda-nvtx 12.4.127 0 nvidia cuda-opencl 12.6.77 0 nvidia cuda-runtime 12.4.1 0 nvidia cuda-version 12.6 3 nvidia cycler 0.12.1 pyhd8ed1ab_1 conda-forge cyclopts 3.16.0 pypi_0 pypi cytoolz 1.0.1 py312h66e93f0_0 conda-forge cyvcf2 0.31.1 pypi_0 pypi dask 2024.12.0 pyhd8ed1ab_1 conda-forge dask-core 2024.12.0 pyhd8ed1ab_1 conda-forge dask-expr 1.1.20 pyhd8ed1ab_1 conda-forge dask-glm 0.3.2 pypi_0 pypi dask-ml 2024.4.4 pypi_0 pypi datacache 1.1.5 py_0 conda-forge dbus 1.13.6 h5008d03_3 conda-forge debugpy 1.8.11 py312h2ec8cdc_0 conda-forge decorator 5.1.1 pyhd8ed1ab_1 conda-forge defusedxml 0.7.1 pyhd8ed1ab_0 conda-forge dill 0.3.8 py312h06a4308_0 distlib 0.3.9 pyhd8ed1ab_1 conda-forge distributed 2024.12.0 pyhd8ed1ab_1 conda-forge docstring-parser 0.16 pypi_0 pypi docutils 0.21.2 pypi_0 pypi dulwich 0.21.7 py312h66e93f0_1 conda-forge einops 0.8.1 pypi_0 pypi elfutils 0.192 h7f4e02f_1 conda-forge ensembl-rest 0.3.4 pypi_0 pypi entrypoints 0.4 pyhd8ed1ab_1 conda-forge exceptiongroup 1.2.2 pyhd8ed1ab_1 conda-forge executing 2.1.0 pyhd8ed1ab_1 conda-forge expat 2.6.4 h5888daf_0 conda-forge fasteners 0.19 pypi_0 pypi ffmpeg 4.4.2 gpl_hdf48244_113 conda-forge filelock 3.16.1 pyhd8ed1ab_1 conda-forge font-ttf-dejavu-sans-mono 2.37 hd3eb1b0_0 font-ttf-inconsolata 2.001 hcb22688_0 font-ttf-source-code-pro 2.030 hd3eb1b0_0 font-ttf-ubuntu 0.83 h8b1ccd4_0 fontconfig 2.15.0 h7e30c49_1 conda-forge fonts-anaconda 1 h8fa9717_0 fonts-conda-ecosystem 1 hd3eb1b0_0 fonttools 4.56.0 py312h178313f_0 conda-forge fqdn 1.5.1 pyhd8ed1ab_1 conda-forge freetype 2.12.1 h267a509_2 conda-forge frozenlist 1.5.0 py312h66e93f0_0 conda-forge fsspec 2025.3.2 pypi_0 pypi gcc_impl_linux-64 11.2.0 h1234567_1 genoray 0.10.2 pypi_0 pypi genvarloader 0.14.2 pypi_0 pypi gettext 0.22.5 he02047a_3 conda-forge gettext-tools 0.22.5 he02047a_3 conda-forge gffutils 0.13 pypi_0 pypi gflags 2.2.2 h5888daf_1005 conda-forge giflib 5.2.2 h5eee18b_0 glog 0.7.1 hbabe93e_0 conda-forge gmp 6.2.1 h295c915_3 gnutls 3.7.9 hb077bed_0 conda-forge google-api-core 2.24.0 pyhd8ed1ab_0 conda-forge google-auth 2.37.0 pyhd8ed1ab_0 conda-forge googleapis-common-protos 1.66.0 pyhff2d567_0 conda-forge grpcio 1.65.5 py312h374181b_0 conda-forge gtfparse 2.5.0 pypi_0 pypi h11 0.14.0 pyhd8ed1ab_1 conda-forge h2 4.1.0 pyhd8ed1ab_1 conda-forge haptools 0.5.0 pyhdfd78af_0 bioconda hirola 0.3.0 pypi_0 pypi hpack 4.0.0 pyhd8ed1ab_1 conda-forge htslib 1.21 h566b1c6_1 bioconda httpcore 1.0.7 pyh29332c3_1 conda-forge httpx 0.28.1 pyhd8ed1ab_0 conda-forge humanfriendly 10.0 pypi_0 pypi hyperframe 6.0.1 pyhd8ed1ab_1 conda-forge idna 3.10 pyhd8ed1ab_1 conda-forge importlib-metadata 8.5.0 pyha770c72_1 conda-forge importlib_resources 6.4.5 pyhd8ed1ab_1 conda-forge intel-openmp 2022.0.1 h06a4308_3633 intervaltree 3.1.0 pypi_0 pypi iprogress 0.4 pypi_0 pypi ipykernel 6.29.5 pyh3099207_0 conda-forge ipython 8.30.0 pyh707e725_0 conda-forge isoduration 20.11.0 pyhd8ed1ab_1 conda-forge isort 5.13.2 py312h06a4308_0 jaraco.classes 3.4.0 pyhd8ed1ab_2 conda-forge jedi 0.19.2 pyhd8ed1ab_1 conda-forge jeepney 0.8.0 pyhd8ed1ab_0 conda-forge jinja2 3.1.4 pyhd8ed1ab_1 conda-forge joblib 1.5.0 pypi_0 pypi json5 0.10.0 pyhd8ed1ab_1 conda-forge jsonpointer 3.0.0 py312h7900ff3_1 conda-forge jsonschema 4.23.0 pyhd8ed1ab_1 conda-forge jsonschema-specifications 2024.10.1 pyhd8ed1ab_1 conda-forge jsonschema-with-format-nongpl 4.23.0 hd8ed1ab_1 conda-forge jupyter-client 8.6.3 pypi_0 pypi jupyter-lsp 2.2.5 pyhd8ed1ab_1 conda-forge jupyter_client 7.4.9 pyhd8ed1ab_0 conda-forge jupyter_core 5.7.2 pyh31011fe_1 conda-forge jupyter_events 0.12.0 pyh29332c3_0 conda-forge jupyter_server 2.15.0 pyhd8ed1ab_0 conda-forge jupyter_server_terminals 0.5.3 pyhd8ed1ab_1 conda-forge jupyterlab 4.3.5 pyhd8ed1ab_0 conda-forge jupyterlab_pygments 0.3.0 pyhd8ed1ab_2 conda-forge jupyterlab_server 2.27.3 pyhd8ed1ab_1 conda-forge kernel-headers_linux-64 3.10.0 h57e8cba_10 keyring 24.3.1 pyha804496_1 conda-forge keyutils 1.6.1 h166bdaf_0 conda-forge kiwisolver 1.4.8 py312h84d6215_0 conda-forge krb5 1.21.3 h659f571_0 conda-forge lame 3.100 h7b6447c_0 lcms2 2.16 hb7c19ff_0 conda-forge ld_impl_linux-64 2.43 h712a8e2_2 conda-forge lerc 4.0.0 h27087fc_0 conda-forge levenshtein 0.27.1 pypi_0 pypi libabseil 20240722.0 cxx17_h5888daf_1 conda-forge libarchive 3.7.7 hadbb8c3_0 conda-forge libarrow 18.0.0 ha5db6c2_0_cpu conda-forge libarrow-acero 18.0.0 h5888daf_0_cpu conda-forge libarrow-dataset 18.0.0 h5888daf_0_cpu conda-forge libarrow-substrait 18.0.0 he882d9a_0_cpu conda-forge libasprintf 0.22.5 he8f35ee_3 conda-forge libasprintf-devel 0.22.5 he8f35ee_3 conda-forge libblas 3.9.0 16_linux64_mkl conda-forge libbrotlicommon 1.1.0 hb9d3cd8_2 conda-forge libbrotlidec 1.1.0 hb9d3cd8_2 conda-forge libbrotlienc 1.1.0 hb9d3cd8_2 conda-forge libcblas 3.9.0 16_linux64_mkl conda-forge libcrc32c 1.1.2 h9c3ff4c_0 conda-forge libcublas 12.4.5.8 0 nvidia libcufft 11.2.1.3 0 nvidia libcufile 1.11.1.6 0 nvidia libcurand 10.3.7.77 0 nvidia libcurl 8.11.1 h332b0f4_0 conda-forge libcusolver 11.6.1.9 0 nvidia libcusparse 12.3.1.170 0 nvidia libdeflate 1.22 hb9d3cd8_0 conda-forge libdrm 2.4.124 hb9d3cd8_0 conda-forge libedit 3.1.20191231 he28a2e2_2 conda-forge libegl 1.7.0 ha4b6fd6_2 conda-forge libev 4.33 hd590300_2 conda-forge libevent 2.1.12 hf998b51_1 conda-forge libexpat 2.6.4 h5888daf_0 conda-forge libffi 3.4.2 h7f98852_5 conda-forge libgcc 14.2.0 h77fa898_1 conda-forge libgcc-devel_linux-64 11.2.0 h1234567_1 libgcc-ng 14.2.0 h69a702a_1 conda-forge libgettextpo 0.22.5 he02047a_3 conda-forge libgettextpo-devel 0.22.5 he02047a_3 conda-forge libgfortran 14.2.0 h69a702a_1 conda-forge libgfortran5 14.2.0 hd5240d6_1 conda-forge libgl 1.7.0 ha4b6fd6_2 conda-forge libglib 2.82.2 h2ff4ddf_0 conda-forge libglvnd 1.7.0 ha4b6fd6_2 conda-forge libglx 1.7.0 ha4b6fd6_2 conda-forge libgomp 14.2.0 h77fa898_1 conda-forge libgoogle-cloud 2.30.0 h438788a_0 conda-forge libgoogle-cloud-storage 2.30.0 h0121fbd_0 conda-forge libgrpc 1.65.5 hf5c653b_0 conda-forge libiconv 1.17 hd590300_2 conda-forge libidn2 2.3.7 hd590300_0 conda-forge libjpeg-turbo 3.0.0 hd590300_1 conda-forge liblapack 3.9.0 16_linux64_mkl conda-forge libllvm14 14.0.6 hcd5def8_4 conda-forge liblzma 5.6.3 hb9d3cd8_1 conda-forge liblzma-devel 5.6.3 hb9d3cd8_1 conda-forge libmicrohttpd 1.0.1 h97afed2_0 conda-forge libnghttp2 1.64.0 h161d5f1_0 conda-forge libnpp 12.2.5.30 0 nvidia libnsl 2.0.1 hd590300_0 conda-forge libnvfatbin 12.6.77 0 nvidia libnvjitlink 12.4.127 0 nvidia libnvjpeg 12.3.1.117 0 nvidia libopenblas 0.3.28 pthreads_h94d23a6_1 conda-forge libparquet 18.0.0 h6bd9018_0_cpu conda-forge libpciaccess 0.18 hd590300_0 conda-forge libpng 1.6.44 hadc24fc_0 conda-forge libprotobuf 5.27.5 h5b01275_2 conda-forge libre2-11 2024.07.02 hbbce691_1 conda-forge libsodium 1.0.20 h4ab18f5_0 conda-forge libsqlite 3.47.2 hee588c1_0 conda-forge libssh2 1.11.1 hf672d98_0 conda-forge libstdcxx 14.2.0 hc0a3c3a_1 conda-forge libstdcxx-ng 14.2.0 h4852527_1 conda-forge libtasn1 4.19.0 h166bdaf_0 conda-forge libthrift 0.21.0 h0e7cc3e_0 conda-forge libtiff 4.7.0 hc4654cb_2 conda-forge libunistring 0.9.10 h7f98852_0 conda-forge libunwind 1.6.2 h9c3ff4c_0 conda-forge libutf8proc 2.8.0 hf23e847_1 conda-forge libuuid 2.38.1 h0b41bf4_0 conda-forge libva 2.22.0 h8a09558_1 conda-forge libvpx 1.13.1 h6a678d5_0 libwebp 1.4.0 h2c329e2_0 conda-forge libwebp-base 1.4.0 hd590300_0 conda-forge libxcb 1.17.0 h8a09558_0 conda-forge libxcrypt 4.4.36 hd590300_1 conda-forge libxml2 2.13.5 h0d44e9d_1 conda-forge libzlib 1.3.1 hb9d3cd8_2 conda-forge linkify-it-py 2.0.3 pyhd8ed1ab_1 conda-forge llvm-openmp 15.0.7 h0cdce71_0 conda-forge llvmlite 0.44.0 pypi_0 pypi locket 1.0.0 pyhd8ed1ab_0 conda-forge loguru 0.7.3 pypi_0 pypi lz4 4.3.3 py312hb3f7f12_1 conda-forge lz4-c 1.9.4 hcb278e6_0 conda-forge lzo 2.10 hd590300_1001 conda-forge markdown-it-py 3.0.0 pyhd8ed1ab_1 conda-forge markupsafe 3.0.2 py312h178313f_1 conda-forge matplotlib-base 3.10.1 py312hd3ec401_0 conda-forge matplotlib-inline 0.1.7 pyhd8ed1ab_1 conda-forge mccabe 0.7.0 pyhd3eb1b0_0 mdit-py-plugins 0.4.2 pyhd8ed1ab_1 conda-forge mdurl 0.1.2 pyhd8ed1ab_1 conda-forge memoized-property 1.0.3 pyhd8ed1ab_1 conda-forge memray 1.15.0 py312h9f2de7a_0 conda-forge mistune 3.1.2 pyhd8ed1ab_0 conda-forge mkl 2022.1.0 hc2b9512_224 ml-dtypes 0.5.0 pypi_0 pypi mock 4.0.3 pyhd3eb1b0_0 more-itertools 10.7.0 pypi_0 pypi mpmath 1.3.0 py312h06a4308_0 msgpack-python 1.1.0 py312h68727a3_0 conda-forge multidict 6.1.0 py312h178313f_2 conda-forge multimethod 1.9.1 pyhd8ed1ab_0 conda-forge multipledispatch 1.0.0 pypi_0 pypi munkres 1.1.4 pyh9f0ad1d_0 conda-forge mypy-extensions 1.1.0 pypi_0 pypi natsort 8.4.0 pyh29332c3_1 conda-forge nbclient 0.10.2 pyhd8ed1ab_0 conda-forge nbconvert-core 7.16.6 pyh29332c3_0 conda-forge nbformat 5.10.4 pyhd8ed1ab_1 conda-forge ncls 0.0.68 pypi_0 pypi ncurses 6.5 he02047a_1 conda-forge nest-asyncio 1.6.0 pyhd8ed1ab_1 conda-forge nettle 3.9.1 h7ab15ed_0 conda-forge networkx 3.3 py312h06a4308_0 notebook 7.3.2 pyhd8ed1ab_0 conda-forge notebook-shim 0.2.4 pyhd8ed1ab_1 conda-forge numba 0.61.2 pypi_0 pypi numcodecs 0.13.1 pypi_0 pypi numerary 0.4.4 pypi_0 pypi numpy 2.2.5 pypi_0 pypi opencensus 0.11.3 pyhd8ed1ab_0 conda-forge opencensus-context 0.1.3 py312h7900ff3_3 conda-forge openh264 2.3.1 hcb278e6_2 conda-forge openjpeg 2.5.3 h5fbd93e_0 conda-forge openssl 3.5.0 h7b32b05_0 conda-forge orc 2.0.2 h690cf93_1 conda-forge overrides 7.7.0 pyhd8ed1ab_1 conda-forge p11-kit 0.24.1 hc5aa10d_0 conda-forge packaging 25.0 pypi_0 pypi pandas 2.2.3 py312hf9745cd_1 conda-forge pandera 0.23.1 pypi_0 pypi pandocfilters 1.5.0 pyhd8ed1ab_0 conda-forge parso 0.8.4 pyhd8ed1ab_1 conda-forge partd 1.4.2 pyhd8ed1ab_0 conda-forge patsy 1.0.1 py312h06a4308_0 pcre2 10.44 hba22ea6_2 conda-forge pexpect 4.9.0 pyhd8ed1ab_1 conda-forge pgenlib 0.92.0 pypi_0 pypi phantom-types 3.0.2 pypi_0 pypi pickleshare 0.7.5 pyhd8ed1ab_1004 conda-forge pillow 11.0.0 py312h7b63e92_0 conda-forge pip 25.0.1 pyh8b19718_0 conda-forge pkginfo 1.12.0 pyhd8ed1ab_1 conda-forge pkgutil-resolve-name 1.3.10 pyhd8ed1ab_2 conda-forge platformdirs 4.3.8 pypi_0 pypi plink 1.90b7.7 h18e278d_1 bioconda plink2 2.0.0a.6.9 h9948957_0 bioconda poetry 1.8.5 pyha804496_0 conda-forge poetry-core 1.9.1 pyhd8ed1ab_1 conda-forge poetry-plugin-export 1.8.0 pyhd8ed1ab_1 conda-forge polars 1.29.0 pypi_0 pypi pooch 1.8.2 pypi_0 pypi progressbar33 2.4 pyhd8ed1ab_1 conda-forge prometheus_client 0.21.1 pyhd8ed1ab_0 conda-forge prompt-toolkit 3.0.48 pyha770c72_1 conda-forge propcache 0.2.1 py312h66e93f0_0 conda-forge proto-plus 1.25.0 pyhd8ed1ab_1 conda-forge protobuf 5.27.5 py312h2ec8cdc_0 conda-forge psutil 6.1.0 py312h66e93f0_0 conda-forge pthread-stubs 0.4 hb9d3cd8_1002 conda-forge ptyprocess 0.7.0 pyhd8ed1ab_1 conda-forge pure_eval 0.2.3 pyhd8ed1ab_1 conda-forge py-spy 0.4.0 h4c5a871_1 conda-forge pyarrow 20.0.0 pypi_0 pypi pyasn1 0.6.1 pyhd8ed1ab_2 conda-forge pyasn1-modules 0.4.1 pyhd8ed1ab_1 conda-forge pybigwig 0.3.24 pypi_0 pypi pybind11 2.13.6 pypi_0 pypi pycparser 2.22 pyh29332c3_1 conda-forge pydantic 2.11.4 pypi_0 pypi pydantic-core 2.33.2 pypi_0 pypi pyensembl 2.3.13 pyh7cba7a3_0 bioconda pyfaidx 0.8.1.3 pypi_0 pypi pygments 2.19.1 pypi_0 pypi pylint 3.2.7 py312h06a4308_0 pyopenssl 24.3.0 pyhd8ed1ab_0 conda-forge pyparsing 3.2.2 pyhd8ed1ab_0 conda-forge pyproject_hooks 1.2.0 pyhd8ed1ab_1 conda-forge pyranges 0.1.4 pypi_0 pypi pysam 0.23.0 pypi_0 pypi pysocks 1.7.1 pyha55dd90_7 conda-forge python 3.12.8 h9e4cc4f_1_cpython conda-forge python-build 1.2.2.post1 pyhff2d567_1 conda-forge python-dateutil 2.9.0.post0 pyhff2d567_1 conda-forge python-fastjsonschema 2.21.1 pyhd8ed1ab_0 conda-forge python-installer 0.7.0 pyhff2d567_1 conda-forge python-json-logger 2.0.7 pyhd8ed1ab_0 conda-forge python_abi 3.12 5_cp312 conda-forge pytorch 2.5.1 py3.12_cuda12.4_cudnn9.1.0_0 pytorch pytorch-cuda 12.4 hc786d27_7 pytorch pytorch-mutex 1.0 cuda pytorch pytz 2025.2 pypi_0 pypi pyu2f 0.1.5 pyhd8ed1ab_1 conda-forge pyyaml 6.0.2 py312h66e93f0_1 conda-forge pyzmq 24.0.1 py312h5eee18b_0 qhull 2020.2 h434a139_5 conda-forge rapidfuzz 3.13.0 pypi_0 pypi ray-core 2.40.0 py312h6630fa3_0 conda-forge ray-default 2.40.0 py312h6dd12e9_0 conda-forge re2 2024.07.02 h77b4e00_1 conda-forge readline 8.2 h8228510_1 conda-forge referencing 0.35.1 pyhd8ed1ab_1 conda-forge requests 2.32.3 pyhd8ed1ab_1 conda-forge requests-toolbelt 1.0.0 pyhd8ed1ab_1 conda-forge rfc3339-validator 0.1.4 pyhd8ed1ab_1 conda-forge rfc3986-validator 0.1.1 pyh9f0ad1d_0 conda-forge rich 14.0.0 pypi_0 pypi rich-rst 1.3.1 pypi_0 pypi rpds-py 0.22.3 py312h12e396e_0 conda-forge rsa 4.9 pyhd8ed1ab_1 conda-forge rust 1.86.0 h1a8d7c4_0 conda-forge rust-std-x86_64-unknown-linux-gnu 1.86.0 h2c6d0dc_0 conda-forge s2n 1.5.6 h0e56266_0 conda-forge scikit-learn 1.6.1 pypi_0 pypi scipy 1.15.2 pypi_0 pypi seaborn 0.13.2 hd8ed1ab_3 conda-forge seaborn-base 0.13.2 pyhd8ed1ab_3 conda-forge secretstorage 3.3.3 py312h7900ff3_3 conda-forge send2trash 1.8.3 pyh0d859eb_1 conda-forge seqpro 0.3.2 pypi_0 pypi serializable 0.2.1 pyhd8ed1ab_0 conda-forge setproctitle 1.3.4 py312h66e93f0_0 conda-forge setuptools 80.3.1 pypi_0 pypi sgkit 0.9.0 pypi_0 pypi shellingham 1.5.4 pyhd8ed1ab_1 conda-forge simplejson 3.19.2 py312h5eee18b_0 six 1.17.0 pyhd8ed1ab_0 conda-forge smart_open 7.0.5 pyhd8ed1ab_1 conda-forge snappy 1.2.1 h8bd8927_1 conda-forge sniffio 1.3.1 pyhd8ed1ab_1 conda-forge sorted-nearest 0.0.39 pypi_0 pypi sortedcontainers 2.4.0 pyhd8ed1ab_0 conda-forge soupsieve 2.5 pyhd8ed1ab_1 conda-forge sparse 0.15.5 pypi_0 pypi stack_data 0.6.3 pyhd8ed1ab_1 conda-forge statsmodels 0.14.4 py312h5eee18b_0 svt-av1 1.4.1 hcb278e6_0 conda-forge sympy 1.13.2 py312h06a4308_0 sysroot_linux-64 2.17 h57e8cba_10 tabix 1.11 hdfd78af_0 bioconda tabulate 0.9.0 pypi_0 pypi tbb 2022.1.0 pypi_0 pypi tblib 3.0.0 pyhd8ed1ab_1 conda-forge tcmlib 1.3.0 pypi_0 pypi tensorstore 0.1.71 pypi_0 pypi terminado 0.18.1 pyh0d859eb_0 conda-forge textual 1.0.0 pyhd8ed1ab_0 conda-forge threadpoolctl 3.6.0 pypi_0 pypi tinycss2 1.4.0 pyhd8ed1ab_0 conda-forge tinytimer 0.0.0 py_0 conda-forge tk 8.6.13 noxft_h4845f30_101 conda-forge tomli 2.2.1 pyhd8ed1ab_1 conda-forge tomlkit 0.13.2 pyha770c72_1 conda-forge toolz 1.0.0 pyhd8ed1ab_1 conda-forge torchaudio 2.5.1 py312_cu124 pytorch torchtriton 3.1.0 py312 pytorch torchvision 0.20.1 py312_cu124 pytorch tornado 6.4.2 py312h66e93f0_0 conda-forge tqdm 4.67.1 pypi_0 pypi traitlets 5.14.3 pyhd8ed1ab_1 conda-forge trove-classifiers 2024.10.21.16 pyhd8ed1ab_1 conda-forge typechecks 0.1.0 pyhd8ed1ab_1 conda-forge typeguard 4.4.2 pypi_0 pypi typer 0.15.2 pypi_0 pypi types-python-dateutil 2.9.0.20241206 pyhd8ed1ab_0 conda-forge typing-extensions 4.13.2 pypi_0 pypi typing-inspection 0.4.0 pypi_0 pypi typing_inspect 0.9.0 pyhd8ed1ab_1 conda-forge typing_utils 0.1.0 pyhd8ed1ab_1 conda-forge tzdata 2025.2 pypi_0 pypi uc-micro-py 1.0.3 pyhd8ed1ab_1 conda-forge unicodedata2 16.0.0 py312h66e93f0_0 conda-forge uri-template 1.3.0 pyhd8ed1ab_1 conda-forge urllib3 2.4.0 pypi_0 pypi virtualenv 20.28.0 pyhd8ed1ab_0 conda-forge wayland 1.23.1 h3e06ad9_0 conda-forge wayland-protocols 1.37 hd8ed1ab_0 conda-forge wcwidth 0.2.13 pyhd8ed1ab_1 conda-forge webcolors 24.11.1 pyhd8ed1ab_0 conda-forge webencodings 0.5.1 pyhd8ed1ab_3 conda-forge websocket-client 1.8.0 pyhd8ed1ab_1 conda-forge wheel 0.45.1 pyhd8ed1ab_1 conda-forge wrapt 1.17.0 py312h66e93f0_0 conda-forge x264 1!164.3095 h166bdaf_2 conda-forge x265 3.5 h924138e_3 conda-forge xarray 2024.11.0 pyhd8ed1ab_0 conda-forge xorg-libx11 1.8.10 h4f16b4b_1 conda-forge xorg-libxau 1.0.12 hb9d3cd8_0 conda-forge xorg-libxdmcp 1.1.5 hb9d3cd8_0 conda-forge xorg-libxext 1.3.6 hb9d3cd8_0 conda-forge xorg-libxfixes 6.0.1 hb9d3cd8_0 conda-forge xyzservices 2024.9.0 pyhd8ed1ab_1 conda-forge xz 5.6.3 hbcc6ac9_1 conda-forge xz-gpl-tools 5.6.3 hbcc6ac9_1 conda-forge xz-tools 5.6.3 hb9d3cd8_1 conda-forge yaml 0.2.5 h7f98852_2 conda-forge yarl 1.18.3 py312h66e93f0_0 conda-forge zarr 2.18.4 pypi_0 pypi zeromq 4.3.5 h3b0a872_7 conda-forge zict 3.0.0 pyhd8ed1ab_1 conda-forge zipp 3.21.0 pyhd8ed1ab_1 conda-forge zlib 1.3.1 hb9d3cd8_2 conda-forge zstandard 0.23.0 py312hef9b889_1 conda-forge zstd 1.5.6 ha6fb4c9_0 conda-forge ```

@bschilder
Copy link
Collaborator

Also tried passing the bed path in directly, but got a different error:

temp = TemporaryDirectory(suffix=".gvl")
gvl.write(
    path=temp.name,
    bed=bed_path,
    variants=variants,
    overwrite=True,
)
2025-05-09 15:31:57.689 | INFO     | genvarloader._dataset._write:write:75 - Writing dataset to [/tmp/tmppsiq446n.gvl](https://vscode-remote+ssh-002dremote-002bbamdev2-002ecshl-002eedu.vscode-resource.vscode-cdn.net/tmp/tmppsiq446n.gvl)
2025-05-09 15:31:57.690 | INFO     | genvarloader._dataset._write:write:82 - Found existing GVL store, overwriting.
2025-05-09 15:31:57.713 | INFO     | genoray._pgen:_read_index:1045 - Loading genoray index.
2025-05-09 15:31:58.225 | INFO     | genvarloader._dataset._write:write:148 - Using 451 samples.
2025-05-09 15:31:58.226 | INFO     | genvarloader._dataset._write:write:154 - Writing genotypes.
2025-05-09 15:31:58.648 | WARNING  | genvarloader._dataset._write:_write_from_pgen:437 - A region has no variants for any sample. This could be expected depending on the region lengths and source of variants. However, this can also be caused by a mismatch between the reference genome used for the BED file coordinates and the one used for the variants.
Processing genotypes for 165 regions on contig chr22:   1%|          | 1/165 [00:00<01:04,  2.55 region/s]

InvalidOperationError                     Traceback (most recent call last)
Cell In[11], [line 2](vscode-notebook-cell:?execution_count=11&line=2)
      [1](vscode-notebook-cell:?execution_count=11&line=1) temp = TemporaryDirectory(suffix=".gvl")
----> [2](vscode-notebook-cell:?execution_count=11&line=2) gvl.write(
      [3](vscode-notebook-cell:?execution_count=11&line=3)     path=temp.name,
      [4](vscode-notebook-cell:?execution_count=11&line=4)     bed=bed_path,
      [5](vscode-notebook-cell:?execution_count=11&line=5)     variants=variants,
      [6](vscode-notebook-cell:?execution_count=11&line=6)     overwrite=True,
      [7](vscode-notebook-cell:?execution_count=11&line=7) )

File ~/projects/GenVarLoader/python/genvarloader/_dataset/_write.py:160, in write(path, bed, variants, bigwigs, samples, max_jitter, overwrite, max_mem)
    [158](https://vscode-remote+ssh-002dremote-002bbamdev2-002ecshl-002eedu.vscode-resource.vscode-cdn.net/grid/koo/home/schilder/projects/VEP_protein/notebooks/~/projects/GenVarLoader/python/genvarloader/_dataset/_write.py:158) elif isinstance(variants, PGEN):
    [159](https://vscode-remote+ssh-002dremote-002bbamdev2-002ecshl-002eedu.vscode-resource.vscode-cdn.net/grid/koo/home/schilder/projects/VEP_protein/notebooks/~/projects/GenVarLoader/python/genvarloader/_dataset/_write.py:159)     variants.set_samples(samples)
--> [160](https://vscode-remote+ssh-002dremote-002bbamdev2-002ecshl-002eedu.vscode-resource.vscode-cdn.net/grid/koo/home/schilder/projects/VEP_protein/notebooks/~/projects/GenVarLoader/python/genvarloader/_dataset/_write.py:160)     gvl_bed = _write_from_pgen(path, gvl_bed, variants, max_mem)
    [161](https://vscode-remote+ssh-002dremote-002bbamdev2-002ecshl-002eedu.vscode-resource.vscode-cdn.net/grid/koo/home/schilder/projects/VEP_protein/notebooks/~/projects/GenVarLoader/python/genvarloader/_dataset/_write.py:161) elif isinstance(variants, SparseVar):
    [162](https://vscode-remote+ssh-002dremote-002bbamdev2-002ecshl-002eedu.vscode-resource.vscode-cdn.net/grid/koo/home/schilder/projects/VEP_protein/notebooks/~/projects/GenVarLoader/python/genvarloader/_dataset/_write.py:162)     gvl_bed = _write_from_svar(path, gvl_bed, variants, samples)

File ~/projects/GenVarLoader/python/genvarloader/_dataset/_write.py:533, in _write_from_pgen(path, bed, pgen, max_mem)
    [530](https://vscode-remote+ssh-002dremote-002bbamdev2-002ecshl-002eedu.vscode-resource.vscode-cdn.net/grid/koo/home/schilder/projects/VEP_protein/notebooks/~/projects/GenVarLoader/python/genvarloader/_dataset/_write.py:530) out[-1] = last_offset
    [531](https://vscode-remote+ssh-002dremote-002bbamdev2-002ecshl-002eedu.vscode-resource.vscode-cdn.net/grid/koo/home/schilder/projects/VEP_protein/notebooks/~/projects/GenVarLoader/python/genvarloader/_dataset/_write.py:531) out.flush()
--> [533](https://vscode-remote+ssh-002dremote-002bbamdev2-002ecshl-002eedu.vscode-resource.vscode-cdn.net/grid/koo/home/schilder/projects/VEP_protein/notebooks/~/projects/GenVarLoader/python/genvarloader/_dataset/_write.py:533) bed = bed.with_columns(chromEnd=pl.Series(max_ends))
    [534](https://vscode-remote+ssh-002dremote-002bbamdev2-002ecshl-002eedu.vscode-resource.vscode-cdn.net/grid/koo/home/schilder/projects/VEP_protein/notebooks/~/projects/GenVarLoader/python/genvarloader/_dataset/_write.py:534) return bed

File ~/.conda/envs/genome-loader/lib/python3.12/site-packages/polars/dataframe/frame.py:9830, in DataFrame.with_columns(self, *exprs, **named_exprs)
   [9684](https://vscode-remote+ssh-002dremote-002bbamdev2-002ecshl-002eedu.vscode-resource.vscode-cdn.net/grid/koo/home/schilder/projects/VEP_protein/notebooks/~/.conda/envs/genome-loader/lib/python3.12/site-packages/polars/dataframe/frame.py:9684) def with_columns(
   [9685](https://vscode-remote+ssh-002dremote-002bbamdev2-002ecshl-002eedu.vscode-resource.vscode-cdn.net/grid/koo/home/schilder/projects/VEP_protein/notebooks/~/.conda/envs/genome-loader/lib/python3.12/site-packages/polars/dataframe/frame.py:9685)     self,
   [9686](https://vscode-remote+ssh-002dremote-002bbamdev2-002ecshl-002eedu.vscode-resource.vscode-cdn.net/grid/koo/home/schilder/projects/VEP_protein/notebooks/~/.conda/envs/genome-loader/lib/python3.12/site-packages/polars/dataframe/frame.py:9686)     *exprs: IntoExpr | Iterable[IntoExpr],
   [9687](https://vscode-remote+ssh-002dremote-002bbamdev2-002ecshl-002eedu.vscode-resource.vscode-cdn.net/grid/koo/home/schilder/projects/VEP_protein/notebooks/~/.conda/envs/genome-loader/lib/python3.12/site-packages/polars/dataframe/frame.py:9687)     **named_exprs: IntoExpr,
   [9688](https://vscode-remote+ssh-002dremote-002bbamdev2-002ecshl-002eedu.vscode-resource.vscode-cdn.net/grid/koo/home/schilder/projects/VEP_protein/notebooks/~/.conda/envs/genome-loader/lib/python3.12/site-packages/polars/dataframe/frame.py:9688) ) -> DataFrame:
   [9689](https://vscode-remote+ssh-002dremote-002bbamdev2-002ecshl-002eedu.vscode-resource.vscode-cdn.net/grid/koo/home/schilder/projects/VEP_protein/notebooks/~/.conda/envs/genome-loader/lib/python3.12/site-packages/polars/dataframe/frame.py:9689)     """
   [9690](https://vscode-remote+ssh-002dremote-002bbamdev2-002ecshl-002eedu.vscode-resource.vscode-cdn.net/grid/koo/home/schilder/projects/VEP_protein/notebooks/~/.conda/envs/genome-loader/lib/python3.12/site-packages/polars/dataframe/frame.py:9690)     Add columns to this DataFrame.
   [9691](https://vscode-remote+ssh-002dremote-002bbamdev2-002ecshl-002eedu.vscode-resource.vscode-cdn.net/grid/koo/home/schilder/projects/VEP_protein/notebooks/~/.conda/envs/genome-loader/lib/python3.12/site-packages/polars/dataframe/frame.py:9691) 
   (...)
   [9828](https://vscode-remote+ssh-002dremote-002bbamdev2-002ecshl-002eedu.vscode-resource.vscode-cdn.net/grid/koo/home/schilder/projects/VEP_protein/notebooks/~/.conda/envs/genome-loader/lib/python3.12/site-packages/polars/dataframe/frame.py:9828)     └─────┴──────┴─────────────┘
   [9829](https://vscode-remote+ssh-002dremote-002bbamdev2-002ecshl-002eedu.vscode-resource.vscode-cdn.net/grid/koo/home/schilder/projects/VEP_protein/notebooks/~/.conda/envs/genome-loader/lib/python3.12/site-packages/polars/dataframe/frame.py:9829)     """
-> [9830](https://vscode-remote+ssh-002dremote-002bbamdev2-002ecshl-002eedu.vscode-resource.vscode-cdn.net/grid/koo/home/schilder/projects/VEP_protein/notebooks/~/.conda/envs/genome-loader/lib/python3.12/site-packages/polars/dataframe/frame.py:9830)     return self.lazy().with_columns(*exprs, **named_exprs).collect(_eager=True)

File ~/.conda/envs/genome-loader/lib/python3.12/site-packages/polars/_utils/deprecation.py:93, in deprecate_streaming_parameter.<locals>.decorate.<locals>.wrapper(*args, **kwargs)
     [89](https://vscode-remote+ssh-002dremote-002bbamdev2-002ecshl-002eedu.vscode-resource.vscode-cdn.net/grid/koo/home/schilder/projects/VEP_protein/notebooks/~/.conda/envs/genome-loader/lib/python3.12/site-packages/polars/_utils/deprecation.py:89)         kwargs["engine"] = "in-memory"
     [91](https://vscode-remote+ssh-002dremote-002bbamdev2-002ecshl-002eedu.vscode-resource.vscode-cdn.net/grid/koo/home/schilder/projects/VEP_protein/notebooks/~/.conda/envs/genome-loader/lib/python3.12/site-packages/polars/_utils/deprecation.py:91)     del kwargs["streaming"]
---> [93](https://vscode-remote+ssh-002dremote-002bbamdev2-002ecshl-002eedu.vscode-resource.vscode-cdn.net/grid/koo/home/schilder/projects/VEP_protein/notebooks/~/.conda/envs/genome-loader/lib/python3.12/site-packages/polars/_utils/deprecation.py:93) return function(*args, **kwargs)

File ~/.conda/envs/genome-loader/lib/python3.12/site-packages/polars/lazyframe/frame.py:2224, in LazyFrame.collect(self, type_coercion, _type_check, predicate_pushdown, projection_pushdown, simplify_expression, slice_pushdown, comm_subplan_elim, comm_subexpr_elim, cluster_with_columns, collapse_joins, no_optimization, engine, background, _check_order, _eager, **_kwargs)
   [2222](https://vscode-remote+ssh-002dremote-002bbamdev2-002ecshl-002eedu.vscode-resource.vscode-cdn.net/grid/koo/home/schilder/projects/VEP_protein/notebooks/~/.conda/envs/genome-loader/lib/python3.12/site-packages/polars/lazyframe/frame.py:2222) # Only for testing purposes
   [2223](https://vscode-remote+ssh-002dremote-002bbamdev2-002ecshl-002eedu.vscode-resource.vscode-cdn.net/grid/koo/home/schilder/projects/VEP_protein/notebooks/~/.conda/envs/genome-loader/lib/python3.12/site-packages/polars/lazyframe/frame.py:2223) callback = _kwargs.get("post_opt_callback", callback)
-> [2224](https://vscode-remote+ssh-002dremote-002bbamdev2-002ecshl-002eedu.vscode-resource.vscode-cdn.net/grid/koo/home/schilder/projects/VEP_protein/notebooks/~/.conda/envs/genome-loader/lib/python3.12/site-packages/polars/lazyframe/frame.py:2224) return wrap_df(ldf.collect(engine, callback))

InvalidOperationError: Series chromEnd, length 1 doesn't match the DataFrame height of 165

If you want expression: Series to be broadcasted, ensure it is a scalar (for instance by adding '.first()').

@d-laub
Copy link
Collaborator Author

d-laub commented May 9, 2025

ok thanks for taking a look! I can get to this tmrw

@d-laub
Copy link
Collaborator Author

d-laub commented May 10, 2025

was an upstream bug in reading PGEN data that should be fixed now, don't forget to bump your local genoray installation to 0.10.3

@d-laub d-laub merged commit 48f0cac into main May 16, 2025
3 checks passed
@d-laub d-laub deleted the dlaub/annot-tracks branch May 16, 2025 01:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

type: enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants