Skip to content

[Bug]: KeyError: 'text_formatter' #965

@LRY1994

Description

@LRY1994

Before Reporting

  • I have asked the Data-Juicer Q&A Copilot (available on Doc Site, DingTalk, or Discord), but the problem still persists.

  • I have pulled the latest code of main branch to run again and the bug still existed.

  • I have read the README carefully and no error occurred during the installation process. (Otherwise, we recommend that you can ask a question using the Question template)

Search before reporting

  • I have searched the Data-Juicer issues and found no similar bugs.

OS

ubuntu

Installation Method

source

Data-Juicer Version

latest

Python Version

3.10

Describe the bug

KeyError: 'text_formatter'
Traceback (most recent call last):
File "/opt/app-data/workspace/data-juicer-latest/tools/process_data.py", line 45, in
main()
File "/home/ray/anaconda3/lib/python3.12/site-packages/loguru/_logger.py", line 1297, in catch_wrapper
return function(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/app-data/workspace/data-juicer-latest/tools/process_data.py", line 41, in main
executor.run()
File "/opt/app-data/workspace/data-juicer-latest/data_juicer/core/executor/default_executor.py", line 166, in run
ops = load_ops(self.cfg.process)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/app-data/workspace/data-juicer-latest/data_juicer/ops/load.py", line 19, in load_ops
ops.append(OPERATORS.modulesop_name)

KeyError: 'text_formatter'

### To Reproduce

python  data-juicer-latest/tools/process_data.py --config extract.txt_config.yaml

### Configs

dataset_path: '/opt/app-data/skill-nlp-test/test.txt'
export_path: '/opt/app-data/skill-nlp-test/export/extract.txt.jsonl'
process:
 - text_formatter:
     suffixes:  '.txt'
 - text_chunk_mapper:
     split_pattern: '\n\n'

### Logs

KeyError: 'text_formatter'
Traceback (most recent call last):
File "/opt/app-data/workspace/data-juicer-latest/tools/process_data.py", line 45, in <module>
main()
File "/home/ray/anaconda3/lib/python3.12/site-packages/loguru/_logger.py", line 1297, in catch_wrapper
return function(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/app-data/workspace/data-juicer-latest/tools/process_data.py", line 41, in main
executor.run()
File "/opt/app-data/workspace/data-juicer-latest/data_juicer/core/executor/default_executor.py", line 166, in run
ops = load_ops(self.cfg.process)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/app-data/workspace/data-juicer-latest/data_juicer/ops/load.py", line 19, in load_ops
ops.append(OPERATORS.modules[op_name](**args))
~~~~~~~~~~~~~~~~~^^^^^^^^^
KeyError: 'text_formatter'

### Screenshots

_No response_

### Expected Behavior

_No response_

### Additional Context

_No response_

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions