Before Reporting
Search before reporting
OS
ubuntu
Installation Method
source
Data-Juicer Version
latest
Python Version
3.10
Describe the bug
KeyError: 'text_formatter'
Traceback (most recent call last):
File "/opt/app-data/workspace/data-juicer-latest/tools/process_data.py", line 45, in
main()
File "/home/ray/anaconda3/lib/python3.12/site-packages/loguru/_logger.py", line 1297, in catch_wrapper
return function(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/app-data/workspace/data-juicer-latest/tools/process_data.py", line 41, in main
executor.run()
File "/opt/app-data/workspace/data-juicer-latest/data_juicer/core/executor/default_executor.py", line 166, in run
ops = load_ops(self.cfg.process)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/app-data/workspace/data-juicer-latest/data_juicer/ops/load.py", line 19, in load_ops
ops.append(OPERATORS.modulesop_name)
KeyError: 'text_formatter'
### To Reproduce
python data-juicer-latest/tools/process_data.py --config extract.txt_config.yaml
### Configs
dataset_path: '/opt/app-data/skill-nlp-test/test.txt'
export_path: '/opt/app-data/skill-nlp-test/export/extract.txt.jsonl'
process:
- text_formatter:
suffixes: '.txt'
- text_chunk_mapper:
split_pattern: '\n\n'
### Logs
KeyError: 'text_formatter'
Traceback (most recent call last):
File "/opt/app-data/workspace/data-juicer-latest/tools/process_data.py", line 45, in <module>
main()
File "/home/ray/anaconda3/lib/python3.12/site-packages/loguru/_logger.py", line 1297, in catch_wrapper
return function(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/app-data/workspace/data-juicer-latest/tools/process_data.py", line 41, in main
executor.run()
File "/opt/app-data/workspace/data-juicer-latest/data_juicer/core/executor/default_executor.py", line 166, in run
ops = load_ops(self.cfg.process)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/app-data/workspace/data-juicer-latest/data_juicer/ops/load.py", line 19, in load_ops
ops.append(OPERATORS.modules[op_name](**args))
~~~~~~~~~~~~~~~~~^^^^^^^^^
KeyError: 'text_formatter'
### Screenshots
_No response_
### Expected Behavior
_No response_
### Additional Context
_No response_
Before Reporting
I have asked the Data-Juicer Q&A Copilot (available on Doc Site, DingTalk, or Discord), but the problem still persists.
I have pulled the latest code of main branch to run again and the bug still existed.
I have read the README carefully and no error occurred during the installation process. (Otherwise, we recommend that you can ask a question using the Question template)
Search before reporting
OS
ubuntu
Installation Method
source
Data-Juicer Version
latest
Python Version
3.10
Describe the bug
KeyError: 'text_formatter'
Traceback (most recent call last):
File "/opt/app-data/workspace/data-juicer-latest/tools/process_data.py", line 45, in
main()
File "/home/ray/anaconda3/lib/python3.12/site-packages/loguru/_logger.py", line 1297, in catch_wrapper
return function(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/app-data/workspace/data-juicer-latest/tools/process_data.py", line 41, in main
executor.run()
File "/opt/app-data/workspace/data-juicer-latest/data_juicer/core/executor/default_executor.py", line 166, in run
ops = load_ops(self.cfg.process)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/app-data/workspace/data-juicer-latest/data_juicer/ops/load.py", line 19, in load_ops
ops.append(OPERATORS.modulesop_name)