Discussion: Using WFGY 16-problem map as an evaluation pack for future LLM + FinRL integrations

Hi, and thanks for **FinRL-Library**. It has been very influential for people experimenting with deep reinforcement learning in finance.

I maintain an MIT-licensed open-source project called **WFGY** (~1.5k GitHub stars).  
One of its main components is a **16-problem “ProblemMap” for RAG and LLM pipelines**, which catalogues common failure modes in LLM based systems, especially when they are used for decision support.

ProblemMap overview:  
https://github.com/onestardao/WFGY/blob/main/ProblemMap/README.md

The map is already referenced by several external projects, for example Harvard MIMS Lab’s ToolUniverse and QCRI’s Multimodal RAG Survey, and by curated lists such as Awesome AI in Finance.

---

### Why this might be relevant for FinRL

FinRL itself focuses on **deep reinforcement learning for trading and portfolio management**, not on LLMs. However, in the broader AI4Finance ecosystem there are now FinGPT and FinRobot projects that bring LLM based analysis and agents into the same space.

In many realistic setups, people will combine:

- FinRL strategies or environments for execution and backtesting  
- FinGPT and FinRobot for research, explanations and decision support  
- custom glue code and RAG pipelines around all of the above  

In those hybrid stacks, it becomes important to evaluate not only RL policies but also **LLM based reasoning and retrieval layers** that sit on top of financial data and strategy outputs.

WFGY’s 16-problem map is designed exactly for that kind of LLM pipeline evaluation and could serve as an optional stress-test pack when FinRL is wrapped by agents and LLMs.

---

### Concrete idea

I am not proposing any change to the core FinRL library. Instead, I would like to ask:

- If, in the future, you plan to add official examples or docs that combine **FinRL with FinGPT or FinRobot**, would you be open to referencing WFGY as an **external evaluation and stress-test pack** for the LLM layer.  

If this sounds reasonable, I could later prepare a short example or note such as:

- “When you wrap FinRL with LLM agents, here is a checklist of typical failure modes in the LLM pipeline and how to tag them using WFGY ProblemMap.”

I am happy to keep this as a light documentation contribution and follow your preference on where, if anywhere, such a reference would fit.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Discussion: Using WFGY 16-problem map as an evaluation pack for future LLM + FinRL integrations #358

Why this might be relevant for FinRL

Concrete idea

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Discussion: Using WFGY 16-problem map as an evaluation pack for future LLM + FinRL integrations #358

Description

Why this might be relevant for FinRL

Concrete idea

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions