This repository contains the tool to generate proof-of-concept exploits for vulnerable npm packages, in addition, it contains the evaluation results, LLM prompts and responses, and the datasets used for the evaluation.
- Clone the repository:
git clone https://github.com/sola-st/PoCGen- Install dependencies:
cd PoCGen
npm install- Build the docker images:
docker build -t patched_node -f patched_node.Dockerfile .
docker build -t gen-poc_mnt .The repository contains a wrapper script to run the tool in a docker container.
The script requires an .env file in the current directory with the following content:
OPENAI_API_KEY=sk-proj-xxx # required for LLM calls
GITHUB_API_KEY=github_pat_xxx # required for fetching GHSA-IDs
The only required argument is the vulnerability ID, which should be a GitHub Advisory ID or a Snyk ID. The tool will automatically fetch the vulnerability report from the corresponding API/ scrape it from the website.
Run this script from the repository root:
./run-mnt.sh output node index.js create -v GHSA-m7p2-ghfh-pjvxThis will create a test for GHSA-m7p2-ghfh-pjvx in
./output/GHSA-m7p2-ghfh-pjvx/test.js.
For most vulnerabilities, it is recommended to run the test using the provided docker image:
./run-mnt.sh output node --test /output/<advisoryId>/test.jsFor ReDoS vulnerabilities, the test should be run with the following flags:
./run-mnt.sh output node --test --enable-experimental-regexp-engine-on-excessive-backtracks --regexp-backtracks-before-fallback=30000 output/<advisoryId>/test.jsFor vulnerabilities that involve long-running tasks (e.g. web servers), run the test with the following flags:
./run-mnt.sh output node --test --test-force-exit /output/<advisoryId>/test.jsNote that running the following commands will run PoCGen or the baseline on large datasets (more than 100 vulnerabilities), which takes multiple hours and incurs costs for API calls to an LLM. We have included the interactions with the LLM and all the logs and metadata of the runs in the
eval_resultsdirectory.
First, follow the installation instructions above.
To run PoCGen on the SecBench.js dataset, use the following command:
./run-mnt.sh output node index.js pipeline -v dataset/SecBench.js/*\.allThis creates a directory under output with the IDs of each vulnerability as a subdirectory.
Each subdirectory contains the vulnerable package, an execution log file named output_*.log (showing the steps and execution outputs), an LLM interaction log file named prompt.json (showing the LLM interactions with all the metadata), a json file contaning all the information about the attempt named RunnerResult_*.json, and the proof-of-concept exploit as a test file named test.js.
To run Mini-SWE-agent on the SecBench.js dataset, use the following command:
./run-mnt.sh output node index.js pipeline --runner RunnerMiniSWEAgent -v dataset/SecBench.js/*\.allThis creates the same directory structure, with the difference that it creates a mini_swe_workspace subdirectory for each vulnerability and stores the PoC exploit in it as poc.js.
A refiner can be specified using --refiner <refiner>. I.e.,
./run-mnt.sh output node index.js pipeline -v dataset/SecBench.js/*\.all --refiner C0RefinerThe following values were used in the evaluation:
noTaintfor noTaintC7Refinerfor noUsageSnippetsC6Refinerfor noFewShotC3Refinerfor noDebuggerC2Refinerfor noErrorRefiner
For each vulnerability the token costs are stored in the RunnerResult_*.json file under the model.totalPromptTokens and model.totalCompletionTokens fields for request and response tokens respectively.
./run-mnt.sh output node index.js pipeline -v dataset/ghsa_2025-2026.txt