Skip to content

Commit 5e0195f

Browse files
committed
Add more models
2 parents 5793fc8 + a300548 commit 5e0195f

File tree

10 files changed

+720
-559
lines changed

10 files changed

+720
-559
lines changed

.env.example

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,9 @@ ANTHROPIC_API_KEY=""
44
GOOGLE_GEMINI_API_KEY=""
55
GROK_API_KEY=""
66
CEREBRAS_API_KEY=""
7+
TOGETHER_API_KEY=""
8+
ANYSCALE_API_KEY=""
9+
FIREWORKS_API_KEY=""
710
DISABLE_LLM="False"
811

912
# AWS credentials

README.md

Lines changed: 106 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -35,31 +35,40 @@ As opposed to RL models, which blindly take actions based on the reward function
3535

3636
# Results
3737

38-
Our experimentations (342 fights so far) led to the following leaderboard.
38+
Our experimentations (546 fights so far) led to the following leaderboard.
3939
Each LLM has an ELO score based on its results
4040

4141
## Ranking
4242

4343
### ELO ranking
4444

45-
| Model | Rating |
46-
| ------------------------------ | ------: |
47-
| 🥇openai:gpt-3.5-turbo-0125 | 1776.11 |
48-
| 🥈mistral:mistral-small-latest | 1586.16 |
49-
| 🥉openai:gpt-4-1106-preview | 1584.78 |
50-
| openai:gpt-4 | 1517.2 |
51-
| openai:gpt-4-turbo-preview | 1509.28 |
52-
| openai:gpt-4-0125-preview | 1438.92 |
53-
| mistral:mistral-medium-latest | 1356.19 |
54-
| mistral:mistral-large-latest | 1231.36 |
45+
| Rank | Model | Rating |
46+
| ---: | :----------------------------------------------------------------- | ------: |
47+
| 1 | 🥇openai:gpt-4o:text | 1912.5 |
48+
| 2 | 🥈**openai:gpt-4o-mini:vision** | 1835.27 |
49+
| 3 | 🥉openai:gpt-4o-mini:text | 1670.89 |
50+
| 4 | **openai:gpt-4o:vision** | 1656.93 |
51+
| 5 | **mistral:pixtral-large-latest:vision** | 1654.61 |
52+
| 6 | **mistral:pixtral-12b-2409:vision** | 1590.77 |
53+
| 7 | mistral:pixtral-12b-2409:text | 1569.03 |
54+
| 8 | together:meta-llama/Llama-3.2-90B-Vision-Instruct-Turbo:text | 1441.45 |
55+
| 9 | **anthropic:claude-3-haiku-20240307:vision** | 1364.87 |
56+
| 10 | mistral:pixtral-large-latest:text | 1356.32 |
57+
| 11 | anthropic:claude-3-haiku-20240307:text | 1333.6 |
58+
| 12 | **anthropic:claude-3-sonnet-20240229:vision** | 1314.61 |
59+
| 13 | **together:meta-llama/Llama-3.2-90B-Vision-Instruct-Turbo:vision** | 1269.84 |
60+
| 14 | anthropic:claude-3-sonnet-20240229:text | 1029.31 |
5561

5662
### Win rate matrix
5763

58-
![Win rate matrix](notebooks/win_rate_matrix.png)
64+
![Win rate matrix](notebooks/result_matrix.png)
5965

6066
# Explanation
6167

62-
Each player is controlled by an LLM.
68+
Each player can be controlled by a multimodal model or an text generating model.
69+
70+
### TextRobot
71+
6372
We send to the LLM a text description of the screen. The LLM decide on the next moves its character will make. The next moves depends on its previous moves, the moves of its opponents, its power and health bars.
6473

6574
- Agent based
@@ -68,6 +77,10 @@ We send to the LLM a text description of the screen. The LLM decide on the next
6877

6978
![fight3 drawio](https://github.com/OpenGenerativeAI/llm-colosseum/assets/78322686/3a212601-f54c-490d-aeb9-6f7c2401ebe6)
7079

80+
### VisionRobot
81+
82+
We send to the LLM a screenshot of the current state of the game precising which character he is controlling. His decision is only based on this visual information.
83+
7184
# Installation
7285

7386
- Follow instructions in https://docs.diambra.ai/#installation
@@ -142,43 +155,52 @@ By default, it runs mistral against mistral. To use other models, you need to ch
142155
from eval.game import Game, Player1, Player2
143156

144157
def main():
158+
# Environment Settings
159+
145160
game = Game(
146161
render=True,
147162
save_game=True,
148163
player_1=Player1(
149164
nickname="Baby",
150-
model="ollama:mistral", # change this
165+
model="ollama:mistral",
166+
robot_type="text", # vision or text
167+
temperature=0.7,
151168
),
152169
player_2=Player2(
153170
nickname="Daddy",
154-
model="ollama:mistral", # change this
171+
model="ollama:mistral",
172+
robot_type="text",
173+
temperature=0.7,
155174
),
156175
)
176+
157177
game.run()
158178
return 0
179+
180+
181+
if __name__ == "__main__":
182+
main()
159183
```
160184

161185
The convention we use is `model_provider:model_name`. If you want to use another local model than Mistral, you can do `ollama:some_other_model`
162186

163187
## How to make my own LLM model play? Can I improve the prompts?
164188

165-
The LLM is called in `Robot.call_llm()` method of the `agent/robot.py` file.
189+
The LLM is called in `<Text||Vision>Robot.call_llm()` method of the `agent/robot.py` file.
190+
191+
#### TextRobot method:
166192

167193
```python
168194
def call_llm(
169195
self,
170-
temperature: float = 0.7,
171196
max_tokens: int = 50,
172197
top_p: float = 1.0,
173-
) -> str:
198+
) -> Generator[ChatResponse, None, None]:
174199
"""
175200
Make an API call to the language model.
176201
177202
Edit this method to change the behavior of the robot!
178203
"""
179-
# self.model is a slug like mistral:mistral-small-latest or ollama:mistral
180-
provider_name, model_name = get_provider_and_model(self.model)
181-
client = get_sync_client(provider_name) # OpenAI client
182204

183205
# Generate the prompts
184206
move_list = "- " + "\n - ".join([move for move in META_INSTRUCTIONS])
@@ -197,28 +219,76 @@ Example if the opponent is far:
197219
- Fireball
198220
- Move closer"""
199221

200-
# Call the LLM
201-
completion = client.chat.completions.create(
202-
model=model_name,
203-
messages=[
204-
{"role": "system", "content": system_prompt},
205-
{"role": "user", "content": "Your next moves are:"},
206-
],
207-
temperature=temperature,
208-
max_tokens=max_tokens,
209-
top_p=top_p,
222+
start_time = time.time()
223+
224+
client = get_client(self.model, temperature=self.temperature)
225+
226+
messages = [
227+
ChatMessage(role="system", content=system_prompt),
228+
ChatMessage(role="user", content="Your next moves are:"),
229+
]
230+
resp = client.stream_chat(messages)
231+
232+
logger.debug(f"LLM call to {self.model}: {system_prompt}")
233+
logger.debug(f"LLM call to {self.model}: {time.time() - start_time}s")
234+
235+
return resp
236+
```
237+
238+
#### VisionRobot method:
239+
240+
```python
241+
def call_llm(
242+
self,
243+
max_tokens: int = 50,
244+
top_p: float = 1.0,
245+
) -> Generator[CompletionResponse, None, None]:
246+
"""
247+
Make an API call to the language model.
248+
249+
Edit this method to change the behavior of the robot!
250+
"""
251+
252+
# Generate the prompts
253+
move_list = "- " + "\n - ".join([move for move in META_INSTRUCTIONS])
254+
system_prompt = f"""You are the best and most aggressive Street Fighter III 3rd strike player in the world.
255+
Your character is {self.character}. Your goal is to beat the other opponent. You respond with a bullet point list of moves.
256+
257+
The current state of the game is given in the following image.
258+
259+
The moves you can use are:
260+
{move_list}
261+
----
262+
Reply with a bullet point list of 3 moves. The format should be: `- <name of the move>` separated by a new line.
263+
Example if the opponent is close:
264+
- Move closer
265+
- Medium Punch
266+
267+
Example if the opponent is far:
268+
- Fireball
269+
- Move closer"""
270+
271+
start_time = time.time()
272+
273+
client = get_client_multimodal(
274+
self.model, temperature=self.temperature
275+
) # MultiModalLLM
276+
277+
resp = client.stream_complete(
278+
prompt=system_prompt, image_documents=[self.last_image_to_image_node()]
210279
)
211280

212-
# Return the string to be parsed with regex
213-
llm_response = completion.choices[0].message.content.strip()
214-
return llm_response
281+
logger.debug(f"LLM call to {self.model}: {system_prompt}")
282+
logger.debug(f"LLM call to {self.model}: {time.time() - start_time}s")
283+
284+
return resp
215285
```
216286

217-
To use another model or other prompts, make a call to another client in this function, change the system prompt, or make any fancy stuff.
287+
You can personnalise your prompt in these functions.
218288

219289
### Submit your model
220290

221-
Create a new class herited from `Robot` that has the changes you want to make and open a PR.
291+
Create a new class herited from Robot that has the changes you want to make and open a PR.
222292

223293
We'll do our best to add it to the ranking!
224294

agent/llm.py

Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
from llama_index.core.llms.function_calling import FunctionCallingLLM
22
from llama_index.core.multi_modal_llms.base import MultiModalLLM
3+
import os
34

45

56
def get_client(model_str: str, temperature: float = 0.7) -> FunctionCallingLLM:
@@ -50,6 +51,36 @@ def get_client(model_str: str, temperature: float = 0.7) -> FunctionCallingLLM:
5051

5152
return Gemini(model=model_name, temperature=temperature)
5253

54+
elif provider == "anyscale":
55+
from llama_index.llms.openai import OpenAI
56+
57+
return OpenAI(
58+
model=model_name,
59+
temperature=temperature,
60+
api_key=os.environ.get("ANYSCALE_API_KEY"),
61+
api_base="https://api.endpoints.anyscale.com/v1/",
62+
)
63+
64+
elif provider == "fireworks":
65+
from llama_index.llms.openai import OpenAI
66+
67+
return OpenAI(
68+
model=model_name,
69+
temperature=temperature,
70+
api_key=os.environ.get("FIREWORKS_API_KEY"),
71+
api_base="https://api.fireworks.ai/inference/v1/",
72+
)
73+
74+
elif provider == "together":
75+
from llama_index.llms.openai import OpenAI
76+
77+
return OpenAI(
78+
model=model_name,
79+
temperature=temperature,
80+
api_key=os.environ.get("TOGETHER_API_KEY"),
81+
api_base="https://api.together.xyz/v1/",
82+
)
83+
5384
raise ValueError(f"Provider {provider} not found in models")
5485

5586

@@ -92,4 +123,34 @@ def get_client_multimodal(model_str: str, temperature: float = 0.7) -> MultiModa
92123

93124
return AnthropicMultiModal(model=model_name, temperature=temperature)
94125

126+
elif provider == "anyscale":
127+
from llama_index.multi_modal_llms.openai import OpenAIMultiModal
128+
129+
return OpenAIMultiModal(
130+
model=model_name,
131+
temperature=temperature,
132+
api_key=os.environ.get("ANYSCALE_API_KEY"),
133+
api_base="https://api.endpoints.anyscale.com/v1/",
134+
)
135+
136+
elif provider == "fireworks":
137+
from llama_index.multi_modal_llms.openai import OpenAIMultiModal
138+
139+
return OpenAIMultiModal(
140+
model=model_name,
141+
temperature=temperature,
142+
api_key=os.environ.get("FIREWORKS_API_KEY"),
143+
api_base="https://api.fireworks.ai/inference/v1/",
144+
)
145+
146+
elif provider == "together":
147+
from llama_index.multi_modal_llms.openai import OpenAIMultiModal
148+
149+
return OpenAIMultiModal(
150+
model=model_name,
151+
temperature=temperature,
152+
api_key=os.environ.get("TOGETHER_API_KEY"),
153+
api_base="https://api.together.xyz/v1/",
154+
)
155+
95156
raise ValueError(f"Provider {provider} not found in multimodal models")

local.py

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -19,12 +19,14 @@ def main():
1919
player_1=Player1(
2020
nickname="Baby",
2121
model="ollama:mistral",
22-
# model="ollama:mistral",
22+
robot_type="text", # vision or text
23+
temperature=0.7,
2324
),
2425
player_2=Player2(
2526
nickname="Daddy",
2627
model="ollama:mistral",
27-
# model="ollama:mistral",
28+
robot_type="text",
29+
temperature=0.7,
2830
),
2931
)
3032

notebooks/elo.md

Lines changed: 16 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,16 @@
1-
| | Model | Rating |
2-
| --: | :-------------------------------- | ------: |
3-
| 3 | openai:gpt-4o-mini | 1603.73 |
4-
| 2 | mistral:pixtral-12b-2409 | 1568.49 |
5-
| 1 | anthropic:claude-3-haiku-20240307 | 1524.71 |
6-
| 0 | openai:gpt-4o | 1524.58 |
7-
| 4 | mistral:pixtral-large-latest | 1278.49 |
1+
| Rank | Model | Rating |
2+
| ---: | :----------------------------------------------------------------- | ------: |
3+
| 1 | openai:gpt-4o:text | 1912.5 |
4+
| 2 | **openai:gpt-4o-mini:vision** | 1835.27 |
5+
| 3 | openai:gpt-4o-mini:text | 1670.89 |
6+
| 4 | **openai:gpt-4o:vision** | 1656.93 |
7+
| 5 | **mistral:pixtral-large-latest:vision** | 1654.61 |
8+
| 6 | **mistral:pixtral-12b-2409:vision** | 1590.77 |
9+
| 7 | mistral:pixtral-12b-2409:text | 1569.03 |
10+
| 8 | together:meta-llama/Llama-3.2-90B-Vision-Instruct-Turbo:text | 1441.45 |
11+
| 9 | **anthropic:claude-3-haiku-20240307:vision** | 1364.87 |
12+
| 10 | mistral:pixtral-large-latest:text | 1356.32 |
13+
| 11 | anthropic:claude-3-haiku-20240307:text | 1333.6 |
14+
| 12 | **anthropic:claude-3-sonnet-20240229:vision** | 1314.61 |
15+
| 13 | **together:meta-llama/Llama-3.2-90B-Vision-Instruct-Turbo:vision** | 1269.84 |
16+
| 14 | anthropic:claude-3-sonnet-20240229:text | 1029.31 |

0 commit comments

Comments
 (0)