Skip to content

Commit f21da2f

Browse files
committed
Update readme on where to find BPE rank file download link
1 parent 44cc0d6 commit f21da2f

File tree

2 files changed

+6
-3
lines changed

2 files changed

+6
-3
lines changed

README.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ The TokenizerLib is built in .NET Standard 2.0, which can be consumed in project
88

99
You can download and install the nuget package of TokenizerLib [here](https://www.nuget.org/packages/Microsoft.DeepDev.TokenizerLib/).
1010

11-
Example C# code to use TokenizerLib in your code. In production setting, you should pre-download the BPE rank file and call `TokenizerBuilder.CreateTokenizer` API to avoid downloading the BPE rank file on the fly.
11+
Example C# code to use TokenizerLib in your code:
1212
```csharp
1313
using System.Collections.Generic;
1414
using Microsoft.DeepDev;
@@ -29,6 +29,8 @@ Console.WriteLine(encoded.Count);
2929
var decoded = tokenizer.Decode(encoded.ToArray());
3030
Console.WriteLine(decoded);
3131
```
32+
In production setting, you should pre-download the BPE rank file and call `TokenizerBuilder.CreateTokenizer` API to avoid downloading the BPE rank file on the fly.
33+
You can find the model to encoder and encoder to BPE rank file link mapping in: [TokenizerBuilder.cs](https://github.com/microsoft/Tokenizer/blob/44cc0d603b22483abcc71310e25b8b3746f32cd9/Tokenizer_C%23/TokenizerLib/TokenizerBuilder.cs#L107).
3234

3335
## C# performance benchmark
3436

tokenizer_ts/README.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -18,8 +18,7 @@ Install the npm package in your project:
1818
npm install @microsoft/tiktokenizer
1919
```
2020

21-
Example Typescript code to use @microsoft/tiktokenizer in your code. In production setting, you should pre-download the BPE rank file and call `createTokenizer` API to avoid downloading the BPE rank file on the fly.
22-
21+
Example Typescript code to use @microsoft/tiktokenizer in your code:
2322
```typescript
2423
import {
2524
createByModelName
@@ -48,6 +47,8 @@ const createTokenizer = async () => {
4847
createTokenizer();
4948

5049
```
50+
In production setting, you should pre-download the BPE rank file and call `createTokenizer` API to avoid downloading the BPE rank file on the fly.
51+
You can find the model to encoder and encoder to BPE rank file link mapping in: [tokenizerBuilder.ts](https://github.com/microsoft/Tokenizer/blob/44cc0d603b22483abcc71310e25b8b3746f32cd9/tokenizer_ts/src/tokenizerBuilder.ts#L201).
5152

5253
# Contributing
5354

0 commit comments

Comments
 (0)