Skip to content

Commit cf8b16d

Browse files
committed
Merge branch 'master' into patch-1
2 parents 369d839 + 183d78f commit cf8b16d

12 files changed

+37
-2
lines changed

.github/workflows/python-package.yml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -40,11 +40,12 @@ jobs:
4040
pip install pytest-cov
4141
pytest --cov=pysbd tests/ --color yes --cov-report=xml --cov-report=html
4242
- name: Upload coverage to Codecov
43-
uses: codecov/codecov-action@v1.0.7
43+
uses: codecov/codecov-action@v1
4444
with:
4545
token: ${{ secrets.CODECOV_TOKEN }}
4646
file: ./coverage.xml
4747
flags: unittests
4848
env_vars: OS,PYTHON
4949
name: codecov-umbrella
5050
fail_ci_if_error: true
51+

CONTRIBUTING.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -52,7 +52,7 @@ the system author uses to tag our issues and pull requests.
5252

5353
## Contributing to the code base
5454

55-
Happy to see you contibute to pySBD codebase. To help you get started and understand internals of pySBD, a good place to start is to refer to the implementation section of pySBD research paper (link to be added soon). Another great place for reference is to look at [merged pull requests](https://github.com/nipunsadvilkar/pySBD/pulls?q=is%3Apr+sort%3Aupdated-desc+is%3Amerged). Depending on the type of your contribution, refer to the assigned labels.
55+
Happy to see you contibute to pySBD codebase. To help you get started and understand internals of pySBD, a good place to start is to refer to the implementation section of [pySBD research paper](https://arxiv.org/abs/2010.09657). Another great place for reference is to look at [merged pull requests](https://github.com/nipunsadvilkar/pySBD/pulls?q=is%3Apr+sort%3Aupdated-desc+is%3Amerged). Depending on the type of your contribution, refer to the assigned labels.
5656

5757
### Getting started
5858
To make changes to pySBD's code base, you need to fork then clone the GitHub repository to your local machine. You'll need to make sure that you have a development environment consisting of a Python distribution including python 3+, pip and git installed.

README.md

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,4 @@
1+
![PySBD logo](artifacts/pysbd_logo.png?raw=true "pysbd logo")
12
# pySBD: Python Sentence Boundary Disambiguation (SBD)
23

34
![Python package](https://github.com/nipunsadvilkar/pySBD/workflows/Python%20package/badge.svg) [![codecov](https://codecov.io/gh/nipunsadvilkar/pySBD/branch/master/graph/badge.svg)](https://codecov.io/gh/nipunsadvilkar/pySBD) [![License](https://img.shields.io/badge/license-MIT-brightgreen.svg?style=flat)](https://github.com/nipunsadvilkar/pySBD/blob/master/LICENSE) [![PyPi](https://img.shields.io/pypi/v/pysbd?color=blue&logo=pypi&logoColor=white)](https://pypi.python.org/pypi/pysbd) [![GitHub](https://img.shields.io/github/v/release/nipunsadvilkar/pySBD.svg?include_prereleases&logo=github&style=flat)](https://github.com/nipunsadvilkar/pySBD)
@@ -8,6 +9,21 @@ This project is a direct port of ruby gem - [Pragmatic Segmenter](https://github
89

910
![pysbd_code](artifacts/pysbd_code.png?raw=true "pysbd_code")
1011

12+
## Highlights
13+
**'PySBD: Pragmatic Sentence Boundary Disambiguation'** a short research paper got accepted into 2nd Workshop for Natural Language Processing Open Source Software (NLP-OSS) at EMNLP 2020. </br>
14+
15+
**Research Paper:**</br>
16+
17+
https://arxiv.org/abs/2010.09657</br>
18+
19+
**[Recorded Talk:](https://slideslive.com/38939754)**</br>
20+
21+
[![pysbd_talk](artifacts/pysbd_talk.png)](https://slideslive.com/38939754)</br>
22+
23+
**Poster:**</br>
24+
25+
[![name](artifacts/pysbd_poster.png)](artifacts/pysbd_poster.png)
26+
1127
## Install
1228

1329
**Python**
@@ -59,6 +75,24 @@ If you want to contribute new feature/language support or found a text that is i
5975
4. Push to the branch (`git push origin my-new-feature`)
6076
5. Create a new Pull Request
6177

78+
## Citation
79+
If you use `pysbd` package in your projects or research, please cite [PySBD: Pragmatic Sentence Boundary Disambiguation](https://www.aclweb.org/anthology/2020.nlposs-1.15).
80+
```
81+
@inproceedings{sadvilkar-neumann-2020-pysbd,
82+
title = "{P}y{SBD}: Pragmatic Sentence Boundary Disambiguation",
83+
author = "Sadvilkar, Nipun and
84+
Neumann, Mark",
85+
booktitle = "Proceedings of Second Workshop for NLP Open Source Software (NLP-OSS)",
86+
month = nov,
87+
year = "2020",
88+
address = "Online",
89+
publisher = "Association for Computational Linguistics",
90+
url = "https://www.aclweb.org/anthology/2020.nlposs-1.15",
91+
pages = "110--114",
92+
abstract = "We present a rule-based sentence boundary disambiguation Python package that works out-of-the-box for 22 languages. We aim to provide a realistic segmenter which can provide logical sentences even when the format and domain of the input text is unknown. In our work, we adapt the Golden Rules Set (a language specific set of sentence boundary exemplars) originally implemented as a ruby gem pragmatic segmenter which we ported to Python with additional improvements and functionality. PySBD passes 97.92{\%} of the Golden Rule Set examplars for English, an improvement of 25{\%} over the next best open source Python tool.",
93+
}
94+
```
95+
6296
## Credit
6397

6498
This project wouldn't be possible without the great work done by [Pragmatic Segmenter](https://github.com/diasks2/pragmatic_segmenter) team.
Binary file not shown.
1.13 MB
Binary file not shown.
157 KB
Loading

artifacts/pysbd_code2.png

52.2 KB
Loading

artifacts/pysbd_code3.png

82.9 KB
Loading

artifacts/pysbd_code_example.png

83.3 KB
Loading

artifacts/pysbd_logo.png

53 KB
Loading

0 commit comments

Comments
 (0)