-
-
Notifications
You must be signed in to change notification settings - Fork 333
Error when trying to use nlp.pipe with n_process > 1 #179
Copy link
Copy link
Open
Labels
bugenhancementgood first issueitems that are good as starting points for new contributorsitems that are good as starting points for new contributorshelp wanted
Description
Intro
I am getting TypeError: can not serialize 'BaseTextRank' object when trying to use spaCy's multiprocessing in nlp.pipe with a textrank pipeline component.
Sorry if this a known/expected feature/limitation - I couldn't find anything by searching repo. I generally find (spaCy's) multiprocessing a bit temperamental anyhow, but this seems to just not work.
PS. thanks for all the great work on the package!
Environment
Ubuntu 18.X (AWS DL AMI), Python 3.8 (via conda/mamba), pytextrank installed via pip, thtough conda - do let me know if you need more info.
Reproducible example - hopefullly
import spacy
import pytextrank
import en_core_web_sm
nlp = en_core_web_sm.load()
nlp.add_pipe("textrank", last=True);
txt = """
The Old Testament of the King James Bible
The First Book of Moses: Called Genesis
1:1 In the beginning God created the heaven and the earth.
1:2 And the earth was without form, and void; and darkness was upon
the face of the deep. And the Spirit of God moved upon the face of the
waters.
1:3 And God said, Let there be light: and there was light.
1:4 And God saw the light, that it was good: and God divided the light
from the darkness.
1:5 And God called the light Day, and the darkness he called Night.
And the evening and the morning were the first day.
...
"""
data = []
for i in range(50):
data.append((txt, {"doc_id": i}))
keys = []
for doc, context in nlp.pipe(data, as_tuples=True, n_process=-1): ## NOTE throws error, but hangs. work with n_process=1
out = {"doc_id": context["doc_id"], "keyphrases": [(phr.text, phr.rank) for phr in doc._.phrases]}
keys.append(out)
# pd.DataFrame(keys).head()
keysReactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugenhancementgood first issueitems that are good as starting points for new contributorsitems that are good as starting points for new contributorshelp wanted