Official repository of the paper "Answering User Questions about Machine Learning Models through Standardized Model Cards"
pip install -r requirements.txt
We used Python 3.11 for this project.
To collect list of models and their discussions from Hugging Face Hub, run the following command from the data_collector directory.
python main.py
- List of models will be saved in
data/all_models.csvfile. - Discussions along with pull requests will be saved inside the
data/discussionsdirectory. The directory structure is as followed:
├── data: all the data generated after running the scripts are saved in this directory
│ ├── discussions: directory to save all discussions and pull requests
│ │ ├── <model_id>: model repository to save discussions and pull requests. the `/` in the `model_id` is replaced with '@'. an empty directory means there are no discussions and pull requests in the repository.
│ │ │ ├── discussion_<discussion_number>.yaml: a discussion file containing the discussion details
│ │ │ ├── pull_request_<pull_request_number>.yaml: a pull request file containing the pull request details
- A list of the downloaded discussions will be created in
data/all_discussions.csvfile
To select sample data for manual analysis, run the following command from the data_analyzer directory.
python random_discussion_selector.py
- 378 list of randomly selected discussions will be created in
data/all_random_discussions.csvfile.
To filter the random discussions in data/all_random_discussions.csv file, run the following command from data_cleaner directory.
python random_discussion_cleaner.py
- Filtered list of random discussions will be saved in
data/cleaned_random_discussions.csvfile.
To filter the models and all the discussions, run the following command from the data_cleaner directory.
python main.py
- Filtered list of models will be saved in
data/quality_models.csv. - Discussion list of the filtered models will be saved in
data/quality_models_discussions.csv. - Filtered list of discussions will be saved in
data/cleaned_discussions.csvfile.
To classify the filtered random discussion posts using gpt-3.5-turbo-0125, run the following command from the discussion_classifier directory
python random_discussion_classifier.py
Please note that you need to have an OpenAI API key to run the classification. The key should be saved in the OPENAI_API_KEY variable of the util/constants.py file.
- Classification will run 3 times, saving the results in
data/random_discussion_classificationdirectory. The result generated by GPT for each discussion will be saved in anmdfile in format<index>_<model_id>_<discussion_number>_result_gpt-3-5.md. The 3 runs' results will be saved inrun_1,run_2, andrun_3directories. - Classification results will also be saved in columns of
data/cleaned_random_discussions.csvfile in namecontains_question_run_<run_number>. - Final decision about the class will be saved in
data/cleaned_random_discussions.csvfile in namecontains_question_final_class.
Two authors individually manually identified if the sample discussions contain questions. The ground truth is available in data/gpt_sample_discussion_classification.xlsx file. 1st_author_classes and 2nd_author_classes contains the classes of the two authors and agreed_classes is their agreed classes. Their agreement is calculated using Cohen's Kappa and saved in cohens_kappa sheet. The disagreement resolution is saved in disagreement_resolution sheet.
Performance evaluation of GPT in classifying the sample discussion posts as question-containing post is available in the gpt_classification_evaluation sheet of data/gpt_sample_discussion_classification.xlsx file.
To classify all the filtered discussion posts using gpt-3.5-turbo-0125, run the following command from the discussion_classifier directory
python all_discussion_classifier.py
Please note that you need to have an OpenAI API key to run the classification. The key should be saved in the OPENAI_API_KEY variable of the util/constants.py file.
- Classification will run 3 times, saving the results in
data/all_discussion_classificationdirectory. The result generated by GPT for each discussion will be saved in anmdfile in format<index>_<model_id>_<discussion_number>_result_gpt-3-5.md. The 3 runs' results will be saved inrun_1,run_2, andtie_breakersdirectories. - Classification results will also be saved in columns of
data/cleaned_discussions.csvfile in namecontains_question_run_1,contains_question_run_2, andcontains_question_tie_breakeraccordingly. - Final decision about the class will be saved in
data/cleaned_discussions.csvfile in namecontains_question_final_class - List of question-containing discussions will be saved in
data/all_questions.csvfile.
To generate all the plots, run the following command from the plot_generator directory
python main.py
- Plots will be generated in
data/plotsdirectory inpdfandpngformat.
To train a BERTopic model on the discussion posts, first run the following command from the repository root
python -m spacy download en_core_web_sm
Then run the following command from the discussion_topic_modeller directory
python bertopic_topic_modeller.py
- Trained BERTopic model file
model_min_cluster_size_60will be saved indata/bertopic_model. - Our trained model is available here.
To save the representative topics and keywords for each topic, run the following command from the discussion_topic_modeller directory
python topic_analyzer.py
- Representative documents and keywords of the topics will be saved in
data/bertopic_model/topics/<topic_id>.mdfile.
To visualize the topics, run the discussion_topic_modeller/bertopic_topic_visualizer.ipynb notebook.
To visualize the clusters of the topics, run the following command from the discussion_topic_modeller directory
python topic_cluster_visualizer.py
- Topic ids of the same clusters will be printed in the console.
- Cluster visualization will be saved in
data/bertopic_model/model_min_cluster_size_60_hierarchy_plot.pdffile. The GPT generated labels for the topics have been used in the visualization. - Cluster visualization with our own labels will be generated in the
data/bertopic_model/custom_label_hierarchy_plot.pdffile. The labels are available in thedata/bertopic_model/topic_custom_label.csvfile.
Two authors individually manually mapped the questions to the model cards. The mapping result is available in data/manual_question_mapping.xlsx file. The 1st and 2nd authors' mapping results are saved in author1_labels and author2_labels sheet respectively. The disagreement resolution is saved in the resolution column of the author1_labels sheet. To calculate the inter-rater agreement, run the following command from the data_analyzer directory
python irr_calculator.py
- Kappa score of the 2 rounds of mapping will be printed in the console.