Paper Content:
Page 1:
“Silent Is Not Actually Silent” : An Investigation of Toxicity on Bug
Report Discussion
Mia Mohammad Imran*
Missouri University of Science and Technology
Rolla, Missouri, USA
imranm@mst.eduJaydeb Sarker*
University of Nebraska Omaha
Omaha, Nebraska, USA
jsarker@unomaha.edu
Abstract
Toxicity in bug report discussions poses significant challenges to
the collaborative dynamics of open-source software development.
Bug reports are crucial for identifying and resolving defects, yet
their inherently problem-focused nature and emotionally charged
context make them susceptible to toxic interactions. This study ex-
plores toxicity in GitHub bug reports through a qualitative analysis
of 203 bug threads, including 81 toxic ones. Our findings reveal
that toxicity frequently arises from misaligned perceptions of bug
severity and priority, unresolved frustrations with tools, and lapses
in professional communication. These toxic interactions not only
derail productive discussions but also reduce the likelihood of ac-
tionable outcomes, such as linking issues with pull requests. Our
preliminary findings offer actionable recommendations to improve
bug resolution by mitigating toxicity.
CCS Concepts
•Software and its engineering →Open source teams .
Keywords
toxicity, bug report, pull request discussions
ACM Reference Format:
Mia Mohammad Imran* and Jaydeb Sarker*. 2025. “Silent Is Not Actually
Silent” : An Investigation of Toxicity on Bug Report Discussion. In Compan-
ion Proceedings of the 33rd ACM Symposium on the Foundations of Software
Engineering (FSE ’25), June 23–27, 2025, Trondheim, Norway. ACM, New
York, NY, USA, Article FSE029, 5 pages. https://doi.org/10.1145/nnnnnnn.
nnnnnnn
1 Introduction
Bug reports serve as a means of communication between users and
developers [ 29,30]. They enable the identification and resolution
of software defects, driving improvements in functionality and
reliability [ 49]. Beyond their technical utility, bug reports help
foster collaboration and build trust among community members [ 19,
34]. They provide a structured medium for discussing software
issues, proposing solutions, assessing the severity of problems, and
prioritizing tasks [8, 18, 43, 45, 49].
* These authors contributed equally to this work.
Please use nonacm option or ACM Engage class to enable CC licenses
This work is licensed under a Creative Commons Attribution 4.0 International License.
FSE 2025, June 2025, Trondheim, Norway
©2025 Copyright held by the owner/author(s).
ACM ISBN 978-x-xxxx-xxxx-x/YY/MM
https://doi.org/10.1145/nnnnnnn.nnnnnnn
I do appreciate the time taken to be explicit with the desired behavior . I
don't see much value in making a trivial change to the description for the
install command at Microsoft Learn unless making that change would make
folks happy . [...]User One filed a bug report
[...] I am reporting that the existing feature --silent is broken [...]
--silent is supposed to be silent by name and documented to suppress all UI.
It's not and it doesn't. [...]
User Two (project contributor) dismissed it as trivial change
User Thr ee made a entitled reply
User One then accuse the project maintainers are lying
Conversation continues [...]
This is a bug... it does NOT do what it should... I tried adding every switch
I could find to silence the darn thing but couldn't... I'm confused with the
number of times this IS reported but flagged as a duplicate or ignored . So
not sure WHICH one I need to +1 on... but it IS annoying as it IS basic
functionality
[....] Ironically , this emphasizes even more how much of a bug this is [...]
At the very least stop lying to users and correct the documentation [...]Figure 1: Toxic Interaction in Bug Report Discussion (blue
marked text shows toxic phrase)
Bug reports require precise and professional communication, as
they often contain technical descriptions and demand clear, con-
structive feedback to facilitate issue resolution. However, bug re-
ports are not without challenges. Zimmermann etal. noted that
“there is a mismatch between what developers consider most helpful
and what users provide” in bug reports [ 49]. Prior research has re-
vealed that bug reports are frequently incomplete and ambiguous,
with their quality varying [ 9,11,47], which often leads to bug re-
ports not being addressed. Ko etal. reported that users became
dissatisfied when their bug reports were not addressed [29].
These frustrations can manifest in negative ways, including
toxic interactions and other antisocial behaviors. Users may ex-
press impatience, dissatisfaction, or hostility when their concerns
are dismissed or unresolved. Figure 1 illustrates a real-world bug
report where the project maintainer dismissed a bug as a trivial
change, sparking toxic comments. Toxicity in bug reports disrupts
communication, hampers collaboration, and may discourage users
from participating in or continuing to use the project. For example,
in this bug report [ 4], the problem was not addressed, and the user
was mocked by the owner. Subsequently, the user commented about
switching to a different technology. In another thread [ 2], a user
referenced market competition when a bug was not addressed ( “I
really don’t mean to be total a*s about it but that’s not how you win
market share. ” ). These incidents show how toxicity in bug reportsarXiv:2503.10072v1 [cs.SE] 13 Mar 2025
Page 2:
FSE 2025, June 2025, Trondheim, Norway Mia Mohammad Imran* and Jaydeb Sarker*
Table 1: Data Sampling
Issue Thread Count ToxiCR (≥0.5)LLaMA (≥0.7)
Total Thread 8,723 861 789
Selected Thread 203 186 165
Labeled as Toxic 81 80 59
can escalate, disrupt productive discourse, and ultimately derail the
collaborative efforts necessary for resolving software issues.
While prior research has investigated toxicity in various broader
Software Engineering contexts, including GitHub pull requests
(PRs) and issues threads [ 15,26,35,36,39–42], bug reports remain
an underexplored yet critical subset. Unlike other interactions, bug
reports frequently highlight deficiencies in software, which can
evoke emotions, such as frustration [ 23,49], and lead to toxicity [ 7].
To address the gap in analyzing toxicity on bug reports, we have
conducted a qualitative study on how toxicity manifests in bug
reports and impacts the developers’ bug resolution. To achieve this
goal, we manually investigated 203 bug threads, including 81 with
toxic interactions, which were selected using a stratified sampling
strategy. After a qualitative exploration, we have found several
notable findings:
•Toxicity arose when users perceived a disconnect between bug
severity and maintainers’ prioritization decisions.
•Toxic interactions hindered collaboration, with unresolved
threads less likely to yield pull requests or fixes.
•Toxicity extended beyond interpersonal interactions and in-
cluded negativity toward technology.
2 Methodology
In the following subsections, we have provided a brief description
of our research methodology.
2.1 Dataset Creation and Labeling
2.1.1 Data Mining. We selected GitHub-based repositories for our
study as it is the most popular open-source software project host-
ing platform [ 10,14]. Primarily, we selected the repositories from
GitHub for our analysis based on three criteria: i) at least 1000 stars
to ensure sufficient community interest; ii) active in 2024 (up to
September 5, 2024); and iii) active moderation was another key
criterion. We defined it as evidence of maintenance and oversight,
such as locking issues or the existence of a Code of Conduct in-
cluded in the repository. After applying these criteria, we found
257 GitHub projects.
Further, we sampled from the repositories chosen based on the
following criteria: i) all comments within threads must be in English
to ensure consistency and accessibility; ii) created in 2024; iii) the
issue/PR threads have at least 2 comments; and iv) each thread
associated with a GitHub label related to bugs, such as bug,defeat ,
crash ,triage orfix[24,28]. Finally, we ended up with a total of
100 repositories and 8,723 threads with 91,929 comments.
2.1.2 Toxicity Classification. Since toxic conversations are rare in
SE texts [ 35,40], we employed automated tools to detect toxicity in
the selected comments on our dataset. To improve the accuracy and
reliability of our toxicity detection process, we utilized two toxicity
detection methods. First, we selected ToxiCR [ 42], a state-of-the-art
SE-specific toxicity detector. Second, we applied the LLaMA modelto toxicity detection for our dataset. We provided the details in the
following:
ToxiCR: A cutting-edge toxicity detection tool referred to as
ToxiCR [ 42] has been specifically designed to detect toxic comments
in code reviews, which also got attention from SE researchers [ 37].
Since this tool is publicly available with a 95 .8%accuracy and 88 .9%
recall, we chose this toxicity detector for our analysis.
LLaMA Model: Since its inception, LLaMA [ 44] has become a
popular LLM used for various affective analysis tasks, including
toxicity detection [ 27,31]. We chose the LLaMA model as it is
an open-source model, unlike proprietary models like GPT-4. We
utilized LLaMA-3.1-70B version. We asked the model to predict
the likelihood of the conversation being toxic, given a thread on a
scale of 0 to 1. We utilized the definitions of toxicity provided by
Miller etal. [35] and incivility characteristics defined by Ferriera et
al. [15] and Ehsani etal. [13]. The detailed prompt is included in
the replication package [25].
Dataset Sampling: According to the authors of ToxiCR [ 42],
a text would be marked as toxic if it scores ≥0.5. Hence, we set a
threshold of ToxiCR as ≥0.5and LLaMA toxicity detection as ≥0.7.
We found 148 threads using these thresholds where both ToxiCR
and LLaMA detected at least one toxic conversation. To expand our
dataset size, we further sampled 55 threads where either ToxiCR
or LLaMA has a toxicity score ≥0.99. Thus, we ended up with 203
threads, forming the basis of our toxicity analysis in bug reports.
Table 1 shows the dataset selection for our analysis.
Manual Labeling: Toxicity is a complex phenomenon and varies
according to different diverse demographic factors [ 32]. Moreover,
the perception of toxicity differs in the SE domain from other social
media platforms [ 35]. Therefore, two raters (authors of this paper)
carefully reviewed prior literature on toxicity and other antisocial
behaviors [ 16,35,40] in SE to reduce bias during manual labeling.
They set five rounds of discussion sessions and collaboratively
labeled the 203 threads [38]. They reviewed each thread, carefully
read the comments, and marked the thread as toxic if at least one
comment was labeled as toxic by two raters. At the end of this
process, we found 81 toxic threads.
2.2 Qualitative Analysis Of Bug Threads
Inspired by recent works with toxicity and incivility analysis in the
SE domain, we have conducted a qualitative analysis of selected
toxic threads [ 15,35]. We have manually analyzed which types of
toxicity are more likely to occur in bug report discussions. We have
found a total of 11 sub-categories of toxicity in our dataset. Table 2
shows our findings and subcategories of toxicity in GitHub bug
discussions. We have also provided the mapping of subcategories
in the second column of table 2 where two of them are not pre-
viously reported in any SE studies – ‘ Toxicity Towards Tools ’ and
‘Profane Technology Naming ’. Developers sometimes get frustrated
with using some existing tools and use toxic comments toward
that technology, which is referred to as the ‘ Toxicity Towards Tools ’
category. Furthermore, we have found that the use of profane nam-
ing, including variable name method name, is an unprofessional
practice, and we have defined that category as ‘ Profane Technology
Naming ’. Further, two of our raters manually went through all the
toxic threads and looked into the following factors:
Page 3:
“Silent Is Not Actually Silent” : An Investigation of Toxicity on Bug Report Discussion FSE 2025, June 2025, Trondheim, Norway
•Bug Resolved : Whether the bug was resolved after the existence
of toxic comments in the thread.
•Authors of Toxicity : We examined the authorship character-
istics of toxic comments based on their membership within the
project. GitHub provides the labels of the authors [ 17]. Authors
were classified as internal participants if they were contribu-
tors, collaborators, members, and owners. Otherwise, they were
categorized as external participants.
•Toxic Reply : We considered a toxic reply when a developer
responded with toxic text after receiving a toxic comment from
peers within the same thread.
3 Early Findings and Observations
In this section, we present the preliminary findings from our quali-
tative analysis of 81 toxic issue threads. Several noteworthy obser-
vations emerged, including those summarized in Table 2 and found
in our analysis, which are discussed in detail below.
I. Subcategories of Toxicity : Table 2 shows 11 toxic categories
that we found in our sample. Similar to prior studies on open source
projects, we have found that vulgarity, insult, unprofessional, and
entitlement are the most common categories encompassing 80%of
the samples. In 13 .58%cases, we observed toxicity directed toward
tools, often characterized by negative or derogatory descriptions.
While these comments were not aimed at individuals, they con-
tribute to an unprofessional atmosphere and reveal deeper frustra-
tions or dissatisfaction with the tools in question. For example, a
user talked about missing docs in NextAuth by remarking - “Some
examples would be awesome, docs or otherwise. The amount of
hours wasted on nextauth is mindboggling. [...]” [ 5]. We observed
that 2 of our threads contain tool’s naming containing profanity
(e.g., thef*ck). While these may not be intended to be toxic, they
are inappropriate in a professional environment.
II. Bug Report Authors and External Participants Use More
Toxicity : We observed that 41/81 (50.61%) toxic comments were
authored by the same individual who created the bug report thread.
This suggests that bug report authors are more likely to initiate
toxic comments than other discussion participants. This behavior
could stem from their heightened emotional investment in the
issue, frustration over perceived neglect, or dissatisfaction with
the responses they receive. Additionally, 56/81 (69.31%) first toxic
comment authors were external participants. Though we worked
with bug report threads, our results support the findings of a prior
study [ 35]. An external participant may have many expectations and
do not need to follow the Code of Conduct, which may trigger the
use of more toxic comments than an internal participant. However,
the internal participants authored 30.69% of toxic comments, which
is not non-trivial, which indicates that active contributors could
also be significant contributors to toxic discourse.
III. Bug Severity and Bug Priority : According to our observa-
tion, toxicity emerged when users perceived a bug as critical or
a “showstopper,” while maintainers either dismissed it or failed to
address it promptly. This disconnect between users’ perception of
severity and maintainers’ prioritization sparked frustration and led
to toxic interactions. We found 17 threads where this happened. For
example, in one bug report [ 3], a user commented - “this is such
an annoying bug, this has existed for nearly half a decade; but thepeople working on this are more obsessed with useless things like
server components.”. Users occasionally questioned why certain
bugs were not prioritized. For instance, one user [ 6], commented -
“yes its frickin annoying. my whole project is on hold now.”
IV. Bug Resolution Rate: Unsurprisingly, our analysis suggests
that the presence of toxic comments negatively impacts the bug
resolution rate. Among our sample, we found that a total of 36
bug threads ( 45%) were resolved. However, in threads containing
vulgarity, insults, entitlement, and expressions of frustration, the
bug resolution rate drops to approximately one-third, highlight-
ing the detrimental effect of toxicity on resolution outcomes. This
finding would help the project managers mitigate specific toxicity
subcategories in the project discussions.
V. Absence of PRs from Toxic Threads : We observed that only
2 bug threads were PRs. This indicates that when a pull request
addresses a bug, it is less likely to become toxic. One possible expla-
nation is that pull requests are often tied to tangible progress and
positive outcomes, such as bug fixes, which tend to elicit satisfaction
and collaborative behavior rather than conflict [46, 48].
VI. Issue Bug Threads Linkage with PRs : GitHub bug reports
that require fixes are often associated with pull requests [ 12,33].
This indicates the bug reports are addressed in those issues. We
found that 23/79 (29.11%) toxic bug report issues were linked with a
PR that is lower than the percentage reported by Li etal. [33] (40.4%)
and de Souza etal. [12] (35.7%). This discrepancy suggests that toxic
bug threads are less likely to result in actionable fixes through PRs,
indicating that they may be deprioritized or avoided altogether
by the maintainers. Consequently, toxic threads are less likely to
be addressed and more likely to remain unresolved, perpetuating
frustration and dissatisfaction among users. For example, in one
of the threads that was not linked to PRs, a collaborator provided
a workaround solution, and it was promptly rejected by a user -
“No. It’s not. It’s sloppiness and incompetence. And it’s even less
acceptable when it’s actively used as an excuse not to revert a
broken commit. [...]” [1].
VII. Subsequent Toxic Comments : In 24/81 (29.63%) cases, we
observed subsequent toxic comments following the initial instance
of toxicity. Only 6/24 cases, the actual problem faced in the issue,
was resolved. This phenomenon was particularly evident in discus-
sions where the initial comments contained insults or entitlement.
Such interactions often created a cascading effect, escalating the
negativity and derailing the conversation. For example, as illus-
trated in Figure 1, a dismissive or aggressive remark led to further
toxic exchanges, making it difficult to refocus on the original issue.
VIII. Code of Conduct Enforcement : In 3 cases, the project
maintainers enforced the Code of Conduct after toxicity occurred.
4 Implications and Future Directions
We have several insights and recommendations based on our study.
Transparent and Automated Bug Severity and Priority Man-
agement System : A robust automated system for managing bug
severity and priority should ensure transparency and inclusivity
by providing clear triage messages and consistent, visible labels
such as “high-priority," “severity-high," or “needs clarification." Au-
tomated explanations for deprioritized or closed bugs, coupled with
features like user voting on severity and real-time updates, can align
Page 4:
FSE 2025, June 2025, Trondheim, Norway Mia Mohammad Imran* and Jaydeb Sarker*
Table 2: Subcategories of Toxicity and Associated Factors (E = External participants, I = Internal participants)
Type Mapping Example Count‡Resolved Toxic Reply
Vulgarity [16,35,
40]Why is it so hard to install this thing? It has been 3 f**ing
hours already!!!19 (E: 16, I: 3) 31 .6% 3
Insult [35, 40] It’s really very annoying to debug [...] Is anybody alive here?
Do you need reproduction? Or what?18 (E: 14, I: 4) 38 .9% 9
Unprofessional [35, 40] Darn it, now I can’t reproduce after a restart of the window
manager.17 (E:10, I:7) 70 .5% 0
Entitlement [35] IMO this is a disappointing outcome. No reliable guideline
for anything but dbus/now.14 (E:9, I:5) 35 .5% 9
Toxicity Towards
ToolsNew This means that Telegram Desktop [...] 5.0.1 and 5.0.2 con-
tains shameful destructive bug11 (E: 8, I: 3) 54 .5% 1
Mocking [16] This isn’t Windows compatible at all, are you joking? 4 (E: 4) 50% 2
Frustration [16] why the hell is no error message being reported, that is some
real bug within podman. I still need to debug this.3 (E: 1, I: 2) 33 .33% 1
Profane Technol-
ogy NamingNew plugins=(git.... thefuck.....) 2 (E:2) 0% 0
Identity Attack [40] And then we don’t have to pay Elon to message you... 1 (E:1) 100%
Threat [16, 40] I think you’re just trolling, so enjoy the ban and learn some
manners.1 (I:1) 100% 1
Arrogant [35] Whatever this function is supposed to achieve, it’s very weird
and messed up. Wouldn’t it make more sense to fix this
function like I suggested..2 (I:2) 100% 0
‡-Since a text can fall into multiple sub-categories, the total count across all categories exceeds the sample size
prioritization decisions with community needs while mitigating
frustration. Recent research underscores the value of integrating
social features with technical analysis to enhance bug-related pro-
cesses [ 20–22]. A comprehensive system would leverage machine
learning to predict severity, utilize sentiment analysis to assess
urgency, and establish feedback loops for maintainers to explain
decisions. Visual dashboards tracking bug resolution progress fur-
ther promote transparency. These measures would foster trust,
improve collaboration, and create a constructive environment for
bug reporting while reducing the potential for toxicity.
Code of Conduct (CoC) for External Users: An outsider from
a project tends to use for toxicity than a member of the project.
For that reason, project maintainers should apply the CoC when
creating the issues for external participants of the project.
Existing Tool Improvement and New Tool Development :
While leveraging state-of-the-art technologies to identify toxic
GitHub threads, we observed the presence of false positives, of-
ten caused by unique characteristics of bug reports, such as stack
traces. Stack traces, crucial for diagnosing root causes, frequently
include software engineering terminology that automated tools
misinterpret as toxic. For instance, terms like ‘.ass’ or ‘nerd-font’ in
Stack Trace triggered false toxicity detections. Future tools should
integrate domain-specific knowledge and context awareness to
differentiate technical jargon from toxic language, ensuring more
accurate detection without hindering productive discussions [22].
A large-scale empirical study : A potential research direction
would be working on a large-scale study, which is necessary to pro-
vide more concrete and actionable recommendations for mitigating
toxicity in bug report discussions. Such a study would enable thesoftware engineering community to better understand the impact
and associations of toxicity during the bug resolution process.
Limitation This study is limited to the analysis of 100 GitHub
projects. Using the repository level and thread level criteria, we
mitigate the external threat of selecting these repositories. We used
ToxiCR [ 42] and LLaMA ( LLaMA-3.1:70B ) [44] for the selection of
the toxic comments, which leads to construct validity. To mitigate
this, we manually analyze whether the comments are toxic or not.
However, we may miss some toxic comments that are not detected
by each of the tools. Since we chose GitHub-based projects for
analyzing the bug report toxicity, we may miss other software
developers’ platform where toxicity manifest differently. However,
GitHub is the most popular and diverse platform for OSS, and thus,
GitHub-based projects capture a wide range of toxic behaviors and
diverse community interactions.
5 Conclusion and Future Work
Toxicity in bug reports poses significant challenges to collaboration
and negatively impacts bug resolution rates. Through a qualitative
analysis of 81 bug threads, we observed that external members are
more prone to toxic comments, often driven by frustration over
perceived mismatches between bug severity and maintainers’ priori-
ties. These insights emphasize the importance of addressing toxicity
in project discussions to improve communication and efficiency.
Our findings serve as a foundation for future work on developing
automated systems to detect and manage toxicity in bug reports.
Expanding this research through large-scale empirical studies and
refining toxicity detection techniques remain essential goals for
enhancing collaborative environments in software development.
Page 5:
“Silent Is Not Actually Silent” : An Investigation of Toxicity on Bug Report Discussion FSE 2025, June 2025, Trondheim, Norway
References
[1]2024. containers/podman Issue 22190 . https://github.com/containers/podman/
issues/22190
[2]2024. containers/podman Issue 22363 . https://github.com/containers/podman/
issues/22363
[3]2024. facebook/react Issue 28584 . https://github.com/facebook/react/issues/28584
[4]2024. kovidgoyal/kitty Issue 7453 . https://github.com/kovidgoyal/kitty/issues/
7453
[5]2024. nextauthjs/next-auth Issue 10633 . https://github.com/nextauthjs/next-
auth/issues/10633
[6] 2024. strapi/strapi Issue 19339 . https://github.com/strapi/strapi/issues/19339
[7]Magdaléna Balážiková. 2019. Real-life frustration from virtual worlds: The
motivational potential of frustration. Acta Ludologica 2, 1 (2019), 56–68.
[8]Oscar Chaparro, Jing Lu, Fiorella Zampetti, Laura Moreno, Massimiliano Di Penta,
Andrian Marcus, Gabriele Bavota, and Vincent Ng. 2017. Detecting Missing
Information in Bug Descriptions. In Proceedings of the 2017 11th Joint Meeting on
Foundations of Software Engineering . ACM, USA, 396–407.
[9]Vedant Chauhan, Chetan Arora, Hourieh Khalajzadeh, and John Grundy. 2024.
How do software practitioners perceive human-centric defects? Information and
Software Technology 176 (2024), 107549. https://doi.org/10.1016/j.infsof.2024.
107549
[10] Kyle Daigle. 2023. Octoverse: The state of open source and rise of AI in 2023.
https://github.blog/2023-11-08-the-state-of-open-source-and-ai/.
[11] Steven Davies and Marc Roper. 2014. What’s in a bug report?. In Proceedings of
the 8th ACM/IEEE International Symposium on Empirical Software Engineering
and Measurement . 1–10.
[12] Cleidson RB de Souza, Emilie Ma, Jesse Wong, Dongwook Yoon, and Ivan
Beschastnikh. 2024. Revealing Software Development Work Patterns with PR-
Issue Graph Topologies. Proceedings of the ACM on Software Engineering 1, FSE
(2024), 2402–2423.
[13] Ramtin Ehsani, Rezvaneh Rezapour, and Preetha Chatterjee. 2023. Exploring
Moral Principles Exhibited in OSS: A Case Study on GitHub Heated Issues. In
Proceedings of the 31st ACM Joint European Software Engineering Conference and
Symposium on the Foundations of Software Engineering . 2092–2096.
[14] Hongbo Fang, Hemank Lamba, James Herbsleb, and Bogdan Vasilescu. 2022. “This
Is Damn Slick!” Estimating the Impact of Tweets on Open Source Project Popu-
larity and New Contributors. In International Conference on Software Engineering
(ICSE) . ACM. https://doi.org/10.1145/3510003.3510121
[15] Isabella Ferreira, Bram Adams, and Jinghui Cheng. 2022. How heated is it?
Understanding GitHub locked issues. In Proceedings of the 19th International
Conference on Mining Software Repositories . 309–320.
[16] Isabella Ferreira, Jinghui Cheng, and Bram Adams. 2021. The" shut the f** k up"
phenomenon: Characterizing incivility in open source code review discussions.
Proceedings of the ACM on Human-Computer Interaction 5, CSCW2 (2021), 1–35.
[17] GitHub. 2025. Roles in an organization . https://docs.github.com/en/organizations/
managing-peoples-access-to-your-organization-with-roles/roles-in-an-
organization
[18] Luiz Alberto Ferreira Gomes, Ricardo da Silva Torres, and Mario Lúcio Côrtes.
2019. Bug report severity level prediction in open source software: A survey and
research opportunities. Information and software technology 115 (2019), 58–78.
[19] Mariam Guizani, Aileen Abril Castro-Guzman, Anita Sarma, and Igor Steinmacher.
2023. Rules of Engagement: Why and How Companies Participate in OSS. In
2023 IEEE/ACM 45th ICSE . IEEE, 2617–2629.
[20] Yisi Han, Zhendong Wang, Yang Feng, Zhihong Zhao, and Yi Wang. 2024. Char-
acterizing Developers’ Linguistic Behaviors in Open Source Development across
Their Social Statuses. Proceedings of the ACM on Human-Computer Interaction 8,
CSCW1 (2024), 1–33.
[21] Zijie Huang, Zhiqing Shao, Guisheng Fan, Huiqun Yu, Kang Yang, and Ziyi Zhou.
2024. Bug report priority prediction using social and technical features. Journal
of Software: Evolution and Process 36, 6 (2024), e2616.
[22] Mia Mohammad Imran, Preetha Chatterjee, and Kostadin Damevski. 2024. Shed-
ding Light on Software Engineering-specific Metaphors and Idioms. In Proceed-
ings of the IEEE/ACM 46th International Conference on Software Engineering . 1–13.
[23] Mia Mohammad Imran, Preetha Chatterjee, and Kostadin Damevski. 2024. Un-
covering the causes of emotions in software developer communication using
zero-shot llms. In Proceedings of the IEEE/ACM 46th ICSE . 1–13.
[24] Mia Mohammad Imran, Agnieszka Ciborowska, and Kostadin Damevski. 2021.
Automatically selecting follow-up questions for deficient bug reports. In 2021
IEEE/ACM 18th International Conference on Mining Software Repositories (MSR) .
[25] Mia Mohammad Imran and Jaydeb Sarker. 2025. Replication Package of Toxicity
in Bug Reports. https://doi.org/10.5281/zenodo.15015619
[26] Mia Mohammad Imran, Robert Zita, Rebekah Copeland, Preetha Chatterjee,
Rahat Rizvi Rahman, and Kostadin Damevski. 2025. Understanding and Predicting
Derailment in Toxic Conversations on GitHub. arXiv preprint arXiv:2503.02191
(2025).
[27] Hakan Inan, Kartikeya Upasani, Jianfeng Chi, Rashi Rungta, Krithika Iyer, Yuning
Mao, Michael Tontchev, Qing Hu, Brian Fuller, Davide Testuggine, et al .2023.Llama guard: Llm-based input-output safeguard for human-ai conversations.
arXiv preprint arXiv:2312.06674 (2023).
[28] Rafael Kallis, Andrea Di Sorbo, Gerardo Canfora, and Sebastiano Panichella. 2021.
Predicting issue types on GitHub. Science of Computer Programming 205 (2021),
102598.
[29] Amy J Ko and Parmit K Chilana. 2010. How power users help and hinder open
bug reporting. In Proceedings of the SIGCHI Conference on Human Factors in
Computing Systems . 1665–1674.
[30] Amy J Ko and Parmit K Chilana. 2011. Design, discussion, and dissent in open
bug reports. In Proceedings of the 2011 iConference . 106–113.
[31] Hyukhun Koh, Dohyung Kim, Minwoo Lee, and Kyomin Jung. 2024. Can llms
recognize toxicity? a structured investigation framework and toxicity metric. In
Findings of the Association for Computational Linguistics: EMNLP 2024 . 6092–6114.
[32] Deepak Kumar, Patrick Gage Kelley, Sunny Consolvo, Joshua Mason, Elie
Bursztein, Zakir Durumeric, Kurt Thomas, and Michael Bailey. 2021. Design-
ing toxic content classification for a diversity of perspectives. In Seventeenth
Symposium on Usable Privacy and Security (SOUPS 2021) . 299–318.
[33] Lisha Li, Zhilei Ren, Xiaochen Li, Weiqin Zou, and He Jiang. 2018. How are issue
units linked? empirical study on the linking behavior in github. In 2018 25th
Asia-Pacific Software Engineering Conference (APSEC) . IEEE, 386–395.
[34] Jenny T Liang, Thomas Zimmermann, and Denae Ford. 2022. Understanding
skills for OSS communities on GitHub. In Proceedings of the 30th ACM Joint
European Software Engineering Conference and Symposium on the Foundations of
Software Engineering . 170–182.
[35] Courtney Miller, Sophie Cohen, Daniel Klug, Bogdan Vasilescu, and Christian
KaUstner. 2022. " Did you miss my comment or what?" understanding toxicity
in open source discussions. In Proceedings of the 44th International Conference on
Software Engineering . 710–722.
[36] Shyamal Mishra and Preetha Chatterjee. 2024. Exploring chatgpt for toxicity
detection in github. In Proceedings of the 2024 ACM/IEEE 44th International Con-
ference on Software Engineering: New Ideas and Emerging Results . 6–10.
[37] Md Shamimur Rahman, Zadia Codabux, and Chanchal K Roy. 2024. Do Words
Have Power? Understanding and Fostering Civility in Code Review Discussion.
Proceedings of the ACM on Software Engineering 1, FSE (2024), 1632–1655.
[38] Johnny Saldaña. 2021. The coding manual for qualitative researchers. (2021).
[39] Jaydeb Sarker, Sayma Sultana, Steven R Wilson, and Amiangshu Bosu. 2023.
ToxiSpanSE: An explainable toxicity detection in code review comments. In
2023 ACM/IEEE International Symposium on Empirical Software Engineering and
Measurement (ESEM) . IEEE, 1–12.
[40] Jaydeb Sarker, Asif Kamal Turzo, and Amiangshu Bosu. 2020. A benchmark study
of the contemporary toxicity detectors on software engineering interactions. In
2020 27th Asia-Pacific Software Engineering Conference (APSEC) . IEEE, 218–227.
[41] Jaydeb Sarker, Asif Kamal Turzo, and Amiangshu Bosu. 2025. The Landscape of
Toxicity: An Empirical Investigation of Toxicity on GitHub. Proceedings of the
ACM on Software Engineering FSE (2025), TBD.
[42] Jaydeb Sarker, Asif Kamal Turzo, Ming Dong, and Amiangshu Bosu. 2023. Au-
tomated identification of toxic code reviews using toxicr. ACM Transactions on
Software Engineering and Methodology 32, 5 (2023), 1–32.
[43] Yang Song, Junayed Mahmud, Ying Zhou, Oscar Chaparro, Kevin Moran, Andrian
Marcus, and Denys Poshyvanyk. 2022. Toward interactive bug reporting for
(android app) end-users. In Proceedings of the 30th ACM joint european software
engineering conference and symposium on the foundations of software engineering .
344–356.
[44] Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne
Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal
Azhar, et al .2023. Llama: Open and efficient foundation language models. arXiv
preprint arXiv:2302.13971 (2023).
[45] Wen-Yao Wang, Chen-Hao Wu, and Jie He. 2023. CLeBPI: Contrastive Learning
for Bug Priority Inference. Information and Software Technology 164 (2023),
107302.
[46] Emily Winter, David Bowes, Steve Counsell, Tracy Hall, Sæmundur Haraldsson,
Vesna Nowack, and John Woodward. 2022. How do developers really feel about
bug fixing? directions for automatic program repair. IEEE Transactions on Software
Engineering (2022).
[47] Nor Shahida Mohamad Yusop, John Grundy, and Rajesh Vasa. 2016. Reporting
usability defects: do reporters report what software developers need?. In Proceed-
ings of the 20th international conference on evaluation and assessment in software
engineering . 1–10.
[48] Xunhui Zhang, Yue Yu, Georgios Gousios, and Ayushi Rastogi. 2022. Pull re-
quest decisions explained: An empirical overview. IEEE Transactions on Software
Engineering 49, 2 (2022), 849–871.
[49] Thomas Zimmermann, Rahul Premraj, Nicolas Bettenburg, Sascha Just, Adrian
Schroter, and Cathrin Weiss. 2010. What makes a good bug report? IEEE Trans-
actions on Software Engineering 36, 5 (2010), 618–643.
Received 2025-01-16; accepted 2025-03-12