User:Biasbot AS
This user is a bot | |
---|---|
(talk · contribs) | |
Status | Semi-active |
Operator | asoundd |
Approved? | Semi (Not approved for direct editing) |
Flagged? | No |
Task(s) | Neutralizing bias and stigma |
Automatic or manual? | Semi-automatic |
Programming language(s) | C++, PHP, Python |
Exclusion compliant? | No |
Introduction
[edit]Biasbot AS is a bot that attempts to somewhat semi-automate the enforcement of Wikipedia's neutral point of view policy. It scans for sentences in articles that contain explicit and implicit forms of bias/stigma before offering neutralized alternatives.
Detection Algorithm
[edit]Model
[edit]Biasbot AS utilizes a deep learning and unsupervised learning techniques to properly identify sentences with stigma. At the core of any natural language processing model is the corpus size. A considerable lack of publicly labeled datasets concerning stigma and bias motivates the need to find a model that maintains accuracy even with a small corpus.
Biasbot AS is an instance of the Bidirectional Encoder Representations from Transformers model or BERT. BERT is pre-trained on millions of words from Wikipedia and BooksCorpus, making it the perfect model for this task; only an additional outer fine-tuning layer is necessary. The model was pretrained through two tasks: Masked Language Model (MLM), where word embeddings are generated as a result of predicting “masked” words, and a next sentence prediction (NSP) task, where the model attempts to predict the next sentence to understand longer-term dependencies across sentences. Both the cased and uncased versions of BERTBASE are used.
Mechanism
[edit]Due to the lack of moderation of layer tuning that can result in false positives, Biasbot AS only detects bias in articles and offers suggestions; the recommendations are reviewed by real editors, so it is does not yet provide direct edits. The bot is thus semi-automated.
Dataset
[edit]Due to the vast scope of bias (i.e. what variants of stigma will we tackle?) and a lack of available data, Biasbot uses a hand-labeled dataset that solely focuses on mental health stigma. Therefore, the bot only tracks articles in Category:Mental health and its subcategories.
The dataset is relatively simple and is consisted entirely of sentences labeled either as "stigma" or "no stigma."
Artificial Neural Network
[edit]As with any other NLP model, Biasbot AS and its BERTBASE model makes use of an artificial neural network. During back propagation of the model in fine-tuning, the loss is computed using the sparse categorical, cross-entropy function to reduce the loss function.
Upon fine tuning, dropout regularization with a probability factor of 0.1 was implemented to prevent overfitting.
Threshold Calculation
[edit]Biasbot AS makes use of several activation functions and determines whether a sentence contains stigma through a numerical threshold. The model uses the GELU function as an activation function for a classifier layer. The function maps the input scores into the output probabilities, resulting in 1 as the sum of probabilities. The function can be approximated as: