User:Biasbot AS

Biasbot AS
This user is a bot
(talk · contribs)
	B-Bot digging out stigma from the seas of Wikipedia.
Status	Semi-active
Operator	asoundd
Approved?	Semi (Not approved for direct editing)
Flagged?	No
Task(s)	Neutralizing bias and stigma
Automatic or manual?	Semi-automatic
Programming language(s)	C++, PHP, Python
Exclusion compliant?	No

Introduction

Biasbot AS is a bot that attempts to somewhat semi-automate the enforcement of Wikipedia's neutral point of view policy. It scans for sentences in articles that contain explicit and implicit forms of bias/stigma before offering neutralized alternatives.

Detection Algorithm

Model

Biasbot AS utilizes a deep learning and unsupervised learning techniques to properly identify sentences with stigma. At the core of any natural language processing model is the corpus size. A considerable lack of publicly labeled datasets concerning stigma and bias motivates the need to find a model that maintains accuracy even with a small corpus.

Biasbot AS is an instance of the Bidirectional Encoder Representations from Transformers model or BERT. BERT is pre-trained on millions of words from Wikipedia and BooksCorpus, making it the perfect model for this task; only an additional outer fine-tuning layer is necessary. The model was pretrained through two tasks: Masked Language Model (MLM), where word embeddings are generated as a result of predicting “masked” words, and a next sentence prediction (NSP) task, where the model attempts to predict the next sentence to understand longer-term dependencies across sentences. Both the cased and uncased versions of BERT_BASE are used.

Mechanism

Due to the lack of moderation of layer tuning that can result in false positives, Biasbot AS only detects bias in articles and offers suggestions; the recommendations are reviewed by real editors, so it is does not yet provide direct edits. The bot is thus semi-automated.

Dataset

Due to the vast scope of bias (i.e. what variants of stigma will we tackle?) and a lack of available data, Biasbot uses a hand-labeled dataset that solely focuses on mental health stigma. Therefore, the bot only tracks articles in Category:Mental health and its subcategories.

The dataset is relatively simple and is consisted entirely of sentences labeled either as "stigma" or "no stigma."

Artificial Neural Network

As with any other NLP model, Biasbot AS and its BERT_BASE model makes use of an artificial neural network. During back propagation of the model in fine-tuning, the loss is computed using the sparse categorical, cross-entropy function to reduce the loss function.

Upon fine tuning, dropout regularization with a probability factor of 0.1 was implemented to prevent overfitting.

Threshold Calculation

Biasbot AS makes use of several activation functions and determines whether a sentence contains stigma through a numerical threshold. The model uses the GELU function as an activation function for a classifier layer. The function maps the input scores into the output probabilities, resulting in 1 as the sum of probabilities. The function can be approximated as:

$GELU\left(x\right)=0.5x\left(1+\tanh \left[{\sqrt {\frac {2}{\pi }}}\left(x+0.44715x^{3}\right)\right]\right)$