This project is a stylised, synthetic agent-based modelling (ABM) exercise exploring how simple text-derived narrative features can be translated into agent parameters and social influence dynamics. It focuses on a fictional example of an organisation aiming to shift institutional narratives towards a more anti-racist stance.
This model is intentionally simple and exploratory. It is not intended to be predictive or representative of real institutions.
All data used in the corpus are synthetic, created by me in order to learn how textual features can be translated into behavioural parameters and explored over time.
The text corpus is synthetic and was written to represent common organisational discourse positions described in organisational culture change and EDI literature. It is included solely to demonstrate how narrative categories can be operationalised in a toy model, not to describe any specific institution or group.
Five personas were created to function as convenient category labels for clusters of narrative types that may share similar views, perspectives, speech, and/or ways of thinking. They were not designed to represent categories of individual people. I used these personas to guide the writing of 20 made-up statements per persona. I make the assumption that each statement was made by a different individual but that certain statements have thematic overlap. For example, if an organisation gathered perspectives from a sample of its staff and then categorised them into one of five categories, the result would be similar to the synthetic corpus I have produced.
The personas were:
None of these personas or statements are designed to reflect real-world individuals, organisations, quotes, or ideological positions.
#Load corpus from GitHub (URL below for reproducibility or download from GitHub)
corpus <- read.csv("docs/corpus.csv", stringsAsFactors = FALSE)
corpus <- mutate(corpus, persona = factor(persona, levels = c(
"status_quo_defender",
"procedural_pragmatist",
"learning_orientated_participant",
"accountability_advocate",
"disengaged_cynic")))
str(corpus) #persona should be factor with 5 levels## 'data.frame': 100 obs. of 3 variables:
## $ id : int 1 2 3 4 5 6 7 8 9 10 ...
## $ persona: Factor w/ 5 levels "status_quo_defender",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ text : chr "I’m not convinced we have a serious problem here." "From my perspective, people are generally respectful and professional." "I haven’t personally seen discriminatory behaviour in my area." "We already have diversity in parts of the organisation, so I’m unsure what needs changing." ...
## status_quo_defender procedural_pragmatist
## 20 20
## learning_orientated_participant accountability_advocate
## 20 20
## disengaged_cynic
## 20
Statements were analysed to explore how each narrative was framed in terms of sentiment (positive/negative) and emotion (anger, fear, trust, etc). This helped me identify the most prominent narrative framing of statements by persona.
I deliberately chose to limit myself to analysing sentiment and emotion for feasibility, as this project is designed as a learning experience. I focused on these two measures because sentiment can help me capture an overall positive/negative stance for the personas’ starting point, and emotion can help me understand factors that could be influencing perception and behaviour captured in the statements.
Because this is a small synthetic corpus and a learning exercise, I used lexicon methods because they are transparent and easy to understand and explain, largely because every score can be traced back to the specific words used. However, lexicon-based methods are limited because they cannot account for context, negation, or sarcasm, and different lexicons may produce different profiles. Although suitable as a learning exercise, these methods may not be appropriate for a comparable, real-world scenario.
Sentiment analysis followed these steps:
Sentiment was measured using a binary lexicon (Bing) rather than a weighted lexicon (e.g. AFINN) due to the small, synthetic nature of the corpus and also for simplicity. It allowed me to explore directional sentiment (positive/negative) rather than intensity of sentiment (very positive/very negative), resulting in a relatively simple parameter for defining my ABM.
In this synthetic corpus, the lexicon-based sentiment scores suggest that:
Overall, sentiment alone distinguishes defensive and disengaged personas from more engaged and action-orientated personas, but does not capture differences in emotional framing.
sentiment_by_persona_long <- corpus %>%
unnest_tokens(word, text) %>%
inner_join(get_sentiments("bing"), by = "word") %>%
count(persona, sentiment, name = "n_words")
sentiment_by_persona <- sentiment_by_persona_long %>%
pivot_wider(
names_from = sentiment,
values_from = n_words,
values_fill = 0) %>%
mutate(
total_words = positive + negative,
sentiment_net = positive - negative,
sentiment_net_norm = ifelse(total_words > 0, sentiment_net/total_words, NA_real_))
sentiment_by_persona## # A tibble: 5 × 6
## persona negative positive total_words sentiment_net sentiment_net_norm
## <fct> <int> <int> <int> <int> <dbl>
## 1 status_quo_def… 14 11 25 -3 -0.12
## 2 procedural_pra… 3 14 17 11 0.647
## 3 learning_orien… 6 18 24 12 0.5
## 4 accountability… 2 8 10 6 0.6
## 5 disengaged_cyn… 9 11 20 2 0.1
Analysis to extract emotion used the following steps:
In this synthetic corpus, the lexicon-based emotion scores suggest that:
These emotional profiles suggest that personas differ not only in sentiment but in the emotional mechanisms through which narratives are framed. As the synthetic corpus was constructed with these personas in mind, these results are to be expected.
emotion_by_persona_raw <- corpus %>%
unnest_tokens(word, text) %>%
inner_join(get_sentiments("nrc"), by = "word", relationship = "many-to-many") %>%
count(persona, sentiment, name = "n_words") %>%
filter(sentiment != "positive", sentiment != "negative")
emotion_by_persona_long <- emotion_by_persona_raw %>%
group_by(persona) %>%
mutate(
total_emotion_words = sum(n_words),
emotion_prob = n_words / total_emotion_words) %>%
ungroup()
emotion_by_persona <- emotion_by_persona_long %>%
select(persona, sentiment, emotion_prob) %>%
pivot_wider(names_from = sentiment, values_from = emotion_prob, values_fill = 0)
emotion_by_persona## # A tibble: 5 × 9
## persona anger anticipation disgust fear joy sadness surprise trust
## <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 status_quo_d… 0.122 0.0732 0.0732 0.171 0.0732 0.268 0.0244 0.195
## 2 procedural_p… 0 0.333 0 0.0370 0.185 0 0.0370 0.407
## 3 learning_ori… 0.0588 0.147 0.118 0.0882 0.0882 0.147 0.0588 0.294
## 4 accountabili… 0.0278 0.25 0.0278 0.139 0.167 0.0556 0.0556 0.278
## 5 disengaged_c… 0.0312 0.188 0.0312 0.219 0.0938 0.0625 0.0625 0.312
As a learning exercise for ABM, this project was designed to help me build familiarity with the concept by creating a simple, opinion-dynamics style social-influence model on a network, where agents change their stance via repeated interactions with neighbours. This simplified ABM includes the following basic components commonly found in ABMs:
Unlike more complex models, this ABM does not contain the following properties, which would have over-complicated this learning exercise:
In equivalent, real-world scenarios, ABM would be a particularly useful technique for exploring how different agents behave, and how narratives change over time, in response to a range of interventions or changes. For example, the ABM below models how much narratives for the different personas converge over time, as well as exploring how the personas converge under the influence of an intervention like a leadership comms message. In this way, ABM can help us understand things like the minimum intervention features to trigger a meaningful change, what interventions prove the most useful, and identify areas where agents experience the most rapid/slowest change.
The ABM produced below involved the following steps:
This model has a number of limitations including:
Initialising the agent population involved first creating a combined dataset for sentiment and emotion analysis. I then computed three different behavioural tendencies that could impact an agent’s stance:
These behavioural tendencies are not true psychological reflections, but they do function as a way of mapping emotional profiles and exploring different factors that can cause one agent to influence their neighbours. For example, an agent who has high openness but low voice strength may be easier to influence than one who has high values for both.
These values, as well as the net normalised sentiment, were all rescaled to a 0-1 range (bounded to 0.025-0.975 to avoid extremes and prevent agents being frozen into immovable stances) so that they all operate on the same scale and can be used for further calculations.
# clean combined dataset
lower <- 0.025
upper <- 0.975
combined_data <- sentiment_by_persona %>%
select(persona, sentiment_net_norm) %>%
left_join(emotion_by_persona, by = "persona")
combined_data## # A tibble: 5 × 10
## persona sentiment_net_norm anger anticipation disgust fear joy sadness
## <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 status_q… -0.12 0.122 0.0732 0.0732 0.171 0.0732 0.268
## 2 procedur… 0.647 0 0.333 0 0.0370 0.185 0
## 3 learning… 0.5 0.0588 0.147 0.118 0.0882 0.0882 0.147
## 4 accounta… 0.6 0.0278 0.25 0.0278 0.139 0.167 0.0556
## 5 disengag… 0.1 0.0312 0.188 0.0312 0.219 0.0938 0.0625
## # ℹ 2 more variables: surprise <dbl>, trust <dbl>
# dataset containing parameters for personas
persona_params <- combined_data %>%
mutate(
openness = (trust + anticipation) - (fear + sadness),
rigidity = (fear + sadness + disgust),
voice_strength = (anger + trust),
openness_scaled = scales::rescale(openness, to = c(lower, upper)),
voice_scaled = scales::rescale(voice_strength, to = c(lower, upper)),
rigidity_scaled = scales::rescale(rigidity, to = c(lower, upper)),
sentiment_scaled = scales::rescale(sentiment_net_norm, to = c(lower, upper))) %>%
select(persona, sentiment_scaled, openness_scaled, rigidity_scaled, voice_scaled)
persona_params## # A tibble: 5 × 5
## persona sentiment_scaled openness_scaled rigidity_scaled voice_scaled
## <fct> <dbl> <dbl> <dbl> <dbl>
## 1 status_quo_defe… 0.025 0.025 0.975 0.132
## 2 procedural_prag… 0.975 0.975 0.025 0.975
## 3 learning_orient… 0.793 0.434 0.657 0.467
## 4 accountability_… 0.917 0.573 0.395 0.025
## 5 disengaged_cynic 0.297 0.448 0.576 0.381
Synthetic agents were then created based on these behavioural tendencies. This included:
stance score derived directly from
sentiment_scaled to reflect an agent’s starting position.
Closer to 0 reflected a more negative, anti-change stance; closer to 1 a
more positive, pro-change stance. This was bounded to 0.025-0.975 to
avoid extreme values and make calculations easier in the later ABM.set.seed(123)
n_agents <- 200
agents <- tibble(
id = 1:n_agents,
persona = sample(levels(corpus$persona), size = n_agents, replace = TRUE)) %>%
left_join(persona_params, by = "persona") %>%
mutate(
stance = sentiment_scaled,
stance = pmin(pmax(stance + rnorm(n(), 0, 0.08), lower), upper))
head(agents, 10)## # A tibble: 10 × 7
## id persona sentiment_scaled openness_scaled rigidity_scaled voice_scaled
## <int> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 1 learning… 0.793 0.434 0.657 0.467
## 2 2 learning… 0.793 0.434 0.657 0.467
## 3 3 procedur… 0.975 0.975 0.025 0.975
## 4 4 procedur… 0.975 0.975 0.025 0.975
## 5 5 learning… 0.793 0.434 0.657 0.467
## 6 6 disengag… 0.297 0.448 0.576 0.381
## 7 7 accounta… 0.917 0.573 0.395 0.025
## 8 8 status_q… 0.025 0.025 0.975 0.132
## 9 9 procedur… 0.975 0.975 0.025 0.975
## 10 10 learning… 0.793 0.434 0.657 0.467
## # ℹ 1 more variable: stance <dbl>
After defining my agents, I created an interaction network, which was extremely simplified, and only serves to represent who tends to interact with who.
I used a small-world network to generate a stylised representation of an organisational social structure. This approach is intended to approximate an organisation in which people tend to interact with a small, familiar group (e.g. a team or department) but have a few long-range connections (e.g. cross-team meetings or management structures), allowing information to spread across the whole organisation. This network was structured around the following:
nei was set as 5, giving around 10 neighbours per
agent. This is based on the assumption that in an organisation of 200
people, I could reasonably expect a team/department to consist of around
10 people, which would imply approximately 20 teams/departments. This
also keeps the number of neighbours reasonably small, and maintains
simplicity for the network structure and subsequent ABM.I did some simple tests to check that this network would function as expected in the ABM:
This network is not intended to reproduce a specific organisational structure, but instead provides a simple and interpretable baseline that allows for change over time.
set.seed(1234)
network <- sample_smallworld(dim = 1, size = n_agents, nei = 5, p = 0.05)
components(network)$no## [1] 1
## [1] 10
Taking the simple agent population and network defined above, this ABM works on the assumption that:
The intervention is designed to mimic a primarily comms-based intervention, designed to push people towards a particular goal. This could encompass:
It is designed to create a clean way to model how a certain intervention can create a shock to the system and create a noticeable change in stance. The intervention is designed to affect open agents more strongly, and to have lower impact on agents already near the target. It is important to note that this intervention does not take into account who is creating the intervention and their relative level of influence. For example, a comms campaign run by an internal comms team may have excellent reach but little influence, whereas a manager introducing anti-racism to their team may have excellent influence on an immediate team, but limited reach beyond their direct reports.
I created a function (run_abm) to run the model, which
updates stance at every time step under the influence of neighbours and
(if applicable) the comms intervention:
i, the function picks one neighbour
j at random. If the agent has no neighbours, the model
moves to the next agent.i using openness and rigidity scores. For example, if
openness is high and rigidity is low, then influenceability is high.
This value is also scaled to 0.025-0.975 to avoid extremes.j pushes based
on voice strength.i is updated slightly in the
direction of the neighbour j stance. For instance, if the
neighbour is more pro-change (i.e. stance_j - stance_i is a
positive number), the agent moves upwards in stance slightly. If the
neighbour is more anti-change (i.e. stance_j - stance_i is
a negative number), the agent moves down slightly. This update is scaled
based on the influenceability of agent i and voice strength
of neighbour j, meaning that updates for an influenceable
agent and vocal neighbour would be larger than an update for a less
influenceable agent and less vocal neighbour. The updated stance score
is capped between 0.025 and 0.975 to align with the original scores and
prevent extremes.ii is to the target stance of the
intervention.The function was set with the following default parameters:
n_steps was set to 52 on the assumption that each time
step represents one week, 52 in a full year.step_size was set below 0.2 to keep updates to stance
gradual. This would reflect cultural inertia, particularly in larger
organisations where things a slower to change and do not change rapidly
at pace. I chose this also to avoid too many extremes in the model.intervention_step was set at 10 represented the time
step at which the intervention took place, in week 10 of the year. This
can be changed and was purely set arbitrarily.comms_strength represents the overall power of the
comms intervention. For example, an intervention that is persuasive
(e.g. “we’ve committed to becoming anti-racist as an organisation and I
want you to join in on this vision) would have a lower strength than one
that is prescriptive (e.g. ”every team member must complete mandatory
training by next quarter”).comms_target represents the target stance the
intervention aims to push agents towards. For example, the intervention
may be aiming to have agents acknowledge that there is a problem (more
passive intervention, likely lower target stance as it’s not asking
agents to make a change now) whereas one that aims for agents to take on
a professional objective or KPI to institute change may need a higher
target stance (more active intervention, need action-orientated change
behaviour rather than passive acknowledgement).This is a very simple way to model the impact of influence, particularly because it assumes that an agent’s stance will always change slightly under the impact of a neighbour. In more complex models (and in real-world situations), agents may not change their stance in relation to their neighbour’s at all, and/or agent influenceability may be controlled by a number of factors including environments (home, work, social) and directionality (e.g. manager may have more influence than colleague).
run_abm <- function(agents, network, n_steps = 52, step_size = 0.15, intervention_step = 10, comms_strength = 0.05, comms_target = 0.8, lower = 0.025, upper = 0.975
)
{history <- vector("list", n_steps)
agents_now <- agents
for (t in 1:n_steps) {
if (intervention_step > 0 && t == intervention_step) {
agents_now <- agents_now %>%
mutate(
stance = pmin(
pmax(stance + comms_strength * openness_scaled * (comms_target - stance), lower), upper))}
for (i in agents_now$id) {
neigh <- neighbors(network, i)
if (length(neigh) == 0) next
j <- sample(as.integer(neigh), 1)
stance_i <- agents_now$stance[agents_now$id == i]
stance_j <- agents_now$stance[agents_now$id == j]
openness_i <- agents_now$openness_scaled[agents_now$id == i]
rigidity_i <- agents_now$rigidity_scaled[agents_now$id == i]
influenceability <- pmin(pmax(openness_i * (1 - rigidity_i), lower), upper)
voice_j <- agents_now$voice_scaled[agents_now$id == j]
delta <- step_size * influenceability * voice_j * (stance_j - stance_i)
agents_now$stance[agents_now$id == i] <- pmin(pmax(stance_i + delta, lower), upper)}
history[[t]] <- agents_now %>%
group_by(persona) %>%
summarise(
mean_stance = mean(stance),
sd_stance = sd(stance),
.groups = "drop") %>%
mutate(step = t)}
list(final_agents = agents_now, history = bind_rows(history))}Using the run_abm function defined above, I modelled how
the different personas would change in stance over time. This baseline
simulates repeated local influence under the chosen parameters.
Depending on the balance of openness, rigidity, and voice, the system
may converge, stabilise, or remain separated by persona.
This simple model serves as a baseline against which to test the impact of different interventions.
baseline <- run_abm(agents, network, n_steps = 52, step_size = 0.15, intervention_step = 0, comms_strength = 0, comms_target = 0)
baseline_plot <- baseline$history %>%
ggplot(aes(x = step, y = mean_stance, colour = persona)) +
geom_line(linewidth = 1.1) +
labs(
title = "Change in mean stance by persona under baseline (no intervention) conditions",
x = "Time step",
y = "Mean stance",
colour = "Persona") +
scale_y_continuous(limits = c(0,1)) +
scale_colour_discrete(labels = c(
status_quo_defender = "Status-quo defender",
procedural_pragmatist = "Procedural pragmatist",
learning_orientated_participant = "Learning-orientated participant",
accountability_advocate = "Accountability advocate",
disengaged_cynic = "Disengaged cynic")) +
theme_minimal()
baseline_plotFollowing baseline modelling, I explore the impact of a comms intervention, modelling two different theoretical scenarios.
Scenario 1 is based on the idea of senior leadership sending out a message to all staff encouraging participation in anti-racism initiatives and education. For this scenario, I set the following parameters:
intervention_step = 10 based on the assumption that the
intervention takes place early in the yearcomms_strength = 0.10 based on the assumption that the
comms are persuasive (e.g. “we’ve committed to becoming anti-racist as
an organisation and I want you to take part”) rather than asking for
specific action to be taken.comms_target = 0.8 based on a target stance of 0.8,
where agents are moved closer towards a stance of 0.8. This is a purely
arbitrary choice representing a relatively high, but not extremely high,
stance.intervention <- run_abm(agents, network, n_steps = 52, step_size = 0.15, intervention_step = 10, comms_strength = 0.10, comms_target = 0.8)
intervention_plot <- ggplot(intervention$history,
aes(x = step, y = mean_stance, colour = persona, group = persona)) +
geom_line(linewidth = 1.1) +
labs(
title = "Change in mean stance by persona under early, lower-strength leadership intervention",
x = "Time step",
y = "Mean stance",
colour = "Persona") +
scale_y_continuous(limits = c(0,1)) +
scale_colour_discrete(labels = c(
status_quo_defender = "Status-quo defender",
procedural_pragmatist = "Procedural pragmatist",
learning_orientated_participant = "Learning-orientated participant",
accountability_advocate = "Accountability advocate",
disengaged_cynic = "Disengaged cynic")) +
theme_minimal()
intervention_plotScenario 2 is based on the idea of senior leadership sending out a message to all staff requiring completion of an anti-racism training module by the end of the year, perhaps as a more firm stance after more persuasive messaging has not had the desired effect on agents’ stance. For this scenario, I set the following parameters:
intervention_step = 26 based on the assumption that the
intervention takes place halfway through the year, once other methods
have been tested and have not had noticeable effectscomms_strength = 0.30 based on the assumption that the
comms are prescriptive and asking employees to take a specific action by
a certain datecomms_target = 0.8 based on a target stance of 0.8,
where agents are moved closer towards a stance of 0.8. This a purely
arbitrary choice representing a relatively high, but not extremely high,
stance.intervention_2 <- run_abm(agents, network, n_steps = 52, step_size = 0.15, intervention_step = 26, comms_strength = 0.30, comms_target = 0.8)
intervention_plot_2 <- ggplot(intervention_2$history,
aes(x = step, y = mean_stance, colour = persona, group = persona)) +
geom_line(linewidth = 1.1) +
labs(
title = "Change in mean stance by persona under mid-year, higher-strength leadership intervention",
x = "Time step",
y = "Mean stance",
colour = "Persona") +
scale_y_continuous(limits = c(0,1)) +
scale_colour_discrete(labels = c(
status_quo_defender = "Status-quo defender",
procedural_pragmatist = "Procedural pragmatist",
learning_orientated_participant = "Learning-orientated participant",
accountability_advocate = "Accountability advocate",
disengaged_cynic = "Disengaged cynic")) +
theme_minimal()
intervention_plot_2I used a number of different sources to learn how to build this ABM. These include:
I used several different sources to construct the personas, with the following being key sources of information:
## R version 4.5.1 (2025-06-13 ucrt)
## Platform: x86_64-w64-mingw32/x64
## Running under: Windows 10 x64 (build 19045)
##
## Matrix products: default
## LAPACK version 3.12.1
##
## locale:
## [1] LC_COLLATE=English_United Kingdom.utf8
## [2] LC_CTYPE=English_United Kingdom.utf8
## [3] LC_MONETARY=English_United Kingdom.utf8
## [4] LC_NUMERIC=C
## [5] LC_TIME=English_United Kingdom.utf8
##
## time zone: Europe/London
## tzcode source: internal
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] ggplot2_4.0.1 igraph_2.2.1 scales_1.4.0 tidytext_0.4.3 tidyr_1.3.1
## [6] dplyr_1.1.4
##
## loaded via a namespace (and not attached):
## [1] janeaustenr_1.0.0 rappdirs_0.3.3 sass_0.4.10 utf8_1.2.6
## [5] generics_0.1.4 stringi_1.8.7 lattice_0.22-7 hms_1.1.3
## [9] digest_0.6.37 magrittr_2.0.3 evaluate_1.0.4 grid_4.5.1
## [13] RColorBrewer_1.1-3 bookdown_0.46 fastmap_1.2.0 jsonlite_2.0.0
## [17] Matrix_1.7-3 purrr_1.0.4 jquerylib_0.1.4 cli_3.6.5
## [21] rlang_1.1.6 tokenizers_0.3.0 withr_3.0.2 cachem_1.1.0
## [25] yaml_2.3.10 tools_4.5.1 tzdb_0.5.0 vctrs_0.6.5
## [29] R6_2.6.1 lifecycle_1.0.4 stringr_1.5.1 fs_1.6.6
## [33] pkgconfig_2.0.3 pillar_1.10.2 bslib_0.9.0 gtable_0.3.6
## [37] glue_1.8.0 textdata_0.4.5 rmdformats_1.0.4 Rcpp_1.1.0
## [41] xfun_0.52 tibble_3.3.0 tidyselect_1.2.1 rstudioapi_0.17.1
## [45] knitr_1.50 farver_2.1.2 htmltools_0.5.8.1 SnowballC_0.7.1
## [49] labeling_0.4.3 rmarkdown_2.29 readr_2.1.5 compiler_4.5.1
## [53] S7_0.2.1