Information science used to concern itself with how to make meaning from scarce data. In intelligence that meant paying attention to its provenance and how to process it accurately. The era of AI and large language models now means that information science is about how to deal with an abundance of information. The field is also challenged by the speed at which information is collected and how to use and trust machine assessment of information without deskilling analysts: the role of the analyst is also changing to one of identifying where the machine is making errors. The field has yet to come to a mature way of understanding how to avoid being ‘gamed’ or manipulated, and so even the most sophisticated systems are highly vulnerable to manipulation by adversaries.
LLMs are, however, powerful pattern matchers. Effective LLMs are good at identifying outliers, – which has obvious intelligence applications. They can shape research questions and engage in feedback loop dialogues with human analysts to refine assessments. This sort of dialogue can then also impact upon an LLM’s future assessment. The way that an LLM builds and layers understanding is, therefore, a discipline in its own right.
Public and open LLMs are good at drawing together publicly available OSINT. Closed LLMs obviously need to be fed curated data to do the same work. If either open or closed LLMs are used effectively, they can radically enhance a horizon scanning function by focusing in on where, for example, terminology is layered and where it shifts (over time and geographical space). The further development of multimodal models has extended the layering to imagery, multiple languages, video and audio feeds. This can make an LLM a highly sophisticated assessor of imagery intelligence, audio intercepts and the written word, in a joined-up configuration. At its core LLMs are making probabilistic assessments, and therefore the analyst needs to express an identified measure of confidence in the underlying intelligence and in the prompt engineering, the model and its output.
LLMs are currently most usefully deployed in intelligence as a means by which to enhance and augment the productivity of intelligence analysts, rather than in replacing them. In human intelligence (HUMINT) the work of LLMs is in examining transcripts, finding falsehoods and linking to patterns. They do not yet replace the art of handling, which remains a uniquely human to human relationship.
Could I, for example, train an LLM to think in a Finnish way?
If I tried to emulate Finnish culture, by training my LLM on Finnish language, literature, idioms, the Finnish education system, and other local Finnish particularities, and get my the LLM to ‘think’ and respond as a Finn? In this way, I might be able to test and forecast how various narratives might be received in the Finnish population. In doing so I might be able to speculate about Finnish-specific deception weaknesses or be able to create realistic Finnish red teams in electronic desktop exercises.
But is it possible to boil down the essence of what it is to be Finnish in this way? There are significant dangers of stereotyping, of over or under-reading what we believe to be essential texts or cultural artefacts and missing the myriads of sub-cultures available in a country. To get close to doing something useful we would need a multitude of Finnish models across ages, educational attainment, and regions, and try to calibrate these through real-world evidence collection. Even then capturing enough complexity and nuance would be incredibly difficult.
What a Finnish cultural emulation LLM might be able to achieve is an increased degree of empathy in the analyst. In turn this would reduce the mirroring biases we see in intelligence assessment cadres. Such an emulator LLM should only ever be seen as a simulator to help develop and work-through hypotheses (the human and machine working together), rather than as a replacement for the all-source intelligence mix.
So, how do we ensure LLMs are used effectively in intelligence? The answers are not going to be ones promised by AI companies. Using LLMs in intelligence will require large human labour inputs and careful standard operating procedures. Some have described trust in LLMs to require ‘provenance by design’, a reworking of privacy or security by design. Each phase of the assessment has to be attached to a testable action log, and assessments need to be stress tested through counter-poisoning techniques and enhanced triangulation. Rather than LLMs being a black box in which prompts are entered and outputs result, there must be transparency over the way that the LLM weighs its evidence and how it changes its responses due to different prompting. It is through a quite classical epistemological approach of examining falsifiability that analysts can then spot the gaps and suggest responses through their chain of command to them.
To take advantage of LLMs, without compromising intelligence tradecraft, agencies must focus strongly on the provenance at all stages of a model’s use. Far from degrading the intellectual capability of analysts, effective use of LLMs will require a greater level of skill in method and discrimination in evidence capture and usage. But labour saving, it will not be. Not in the short to medium term.
Robert Dover
Professor, Dean of the Hull University Business School
University of Hull
United Kingdom
r.m.dover@hull.ac.uk
