Developing a way to help young people use generative AI thoughtfully (the tech bit)

Charles sat down

LLMs are not search engines or sources of truth; they are pattern generators. As developers, we need to be mindful of their limitations

We’re co-creating a way of helping 15 – 17 year olds to understand, examine and use generative AI in positive ways. The National Citizen Service (NCS) and Centre for the Acceleration of Social Technology (CAST) asked dxw family-member Neontribe to carry out the project to answer the question:

““How might we support young people to become thoughtful makers and critical users of this technology?”

We began the project in November with a discovery phase, then started to define and iterate our response to the question. This post explains how we developed the solution we’d defined.

To most people, the process of using or developing software can feel complex and inaccessible. With that in mind, I wanted to break down our approach.  

Our goal

After extensive research and feedback sessions with young people, along with several prototypes and proofs of concept, we had refined one of our technical goals as:

“How might we get generative AI to assess and critique its own responses to user-generated questions?”

The constraints

There were several key constraints that shaped the design and development process.

First, we had a specific audience: young people (aged approximately 15-17) in known organisations in England. This narrowed our requirements around traffic load and reduced the need for extended localisation efforts.

However, the project depends on a third-party provider to supply much of the response content displayed, based on user input. This introduced a known risk that had to be carefully managed for the concept to function effectively, namely robust safeguards. Any system interacting with external services must be designed to manage unpredictable behaviour and ensure appropriate user experiences. Additionally, it was critical to protect users’ privacy and shield them from exposure to less regulated parts of the global digital ecosystem.

Other limitations included a tight budget for both development and operations, with no expectation of additional funding. This meant we prioritised using existing skills, mature frameworks, and tools that worked across both front- and back-end systems to minimise unnecessary overhead.

From an operational standpoint, the service needed to run on demand, with costs strictly capped and tied to actual usage.

To remain responsive and iterative, we aimed for a basic CI/CD integration, allowing us to deliver progress to our youth advisory group without a complex deployment process, and the time that would take.

We also planned for the project’s limited lifespan. While nothing lasts forever, the sponsors wanted the code we produced to remain useful beyond the project’s end. This meant we had to make the codebase publicly available under an open-source licence and ensure the product could be responsibly decommissioned or transferred.

The technical bit

Subsequently, our technical approach looked like this:

Next.js hosted on Vercel.com to support a stateless application hosted in a GDPR compatible region – so the UK/EU. This consists of a client-side component communicating with server-side API functions that modify, fine-tune, and proxy requests to the GPT provider. The code, licensed under CC-BY-SA 4.0, is hosted on GitHub, with Vercel monitoring commits for previews and auto-deploying releases.

How did we do that?

We evaluated several GPT providers offering a range of models and SDKs compatible with our stack. OpenAI emerged as the then best fit, particularly as its more recent models have been trained with better request/response filters and without the taint of dubious training materials. An example might be asking it to show a picture of “all the money” – it will refuse, presumably to avoid depictions of currency. It helps that a couple of other providers are developing compatible APIs, creating future opportunities for interoperability as some kind of de facto standard emerges, but OpenAI met our needs at that point.

As developers, we’re used to working with third-party services that use precise, technical languages with predictable outputs. However, working with a large language model (LLM) meant entering the less predictable domain of prompt engineering.

Our young users clearly wanted the tool to avoid patronising language, slang, and overly technical jargon. To meet this, we defined a system personality designed to communicate at an accessible reading level, with a positive, informative tone. We provided a small dictionary of terms to avoid or substitute, and specified some topics to handle with extra care. Positive reinforcement and template-based formatting helped ensure consistent, structured responses that could be processed and displayed reliably.

We also developed a second system personality tasked with reviewing and critiquing the primary AI’s responses, offering learning points and constructive feedback. Both personalities were deliberately configured to prioritise clarity over creativity.

In addition, every response is surrounded by static content designed to remind users of the issues surrounding generative AI. The educational intent of the system itself mitigates the risk of users internalising inaccurate, misleading or otheriwse harmful content.

Of course, no system is entirely immune to misuse. While the system’s settings offer strong guardrails, a determined user could attempt to push it beyond its intended scope, potentially triggering errors in the post-processing layer. To address this, we built additional code to handle edge cases and maintain system stability.

So, what did we learn?

In essence, this project is about providing a user-friendly interface to manage interactions with a third-party AI service. One designed to offer different responses each time, and then explain why those responses should be questioned.

It’s important to remember that LLMs are not search engines or sources of truth; they are pattern generators. As developers, we need to be mindful of their limitations and apply appropriate controls to maintain trust in the system’s output.

One of the privileges of working in software development is having the ability to observe, understand, and sometimes replicate a system’s effects. This technical insight builds confidence in a product – something that, for non-technical stakeholders, can feel more like a matter of trust or belief.

Some personal reflections

There have been rare moments in my career where I’ve had to explain a system or process and admit that its inner workings were largely opaque. Historically, this tended to happen with particularly closed or under-documented systems.

Even today, with access to SDKs and documentation, working with LLMs can sometimes feel like navigating unfamiliar terrain – challenging the level of technical assurance and functional consistency we typically expect as developers.

While various levers can be pulled to try and control the output of a service using generative AI, we should not, at present, expect the same input to yield the same results. This can be deeply unnerving for me as a developer. I found myself reminded of the issues of LLMs just as much as we hope users of RealchatAI will be.