Insights from the KCL LLM Knowledge Prompting Hack Event 2023
Following on from our news post about the Kings College London knowledge graph and LLM hack event, here are our initial "take-homes" and early insights from the event.
Hack Event Summary
Quoted from the Kings College London event github page:
The hackathon is designed as a collaborative, interdisciplinary sprint-style research activity, in which participants will work in teams to prototype new ideas, methods, tools, and evaluation frameworks around the use of PLMs to produce, access, and share knowledge that people can trust. The gathering is meant to kick-start an interdisciplinary community of interest and practice in deploying advanced AI capabilities to support people in engineering better knowledge graphs for trustworthy, human-centric information services, from search and question answering to recommendations and fact checking.
The technologies involved
We worked with a range of technologies in order to make progress over the 3 days of hacking (day 4 was purely a playback event).
- A large range of language models were part of the attendees experiments: ChatGPT, GPT-4, LLama, OpenLLama, PaLM, BARD and Claude.
- Ontologies are often driven by semantic web technologies, therefore inference and reasoning tools such as Hermit and Protege came into researchers hacks.
- Language model frameworks, such as LangChain, were used to rapid build complex applications that interact with language models.
The Hack team we took part in
- Team name: "Group A"
- Challenge: To what extent can LLMs support rich knowledge extraction?
What we did
In our project, we prototyped approaches using Pre-trained Language Models (PLMs), like GPT, for the construction of knowledge bases. Given the potential of PLMs in extracting and representing information, our objective was to craft disambiguated knowledge bases specific to designated subjects and relations. The core task was defined such that, given a subject-entity (s) and a relation (r), the challenge was to accurately predict the associated object-entities ({o1, o2, ..., ok}) through PLM prompting.
Take-home learnings and early insights
Overall, the Group A team gave us insights to a range of potential business capabilities regarding both knowledge graphs and language models. We will be building on this following further discussions with the Kings College London Informatics team, and hack participants.
- Useful datasets: The LM-KBC competition series give us the ability to evaluate our LLM-driven products with a real dataset that we can take to clients. This has the potential to help us mitigate questions around hallucinations and precision of the models we use.
- LLM / PLM Prompting isn't easy: There are several elements to consider within the prompt used in order to extract the right data necessary, there is no "one size fits all". From the task you define within the prompt, through to the structure of the entire prompt all influence the results returned by the model.
- Few-shot learning: An important technique to consider when righting your prompt is few-shot prompting or learning. This means consider giving the model some examples to consider that are similar to the answer you wish to receive.
- Chain-of-thought: An interesting technique to suggested by another team was chain-of-thought. Within our team we provided all of few-shot samples within a single prompt. There are some works that demonstrate that prompting multiple times with a sample of your task in each allows the model to process each sample individually before. Essentially giving the model time to `think` before returning an answer to the task you have defined.
Teams using Data Graphs
A few of the teams used Data Graphs (our knowledge graph platform) so that they could make rapid progress setting up a knowledge graph, complete with visual domain model creation and graph visualization.
- Ontology alignment: Teams making use of Data Graphs to look at ontology alignment. This sounds academic, but a problem we frequently solve here at Data Language is the issue of data silos. It appears LLMs have the potential to find conflicting data, sort and disambiguate silos of data.
- Visual ontology modelling tool: Teams found that after building an initial ontology through their LLMs they could then view this is in our domain model builder tool! Making it much easier to get stuck into curating datasets and concepts associated with their newly found schemas.
- Graph visualization: The Data Graphs graph visualization tool was a hit with all the teams during a demonstrations, and for teams working on information extraction, it became very useful in looking at connections between the dataset, which helped facilitate ideas and inform decisions when it came to designing prompts for the task.
Find out more about the Data Graphs Knowledge Graph Workbench here, and try it yourself for free.
What we plan to do next in this area
Building on our initial results with knowledge base construction using Pre-trained Language Models (PLMs), the team's next steps involve refining and expanding the methodologies. We aim to incorporate more diverse data sources to ensure comprehensive and unbiased knowledge extraction. Further, we'll delve deeper into advanced fine-tuning techniques, enhancing our model's capability to discern nuanced relations and handle ambiguities.
What did Kings College London Informatics team think?
This has been one of the most inspiring research events I've attended in many years! We'll publish a report soon, so stay tuned. Congratulations to all participants for some amazing projects on #llms and #knowledgegraphs. - Elena Simperl - Professor of Computer Science and Kings College London (link on Twitter)