List of questions
The foundation of the AIMday concept is to match questions from industry with relevant research competence. These questions are made the subject for discussions in small, multi-disciplinary groups with one hour per question.
You can hear from previous AIMday participants in here and read about our recent AIMday event in here. If you have any questions, please email UK AIMday Hub at EPSRCIAA@ei.ed.ac.uk.
- 5922
When we are trying to baseline data for Natural Capital projects – what is ‘good enough’ and can more novel computing, satellite data with AI and machine-learning, help reduce the cost of reliable natural capital baselining?
There is a lot of focus on “additionality” for biodiversity and carbon projects – that requires us to invest in baselining data at a very early stage (sometimes pre investment) – that can be a major challenge. What data is cheaply and easily available, and how can we then build on that data as contract discussions and projects develop.
- 5871
Is the lack of technology for transmitting high volumes of data through free space at the rates achievable for fibre optic systems a barrier to deploying digital technology in the field.
Defence, security and civil projection applications need to be deployed in areas where communication by fibre optic networks will not be available. Mobile communication networks may also have limited availability/capcity. Free space communication systems forming a local network would be needed. Is this technology emerging? Is it emerging fast enough?
- 5874
A barrier to deploying complex digital technology in the field is the availability of power compared to the power demands of the system. What enabling technology for reducing power consumption will emerge in the next 5 years or so.
Anything which is deployed in the field e.g. on a aircraft, road vehicle or is set up in remote locations has a power budget limited by what is available from the engines/batteries/generator available. Many digital applications developed in the lab/office use power hungry high performance computing. To be taken into more challenging environmnets low power/high performance systems are needed.
- 5910
What is stopping end users controlling more model parameters when interacting with consumer AI?
Many products use machine learning, but user interfaces have rarely changed. Recently, Microsoft Bing added a feature that allows users to specify responses to be “precise” or “creative”. Yet much machine learning just provides results from a black box, for example the Youtube recommendation algorithm. This is frustrating to end users who want to bias results in certain directions. Moonsift is building a user interface for an “AI-powered product discovery engine”, with a focus on fashion and homeware. We would be keen to hear from ML experts what they think is limiting the parameters end users are able to control in order to influence the running of their models.
- 5922
When we are trying to baseline data for Natural Capital projects – what is ‘good enough’ and can more novel computing, satellite data with AI and machine-learning, help reduce the cost of reliable natural capital baselining?
There is a lot of focus on “additionality” for biodiversity and carbon projects – that requires us to invest in baselining data at a very early stage (sometimes pre investment) – that can be a major challenge. What data is cheaply and easily available, and how can we then build on that data as contract discussions and projects develop.
- 5864
What is the best way to prioritise and present relevant data to frontline police officers, and to evaluate the effectiveness of any new approach?
Police Officers routinely attend locations and incidents where data is held about the persons or places involved. This data can include previous offending history, repeat caller or victim information, vulnerability data, for example whether the individual has gone missing previously or has mental health problems, or data about the location such as whether children live at the address. Other datasets held by partner agencies, for instance relating to addiction, may also be accessible under certain circumstances.
Police Officers now carry mobile devices connected to the police network, meaning they can access (or will in the future be able to access) the above data at the scene or before arriving. However, the amount of data can be overwhelming and not necessarily relevant. Police forces around the world that have attempted to prioritise in these instances have often used a weighting of some kind to produce a score and prioritise which data is displayed and used. For example, this could be via a Crime Harm Index (scoring offences for severity based on sentencing data) or a Recency/Frequency/Gravity score.
We would like to examine this problem from multiple angles – what is possible, what is desirable, what are the potential un-intended consequences, and how would we measure success. Becoming data-driven is difficult, and democratising data doesn’t automatically translate into better decisions. We hope this discussion can lead to some tangible approaches we might take to look at this issue. - 5867
How to analyse and categorise voice files on an ongoing basis?
As a police force, we receive a huge number of calls each year. A large number of these calls do not result in an incident being created, and as such we do not currently have a way of categorising the demand this places on our service advisors (there’s no data being produced off the back of them). Calls that do not result in an incident may include members of the public looking to speak with a specific Officer or leave a message, asking for advice, enquiring about a device being held, returning a call, or many other reasons. Because there is no category data about these calls, it is difficult at present to know if this demand could be best served in other ways.
We would like to use the voice recordings we have of these calls in order to 1) create categories, 2) categorise historic demand of this type and 3) categorise these calls on a dynamic, ongoing basis. Our idea is that we could convert these recordings to text, create a labelled dataset, and then use that to train a dynamic model. We are hoping that this session could tease out any issues with that approach and suggest alternatives if necessary.
- 5904
How can we develop data and technology supported processes which enable better community-led engagement and research?
We’re interested in how we can better facilitate community-led activities (we for example help fund and support over 40 community-research projects across the Highlands and Islands). This can be everything from thinking about better quality data protection where communities own their own data in a supported way, thinking about digital ways to make consent much more engaging, to making things like payment and expense claims much more accessible and immediate (but still respecting due diligence) as we work with communities who are often marginalised and can’t necessarily pay in advance or wait too long for payment for travel expenses for example. There’s lots of themes that could be explored if this is relevant, happy to expand on this. You can see the type of work we support more here for example. https://www.scienceceilidh.com/hicommunityresearchnetwork
- 5877
What is the optimal set of environmental parameters required to produce SEPA’s annual water quality classification?
SEPA collects a large set of environmental data that contribute to an overall water quality classification scheme. As a way of optimising this scheme from a cost and manpower angle we are interested in how we can develop the most effective suite of parameters.
Here is the SEPA interactive water quality classification tool to give you a flavour.
https://www.sepa.org.uk/data-visualisation/water-classification-hub - 5880
Can SEPA predict river LEVEL and FLOW using AI/ML techniques on SEPA hydrologic and geospatial datasets ? What is the optimal set of parameters required?
SEPA has a network of approximately 500 river gauging stations across the country. These are located to represent the hydrology of major river catchments. Can we use other environmental datasets to allow us to generate a spatially distributed higher resolution prediction of levels and flow.
https://www2.sepa.org.uk/waterlevels/default.aspx?sm=t
- 5912
Data linkage – how might we automatically find suitable entities on which to link disparate organisations’ datasets?
For example, where two public sector organisations are allowed by law and ethics to combine their data for better public outcomes, how might we reduce the complexity and cost of linking that data to mutually inform their services?
- 5915
Identity – without compromising privacy in an (ideally decentralised) identity scheme, how might networks of private identities vouching for one another provide the same reassurance as a credential from a known organisation?
For example, how might we get the same levels of protection against bots and fraud when dealing with unbanked citizens or refugees, as when dealing with someone who has a fixed address and bank account?
- 5918
Dynamic ethics – how might Open Government mechanisms (like those in the SG Participation Framework) use data and technology to maintain visibility of a changing public opinion, or even to foster informed and inclusive public deliberation at scale with widely shared conclusions that register in the public consciousness?
For example, how might we keep track of and encourage neutrally-informed public perception of what is ethical for data sharing, AI and digital identity?
- 5886
How can robotics, drones and AI be used for accelerating controlled environment agriculture (e.g. Vertical farming) to help increase the UK’s local food production?
Vertical farming is an innovative approach to agriculture that involves growing crops in vertically stacked layers, using advanced technologies such as artificial lighting, controlled environment agriculture, and aeroponics. Adopting automation technologies such as robotics and AI has also revolutionised vertical farming. Robotic systems can monitor and adjust environmental conditions, plant nutrients, and lighting in real-time while identifying and removing diseased plants, optimising plant growth and minimising labour costs. This is one of the fields where our AI and data science expertise is very relevant and needed and where it supports environmental sustainability.
However, vertical farming faces several challenges, including high initial investment costs. Establishing vertical farms requires a significant upfront investment in infrastructure, technology, and equipment. Furthermore, vertical farming requires much energy to power the lighting and other systems, which can be expensive and environmentally unsustainable.
How can we measure the effectiveness and Return-On-Investment of using robotics and AI in building vertical farms and controlled environment agriculture in general? How can the break-even point be determined? Optimised models that can improve main KPI’s can make vertical farming a more attractive and affordable investment which in turn can increase the UK’s local food production. This will result in reduced dependency on import which can be a key contributor to environmental goals such as the net-zero carbon footprint by reducing the transportation and limiting the food waste. - 5889
What are the best practices to ensure that the multivariate relationships present in real world data are kept as part of the synthetic data generation process?
Synthetic data generation is becoming key for many industrial fields where innovative concepts for self-driven data science need to be tested but real data is unavailable due to sensitivity, confidentiality, regulations or volume.
There are several popular approaches to generating synthetic data, such as applying pre-defined distributions or neural network techniques (such as GANs and VAEs). Examples of common challenges are that outliers need to be replicated manually, the need for subject domain expertise and quality control.
Our main interest is to explore the best techniques and practices to ensure multivariate relationships are replicated. A simple illustration can be found in the financial industry where a tabular data is used for scorecard modelling, for instance. Logically, a customer that has large and increasing unsecured borrowing, and declining or unchanged income will likely have poor credit scores. However, without interventions, these multivariate relationships could easily be lost during the synthetic data generation stages. Thus, a key question is how we can automate the identification and preservation of all key relationships? - 5892
How can we ensure data privacy for individual clients when leveraging AI focused projects (such as using Large Language Models), without necessitating individual client-hosted instances for inferencing?
As more and more projects look to implement AI, data privacy has become a paramount concern for both businesses and their clients. When utilising AI to generate insights, and make data-driven decisions, ensuring the privacy of our individual clients’ data presents a significant challenge. Currently, one approach to address this concern is to have each client host their own instances needed for inferencing, which, while effective in ensuring data privacy, raises issues related to feasibility, cost, and computational resources. As such, we are looking for innovative solutions that can allow us to maintain the benefits of AI, such as Large Language Models (LLMs), while ensuring data privacy, without resorting to client-hosted instances.
This question is particularly relevant given the growing emphasis on data protection regulations and the ever-increasing scale and capabilities of LLMs and AI in general. Potential discussions could include, but are not limited to, techniques for anonymisation, federated learning, or privacy-preserving AI methodologies.
- 5907
How to make the best use of data in the R&D domain for early adoption of machine learning and artificial intelligence techniques. [R&D domain data generally is characterized by limited data sets, strong imbalance in data sets, which make traditional ML training and development techniques challenging]. What techniques could be applicable and how best to access them and deploy them?
STMicroelectronics design and develop technologies for semiconductor devices. Imaging Division specifically is related to design and technology development for image and photonics sensors. We are interested and investigating how to use AI & ML to help, augment, streamline or support our R&D processes.
- 5895
AI is only as good as the input code or data. How do we enable best practice when it comes to data quality, data evaluation, data architecture, and data input?
The Data Shed supports companies to transform their business using data to drive value and insight. The company wants to instigate a discussion around trustworthy, explainable AI that has better and more meaningful data input. Follow on questions will include:
a) Who should be accountable for data accuracy for an AI platform or product?
b) Should the responsibility in evaluating decisions from the AI model lie with the creator of the AI tool (and trained datasets) or the end user of the AI product (with input of further new trained datasets)?
c) How do we support customers or AI product developers to understand their responsibilities around explainable AI?
- 5885
Is there an approach for calculating a waste-related Scope 3 GHG footprint for an organisation, that uses conversion factors which incorporate emissions from treatment in an appropriate way?
Standard industry approved GHG reporting approach (e.g. GRI, CDP) refer to the use of GHG Protocol for calculation of corporate Scope 3 GHG emissions. These protocols highlight the use of the UK government emissions factors within these calculations.
In line with GHG Protocol Guidelines, these factors consider transport to an energy recovery or materials reclamation facility only (for combustion and recycling) and collection, transportation and landfill emissions (‘gate to grave’) for landfill. The emissions from the waste management process (e.g. energy recovery, recycling, composting, anaerobic digestion) are attributed to the user of the recycled materials, rather than the producer of the waste.
Whilst we accept that this is the industry standard approach, we note that there is a disconnect with good sustainability practice and specifically with application of circular economy principles and ultimately with the waste hierarchy. The result is that there is no visible change in carbon emissions for companies based on their choice of end of life route. e.g. an emissions factor of 21.280 kg CO2e per tonne of any waste metal, waste paper, waste electrical equipment or waste plastic is applied regardless of whether the waste is recycled or burned. This appears to be both conceptually and scientifically incorrect.
- 5901
How can we use more optimal data science techniques or digital technologies to accurately measure and model the reactive surface area for enhanced rock weathering?
Enhanced rock weathering takes the naturally occurring but very gradual weathering process (for example rain) to remove carbon dioxide from the atmosphere. When it comes to modelling weathering rates, reactive surface area plays a critical role. However the industry standard of using BET measurements (multi-point measurement of an analyte’s specific surface area (m2/g) through gas adsorption analysis) is far from accurate for quantifying large stockpiles of powdered rock. The work requires ‘digital’ data capture using a scanning electron microscopy and computational intensive modelling in 3 Dimensions to predict CO2 removal rates. Can a new data science approach or digital technology be developed to optimise this measurement? Further information can be found here: https://www.bbc.com/news/science-environment-65648361