List of questions
Top mathematical challenges in blockchain technology?
Trends in anomaly detection?
Semi-supervised Anomaly Detection with Human-in-the-Loop
What algorithms are there for incorporating expert human feedback into anomaly detection,especially with auto-encoders, and what are their limitations when scaling to terabytes of data? ● Can one incorporate expert human feedback with anomaly detection for continuous time series data of large networks (eg. network logs data such as netflow logs)? ● How do you avoid overfitting to known types of anomalies that make up only a small fraction of all events? ● How can you allow for new (yet unknown anomalies) to be discovered by the model, i.e. account for new types of anomalies over time? ● Can Ladder Networks which were specially developed for semi-supervised learning be adapted for generic anomaly detection (beyond standard datasets)? ● Can a loss function be specified for an auto-encoder with additional classifier node(s) for rare anomalous events of several types via interaction with the domain expert? ● Are there natural parametric families of loss functions for tuning hyper-parameters, where the loss functions can account for the budgeting costs of distinct set of humans with different hourly costs and tagging capabilities within a generic human-in-the-loop model for anomaly detection? Some ideas to start brain-storming: ● For example, the loss function in the last question above could perhaps be justified using notions such as query-efficiency in the sense of involving only a small amount of interaction with the teacher/domain-expert (Supervised Clustering, NIPS Proceedings, 2010). ● Do an SVD of the network data when dealing with time-series of large networks that are tall and skinny and look at the distances between the dominant singular vectors, perhaps? See Questions for AIMDay https://lamastex.github.io/scalable-data-science/sds/research/densityEstimation/saaad/
Interactive Visualisation for the Human-in-the-Loop
Given the crucial requirement for rich visual interactions between the algorithm and the human-in-the-loop, what are natural open-source frameworks for programmatically enriching this human-algorithm interaction via visual inspection and interrogation (such as SVDs of activations of rare anomalous events for instance). For example, how can open source tools be integrated into Active-Learning and other human-in-the-loop Anomaly Detectors? Some such tools include: ● facets from https://ai.google/pair ● http://projector.tensorflow.org ● https://distill.pub/2016/misread-tsne/ ● https://github.com/vegas-viz/Vegas ● ... Beyond, visualising the ML algorithms, often the Human-in-the-Loop needs to see the details of the raw event that triggered the Anomaly. And typically this event needs to be seen in the context of other related and relevant events, including it anomaly score with some some historical comparisons of similar events from a no-SQL query. What are some natural frameworks for being able to click the event of interest (say those alerted by the algorithm) and visualise the raw event details (usually a JSON record or a row of a CSV file) in order to make an informed decision. Somesuch frameworks include: ● https://d3js.org/ ● https://vega.github.io/vega/ ● https://processing.org/ ● https://gephi.org/ ● http://dygraphs.com/ ● https://github.com/vegas-viz/Vegas See Questions for AIMDay https://lamastex.github.io/scalable-data-science/sds/research/densityEstimation/saaad/
Perturbation theory for the solution of systems of equations
When calibrating curves, there are a lot of systems of equations to solve and many of them can only differ in small constants. Applying root-finding algorithms such as Newton’s method to all these systems is inefficient and may have an impact on application performance. Instead, the solutions of the systems may be derived from the solution of the first system by applying the inverse function theorem to the Jacobian matrix. Our question concerns whether and under what conditions this can be applied.
Cross-dimensional error terms
Analysis of the error introduced by numerical methods tends to focus on asymptotic behavior and ignore higher order terms. For one-dimensional problems usage of these results readily allows confirming for finite numbers of nodes that higher-order terms do not cause significantly impaired precision. For several-dimensional problems, however, correct procedures are less obvious. How do we efficiently investigate any cross-dimensional (potentially higher order) error terms, and their effects for finite numbers of nodes?
Mapping desired error to numerical solver settings
In making numerical solvers available to users removed from the mathematical problem, who are also subject to real-time constraints, the trade-off between computational performance and accuracy is highly relevant. Arbitrarily creating solver settings is an easy way to expose this trade-off as part of the interface. It suffers, however, from the issues of being neither transparent nor dynamic. Are there efficient methods of establishing a priori bounds on the error that could be used to provide a more intuitive interface to users? General approaches are interesting, as well as specific ones for finite difference methods, lattice methods and Monte Carlo methods.
Mathematical methods to validate data
Financial data can come in many shapes and sizes – from tick data on a stock exchange to legal contracts and televised news broadcasts. A common need for treating any type of data is a good toolbox of mathematical methods. We would like to discuss the potential of using predictive algorithms in this area.
COMBINATION OF DISCRETE EVENT SIMULATION AND OPTIMIZATION
Numerous of problems formulated within the area of wood procurement are traditionally solved using deterministic methods. Wood procurement can involve question spanning from the optimal bucking pattern of trees according to customer demand and price of the log, to tactical transport planning from forest to mill, often with an intermediate storage point at landing or train terminal. Thus, we want to deliver a specific log with the right wood quality from the forest to the designated mill in time to the optimal prize and suitable for the intended product. However, the stochastic factor most often affects the optimal plan, factors varies from weather conditions to market and price fluctuations, and the optimal delivery plan is suddenly not optimal any longer. The method we are looking for should combine the stochastic behavior in processes from discrete event simulations with ordinary deterministic optimization. The method is benefiting advantages from both optimization and simulation, i.e. optimal solutions derived under uncertainty. We think the approach will simplify the construction and evaluation of stochastic optimization problems which are quite frequent in approaching questions regarding wood flows. The project goal is to evaluate the developed method compared to a deterministic approach and how big effect the stochastic events have on an optimized procurement plan of round wood to industry, in volume, quality and time. I.e. the effects of stochastic events on delivery precision and delivery time. There are several test cases and real data over years, as mentioned above, usable for evaluation from forest companies as well as climatic and spatial data from the geographic areas since the forest sector and the forest companies in specific traditionally handles data, stored over years.
How can mathematical technics improve quality of production related data?
The scope is based on process data originated from paper and/or pulp production units. The data sources contain time based observations coming from sensors, equipment status, lab tests etc (time series data). A set of tags (device measurements) can scale up to 100.000 in a single production unit with a time frequency in the range of sub-seconds. The problem The quality of the data is a key factor to get advantage of advanced process analytics, including prediction models for preventive maintenance (prediction of breaks), condition monitoring, and prediction of product quality properties. Several factors affect the quality of data, being some of them inherent to the asset age, instrument maintenance procedures, calibration etc. Excluding such external factors and concentrating purely on mathematical techniques, consider the following questions. - Is that possible to spot an intermittent data failure in an instrument based solely on another instrument of same characteristics (testimony)? Take into consideration that the same instrument/device could be installed in different place having different function. - How to reliably spot white noise in time series data originated from such devices? How would the presence of white noise affect prediction models? - Outliers and short time data loss could be frequent in some areas or with some types of devices. Would it be feasible and reliable to use Kalman Filter to replace/correct observations? How would that affect prediction models in comparison to smoothing or simply removing outliers? - Is there a general technique to measure the reliability of the data collected in order to stablish some quality level thresholds for further data processing?
Hur parallelliserar man på bäst sätt ett medelstort minstakvadrat problem på ett cloud-kluster under villkoret att varje nod ska arbeta så oberoende (asynkront) som möjligt från de andra noderna?
Building a football Artificial Intelligence
We are working on a project to build an ‘Artificial Intelligence’ that can make sensible comments about football live during a match. We have a feed of everything that happens on the ball during a football match. We can evaluate the quality of passes, shots and defensive actions using a method based on Markov chains. What we would like to do is use this input data to output sentences in English which describe the performance of players and the team. The approach is a combination of machine learning, where patterns from previous matches are learnt, and a human built system based on ‘if…then’ like structures. The question is how we can best model the data and give the AI a human feel.