List of questions
 1497
Top mathematical challenges in blockchain technology?
None.
 1498
Trends in anomaly detection?
None.
 1543
Semisupervised Anomaly Detection with HumanintheLoop
What algorithms are there for incorporating expert human feedback into anomaly detection,especially with autoencoders, and what are their limitations when scaling to terabytes of data?
●
Can one incorporate expert human feedback with anomaly detection for continuous time series
data of large networks (eg. network logs data such as netflow logs)?
●
How do you avoid overfitting to known types of anomalies that make up only a small fraction of
all events?
●
How can you allow for new (yet unknown anomalies) to be discovered by the model, i.e. account
for new types of anomalies over time?
●
Can Ladder Networks which were specially developed for semisupervised learning be adapted
for generic anomaly detection (beyond standard datasets)?
●
Can a loss function be specified for an autoencoder with additional classifier node(s) for rare
anomalous events of several types via interaction with the domain expert?
●
Are there natural parametric families of loss functions for tuning hyperparameters, where the
loss functions can account for the budgeting costs of distinct set of humans with different hourly
costs and tagging capabilities within a generic humanintheloop model for anomaly detection?
Some ideas to start brainstorming:
●
For example, the loss function in the last question above could perhaps be justified using notions
such as queryefficiency in the sense of involving only a small amount of interaction with the
teacher/domainexpert (Supervised Clustering, NIPS Proceedings, 2010).
●
Do an SVD of the network data when dealing with timeseries of large networks that are tall and
skinny and look at the distances between the dominant singular vectors, perhaps?
See Questions for AIMDay https://lamastex.github.io/scalabledatascience/sds/research/densityEstimation/saaad/  1544
Interactive Visualisation for the HumanintheLoop
Given the crucial requirement for rich visual interactions between the algorithm and the
humanintheloop, what are natural opensource frameworks for programmatically enriching this
humanalgorithm interaction via visual inspection and interrogation (such as SVDs of activations of
rare anomalous events for instance).
For example, how can open source tools be integrated into ActiveLearning and other
humanintheloop Anomaly Detectors? Some such tools include:
● facets from https://ai.google/pair
● http://projector.tensorflow.org
● https://distill.pub/2016/misreadtsne/
● https://github.com/vegasviz/Vegas
● ...
Beyond, visualising the ML algorithms, often the HumanintheLoop needs to see the details of the
raw event that triggered the Anomaly. And typically this event needs to be seen in the context of
other related and relevant events, including it anomaly score with some some historical
comparisons of similar events from a noSQL query. What are some natural frameworks for being
able to click the event of interest (say those alerted by the algorithm) and visualise the raw event
details (usually a JSON record or a row of a CSV file) in order to make an informed decision. Somesuch frameworks include:
● https://d3js.org/
● https://vega.github.io/vega/
● https://processing.org/
● https://gephi.org/
● http://dygraphs.com/
● https://github.com/vegasviz/Vegas
See Questions for AIMDay https://lamastex.github.io/scalabledatascience/sds/research/densityEstimation/saaad/
 1502
Perturbation theory for the solution of systems of equations
When calibrating curves, there are a lot of systems of equations to solve and many of them can only differ in small constants. Applying rootfinding algorithms such as Newton’s method to all these systems is inefficient and may have an impact on application performance. Instead, the solutions of the systems may be derived from the solution of the first system by applying the inverse function theorem to the Jacobian matrix. Our question concerns whether and under what conditions this can be applied.
 1518
Crossdimensional error terms
Analysis of the error introduced by numerical methods tends to focus on asymptotic behavior and ignore higher order terms. For onedimensional problems usage of these results readily allows confirming for finite numbers of nodes that higherorder terms do not cause significantly impaired precision. For severaldimensional problems, however, correct procedures are less obvious. How do we efficiently investigate any crossdimensional (potentially higher order) error terms, and their effects for finite numbers of nodes?
 1519
Mapping desired error to numerical solver settings
In making numerical solvers available to users removed from the mathematical problem, who are also subject to realtime constraints, the tradeoff between computational performance and accuracy is highly relevant. Arbitrarily creating solver settings is an easy way to expose this tradeoff as part of the interface. It suffers, however, from the issues of being neither transparent nor dynamic. Are there efficient methods of establishing a priori bounds on the error that could be used to provide a more intuitive interface to users? General approaches are interesting, as well as specific ones for finite difference methods, lattice methods and Monte Carlo methods.
 1523
Mathematical methods to validate data
Financial data can come in many shapes and sizes – from tick data on a stock exchange to legal contracts and televised news broadcasts. A common need for treating any type of data is a good toolbox of mathematical methods. We would like to discuss the potential of using predictive algorithms in this area.
 1524
COMBINATION OF DISCRETE EVENT SIMULATION AND OPTIMIZATION
Numerous of problems formulated within the area of wood procurement are traditionally solved using deterministic methods. Wood procurement can involve question spanning from the optimal bucking pattern of trees according to customer demand and price of the log, to tactical transport planning from forest to mill, often with an intermediate storage point at landing or train terminal. Thus, we want to deliver a specific log with the right wood quality from the forest to the designated mill in time to the optimal prize and suitable for the intended product. However, the stochastic factor most often affects the optimal plan, factors varies from weather conditions to market and price fluctuations, and the optimal delivery plan is suddenly not optimal any longer.
The method we are looking for should combine the stochastic behavior in processes from discrete event simulations with ordinary deterministic optimization. The method is benefiting advantages from both optimization and simulation, i.e. optimal solutions derived under uncertainty. We think the approach will simplify the construction and evaluation of stochastic optimization problems which are quite frequent in approaching questions regarding wood flows.
The project goal is to evaluate the developed method compared to a deterministic approach and how big effect the stochastic events have on an optimized procurement plan of round wood to industry, in volume, quality and time. I.e. the effects of stochastic events on delivery precision and delivery time.
There are several test cases and real data over years, as mentioned above, usable for evaluation from forest companies as well as climatic and spatial data from the geographic areas since the forest sector and the forest companies in specific traditionally handles data, stored over years.
 1558
How can mathematical technics improve quality of production related data?
The scope is based on process data originated from paper and/or pulp production units.
The data sources contain time based observations coming from sensors, equipment status, lab tests etc (time series data). A set of tags (device measurements) can scale up to 100.000 in a single production unit with a time frequency in the range of subseconds.
The problem
The quality of the data is a key factor to get advantage of advanced process analytics, including prediction models for preventive maintenance (prediction of breaks), condition monitoring, and prediction of product quality properties. Several factors affect the quality of data, being some of them inherent to the asset age, instrument maintenance procedures, calibration etc. Excluding such external factors and concentrating purely on mathematical techniques, consider the following questions.
 Is that possible to spot an intermittent data failure in an instrument based solely on another instrument of same characteristics (testimony)? Take into consideration that the same instrument/device could be installed in different place having different function.
 How to reliably spot white noise in time series data originated from such devices? How would the presence of white noise affect prediction models?
 Outliers and short time data loss could be frequent in some areas or with some types of devices. Would it be feasible and reliable to use Kalman Filter to replace/correct observations? How would that affect prediction models in comparison to smoothing or simply removing outliers?
 Is there a general technique to measure the reliability of the data collected in order to stablish some quality level thresholds for further data processing?
 1559
Hur parallelliserar man på bäst sätt ett medelstort minstakvadrat problem på ett cloudkluster under villkoret att varje nod ska arbeta så oberoende (asynkront) som möjligt från de andra noderna?

 1583
Building a football Artificial Intelligence
We are working on a project to build an ‘Artificial Intelligence’ that can make sensible comments about football live during a match. We have a feed of everything that happens on the ball during a football match. We can evaluate the quality of passes, shots and defensive actions using a method based on Markov chains. What we would like to do is use this input data to output sentences in English which describe the performance of players and the team. The approach is a combination of machine learning, where patterns from previous matches are learnt, and a human built system based on ‘if…then’ like structures. The question is how we can best model the data and give the AI a human feel.