List of questions

Ericsson Research

714
How to do a Big Data Visualization that bring forth insights?

Any good data scientist usually starts massaging the data by plotting and visualize what they have. But in a big data scenario, how can one make a sufficient data plot, and how can one create valuable visualization techniques that not only presents the statistics from the data, but turns the data into valuable insights that is possible to take actions upon
715
How can human mobility analytics be used to minimise spread of deceases?

How can one make use of data passing in the networks. Data which in some sense in one way or another always have to pass the networks. And use this to minimize the spread of deceases in the world without risking the privacy of the individual.
716
With the advent of the third industrial revolution using IoT, Internet and 3D printers, distributed systems will increase in scale and quantity. What will be the impacts on networks, and communications in such a society? How could one manage and organize communications between all these devices and people?

http://www.thethirdindustrialrevolution.com
717
Given a data set and a binary classification problem, what is the minimum sample we must take in order to determine the best shape or form the separating/kernel function, i.e. linear, quadratic, etc., for minimizing some error metric.

This is a common problem that does not seem to have one simple solution, what are the best methods for attacking the problem.
718
When is it feasible to redistribute big data in order to optimize for the latency of future queries. If so, what assumptions do you have to make?

When big data becomes really big you need to have a strategy in place before loading data into analytics systems. This can create a lot of problems if not tackled properly
719
How can real time analytics allow us to answer questions without examining a full data set?

What limits us when analysing data in real time? (e.g. Complexity, latency etc.) How can we process data only once combining a batch execution with real time results?
720
How much does it cost to query big data in terms of resources?

Is big data sustainable form a resource perspective, has this been measured. Data centers today use enormous amounts of electricity. How much analysis cen you perform before the cost outweighs the benifits?
721
How can data consistency be achieved in Big Data solutions?

How does BASE stack up to CAP?
Are there methods to deal with uncertainty in consistency, would these vary based on query?
How can ATOM compliant transactions take place on a BASE compliant system?

Försvarsmakten

632
What kind of software or applications is avaliable to support the analysis?

Big Data uses so large amount of data that the personal integrity on an individual level is not at stake, as communicated to the public. However, if a possible threat or benign anomali is exposed to an organization, there is an interest for the organization to find the specific person or event and thus de-aggregate the data into comprehensible information.

Kungliga Biblioteket

728
KBs webbarkiv, kulturarw3, innehåller för närvarande ca 4,5 miljarder objekt. Av dessa är ca 60 procent text (html eller plain). Arkivet växer exponentiellt, i år med gissningsvis 1,5-2 miljarder objekt. Antalet bytes är dock inte så stort, för närvarande ca 250 TB. Detta pga. att textobjekt är ganska små, något 10-tals objekt KB styck. Hur göra källkritik? Ett påstående kan enkelt spridas i många versioner på nätet. Så att gå på vad majoriteten säger kan vara farligt. Hur är det med spårbarhet? Hur veta vem som ligger bakom ett påstående? Finns det något sätt att strukturera arkivet för tillgängliggörande? En möjlighet kanske är automatklassning av webbplatser; den här webbplatsen (eller delar av) handlar om matematik. Finns det andra sätt att ge struktur till arkivet? Hur göra en mer avancerad länkstruktur? Det finns i webbarkiv möjlighet att göra länkningen mer avancerad. T.ex. dubbelriktning, dvs. på en viss webbplats/webbsida ha referenser till webbplatser/sidor som är refererande. Hur kan man göra en sådan praktiskt möjlig?

Discussion will be held in swedish.

Kungliga Biblioteket

729
KBs webbarkiv, kulturarw3, innehåller för närvarande ca 4,5 miljarder objekt. Av dessa är ca 60 procent text (html eller plain). Arkivet växer exponentiellt, i år med gissningsvis 1,5-2 miljarder objekt. Antalet bytes är dock inte så stort, för närvarande ca 250 TB. Detta pga. att textobjekt är ganska små, något 10-tals KB styck. Hur kan man på ett bättre sätt visualisera och tillgängliggöra träffar/träfflistor vid sökningar i stora ostrukturerade datamängder?Vi vill framöver börja arbeta mer med dataanalys ur många olika perspektiv. Vi har bland annat en delmängd data från ett projekt som vi tänker oss att vi kan utgå ifrån. Vi har tidningsdata, till största delen artiklar (oannoterade fulltexter), data från geoplaces och wikipedia. Vi har i ett försök arbetat med detta data utifrån att vi vill få fram fram är "nya uppslag kring en händelse" och har byggt en prototyp på hur man visuellt kan manövrerar sig i detta material genom att lyfta fram tidsaspekt och upphovsman.
Vilken/vilka metoder skulle vara användbara för fulltextanalys för att ta fram statistik, trender eller kopplingar mellan ord eller begrepp, eller metoder för Namnigenkänning (Named Entity Recognition) i fulltext, kanske i kombination med DBpedia-data eller liknande? Hur ska vi ta reda på vad vi ska använda datat till (vilken statistik vi har nytta av, vilka kopplingar som kan berika våra tjänster)?

Discussion will be held in Swedish.

Naturhistoriska Riksmuseet

722
Is the an open source tool chain that can be used to answer big data questions like those within the scope of the AIM day (ie including real-time decision support, visualization)? If so, which tools do you recommend? Are there any gaps that needs to be filled?

Some organizations rely only on open source tools when working with big data analytics due to licensing policies / costs issues and may need to support an Open Science approach with fully transparent and reproducible findings all the way from the raw data to the final analysis and presentation of results and recommendations.

Scania CV AB

695
Physical vs data driven models. What are the benefits/risks of the two and how can they be combined to improve accuracy?

Traditionally physical models of trucks have been used to model the characteristics. How can these be combind with data driven models to improve accuracy/performance etc?
696
What are the possibilities and risks of using data mining in a decision support system? Which are the organizatonal challanges for companies that introduce data mining as an important part of the product development? Implementing data mining in manufacturing companies is a big challange. How can it be managed? In particular, how should a decision support system be designed in order for the human decision maker to trust and appreciate it?
697
Data mining and autonomous vehicles. What are the possibilities?

An open question that could lead discussions stimulate creativity.
699
What would you do if you were granted access to all the data that Scania possesses regarding our vehicles on the roads and why. What are the variables that are most interesting and why?

An open question that could lead discussions stimulate creativity.