List of questions

Navet, BMC, Uppsala
Venue 2

AstraZeneca

1831
How can ML be used to design the next drug?

Only 2 of 10 marketed drags return revenues that match or exceed R&D costs, with an average time for development of 10-15 years. Less than 12% of all drugs that enter clinical trials result in an approved medicine.

Overall, there are two inverted pyramids in the drug development pipeline, with a very high number of candidate substances in early phases, but on the other hand limited (or no) experimental data on each substance. Ideally, ML & AI approaches should be used to bring back insights from candidates that reach late stages to inform the selection process in earlier stages.

In between the main phases of compound formulation, in vitro tests, in vivo tests, and clinical trials, all available information should ideally be merged.

Also see the attached presentations.
1832
How can ML be used assess quality of experiments and generated data?

In the different stages of the drug development pipeline, a multitude of experimental techniques and data collection approaches are used. Since expected results or outcomes are rarely known, it can be a challenge to understand whether a recorded result is in fact due to the true nature of the substance tested, or due to some kind of experiment or trial error. This can range from a problem with substance purity, to instrument malfunction, to incorrect manual recording of clinical outcomes.

What ways are there to automate the assessment of generated data in these settings, given the fact that there is often a ready supply of similar data for completely different substances?

See the attached presentation for some example of the techniques involved.

Atlas Antibodies

1824
How can we approach the problem of automatically annotating ICC-IF digital images?

Atlas Antibodies is a spin-off company from the Human Protein Atlas research project.

Atlas Antibodies manufactures and sells polyclonal antibodies.
When performing quality control of production, we perform a range of experiments in different assays.

The experiment result often are expressed in a digital image.

The assays we primarily are interested in automatically annotate is immunohistochemistry IHC and immunocytochemistry ICC.

The Human Protein Atlas research project have performed hundreds of thousands of these experiments and have had assay experts annotate them, so we believe there is a lot of data available for learning.

Our in-house assay experts are spending a lot of time annotating our own experiments and we aim to automate some or all of their work.

Annotation for IHC is typically in identifying different types of cells in different tissues and the intensity of antibody staining therein.

Annotation for ICC is typically in identifying the location of antibody staining inside a cell.

Also, would it be best for us to start ML ourselves or should we try to identify a commercial or academic collaborator to help us?
1827
How can we apply machine learning on our production and quality control processes?

Atlas Antibodies manufactures and sells polyclonal antibody products that target human proteins. A sketch of the production and quality control processes is attached.

We produce a so called PrestLot as a result of an Antigen Production process. PrEST is the molecule that we use to immunise host animals. Antigen Quality Control process involves running wet lab experiments to evaluate the PrestLot according to our criteria. In the Immunization process, we send a certain volume of a PrestLot to an immunization farm. The process ends when we receive serum as a so called SerumLot. During Antibody Production, we purify polyclonal antibodies from the serum using so called PrEST columns, to produce a so called Antibody Lot. During Serum Quality Control, we run wet lab experiments (ICC, IHC and/or WB) to evaluate the antibody lot and thereby serum lot according to our criteria.

The following measurement points are saved in our system:
measurement points saved in our system

user in every step of every process
PrestLot concentration
PrestLot manufacture date
PrestLot volume per immunization process
Immunisation sent date
Immunisation farm
PrEST column
Serum lot volume in
Antibody lot concentration
Antibody lot volume out
Application analysis (ICC, IHC, WB)
Antibody lot PARC decision
Antibody lot CONC decision
Serum lot decision

The following data is available outside of our system:
Wet lab experiments in Antigen Quality Control
PrestLot packaging type (with / without ice)
Immunisation farm process
PrEST column production process, including source PrestLot

Elekta

1839
How should federated learning be adapted to work well on medical imaging problems?

About a year ago, researchers from Google published an article describing a decentralized learning approach1, referred to as federated learning, which lead to a surge of interest in this area. Federated learning enables systems to collaboratively learn a shared prediction model while keeping all the training data local, decoupling the ability to do machine learning from the need to store the data centrally.
Modern radiotherapy systems generate a wealth of data that is currently not used. Machine learning could potentially capture a lot of value from this data, however, the data is often privacy sensitive, large in quantity, or both, which may preclude centralizing it in the manner required by conventional machine learning approaches.
1840
How could quality assurance of the shared prediction model as well as the results be addressed in a federated learning context?

About a year ago, researchers from Google published an article describing a decentralized learning approach1, referred to as federated learning, which lead to a surge of interest in this area. Federated learning enables systems to collaboratively learn a shared prediction model while keeping all the training data local, decoupling the ability to do machine learning from the need to store the data centrally.
Modern radiotherapy systems generate a wealth of data that is currently not used. Machine learning could potentially capture a lot of value from this data, however, the data is often privacy sensitive, large in quantity, or both, which may preclude centralizing it in the manner required by conventional machine learning approaches.
1841
Are any additional considerations necessary to ensure patient privacy when ML models are trained on patient data in a federated manner? Which?

About a year ago, researchers from Google published an article describing a decentralized learning approach1, referred to as federated learning, which lead to a surge of interest in this area. Federated learning enables systems to collaboratively learn a shared prediction model while keeping all the training data local, decoupling the ability to do machine learning from the need to store the data centrally.
Modern radiotherapy systems generate a wealth of data that is currently not used. Machine learning could potentially capture a lot of value from this data, however, the data is often privacy sensitive, large in quantity, or both, which may preclude centralizing it in the manner required by conventional machine learning approaches.

Galenometrics AB

1834
What is current state of the art within bioinformatics and the tools used?

How can machine learning contribute to better bioinformatics models?” How is machine Learning implemented in academia, public organizations and industry?

Vidilab AB

1819
How to design a CNN for image recognition of parasitic eggs using transfer learning? What other approaches would make sense?

Currently human expertize is used to identify the species of parasitic eggs from microscope pictures. I have been acquiring ~1000 reference images of eggs of one species to train on, and about ~1000 non-egg pictures.

Initial attempts where made by re-training top layers of the ImageNet CNNs such as Inception_V3, VGG16, ResNet,...etc. However, the results were unexpectedly bad with strange misclassifications. The images contain both plant debris and air bubbles aside from eggs. Images of misclassification can be provided later.

In more general terms, what are the important steps to take, and common pitfalls, when retraining a neural network for a classification task like this? What other approaches are there?