Minutes

This page contains the minutes for our weekly meetings.

2022-02-23 - Planning

Location: C0355, Time: Monday 1pm-2pm , Attendees: Jesse Wood, Mengjie Zhang

Notes:
  • Faculty of Graduate Research (FGR) - office on Kelburn Parade.

  • Forms and information for enrollment is available at the FGR website https://www.wgtn.ac.nz/fgr

  • Booked a room for study in MARU101 - Desk 33

  • See Duncan in ECS for an account.

  • Can work up to 12 hours per week.

  • Let supervisors and faculty know about trips out of Wellington.

  • Start as provisional registration, then candidate - write proposal, fully registered - proposal accepted.

  • Two required meetings, FASLIP (Thursday 2pm-3pm), ECRG (Friday 3pm-5pm).

2022-02-28 - FGR

Location: Zoom, Time: Monday 2pm-3pm , Attendees: Matthew Vink, Jesse Wood

Notes:
  • This meeting covers enrollment, we will be confirming details, forms, contacts.

  • PhD Supervisors: Bing Xue, Mengjie Zhang.

  • Documents:
    1. Confirmation of study - AIML 692 code.

    2. Fees assessment - two weeks to pay levees.

  • Information sheet:
    1. Community for needs bank details.

    2. Tony mcGloughin - School Administrator.

    3. Confirmation of Proposal Registration Form (CoPR).

    4. Mathew Vink - helped me enroll today.

    5. Student levees - 2 weeks due.

2022-02-28 - FASLIP

Location: Zoom, Time: Wednesay 2pm-3pm , Attendees: Matthew Vink, Jesse Wood

Neil Dodgeson - Cambridege lecture
Notes:
  • Simular to ENGR401 stuff
    • Dont need slides.

    • Trip check technologies.

    • Face audience.

    • Relevant stuff only.

    • No animations.

  • Research Talks
    • Don’t type the script.

    • Planning, a lot of time before writing slides.

    • Audience, can change how you deliver a presentation.

    • Highlight key points on the last slide.

2022-03-07 - DL

Location: Zoom, Time: Monday 3pm-4pm , Attendees: Bastiaan Kleign, Jesse Wood, et al.

Papers:
  1. Conditional Diffuction Probablistic Model for Speech Enhancment. https://arxiv.org/abs/2202.05256

  2. A Study on Speech enhancment on Diffusion Probabilistic Model. https://arxiv.org/abs/2107.11876

Notes:

2022-03-08 - Induction

Location: AM101, Time: Monday 2pm-4pm , Attendees: Neil Dodgeson, Jesse Wood, et al.

Notes:
  • Bastiaan slide example for meetings.

  • Neil Dodgeson - Faculty of Graduate Research Dean.

  • Faculty of Graduate Research (FGR).

  • Workshops, writing events, professional development.

  • Website https://www.wgtn.ac.nz/fgr

  • Workshops are practical and hands-on.

  • Thesis bootcamp - 20 writing hours.
    • Aimed at final year students.

    • June november

  • Research room
    • Review, tips, stories, events, resources.

    • Updates monthly.

  • Candidate progress form (CPF)
    • Report on 6 monthly progress in a report.

    • May / November.

    • Required, not academic, supporting evidence.

  • 4 weeks annual leave, no formal process.

  • Suspensions, for illness, bereavement, work.

  • Forms for aforementioned available online.

  • Proposal: first major milestone.
    • 12 month deadline.

    • no extensions available.

  • Automatic re-registration for first 2 years.

  • Constructive relationship with supervisor.

  • PhD certificate: competent to do invidual research.

  • Work expert in our PhD Research topic.

  • Regular meetings times.

  • Student/supervisor - same page for expectation.

  • Bring agenda to meeting.

  • Project management techniques - scrum, agile.

  • 2pi rule for time estimation.

  • Secondary supervisor - (usually) hands off role.

  • “The only way through it, is to do it.”

  • Books, publications, thesis - different expectations for each course.

2022-03-10 - Weekly

Location: Zoom, Time: Monday 2pm-4pm , Attendees: Bing Xue, Mengjie Zhang, Bach Hoai Nguyen, Jesse Wood

Notes:
  • Let Bing/Meng know about any financial difficulties.

  • Topic ideas:
    1. Multi-objective

    2. Evolutionary computation

    3. Domain expertise.

  • First two-weeks - extensive background reading.

  • ECRG - meeting tomorrow from 3pm - 5pm.

  • CoPR - fill out by the end of March.

  • Individual induction - copy Bach in email for meeting.

  • Add Bach to gitlab/github for the paper latex file.

2022-03-10 - FASLIP

Location: Zoom, Time: Monday 2pm-4pm , Attendees: Bing Xue, Mengjie Zhang, Bach Hoai Nguyen, Jesse Wood et al.

Jeff Hawkins - Thousand Brains Theory: https://www.youtube.com/watch?v=O4geanMOsyM

Notes:
  • Voting, similar to droupout, bagged ensemble.

  • Many models (sub-networks) for the same thing.

  • Sparse networks, efficient -> noise tolerant.

  • Only update in one area, without need for back-propagation, doesn’t require a full training for each new instance.

  • Builds a full world model, not a model for each task.

  • Thousand brain theory - solution to No Free Lunch.

2022-03-11 - ECRG

Location: Zoom, Time: Friday 3pm-5pm , Attendees: Bing Xue, Mengjie Zhang, Hui Ma, Bach Hoai Nguyen, Jesse Wood et al.

Hui Ma gave presentation on Evolutionary Computation Approaches to Web Service Composition - https://link.springer.com/article/10.1007/s10732-017-9330-4

Notes:
  • Meng will discuss how to write a proposal.

  • EuroGP conference - ask my supervisor to register.

  • Introduced myself to the group
    • paper - finish writing my Summer Research paper.

    • enrolled - lots of paper work.

    • Finish writing the paper properly.

  • Abdullah (lab neighbour) first week in group.

  • Evolutionary Computation Approaches to Web Service Composition.

  • Over 40 publications in the area.

  • Holidy booking service used as an example.

  • Organize services into re-usable modules.

  • Service composition is a NP-hard problem.

  • A global search is not possible, a heuristic based local search is required.

  • Evolutionary principles and techniques - crossover, mutation.

  • Automatcally create hybrid services through composition.

  • Don’t reinvent the wheel, use existing libraries instead.

  • Scheduling, routing, resource allocation, service composition - all possible for EC.

2022-03-17 - Weekly

Location: Zoom, Time: Friday 12pm-1pm , Attendees: Bing Xue, Mengjie Zhang, Bach Hoai Nguyen, Jesse Wood.

Notes:
  • Daniel can provide domain expertise for writing the chemistry sections for the paper.

  • Multi-objective - classify chemical compounds and their percentage.

  • Multi-label - one instance can belong to multiple classes.

  • copy Bing and Bach for induction email from Georgia.

  • pymoo - multi-objective python library.

  • Read/write summaries for papers as I go - write content for second chapter iteratively.

  • Send Daniel conclusions / contributions of paper in email, then organize a follow up meeting.

2022-03-17 - FASLIP

Location: Zoom, Time: Monday 2pm-3pm , Attendees: Ying, Bing Xue, Mengjie Zhang, Bach Hoai Nguyen, Jesse Wood et al.

Ying suggested a talk on Multi-objective Evolutionary Federated Learning https://vimeo.com/552900291

Notes:
  • Yaochi Jin - University of Surrey.

  • Multi-objective machine learning.

  • Centralized and federated learning.

  • Evolutionary multi-objecive federated learning.

  • Evolutionary federated nerual architecture search.

  • Multi-objective - gives a solution set, as their are tradeoffs between objectives.

  • Dominance, no X is worse and Y, and X is strictly better than Y for object A.

  • pareto front (See tegmark2020ai) set of optimal solutions.
    • accuracy, diversity.

    • Inverse generational distance (IGD).

    • Hypervolume - nadir

  • Optimize for minimal complexity implies interpretability.

  • Centralized learning - one database.

  • Localized learning - everyone trains their own model.

  • Privacy techniques:
    • Secure multi-party computation.

    • Differential privacy.

    • Homomorphic encyption.

  • Federated learning
    • train a high-quality centralized model with training dataq distributed over a large number of clients.

    • Each with unreliable and relatively slow network connections.

    • horizontal - all attributes, batches of data.

    • vertical - trained on subset of attributes (i.e. security reasons).

  • Federated learning objectives
    1. Maximise learning performance.

    2. Minimize communication cost.

  • Their work efficiently reduce the number of connections while maintaining similar performance.

  • Neural architecture search (TODO - watch the rest and take notes!!!)

2022-03-18 - ECRG

Location: Zoom, Time: Friday 3pm-5pm , Attendees: Bing Xue, Mengjie Zhang, Bach Hoai Nguyen, Jesse Wood et al.

A talk on Geometric Semantic Genetic Programming by Qi Chen https://link.springer.com/chapter/10.1007/978-3-642-32937-1_3

Notes:
  • We have published heaps of papers that are highly cites and hot papers according to https://www.webofscience.com/wos/woscc/basic-search toool that the university has access to.

  • Top 1% of papers cited per discipline for computer science journals.

  • Evolving neural networks with evolutionary computation.

  • Me: reading psychology papers on how the brain works with memory - hunting for relative simply neuro-science ideas to apply to machine learning.

  • Geometric smenatic genetic programming (Morgalio 2012, moraglio2012geometric) https://link.springer.com/chapter/10.1007/978-3-642-32937-1_3

  • Semantic genetic programming methods.

  • Traditional GP ignores program semantics.

  • Consequences - ragged gentype-phenotype mapping.

  • Is it possible to make GP aware about the effects of the program execution?

  • Semantics:
    • Semantics differs from syntax.

    • Semantics related to the problem domain.

    • Semantics inform program design (Tegmark 2020, tegmark2020ai).

  • Measure semantic distance between current program and target output (multi-dimensional loss function).

  • Genetic operators:
    • Semantic aware cross-over (SAC)

    • Semantic similarity-based cross-over (SSC)

    • Semantic similarity-based mutation (SSM)

    • Senantic tournament selection.
      • t-test for statistical signfician with assessing selection.

  • Search directly in the semantci space of the program.

  • Semnatics of offsrping must sit in between the ntercept between its two parents in semantic space.

  • Therefore each offspring minimized distance to target semantics.

  • Each generation gets closer to the target semantics, or atleast closer than the furthest parent.

  • Independent of data, good effect on improving generalization, althougt not actual claim made in paper.

  • Geometric semantic programming leads to a unimodel fitness landscape - a cone where the apex is the target semantics.
    • manhattan distance - square based pyramid.

    • euclidean distance - cone.

  • Efficient implementation - only store changes to program tree, similar to git version control - except for GSGP.

  • GSGR Red (reduce), simplify problems by expanding and recomputing.

  • Locally geometric semantic crossover (LGSX).
    • Make offsrping similar to eachother than their parents.

  • Random desired operator (RDO), exploit interoperability of instructions, + can be reversed with -, * can be reversed with division, + and * are communicative.

  • Semantic backpropogation - decomposibility of the process if important for BP.

  • Angle aware metrics - larger angle metrics iis more likely to generate offspring closer to target semantics.

  • Permutations GSX and Random Segment Mutation

  • Semenatic distance (euclidean) is the same as the loss, just looking at it from a different point of view.

  • Can geometric smeantic programming work in an unsupervised or combinatorial problem? (Possibly not unimodel semantic space)

2022-03-21 - DL

Location: Zoom, Time: Monday 3pm-4pm , Attendees: Hayden Dyne, Bastiaan Kleign, Jesse Wood, et al.

Talk by Hayden on two papers:
Notes:
  • Model-free reinforcement learning - does not rely on human understanding of world and design controllers.

  • Human driver is the trajectory with is the goal, uses a professional driver playing the game with a steering wheel.

  • Model performs on different track difficulties.

  • Reward function is scaled by velocity, so faster lap times are rewarded.

  • Works for 4 different kinds of vehicles, although the truck struggles to achieve same performance as lighter ones.

  • Second paper - e2e

  • Far easier to use real-world data on driving that has already been collected than generate simulation data.

  • Data augmentation used to help network generalize to new scenarios and edge cases not in the training data.

2022-03-24 - Faculty Induction

Location: Zoom, Time: Monday 10am-11am , Attendees: Georgia Dix, Jesse Wood, Bach Hoai Nguyen.

Induction to my PhD studies with supervisor and faculty.

Notes:
  • Expectations
    • Supervisor
      • Uni life

      • Framework

      • Networking

      • Assessment

    • Me
      • action plan

      • identify problems

      • administration

      • CDP (6 monthly report)

  • Marking can thesis can take up to 6 months - can work during this time.

2022-03-17 - Weekly

Location: Zoom, Time: Friday 12pm-1pm , Attendees: Bing Xue, Mengjie Zhang, Bach Hoai Nguyen, Jesse Wood.

Notes:
  • Daniel can draft the chemistry parts for the paper.

  • Draft the full paper with Bach, then send to Daniel.

  • Read “From evolutionary computation to the evolution of things” - Nature

  • Can start coding now - explore ideas for ENGR489 and EC on existing data.

  • Transformers, LSTM, GAN - yet to be applied to GC-MS data in literature.

  • CNNs for GC, likely due to libraries, hype, understanding, Diffusion of innovation.

  • Scuba diver experiment for context-dependent memory is a good analogy for noise in ML models.

  • Came up with evolutionary ideas, like sexual selection, but (Miller 1994) did it quite some time ago.

  • Idea for EC, a dynamic environment where complexity increases, classes or features are added - similar to evolution IRL.

2022-03-24 - FASLIP

Location: Zoom, Time: Monday 2pm-3pm , Attendees: Ying, Bing Xue, Mengjie Zhang, Bach Hoai Nguyen, Jesse Wood et al.

Ruwing Jiao suggested a video on Bayesian Optimization from Mark Deisenroth https://www.youtube.com/watch?v=_SC5_2vkgbA

Notes:
  • Recommended background reading on this topic:
    1. A Tutorial on Bayesian Optimization of Expensive Cost Functions (Brochu 2010, brochu2010tutorial) https://arxiv.org/pdf/1012.2599.pdf

    2. Taking the Human Out of theLoop: A Review of Bayesian Optimization (Shahriari 2015, shahriari2015taking) https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7352306

  • Latent Structural Support Vector Machine (Miller 2012) - TODO find this paper/project.

  • Deep learning often involves a lot of hyper-parameter tuning, this is usally done by the practitioner model.

  • Alternative approaches:
    • Manual tuning

    • Grid search

    • Random search

    • Black magic (i.e. lr is 1e-3 is “good”)

  • Computationally expensive to search for global maximum in hyper-parameter search space.

  • Globally optimize a black-box approach to evaluate (e.g. cross validatio error for a massive neural network).

  • Use a probabilistic model to approximate the black-box model for the hyper-parameter search.
    • create a proxy model - this learns an approximation of the space - with less computational cost to query that space.

    • referred to ass proxy / approximate / surrogate.

  • The standard model for optimizing a bayesian model is a gaussian process.

  • Evaluate the proxy function once, this saves computation.

  • A gaussian process minimized the uncertainty of the proxy function.

  • It samples the feature space at the minimum value of the shaded area (== uncertainty).

  • It repeats this often, until the proxy function is close enough to the true objective.

  • Exploration - sample areas with high uncertainty.

  • Exploitation - sample places with low mean.

2022-03-25 - ECRG

Location: Zoom, Time: Friday 3pm-5pm , Attendees: Andrew Lenson, Bing Xue, Mengjie Zhang, Bach Hoai Nguyen, Jesse Wood et al.

Andrew gave a talk on Genetic Programming, Explainability and Interdisciplinary AI.

Notes:
  • Heaps of students successfully submitted papers to the Gecko conference.

  • Possible to publish in conference at different levels; paper, poster, etc.

  • If a paper is declined, revise with reviewer comments, and resubmit as poster.

  • Qurrat Al Ain - Cancer research in AI.

  • Swiss roll manifold problem
    • Reduce a manifold to a 2D visual representation.

    • 2D path is representation of non-linear dimensionality reduction (NLDR).

  • Geo-desic, shortest path from A to B, not shortest euclidean distance.

  • Lower dimensional space is referred to as an embedding.

  • We can use AI to learn or approximate this embedding (if the problem is intractable).

  • Ways to estimate the intrinsic dimensionality of the dataset - statistical techniques.

  • Kaka - count distinct nnumber of birds at Zealandia.

  • GoPro for data collection combined with crack for Kaka.

  • Law - predicting sentencing lengths with PLSR on judge summaries.

  • Names with high/low probabilities are often historic cases referred to as ‘guidance judgements’.

  • Combine data analysis and domain expertise to infer knowledge about sentencing lengths.

  • Home detention or communtiy service are associated with shorter sentences.

  • Future work, take humans out of the loop, and make sentencing deterministic.

  • ^ This can be done, because their are extenuating circumstances that require a judges opinion.

  • Also, if all sentences are automated, there would no longer be guidance judgements being set historically.

  • Law is a dynamic and decentralized system, unique and specialized for each country, case, individual, etc…

  • Research more productive on letting judges analysis their blindspots, and identify bias.

2022-03-28 - DL

Location: Zoom, Time: Monday 3pm-4pm , Attendees: Daniel Braithwaite, Bastiaan Kleign, Jesse Wood, et al.

Daniel Braithwaite talked about two papers related to machine learning for audio wave construction:
  1. Deep Audio Priors Emerge from Harmonic Convolutional Networks https://openreview.net/pdf?id=rygjHxrYDB

  2. Harmonic WaveGAN https://www.isca-speech.org/archive/pdfs/interspeech_2021/mizuta21_interspeech.pdf

Notes:
  • The idea is to look at harmonic convolutions, think convolution layer but designed for audio.

  • WaveGAN and Harmonic WaveGAN use deep learning on audio signals.

  • Harmonic, is better suited towards audio signals, than wave alone.

  • Harmonic considers local connections / adjaceny better.

  • TODO Read these papers and add to notes.

2022-03-31 - Weekly

Location: Zoom, Time: Friday 12pm-1pm , Attendees: Bach Hoai Nguyen, Jesse Wood.

Bing and Meng were both unwell this week. Important to send minutes for this meeting to them.

Notes:
  • Augmentation - boost performance on the fish part dataset.
    • Time-shift, shift data backwards and forward, to get time-invariant generalized model (may not work well).

    • Impute data, combine existing samples, add noise, etc…

  • Worked on CNN from ENGR489 for classification task.
    • Issues with keras and sklearn libraries, stratified cross-fold validation and one hot encoding don’t play nice together.

    • CNNs, we use 1D convolution and pooling layers on time-series data.

    • Existing ML + GC literature also use CNN for classification and regression tasks.

    • These models are powerful for extracting features in spaces with local connectivity.

    • Aim to use EC to perform neural architecture search for CNN hyperparamters - these differ for each dataset.

  • Both EC and Bayesian Optimization approaches work for neural architecture search.
    • However, EC has more interpretable results, e.g. a genetic algorithm produces an explainable tree.

    • Neural networks are black-box and esoteric, we understand how (i.e. back-prop, SGD), but not why?

    • EC produces simpler representations, that can be prodded with domain expertise.

  • Important to read heaps for first few months of PHD.
    • Take original notes that can contribute toward a backgroup chapter of my proposal.

    • Get an idea of what has been done, and what I want to do.

    • Still reading psychology textbook on memory and the brain to establish conceptual framework for learning.

2022-03-31 - FASLIP

Location: Zoom, Time: Monday 2pm-3pm , Attendees: Fangfang Zhang, Bing Xue, Mengjie Zhang, Bach Hoai Nguyen, Jesse Wood et al.

Fangfang suggested a video called the Big Reset 2.0 https://www.youtube.com/watch?v=-ePZ7OdY-Dw

Notes:
  • Reinforcment learning for Robotic Arms.

  • Deep blue beat Kasparov, but no AI can set up the chess board, a 6 year old can do that.

  • Hugh Herr designed his own AI legs https://www.youtube.com/watch?v=CDsNZJTWw0w
    • AI prosthesis is cost prohibitive for the masses, but may work with diffusion of innovation in future.

    • Prosthesis can up upgraded over time, biological body parts cannot, hardware/software updates for legs.

  • Fake news - Jon Stewart said MSM has more trackers than ANY other media (adult entertainment, torrent sites, social media included).

  • Chomskey, MSMs job is to sell the educated privelaged wealthy elites as an audience to the corporations advertising.

  • AI algorithms - social media, fake news, incentives.

  • AI autonomous warfare proliferation - we need to ban slaughter bots https://www.youtube.com/watch?v=pOv_9DNoDRY

  • AI used for traffic management, screen-time punishment - pick up phone at cafe and pay the bill.

  • RoboMaster - robot warfare, mechatronics, AI - physical robot warfare as a game/competition.

  • Cosmo - Boris Sofman https://www.youtube.com/watch?v=U_AREIyd0Fc

  • Narrow-AI and no free lunch problem - AI is good at solving very specific tasks, but not general intelligence.

  • I have an industry project, that has real-world applications in a factory settings - i.e. reduce bycatch and maximize efficiency of food processing for fish.

2022-04-01 - ECRG

Location: Zoom, Time: Friday 3pm-5pm , Attendees: Andrew Lenson, Bing Xue, Mengjie Zhang, Bach Hoai Nguyen, Jesse Wood et al.

Meng and Bing were unwell, so Yi chaired the research group meeting.

Notes:
  • Bach (my supervisors) first day lecturing for COMP102.

  • Me: I got 98% accuracy on the fish species dataset using a 1D CNN.

  • Shorter meeting, workshop cancelled, due to Meng being unwell.

2022-04-04 - DL

Location: Zoom, Time: Monday 3pm-4pm , Attendees: Ciaran King, Bastiaan Kleign, Jesse Wood, et al.

Ciaran King was gave a talk on “Experiences using Github Copilot”.
  • Understands the context of code, can make abstractions for helper methods.

  • Can write documentation for codebases.

  • Not software “correct” code, but (likely) the code we were going to write.

  • Can write tests for codebases with very little leading.

Daniel Braithwaite on “Fixed Neural Network for Stenography”
  • Hide messages in adversarial neural network.

  • Pre-trained stenograph, results in non-zero error, we need perfect reconstruction for encryption.

  • Face anonymization, post a persons face online, then regenerate the face, but encrypt the private face.

  • This lets friends anonmyously share images with their face online, without revealing their identity.

Bastian - contractivity of neural networks.
  • Signal processing worries about getting non-stable linear filters for signals.

Jesse Wood
  • Evaluating Large Language Models Trained on Code https://arxiv.org/abs/2107.03374
    • 70% accuracy for basic DSA problems.

    • Can’t solve more difficult problems - doesn’t optimize solutions for performance.

    • CoPilot outperforms other state-of-the-art NLP code generation models.

    • Requires “fine-tuning”, supervised human intervention to hint towards correct answer.

  • Asleep at the Keyboard? Assessing the Security of GitHub Copilot’s Code Contributions https://arxiv.org/abs/2108.09293
    • 40% of code written with CoPilot has cybersecurity vulnerabilities.

    • CodeQL and other static analysis tools used to define the security of the code.

    • Security is a shifting landscape, WannaCry, Log4J - zero days kept secret by intelligence agencies.

    • This is true of all code, the training data was written by humans.

    • Potential vulnerability for future attacks if hackers know open-source repos are training data.

    • Don’t treat copilot as a “glass cannon”, it doesn’t deserve a false sense of security.

2022-04-07 - Weekly

Location: Zoom, Time: Friday 12pm-1pm , Attendees: Bing Xue, Jesse Wood.

Notes:
  • Use an existing neural network architecture search algorithm - application analysis.

  • Callaghan may have extra data work with - arrange a meeting with Daniel.

  • Pre-traning, tranfer learning, NIST dataset for GC refraction index.

  • Look at existing proposals, get an idea for mine - possible to submit proposal early.

  • State-of-the-art, is 50-50 whether it works or is a bust - good to have a backup based in existing literature.

  • Pareto front with tradeoff between complexity and accuracy.

  • Proposal does not lock me into using a particular method (i.e. SVM, EC, PSO).

  • Idea: make sure students have a decent grasp of the field before conducting their own research, if not then read more.

  • Later try out ideas in the proposal, and see if they work. If they don’t change tact.

2022-04-07 - FASLIP

Location: Zoom, Time: Monday 2pm-3pm , Attendees: Qi Chen, Bing Xue, Mengjie Zhang, Bach Hoai Nguyen, Jesse Wood et al.

Qi Chen showed a talk “Artificial Intelligence: A Guide for Thinking Humans” by Melanie Mitchell from https://www.youtube.com/watch?v=NMUqvhuDZtQ

Notes:
  • Shannon, Simon, Minsky - all though AGI was 15 years off in their own time.

  • Andrew Ng - AI is the new electricity.

  • Elon Musk - nobody would listen - https://www.youtube.com/watch?v=4RMKLyaqh_8

  • Deep learning brought back the hype for AGI.

  • “An Anarchy of Methods” - Joel Lehman 2014.

  • AI, Machine Learning, Deep Learning, onotologies of fields and their popularity change over time.

  • Deep learning looks at AI as an aritficial brain - enter the Artificial Neural Network (ANN) - the connectionists.

  • CNN based on the limited understanding of the human brain in 1950s neuroscience.

  • Facebook used CNN AI for facial recognition when a user uploads a photo.

  • ImageNet is a famous supvervised classification task that was generated through crowdfunding internet “slave” labour.

  • Famous 94% result for image classification has a sample size of PhD student (Andrew Kaparthy) - the fake news embellished the story.

  • Self-driving, stopped fire truck on the highway, the long tail of AI, edge cases.

  • Adversarial attacks on neural networks, crack networks to make wrong predictions based off of their flaws.
  • Trick self-driving cars into driving through stop signs with stickers, that make it think it is a speed limit sign.

  • “I wonder whether or not AI will every crash the barrier of meaning.” - Glen Carlo Rote 1988.

  • “common-sense” machine learning, WinnaGap NLP problem.

  • DARPA - competition to design a machine with the common-sense of an 18 month old baby.

2022-04-08 - ECRG

Location: Zoom, Time: Friday 3pm-5pm , Attendees: Harith Al-Sahaf, Bing Xue, Mengjie Zhang, Bach Hoai Nguyen, Jesse Wood et al.

General:

Harith Al-Sahaf gave a talk on Malware Analysis https://www.al-sahaf.com/harith/

Notes:
  • Malware analysis determines the functionality, origin and potential impact of a given malware.

  • Applying EC techniques to malware anaylsis.

  • Harith and the university have a lot of publications in this area https://www.al-sahaf.com/harith/publications.html

  • Siemese neural networks used to identify an unknown instance to a known malware for similarity.

  • Yann LeCunn invented the idea for Siemese neural networks in the 1980s.

2022-04-11 - Deep Learning

Location: Zoom, Time: Monday 3pm-5pm , Attendees: Madhurjya Dev Choudhury, Bastiaan Kleign, Jesse Wood et al.

Madhurjya gave a talk on “Time Series analysis for Machine Health and Diagnosis”.

Notes:
  • “Image-<MUFFLED> tranlation with Conditional Adversarial Networks”.

  • pix2pix

  • Nuisance parameters is any parameter which is not of immediate interest, but must be accounted for in those parameters which are of interest.

2022-04-18 - EvoStar #1

Location: Madrid, Time: Monday 7pm - 6 am, Attendees: Jesse Wood, Bing Xue, Mengjie Zhang, Bach Hoai Nguyen, Jesse Wood et al.

More information can be found on the full conference program - https://easychair.org/smart-program/Evo2022/

Gabriella gave a keynote address on “Illuminating Computational Search Spaces” https://www.youtube.com/watch?v=EyynDbXnwic

Notes:
  • Visualisation, explanation, informed configuration.

  • Complex networks, Local Optima Networks (LON), Search Trajectory Networks (STNs).

  • Graphs exist since 1800s with Eulers famous bridge problem.

  • Networks coming from different systems share topological structure.

  • Force-directed graph layout algorithm, borrows ideas from physics (i.e. simulated annealing).

  • Difference between Euclidian distance and traversal path distance between nodes (see Andrew’s manifold talk from 2022-03-25 - ECRG)

  • Fitness landscpaes f(S,N,F) , search space (S), n<unreadable> (N), fitness (F).

  • Funel - local optima in a course grain structure that can minimize energy.

  • Local optima Network, nodes - local optima from hill climbing heuristic, edges - transition between local optima.

  • Number partitioner, given a random set of numbers find a partition at value k, such that the two disjoint sets have equal sums.

  • Map fitness landscape as a graph, compress the representation, to get an explainable visualisation for a funnel.

  • Travelling Salesman Problem (TSP) - apply pertubations to existing solutions to find new solutions in the fitness landscape.

  • CMLON, compress a Local Optima Network into a smaller representation that is easier to understand.

  • Genetic improvement to CMLON.

  • STN, allow for representative solujtions.

Christian Raymond gave a talk on “Multi-objective Genetic Programming for Symbolic Regression with the Adaptive Splines Representation”

Notes:
  • Overfitting if a difficult problem for GP, because of flexibility of representation.

  • Difficult to regularize overfitting in GP.

  • Parisomony pressure, population distribution control, order of non-linearity.

  • Limitations: structural complexity has minimal correlation with generalization.

  • Estimate complexity of a model by estimating a model’s behaviour over a subset of training space.

  • Limitations: difficult to reliably estimate, complexity vs error is a trade-off.

  • Semi structured representation - a solution to both limitations above.

  • Spline: defined by multipled low degree k (cubic) polynomials smoothly.

  • GG-AWS-PP - Adaptive wieght Splines with parsimony Pressure.

  • Apply multi-objective optimisation, for loss and complexity objectives, both considered.

  • Training paretor fronts, 2D rperesntation of fintess landscape between two objectives.

  • Creates interpretable representations which are easier to understand than genetic algorithms.

  • Works well on low to medium dimension feature sets.

Another talk on “Morphologicl development of Robots […]”

Notes:
  • Development of agents, aging, is beneficial for generating complex agents.

  • Voxel based soft robots (VBS).

  • Aggregates of soft cubes (voxels).

  • Neural networks for voxel controllers, a net for each voxel.

  • New voxels are added to the model over time.

  • This is a scheduling function, that must alloclate the correct time to increase the complexity of the model.

  • <unlegible> based morphology representation.

  • Different development schedules. Early development, uniform development, no development.

  • Early development seems more beneficial than uniform development, the artificial mimics the biological.

  • No development, shows large deviation in the results, fuzzy accuracy.

“A new evoltionary algorithm based home monitoring device for Parkinsons Dyskinesei”

Notes:
  • AUC based fitness.

  • Adaptive size fitness <unlegible> - allocate fitness absed on a subset of training data.

  • Aaply different representation width lengths using GP to see […] (coffee break for me)

Zhixing gave a best paper nominee talk “An investigation of Multi-task Linear Genetic Programming for Dynamic Job Shop Scheduling”

Notes:
  • Job shop scheduling problem.

  • Complicated dynamic NP-hard combinatorial problem.

  • Make decisions based on imperfect information. Instead, we use heuristics to decide schedule.

  • We can’t rely on a single heuristic alone, dynamic environment means we must change heuristics in real-time.

  • Hyper heuristic - a search mechanism to find a heuristic selecting model.

  • GP has good interpretability, a tree can easily be understood by humans.

  • Seasonal variance - more demand for ice-cream in summer for an ice cream factory.

  • Multi-task, conflicting or tradeoff between goals of different stakeholders.

  • Multi-task model - one ring to rule them all - one model that can balance multiple tasks at once.

  • Linear gentic programiing, register based instructions, creates directed acyclic graph (DAG).

  • DAG > Tree; can use diffrent topological structures to perform cross over for DAG, tree reperesentation is limiting.

  • Operators: linear crossover, macro mutation, micro mutation.

  • Multi-population based genetic programming (GP) - sub populations that develop in isolation with crossover (migration) allowed.

  • homogenous/heterogenous - diff-same / same-diff - utilization/objective functions.

  • M^2GP does not perform the same as LGp methods, M^2GP (tree based) state-of-the-art does better.

  • Likely because: (1) too large variation step sive, (2) ineffective initializaiton strategy.

  • One and multi-objective population methods have similar performance.

  • Graph based crossover is a <unlegible> genetic oeprator

  • But not used in this work, but is a research direction in ECRG.

David Wittenberg gave a talk on “Using a denoising autoencover genetic programming to Control Exploration and Explotitation in Search”

Notes:
  • Capture relevant properties of parent population in a latent representation.

  • Model (auto-encoder) is trained to reconstruct the input.

  • Problem: don’t want to learn the identity function.

  • Solution: denoise (slightly mutate) parent to avoid overfitting anf force the auto-encoder to generalize.

  • Level of corruption can be used to control the exploration and exploitation of genetic algorithm.

  • Paper explores this idea, we want a latent representation that is a lower resolution to let the model generalize better.

  • Subtree mutation: can’t control corruption.

  • They propose Levenshtein edit, a genetic operator for mutation on the representation string.

  • Convert tree into infix string reperesentation - then perform mutation operations on that string.

  • Levenshtein edit, insert, delete, mutate - with an edit percentage (this determines level of corruption).

  • The stronger the corruption, the shtonger the exploration.

Nicholas Fontbonne gave talk “Coperatative Co-Evolution and Adaptve Tree Composition for a Multi-Rover Resources Allocation Problem”

Notes:
  • Multi-agent - agents act indepdentdenlt.

  • Competetive, mixed, co-operative sum games from game theory.

  • Zero-sum, shared-sum, shared-fitness agents.

  • In cooperative case, all information about the individual is lost (collectivist idealogies “the greater good”).

  • A shared fitness that promotes social welfare is not a good learning signal.

  • Marginal contribution - contribution of an individual agent to the team.

  • Evolutionary algorithm for multi-agent problems.

  • issues: stuck in local optima, high computation cost.

  • solution: grouping mechanism.

  • reousrce seelction problem: each resource has a satisfcation score.

  • k is the group size for the grouping mechanism.

  • Co-operative co-veolutionary algorithm for adhoc autonmous agents.

  • Efficient anytime larning without apriori knowledge of the problem.

2022-04-21 - Weekly

Location: Zoom, Time: Wednesay 12pm-1pm , Attendees: Bing Xue, Bach Hoai Nguyen, Mengjie Zhang, Jesse Wood.

Notes:
  • EvoStar conference yesterday, not much work for covid isolation week.

  • Demelza is giving a talk tongight for EvoStar.

  • Likely to attend EuroGP in-person next year.

  • EvoCNN - encodes basic CNN components.

  • Apply GP to classifier problem directly.

  • Try several techniques (initially).

  • Also later what new tasks Daniel may want.

2022-04-21 - EvoStar #2

Location: Madrid, Time: Tuesday 7pm - 6 am, Attendees: Jesse Wood, Bing Xue, Mengjie Zhang, Bach Hoai Nguyen, Jesse Wood et al.

More information can be found on the full conference program - https://easychair.org/smart-program/Evo2022/

Demelza gave a talk on “Genetic Algorithm for Automated Spectral Pre-processing in Nutrient Assessment”

Notes:
  • Rapid collection, non-destructive, construct a model.

  • Partial Least Squares Regression (PLSR).

  • Spectral data is easy to collect, but the pre-procsing is a bottleneck.

  • PLSR tranlsates the feature space into latent varaibles.

  • Spectral pre-processing, remove noise and redundant intensity values.

  • Manual process for pre[rpcessing spectra is laborious and lacks standardization.

  • Automate pre-processing for choosing appropriate techniques and their order of application.

  • Representation, a two chromosome structure that encodes objectives for:

  • Fitness function: combination of MSE and number of features.

  • GA-PLSR-PPS perform btter for most cases and has a smaller standard devition for IR and Raman.

Zakaria Dehi gave a talk on “A machine learning based approach for Economics-Tailored Applications: The Spanish Case Study”.

Notes:
  • Use AI for dynamic budget allocations in governmental bodies.

    1. ETL, (2) Profiling, (3) Predicting.

  • ETL - gathered 30,000 economic features, and identified 50 types of related economic metrics.

  • Pre-processing required to make the building blocks for the ETL model.

  • Different metrics can have the same profiles for all cities (i.e. unemployment/retirement).

  • Eliminate the redudnat metrics (duplicate class profiles) - for concise metric space.

  • DCGA-kMeans - unsupervised clustering algorithm to determine profiles (k).

  • Profiling deals with indentifying groups og cities based on an economic profile dervied from metrics.

  • Prediction: Long Short-Term Memory (LSTM) - recurrent neural network.

  • This unlocks the time series component of ecnomic data, persistance of time observed by the model.

  • The ideal number of profiles for most metrics was 3.

  • Massive feature and metric reduction - creates meaninful data through feature construction.

  • Can decrease complexity of number of cases when officials must make complex politicy decisions based on near-infinite combinatorial search spaces.

Daniel Lopez gave a talk on “EvoDesigner: Towards Aiding Graphic Design”

Notes:
  • Readability, balance, innovation, style - all measured by Mean Squared Error (MSE).

  • Adobe plugin to produce variations of a design based on rough input using evolutionary computation.

  • Intitial input is a rought sketch, that paints an idea of a possible layout.

  • Evolutionary engine creates a good design graphic based on that initial input.

  • Then the engine generates variations of that good design using evolutionary computation.

  • Still needs work, but offered a good tool for aiding ideation in graphic design.

Julia Reuter gave a talk on “Genetic Programming-Based Inverse-Kinematics for Robotic Manipulators… <unlegible>”.

Notes:
  • Develop prototypical solution for Inverse Kinematics (IK) problems.

  • We want closed-form solutions, apply to non-standard SCARA, with explainable representations (equations).

  • Kuke koubot - was the brand of SCARA they ran their simulations on.

  • Different objective functions + co-creation / co-evolutionary approaches.

  • IK-CCGP was compared to an ANN. It did better than the ANN.

  • Co-evoltuion for the two-joints, performed better than for 3 (a more complex task).

  • The equations were tested on simlated - not real world - robotic arms.

  • But these equations can be verified later, by applying them to real-wrold systems, and testing for collisions/singularities.

Partick Indri gave a talk on “One-shot learning of Ensembles of Temporal Cage Fomrulaes for Anomaly Detection in Cyber-Physical Systems”.

Notes:
  • Monitor behaviour of CPS, e.g. AV, power plant, medical monitoring, security systems, smart house.

  • CPS are dynamic - need to quantify the system in terms of time.

  • Signal Time Logic (STL), is an expression grammer for time based operations.

  • GP oeprators to construct STL for optimized performiang at controlling a CPS.

  • One-shot algorithm, GP is population based, we can use the population to build ensembles.

  • Water treatment CPS was used as dataset for training/test.

  • Their method performs well when comapred to other state-of-the-art methods.

  • One-shot G3P achieves more complext formulates than the standard G3P.

  • One-shot approach cpatures wave temporal operators better than standard G3P.

  • It can learn more complext models, that include “time-based” operators (i.e. STL).

Gloria Pietropoli gave a talk on “Combining Geometric Semantic GP with Gradient-descent Optimisation”

Notes:
  • Geometric smentaic metric programiing (GSCP) is a well known variant of genetic programming (GP),

  • GSCP used recombination and mutation operators that have a clear semantic effect.

  • Combine GSCP with Adam Optimizer.

  • GP, we can presresent an individual as a point a real n-dimensioanl semantic space.

  • Geometric smenatic operators:
    • Geometric semantic crossover (GSC).

    • Geometric semantic mutation (GSM).

  • Adapate Moment estimation (Adam, kingma2014adam), is a first order gradient-based optimization of stochastic functions.

  • GSCP makes a good jump in the solution space, then adam can refine a candidate solution.

  • Two approaches: (1) one-step GSCP then one-step ADAM (HYB-GSCP), or, (2) full GSCP then full ADAM (HCH-GSCP).

  • HYB-HSCP does better than HCH-GSCP.

Dominik Sobaria gave a talk on “Program Synthesis with Genetic Programming: The Influence of Batch Size”

Notes:
  • Program synthesis with genetic programming.

  • Anayluse perfromance and generalization ability of programs generated by GP.

  • Batch size effects programming synthesis.

2022-04-22 - ECRG

Location: Zoom, Time: Friday 3pm-5pm , Attendees: Bing Xue, Mengjie Zhang, Hui Ma, Bach Hoai Nguyen, Jesse Wood et al.

Notes:

2022-04-22 - Proposal Workshop

Location: Zoom, Time: Friday 3pm-5pm , Attendees: Mengjie Zhang, Jesse Wood et al.

We had a proposal workshop hosted by my former supervisor Mengjie Zhang.

Notes:
  • First (of many) proposal workshops about preparing this document.

  • Target adueince for workshops are first year PhD students (me).

  • PhD is a great way to go for a very good job later in life.

  • Don’t have to do theoretical work, but focus on application, key point is to make a major contribution to the field.

  • First year of PhD, we are a student not a candidate. A candidate has passed their proposal.

  • Bad PhDs can be converted into a masters degree - this happens if the proposal is very bad (try to avoid).

  • Read 50-100 papers in order to get a good grounding for a PhD proposal.

  • Some people do not have the capability to do a PhD, they can propose, but must demostrate their capability with preliminary work.

  • Structure:
    1. Introduction

    2. Literature survey (read) +

    3. preliminary work (code) +

    4. contributiosn (major) / miletones / thesis outline / resources.

  • Overall goal: a single over-arching scientific/engineering argument to unify my PhD stydies as one body of work.

  • 100,000 words (is a lot of words) is the expectation for a PhD thesis.

  • Minimize dependencies - encourage modularity for research objectives.

  • Coherence is very important for choosing research objectives.

  • Literature review should cover most recent work and domain specific (biology/chemistry) papers.

  • Honours work can’t be counted as preliminary work (but can be references as a citation).

  • Not required to publish - but encouraged to do so (prevents concurrent thinking issues later down the line).

2022-04-21 - EvoStar #3

Location: Madrid, Time: Tuesday 9.30pm - 11.30pm, Attendees: Jesse Wood, Bing Xue, Mengjie Zhang, Bach Hoai Nguyen, Jesse Wood et al.

The final day of the conference, involved a plenary talk and prize giving ceremonies. More information can be found on the full conference program - https://easychair.org/smart-program/Evo2022/

Pablo Geruas gave a EvoMUSART talk on “Evolutionary construction of Stories that Combine Several Story Lines”

Notes:
  • Star Wars used as an example during the talk for storylines (kudos :D).

  • Use evolutionary algorithms to evolve <unlegible> <unlegible> plot lines for a story.

  • Made up of 20 popular plot lines from a textbook, assign a plot line to each character.

  • Tricky to come up with a genetic representation for a story plotline.

  • Genetic representation must be manipulatable with genetic operators.

  • Difficult to design an objective/fitness function.

  • Fitness: validity - continuity of life and death, each character falls in love once (simplification).

  • Fitness function averages over all validity metrics to evluate fitness.

  • Eolvutionary approach good for creating multi-plotline stories with semantically valid discourse.

  • Speaker gave a similar application e.g. a robot that generates plot line scripts for a leage of legends game.

  • https://nil.fdi.ucm.ec/

Pedro Larrange gave the plenary talk on “Estimator of Distribution Algorithms In Machine Learning”

Notes:
  • Machine learning is a large focus of artificial intelligence nowadays.

  • Construct a model from data, perform (non)-parametric optimization.

  • Estimation of Distribution Algorithm (EDA).

  • Bayesian Networks (DAG + CPT), Directed Acyclic Graph (DAG), Conditional Probability Table (CPT).

  • Feature subset selection, Filter - only consider features, Wrapper - evaluate performance at machine learning task.

  • Classification - a greedy algorith.

  • Artifical Neural Network (ANN) - (Baluja 1995).

  • Logistic regresion (Roles et al 2008) - led to intrepretable results for complex models.

  • AdaBoost (Cagninin et al. 2018) - aggregated voting system between classifiers.

  • Hierarchcial clustering (Fan 2019).

  • k-Means (Forgy 1965), most popular culstering algorithnm, centroids shiftwed with hill climbing strategy.

  • Reinfrocment learning (Honda and Nishive 2008) - relies on conditional markov fields.

  • EDAs have not yet been applied to Support Vector Machines (SVM) - possible future work here (for me).

2022-04-26 - Thesis Workshop

Location: Zoom, Time: Friday 12pm-1pm , Attendees: Kirsten Reid, Jesse Wood et al.

Kirsten Reid, from learning support, hosted a Thesis Workshop for the Faculty of Graduate Research (FGR).

Notes:
  • Shape/formate appropriate for the field - i.e. Engineering / Artifical Intelligence.

  • Linking ideas between chapters - see About Face 3 for great example.

  • (Sub)headings are used effectively - maximum three levels of indentation for understandability by reader.

  • Topic sentence for each new paragraph.

  • If needed - glossary, acronyms, abbreviations - up front.

  • Reminder of key concepts when needed, jog the readers memory, guide the reader along (a thesis is long).

  • “Reader-friendly”, avoid jargon, use plain text.

  • Sentence length should vary to avoid monontonous tone (see https://bit.ly/38mTmAE)

  • Avoid too many nominisations - e.g. “we decided” rather than “we came to the decision” - focus on action verbs over their nouns.

  • Don’t have front-ended setnences, cut the wheat from the chaff, 1/3 of writing can be removed for brevity usually.

  • Make appointment with Student Learning - limited to 50 minutes.

  • Sentence cohesion worksop coming up soon.

  • Kirsten happy to reply to Emails for further advice on this workshop.

  • Three voices: research, data, researcher.

  • From FGR: 4 workshops coming up for PhD students in May.

2022-04-28 - Weekly

Location: CO 352, Cotton Building, Time: Monday 12pm-1pm , Attendees: Jesse Wood, Bing Xue, Bach Hoai Nguyen, et al.

Notes:
  • Can claim GC dataset research as preliminary work for the REIMS data in my proposal.

  • Transfer models from GC to REIMS data (likely) - this supports the preliminary claim.

  • Pre-training on NIST GC refraction index data - needs a parametric (neural network) model for this technique.

TODO:
  • [x] Apply Genetic Programming (GP) data to GC data.

  • [-] Apply EvoCNN to GC data.

2022-04-28 - FASLIP

Location: Zoom, Time: Monday 2pm-3pm , Attendees: Jesse Wood, Bing Xue, Bach Hoai Nguyen, et al.

Jesse Wood (me) showed a talk “Telsa AI Day (supercut)”.

Notes:
  • Disclaimer: I submitted this talk request on the 21st of April (7 days ago) - this is not political, just a coincidence.

  • “Iron Man in Real life” - a comment from the Zoom chat.

  • If you are interested in another high level summary:

  • Lex Fridman AI Day (summary by MIT researcher) - https://www.youtube.com/watch?v=ABbDB6xri8o

  • Here is the full video, it includes references to academic papers:

  • Full AI Day (3 hours long) - https://www.youtube.com/watch?v=j0z4FweCy4M

  • Supervisor liked my running commentary and links to further watching.

2022-04-22 - ECRG

Location: Zoom, Time: Friday 3pm-5pm , Attendees: Bing Xue, Mengjie Zhang, Hui Ma, Bach Hoai Nguyen, Jesse Wood et al.

General Notes:
  • Zhixing won best paper for EuroGP.

  • Shaolin had a TEVC paper revision.

  • Vincent and Tao had a paper accepted by CEC.

Talk from Xioying Sharon Gee on “Text Representation”

Notes:
  • Inverse document frequency - use to elminate words that are too common - e.g. “the”.

  • Word emeddings, analyse similarity (relateness) with PCA - reduce to lower dimensional space (2d / 3d).

  • CNN, Transformers, Attention, BERT, Pre-training.

  • Pre-training of deep bidirectional transformers for language understading (BERT).

  • Word masking, pre-training, next sentence prediction (NSP).

2022-04-02 - Research Workshop

Location: Zoom, Time: Monday 11pm-1pm , Attendees: Jesse Wood

FGR hosted a workshop “How do I publish and disseminate my research”.

Notes:
  • https://scopus.com - see citation statistics for papers/journals - useful metrics to judge quality.

  • Can see stats for a journal, and comapre this to other jounrals, to assess the credibility.

  • These tools are useful for making/measuring goals as a researcher.

  • http://login.webofknowledge.com/ - useful tool for mainly STEM disciplines.

  • Eslever Journal Finder - provices acceptance nodes. turn around times - useful for finding which journal is appropriate.

  • www2.cabells.com/jouranlytics - see a journal’s publishing frequency, may be anually, this is a long wait.

  • Consequences:
    • Who reads it?

    • Does it publish articles like yours?

    • Does your style match?

    • Would you need to change to submit?

    • Peer-reviwed?

    • Time to publish?

    • Substantial paper.

    • Tolerable rejection rate.

    • Preferred type of journal.

    • Solid reputation/metrics.

    • Many articles a year.

  • Keywords are good for SEO, they ensure the discoverability of work, and extend your audience.

  • Rejection happens, may provide feedback (may not).

  • Query letter - send an abstract to a journal to test the waters - see if a paper is appropriate.

2022-04-11 - Deep Learning

Location: Zoom, Time: Monday 3pm-5pm , Attendees: Mathew O’Connor, Bastiaan Kleign, Jesse Wood et al.

Matt O’Connoer gave a talk “Unsampling Artifacts in Neural Audio Synthesis” https://ieeexplore.ieee.org/abstract/document/9414913

Notes:
  • Imrpove the quality of audio using a neural network approach.

  • CNN is for images, but we need an architecture for audio specifically - this must handle audio specific problems (e.g. time, harmonics).

  • Convolution (collapse), transposed (expand).

  • Transposed convolutions are widely used.

  • Upsampling methods:
    • Stretch (insert zeros)

    • NN (nearest neighbours)

    • Linear

  • The unsampling algorithm leaves artifacts in the output sample.

  • Spectral replicas emerge when sampling/discretizing the signal.

  • All up sampling methods sound very similar - in the human audible range.

  • NN, no artifacts, but frequency filter at zero frequency.

  • Out of distribution test, shows major artifacts, since it wasn’t in training.

2022-05-05 - Weekly

Location: CO 352, Cotton Building, Time: Monday 12pm-1pm , Attendees: Jesse Wood, Bing Xue, Bach Hoai Nguyen, et al.

Notes:
  • Reading this week - Evolutionary ML Survey (60% done).

  • Apply GP to the GC fish dataset.

  • Later consider the time domain: terminal set, symbolic regression.

  • GP Hello World! Try find a tutorial for this.

  • DEAP does not have “Elitism” - I will need to implement my own.

  • Elitism:
    • Keep the top solutions between generations.

    • Ensures performance can’t decrease.

    • Does not guarantee performance will increase.

  • Future work - transfer learning can be:
    • Paramters

    • Model

    • Feature (selected/constructed)

  • Terminology; domain adaptation, domain generalisation.

  • N.B. I shoudl record my tutor meetings on zoom (even if they are in person) to make use of Panopto’s free transcription software while I am a staff member.

2022-05-05 - FASLIP

Location: Zoom, Time: Monday 2pm-3pm , Attendees: Peng Wang, Jesse Wood, Bing Xue, Bach Hoai Nguyen, et al.

Peng Wang proposed a talk from Prof. Zhihua Zhou on “From Pure Learning to Learning+Reasoning” https://www.youtube.com/watch?v=LAvRDCcXCMc

Notes:
  • Most machine learning techniques rely on large database of labelled training data.

  • We can leverage unlablled data, to improve performance on labelled data.

  • Self-learning, use a classifier to apply psuedo classes to unlabelled data.

  • SETRED 2005 - data editing, cleans up self learning.

  • Active learning (AL); uses an oracle to query (label) unlabelled data, rely on minimizing queries to oracle (this requires human supervision).

  • Representative AL approach - informative/representative.

  • AL requires human-in-the-loop.

  • Semi-supervised learning (SSL) - see Lex podcast for more details https://www.youtube.com/watch?v=FUS6ceIvUnI

  • Indepedndent and Identically Distributed (IID).

  • Semi-supervised SVMs (S3VMs) (Zemma 201u6, zemmal2016adaptative).

  • Using inlabeleld data to ensure the decision boundaries are drawn through low density areas.

  • Tri-training approach, three learners, tha can teach eacother, and perform ensemble learning.

  • Ensemble learning - uses multiple models and combines them to make a prediction.

  • Holy grail: machine learning + logical reasoning.

  • Probabilistic Logic Programming (PLP) - heavy-reasoning light-learning.

  • Statistical Relation Learning (SRL) - light-reasoning heavy-learning.

  • Proposal: abductive learning:
    • Deductive

    • Inductive

    • Abductive - Inversly embed deductive reasoning into inductive reasoning.

  • Knowledge Base (KB), a series of first-order logic predicates.

  • Instance –> Psuedo lables –> Psuedo groundings –> KB.

  • Optimize minimzed inconsistency in the system.

  • ABL does not rely on ground-truth labels.

  • SSL for court sentences in China, similar to Andrews work (see “2022-03-25 - ECRG” above).

2022-05-06 - ECRG

Location: Zoom, Time: Friday 3pm-5pm , Attendees: Bing Xue, Mengjie Zhang, Hui Ma, Bach Hoai Nguyen, Jesse Wood et al.

Notes:
  • Celebtration for AI group since Bastiaan recieved his RSNZ Fellowship - Wednesday 18th May from 12 pm - 1 pm in CO3250.

  • Hayden, camera-ready (?) paper for Gecko (?) workshop.

  • Candidate Development Plan (CDP) is due by the end of May - this is due every 6 months (see email https://mail.google.com/mail/u/0/#search/CDP/FMfcgzGpFgvPXpqBwmBtBvqBZPxxQFtB).

  • University policy for thesis students does not require use to publish any papers.

  • STEM has a publishing-forward culture when compared to other fields (e.g. law or humanities).

  • Evolutionary Algorithm:
    • Initialisation

    • Cycle
      1. Evaluation.

      2. Selection.

      3. Reproduction.

      4. Repeat 1-3 until termination condition.

  • Meng has not seen anyone miss genetic operators, but a fair number of research papers ommit the “evaluation” and “selection” sections.

  • DON’T OMMIT THESE SECTIONS! (Unless you have a very good reason not to).

  • Ideally, have a nice flow diagram in a paper, to explain the training process for the model (this figure is a good use of space!!!).

2022-05-09 - Deep Learning

Location: Zoom, Time: Monday 3pm-4pm , Attendees: Ciaran King, Jesse Wood, Bastiaan Kleign, et al.

Ciaran King gave a talk on “Learning from Protein Structure with Geometric Vector Perceptrons” (Jing 2020, jing2020learning) https://openreview.net/forum?id=1YLJDvSx6J4

Notes:
  • Perverse incenstives for acadmiecs to over sell their work and a major reproducability crisis in deep learning.

  • Graph nerual Networks can be used for protien folding.

  • Equivariance to rotations - if the networks thinks the same instance rotates is a completely different structure, this is very inefficient.

  • Instead we want rotation invariant representations for things like protiens. (Like we wan’t time invariant representations for gas chromatography).

  • Voxels are 3D pixels, these can be used to make a 3D representation of an instance, which then applies a 3D Convolutional Neural Network.

  • We think that (1) message passing and (2) spatial convolution, are both well suited for different types of reasoning.

  • In protein folding, their are chemical propoerties of protiens that simplify the combinatorial search space for the graphical neural network.

  • This is similar to how the AI Feynman (Tegmark 2020, tegmark2020ai) used properties of physics equations to simplify symbolic regression.

  • I would like to apply simplification using domain expertise in chemistry to my gas-chromatography and mass spectrometry data.

2022-05-12 - Weekly

Location: CO352, Time: Thursday 12pm-1pm , Attendees: Bach Hoai Nguyen, Bing Xue, Mengjie Zhang, Jesse Wood.

Notes:
  • Audio-only private recordings of weekly meetings for study purposes is ok.

  • Plant & Food collecting information on students (like myself) that have been working for them.

  • Conferences:
    • GECCO - Genetic Evolution Computation Conference.

    • EvoStar (attended in 2022).

    • IEEE CC - IEEE Congress of Evolutionary Computation.

  • Camera ready? This is ready for print, a final version of the paper that has responded to feedback and been formatted for the journal.

  • CDP - Candidate Development Plan, is due this month. Scope is for 6 months only. This is my first so I have no goals from previous CDP.

TODO:

2022-05-12 - FASLIP

Location: Zoom, Time: Thursday 2pm-3pm , Attendees: Cuie Yang, Jesse Wood, Bing Xue, Bach Hoai Nguyen, et al.

Cuie suggested the talk “You are what you read” https://www.youtube.com/watch?v=Du7qLsToW-o

Notes:
  • Light reading, figure fishing - 80% comprehension.

  • Skip to pictures like a kid, we are all kids at heart.

  • Deep read, creative/critical reading.

  • Shallow reading is important, a paper has four parts.
    1. Title

    2. Abstract

    3. Introduction

    4. Rest

  • Quick reading and shallow learning is most important for the majority of readers.

  • Reviwers are busy people, bathroom reviews, a lot of reviwed is based on superficial details.

  • Notes - a format for notes from skim reads of a paper:
    1. About

    2. Problem

    3. Interesting

    4. New

    5. Neat

  • Two word titles: “Snappy sampling”.

  • Find collegaues to collaborate with and share knowledge.
    • Give and take, time is a resource.

    • what NOT to read?

  • Future work:
    1. Good ideas

    2. Improvements

    3. Applications/Extensions

    4. Your opinion

  • Come up with your own:
    • Other domains

    • Moon-shit

    • Related ideas

TODO:

2022-05-13 - ECRG

Location: Zoom, Time: Friday 3pm-5pm , Attendees: Bing Xue, Mengjie Zhang, Hui Ma, Bach Hoai Nguyen, Jesse Wood et al.

General notes:
  • Possible to ahve future meetings in person sson - waiting on university to update their covid policy.

  • Bastiaan’s celbration is next week.

  • Meng asks who would like to go first at the start of each meeting. BE READY! FIFO

  • me: GA Tutorial, Evolutionary ML Survey, 3x Papers to read.

  • Journals are encouraging people to publish their work in interactive ways. See obersvable for visualization techniques.

Junhong Zhao from another department gave a talk on “AI Effects (AIX) in Computer Graphics (CG)”.

Notes:
  • Reconstructing reflection maps using a stacked CNN for Mixed Reality Rendering.

  • Automates/improved re4ndering practises with good estimation for reflection maps.

  • Gist: “Get accurate reflection maps on artificial objects in Augmented Reality Environments”.

  • Challenges: wide range of sensors/lens - people have different phones with different quality camersas.

  • A robust neural network is needed to handle out of distribution and real world data.

  • DLNet - “Adaptive Light Eximtation using Dynamic Filtering Terms”.

TODO:
  • [x] CDP is due this Monday, for part I. (finished!)

2022-05-06 - Deep Learning

Location: Zoom, Time: Monday 3pm-4pm , Attendees: Daniel Braithwaithe, Jesse Wood, Bastiaan Kleign, et al.

Daniel Braithwaithe gave a talk on “Estimating KL Divergence with Kernal Estimators”.

Notes:
  • MINE - Mutual Information Neural Estimation.

  • GANs suffer from mode collapse, due to a lack of diversity in GAN generators.

  • MINE labels as model and loss are not typically convex, so convergence is not guaraneteed.

  • Non-trivial to implement a MINE model.

  • Maxmimizing entropy is an intractable problem. Instead estimate thi for a GAN.

  • Instead use KKLE (KL Divergence using Kernal Estimators), KKLE is convex.

Maxwell clarke gave a talk on “Why Deep Learning Works”.

Notes:
  • Zero Eigenvalue, parameters can be shifted without affecting the loss.

  • Compress the area of the search space for efficiency.

  • Networks which used generalisable representations are “simpler” than networks that don’t.

  • Occam’s razor, or, ‘simple as possible but no simpler’ - Enstein.

2022-05-26 - Weekly

Location: CO352, Time: Thursday 12pm-1pm , Attendees: Bach Hoai Nguyen, Bing Xue, Mengjie Zhang, Jesse Wood.

Notes:
  • Add proposal to the CDP goals.

  • End of October - proposal due.

  • Set 18th June for Australassian AI paper submission as soft deadline for my first paper.

  • IP is 1/3 creators, 1/3 university, 1/3 commercial (note: still far better than previous industry experience).

  • Code is not IP, but is copy-writable.

2022-05-26 - FASLIP

Location: Zoom, Time: Thursday 2pm-3pm , Attendees: Cuie Yang, Jesse Wood, Bing Xue, Bach Hoai Nguyen, et al.

Bach (my supverisor) suggested the talk “The Master Algorithm” from Pedro Domnigos. https://www.youtube.com/watch?v=B8J4uefCQMc

Notes:
  • Knowledge: evolution, experience, culture, computers.

  • Each knowledge acquistion is an order of magnitude faster than the previous.

  • 5 tribes of AI: Symbolists, Connectionists, Evolutionaries, Bayesians, Analogizers.

  • Each tribute is related to a different field of study outside of machine learning.

  • Each 5 algorithms have a master algorithm, a agneral method that with enough data can learn anything.

  • Robotic machine that can perform biological experiments in drug discovery (see Lee Cronin Chemputer https://www.youtube.com/watch?v=ZecQ64l-gKM)

  • Google’s famout “cat” network, was an ANN trained on Youtube videos, not surpisingly it became very good at recognizing images of cats.

  • Koza took Evolutionaries one step fouther, by inventing genetic programming, representing a candidate solution in program semantics.

  • Bayesians are the most fanatical of the tributes. Strict adherence to statistical inference through Bayes Theorum.

  • As we see more evidence, probablites of certrain hypothesis will become more likely.

  • Weakness of AI, is it cannot predict events that are not explicitly given in training, things that have never happened have P(A) = 0.

  • Analogizers, Douglas Hoftstader, author of Godel Escher Bach.

  • Vladimir Vapnik is the creator of Support Vector Machines (SVM).

  • Kernal machines are the master algorithm for analogizers.

  • A master algorithm, for general intelligence, would include all 5 tribes of AI.

  • Pedro prposed the Markov Logic Network, objective function indepdendent.

  • All AI fits the evolutionary paradigm; Evaluation, Selection, Reproduction, Fitness.

2022-05-30 - Deep Learning

Location: Zoom, Time: Monday 3pm-4pm , Attendees: Daniel Braithwaithe, Jesse Wood, Bastiaan Kleign, et al.

Bastiaan gave a talk on “Distributed Processing”

Notes:
  • Knowledge required: graph theory, probability, convex optimisation, domain optimization.

  • Parallel vs Distributed Processing.

  • Dual Gradient Ascent.

  • ADMM, PPMM.

  • Parallel processing - fast as slowest node, central coordination.

  • Distributed processing - no central coordination, any connected topology, gossip consensus algorithm.

  • Gossip - take the mean of each group of nodes and communicate that between groups of nodes.

  • Duual Gradient Ascent - requires strong duality (e.g. Slater’s condition).

  • Dual Gradient Ascent is useful for distributed processing. * We assume f(x) is seperable, get dis<unlegible> * Instead of global optimization, we optimize locally.

  • Alternating Direction Method of Multipliers (ADMM).

  • Lasso (L1) regularization is the most common predecessor to ADMM.

  • ADMM converges much faster because it is quadratic.

  • Fast enough, their are faster algorithms, but this is fast enough in signal processing.

  • ADMM is a distributed MSE.

  • Primal-Dual Method of Multipliers (PDMM).

  • Lifting - add a new variable, then constrain them, this allows us to associate variables with models only.

  • PDMMM was developed for solving decomposible optimisation problems in a distributed fashion.

  • Note: traditional somewhat ad-hoc deviations, nicer alternative is is <unlegible> greater based.

2022-06-02 - FASLIP

Location: Zoom, Time: Thursday 2pm-3pm , Attendees: Christian Raymond, Jesse Wood, Bing Xue, Bach Hoai Nguyen, et al.

(Misc) Heiter Gomes is a new lecturer in the department. Christian Raymond proposed a talk “Aritificial Intelligence 1980s-2021 and Beyond” by Professor Jurgen Schimidhunter https://www.youtube.com/watch?v=pGftUCTqaGg&t=1s

Notes:
  • Long-short-term memory (LSTM)

  • Recurrent Neural Network (RNN)

  • What is predicatbale is compossable [sic]

  • RNN requires unsupervised pre-training.

  • LSTM does not require unsupervised pre-training - an improvement on the RNN.

  • DanNET fast deep CNN based image processing revolution in 2011.

  • Highway nets (May 2015) - first nerual network with over 100 layers, e.g. ResNET.

  • DL networks suffer from the problem of vanishing and exploding gradients.

  • LSTM used for google speech recognition, Amazon Alexa, (Samsung Bixby xD )

  • Compressed network search was the first RL to learn policies from video for a controller.

  • Reinforcement learning LSTM, 2007-2010.

  • 2019, deep mind bet Starcraft player.

  • 2018, OpenAI Five, bet competitive players in Dota 2.

  • World models + RL + controller - e.g. RoboCop AI soccer.

  • Motivate controller to design experiments to improve the world model.

  • 1990 Generation Adversarial Networks (GANs).

  • nnaisense - the dawn of AI - his company.

  • AI in 3D printing - additive manufacturing.

2022-06-03 - ECRG

Location: Zoom, Time: Friday 3pm-5pm , Attendees: Bing Xue, Mengjie Zhang, Hui Ma, Bach Hoai Nguyen, Jesse Wood et al.

General Notes:
  • Mass exodus of academics from university (Nature article).

  • New PhD students have to be in New Zealand to study at VUW.

  • Christian Raymond had a very intuitive video about AI history yesterday at FASLIP.

  • Make sure simiarlity score on Turntin is less thean 25%, bearing in mind we can plagarize ourselves.

  • Me: reading (lehman 2020, lehman2020surprising), writing my first draft of my paper for Australassian AI.

Fangfan Zhang gave a presentation on “Responding to Reviewers”

Note:
  • Peer review processing - some papers are accepted with major/minor reviews.

  • AS long as a paer is accpeted, this is a success, in the peer-review process.

  • R1, R2, R3, … -> Revision 1, Revision 2, Revision 3, …

  • It is common to have up to 4-5 revisions when submitting a paper to a journal.

  • 7 is the maximum number of revisions within the ECRG group.

  • Important to take the revision process very seriously, when publishing as an academic.

  • When first receiving feedback, keep calm and carry on, read all feedback before responding.

  • Respond to each point when replying to a reviwers comment.

  • Can disagree with reviwers comments, but be professional about it, be resepctful.

  • Make is easy for your reviwers!
    • Write a cover letter.

    • Copy text directly for short changes.

    • Reference for larger sections.

    • Use color to highlight changes.

  • A good revisions would mean the reviewer does not have to re-read your original paper to accept the changes.

  • Summary:
    • Tale a break.

    • Point-by-point response.

    • Well reasoned arguments.

    • Pay attention to details.

    • Appreciate reviewers work.

  • Papers need to make a contribution to science/journal.

2022-06-09 - Weekly

Location: CO352, Time: Thursday 12pm-1pm , Attendees: Bach Hoai Nguyen, Bing Xue, Mengjie Zhang, Jesse Wood.

Notes:
  • Print a bigger vesion of the paper for next time, fit to page setting.

  • Meng uses emacs, Pondy used to use emacs for everything (i.e. email).

  • GP paper on GC-fish data.

  • GP - set max tree depth to 8.

  • Protected division - don’t divide by 0 -> NAN.

  • Later, use more compelx GP variations.

  • Australassian AI, papers need novelty.

2022-06-02 - FASLIP

Location: Zoom, Time: Thursday 2pm-3pm , Attendees: Quinglan Fan, Jesse Wood, Bing Xue, Bach Hoai Nguyen, et al.

Quinglan Fan shared a talk on “The Extremes of Interpretability” https://tads.research.iastate.edu/dr-cynthia-rudin-extremes-interpretability-machine-learning-sparse-decision-trees-scoring-systems

Notes:
  • Interpretable machine learnings models objey a domain-specific set of constraints.

  • Needed for high-stakes or troubleshooting, important to have interpretable models for real-world problems.

  • Wildfires CA, Google Air Quality, breezeomoter ML model failed.

  • Glenn Rodriguiz, COMPAS score, AI that bas borken left a man in prison.

  • Florida COMPAS data on recidivim (similar to Andrew Lenson’s talk from 2022-03-25 - ECRG).

  • Blackbox model that is used in the criminal justice system.

  • Correctional Offender Management Profiling for Alternative Sanctions (COMPAS).

  • Very basic decision tree, corels, was able to outperform the COMPAS model.

  • c4.5 and Cart are greedy top down algorithms that often overfit the training data.

  • 1990s: non-greedy algorithm, and, statistiscians improve splitting criteria.

  • Genetic Programming has been tried to create fully optimal decision trees.

  • GODST 2020, fastest algorithm by 3 orders of magnitude.

  • Dynamic programming/ brnach and band
    • Eliminate duplication.

    • no need to solve pure leaves.

    • reduce search space by theorum:
      • minimum support bound.

      • onestep lookahead.

  • Generalized and Scalable Optical Sparse Decition Tree (GODST)

  • Improved representation:
    • store only the leaves.

    • use bitvectors -> crazy fast.

    • Extened computation (avoid repetitions) - caching

    • Consolidation of repeated sub-problems.

  • Results:
    • Similar classification range.

    • For custom lost, much better than greedy algos.

    • Sparser than all heuristic models.

    • orders of magnitude faster than next best model.

  • Explaining deep NNs with saliency maps does not work.

  • i.e. given an image of a dog, the salience map for a Husky and a Music Instrument both examine the same areas.

  • “This Looks Like That”, that forces a blackbox NN to be intereptable with a prototype layer.

  • CUB-200, a staple computer vision dataset on birds.

  • Even for black-box NN, we can have an interpretable model.

  • Interpetable AAI algorithm for Breat Lesions (IAIA-BL).

  • An AI, that used domain specific apporach of radoilogy, and presents the radiologist an interpretable model,

  • They can understand the reasons behind decisions, be skeptical, and disagree or agree based on reason-based interpretable models.

2022-06-10 - ECRG

Location: Zoom, Time: Friday 3pm-5pm , Attendees: Qurret Ul Ain, Bing Xue, Mengjie Zhang, Hui Ma, Bach Hoai Nguyen, Jesse Wood et al.

Qurret Ul Ain gave a talk on “Artificial Intelligence to Diagnose Skin Cancer”

Notes:
  • Early detection of cancer is fundamental in saving lbies.

  • Moles are an example of Skin Cancer.

  • Combine knowledge of dermetology + computer vision.

  • A multi-disciplinary research project (like mine).

  • Skin cancer can spread to the body, in later stages, it can reach the blood vessels, and lead to tumours.

  • Skin cancer is a disprportionately high cuase of mortality in New Zealand - 4,000 diagnoses, 300 deaths per year.

  • Causes: radiation, modles, high altitude, immune system, hereditary.

  • Diagosis:
    • A - Assymetry

    • B - Border

    • C - Colour

    • D - Diameter

    • E - Evolving

  • Tool: punch biopsy (skin biopsy) - manually check a mole (very painful procedure).

  • Computer diagnostics let a dermetologist determine if a punch biopsy is needed.

  • Punch biopsy (painfuly) barrier of entry to diagnosis.

  • This motivates computer vision techniques for detection.

  • Multi-stage classification systems requrie significant expertise; segementation, remove hair, ambient lighting, rotation.

  • Existing methods rely on gray-pixel computer vision, but colour is important too.

  • Lens (camera) callbiration varries due to different resolutions of camera.

  • Limitations of NNs, reduce all to 256x256 pixel images, this distorts the aspect ratio.

  • Hair removal - is this because of the training data? Because a dermatologist shaves a patients hair before scanning.

  • Multi-tree GP has achieved better performance than single tree.

  • Multi-tree is similar to multi-cellular organism from biology.

  • Local Binary Patterns (LBP) is a dense image descriptor.

  • Uses a sliding window of foxed raidus, it computes the value of the central pixel based on the intensities of its neighbouring pixels.

  • Can concatenate LBP frm multiple channels (i.e. RGB) to construct a new feature vector.

  • Classifications:
    • Benign (not harmful)

    • Malignant (harmful)

  • Can extract tabular data with domain specific knowledge, interpretablity “we know these work”.

  • Same-index-crossover-mutation - group together features for one method from constructed feature vector - i.e. Blue channel LBP.

  • j48, NB, SVM, KNN.

  • Analyze the feature apperance as feature frequency, then relate to domain expertise/OG features, to get interpretable models.

  • Wavelet: different-scale information (mean/std summary statistics).

  • “Wavelet” features have perormed well in the existing literature; Exsemble classifier (i.e. AdaBoost) performs very well.

  • Emsemble balances the apporach of several classification algrotihms, voting/bagging, best performance.

  • Wrapper: based on the performance accuracy on a subset of the training data labels (y).

  • Filter: based only on the features (X).

  • Important to use balanced accuracy for imbalanced datasets, otherwise results are biased towards the majority class.

  • GP has the opportunity to generate new knowledge that can be verifiedand encorporated by domain experts into their field.

TODO:
  • [-] READ: GP for Multi-FC in skin cancer image classification.

  • [-] READ: Two-stage GP for auotmated dianoging Skin Cancer from Multi-Modelity Images

2022-06-17 - ECRG

Location: Zoom, Time: Friday 3pm-5pm , Attendees: Cui Yang, Bing Xue, Mengjie Zhang, Hui Ma, Bach Hoai Nguyen, Jesse Wood et al.

General notes:
  • No meeting next week due to the (first) celebration of Matariki as a public holiday.

  • Writing is an important part of research, we refine our ideas, and correct our incorrect assumptions, we organize our thoughts.

  • For example, when re-writing my draft paper, I realized my understanding of treated fish biomass waste was incorrect.

  • Something I would have never realized if I did not explicitly have to re-write my introduction section.

Cui Yang gave a talk on “Domain Adaptation”.

Notes:
  • Machine learning learns to predict data from identical and indepedendly (iid) probability distributions.

  • Transfer learning, use knowldge from one problem and apply this to another problem.

  • Motivation: training ML models can be expensive, requires hundreds of GPU/TPU hours.

  • Goal: improve performance of a model by training it on data from a source domain, to improve performance in the target domain.

  • Source: domain to extract valuable information from.

  • Target: area to improve performance in.

  • Types:
    • Instance

    • Feature

    • Paramater

    • Rational-knowledge

  • Instance based adaptation; KMM, AdaBoost.

  • Feature based adaptation; CORAL, TCA, Class-conditional.

  • Deep domain adaptation; MK-MMD, CORAL, GRL, HDA.

  • Domain invariant and specific parts in prepresentations.

  • Partial domain adaptation: classes in target domain are a subset of that in the source domain.

  • Transfer learning in dynamic environments; concept-drift tolerance, hybrid ensemble approach.

2022-06-20 - Deep Learning

Location: Zoom, Time: Monday 3pm-4pm , Attendees: Jesse Wood, Bastiaan Kleign, et al.

Jesse Wood (me) gave a talk on the paper “Hierarchical Text-Conditional Image Generation with CLIP Latents” (ramesh 2022, ramesh2022hierarchical) https://arxiv.org/abs/2204.06125

Notes:
  • CLIP trains an auto-enocder to have minimize the distance between image and text embeddings in the latent space.

  • Those image embeddings are fed to an autoregressive or diffusion prior to generate image embeddings.

  • Then this embedding is used to condition a diffusion decoder which produces an image.

  • The model is trained on 250 Million images, and has 3.5 billion parameters.

  • We can use CLIP to interpolate between two images in the latent space.

  • As we increase the dimensionality of the latent space we can represent more complex hierarchical structures.

  • CLIP fails at producing text, and reconstruction can mix up objects and their attributes.

2022-06-23 - Weekly

Location: CO352, Time: Thursday 12pm-1pm , Attendees: Bach Hoai Nguyen, Bing Xue, Mengjie Zhang, Jesse Wood.

Notes:
  • After finished Australassian AI paper, start writing the proposal.

  • Write the motviation and research objectives.

  • Good to finish proposal early - before Christmas!

  • All important conference deadliens are in Jan - Feb.

  • Eye contact on zoom, is not possible with cameras and remote audience.

2022-06-23 - FASLIP

Location: Zoom, Time: Thursday 2pm-3pm , Attendees: Jiablin Lin, Jesse Wood, Bing Xue, Bach Hoai Nguyen, et al.

Jiablin Lin showed a talk on “The Three Magic Ingredients of Amazing Presentations” https://www.youtube.com/watch?v=yoD8RMq2OkU

Notes:
  • Three things to consider:
    1. The Audience

    2. The Speaker

    3. Transformation

  • Call to actions (CTA), what can the audience do?

  • Then we re-arrange the roadmap and can get the presentation order.

  • Can use this technique for speeches, emails, proposals.

  • Common mistake to put too much information into a presentation/paper, sometimes less is more, more important to change mind of audience.

Jiablin Lin suggested “The margical science of story-telling”. https://www.youtube.com/watch?v=Nj-hdQMa3uA

Notes:
  • A 99 cent horse head was sold for $62.00.

  • He bought random items of little value, then attactched a sory to each object written by authors, this increased their value a thousand-fold.

  • The more emotionally invested in something, the less critical you are of that thing.

  • 13 months after “honey-moon” period, our neo-cortex and objective reasoning comes back.

  • The same thing ahppens during an advertisment that aims for brand experience - i.e. Coca Cola.

  • All story-telling generates dopamine, because we are waiting for soemthing, i.e. cliff hanger.

  • Oxytocin makes people bond towards a person, it makes them feel human, we can do this by telling a person a story that makes the audience empathetic.

  • Endorphins, laughing makes the audience relax, comfortable and more open.

  • Devils cocktail; high levels of cortisel and adrenaline.

  • Functional story telling:
    1. Everybody is a good story-teller from birth.

    2. Write down your stories! 3-4x times more than you think.

    3. Pick story to enduce emotions you want.

2022-06-27 - Deep Learning

Daniel Braithwaite gave a presentation on “Audio Source Seperation - Using GAN/flow priors”

Notes:
  • We have two unknown source signals, mixed together into a single signal.

  • Minimize distance between the original source and generated source, create a latent space that accurately encodes the mixed audio signals and their individual source signals.

  • Proposed approach is low audio quality, but it can seperate two audio signals.

  • Problem with GAN approach is Mode Collapse (not iid data) and low quality audio.

  • Flow based generateds, train an AI model to generate instruments from an audo strack.

  • These flow-based models use the inverse STFT.

  • Easy to verify the accuracy of an audio seperation model by listening to its output.

  • Problems with likelihood maxmimisation, highly discontinious, “deep generative models don’t know what they don’t know”.

Bastiaan Kelign breifly discussed a paper “Estimating the Optimal Covariance with Imperfect Mean in Diffusion Probabilistic Models” (bao 2022, bao2022estimating) https://arxiv.org/abs/2206.07309

Notes:
  • Diffusion Probabilistic Models (DPM) are special Markov Models with Gaussian Transitions.

  • Paper shows how to go from noisy-to-clean with a deterministic process.

  • A new approach to diffusion based models.

2022-06-30 - FASLIP

Location: Zoom, Time: Thursday 2pm-3pm , Attendees: Junhong Huang, Jesse Wood, Bing Xue, Bach Hoai Nguyen, et al.

General notes:
  • FASLIP was started 5-6 years ago in September 2016.

  • Ying Bi’s last day in New Zealand - she was in FASLIP from the beginning.

Junhong Huang shared a lectured from Andrew Ng “Lecture 8 - Career Advice / Reading Research Papers” https://www.youtube.com/watch?v=733m6qBH-jI

Notes:
  • Two topics: (1) Reading research papers, and (2) career advice.

  • Many PhD students learn to read papers by osmosis, from those around them, picking up tips and tricks.

  • Andrew Ng outlines his method for reading papers.

  • Start with a complete list of papers (including blogposts / Github).

  • Skip around the list.

  • How many? 5-10 papers is good for implementation. 50-100 is needed for thorough understanding and research.

  • Everywhere he goes, he has a stack of papers that he takes around with him.

  • He reads 6 papers a week, and presents 2 to a research group.

  • Take multiple passes over a paper:
    1. Title, abstract, figues.

    2. Intro, conclusion, figures + skip rest.

    3. Read paper + skip math.

    4. Whole things + skip parts that don’t make sense.

  • Questions (after readings):
    • What did authors try to accomplish?

    • What were the key elements?

    • What can you use yourself?

    • What other references do you want to follow?

  • Math:
    • Rederive from scratch.

    • Art Gallery, we see art student’s sitting on the floor copying the work of the greats.

    • Do the same for Machine Learning, copy the greats, learn to dervice (and even invent) algorithms.

  • Code:
    • Run the open-source code.

    • Re-implement from scratch.

  • Steady reading, not short bursts! Sparse repetitions work better than cramming.

  • Goal for most PhD students is a job (either big company or startup).

  • Either way, we intend to do important work! Leave the world a better place than we found it!

  • Recruiters are looking for skills (ML + coding), and meaningful work.

  • The idea candidated is a “T” shaped individual, someone who has a breadth of knowledge in AIML, and depth of knowldege in one or two areas of expertise.

  • Course work is an efficient way to gaina bredth of knowledge in AI field. Also +reading papers, +relevant projects.

  • Saturday morning problem: (1) read paper, (2) open-source contribution, (3) project, (4) TV?

2022-07-01 - Weekly

Location: CO352, Time: Friday 12.30pm-1pm , Attendees: Bing Xue, Mengjie Zhang, Jesse Wood.

Notes:
  • SIGEVO member for free virtual attendence to GECCO conference.

  • Conference is in July 9th - 13th 2022, Boston Masacheseutts.

  • Focus on tutors, conference has different tracks - i.e. GP, Neural Evolution, GA, Swarm Intelligence.

  • Big names in Evolutionary computations are coming to Wellington. Good opportunity to meet.

  • Math is an important part of machine learning, different areas of math required for different tribes of AI.

  • Bach is on holiday…

  • Writing the paper is taking longer than I thought it would. But making slow progress…

  • Deadline for Australassia AI is July 17th (just over two weeks away).

TODO:
  • [x] Email Yi to become a member.

  • [x] Register for GECCO 2022.

2022-07-01 - ECRG

Location: Zoom, Time: Friday 3pm-5pm , Attendees: Shaolin Wang, Bing Xue, Mengjie Zhang, Hui Ma, Bach Hoai Nguyen, Jesse Wood et al.

General notes:
  • Any time after 1 pm is actaully 10 monutes past that time, so 1 pm is 1.10 pm, to give students a 20 minute break for lunch.

  • Impact factor for 2021 was released. Many top journals in AI increased their impact factor - i.e. IEEE TPAMI (>24 now).

  • Register as a SIGEVO member ($10 USD), Yi will buy membership, free attendence virtually to GECCO.

  • Wolfgang Baanzhaf https://www.cse.msu.edu/~banzhafw/, a foundational figure in Genetic Programming, is visiting the university soon.

  • AI undergraduate major has been approved by the university, I am involved in advertising for this major.

Shaolin Wang gave a talk on “LRE for GP-evolved Policies for UCARP”.

Notes:
  • Uncertain Capacitated Arc Routing Problem (UCARP).

  • Use genetic programming to provide hyper-heuristics.

  • Benefits: flexible representation, less domain knowledge, potential interpretability.

  • Existing work to increase the interpretability of routing policy.

  • Hard to describe the full mapping learned by a model.

  • Local explanations, easier to give explanations to less complex solution subspaces.

  • Local Ranking Explanations (LRE) method, is used to explain the behaviour of routing policy in a decision situation.

  • Dataset - 6 representative UCARP problems.

  • Results - by error, correlation and consistency.

  • Table shows a linear model can represent the LRE for a routing problem.

  • Proposed a new LRE method for UCARP problems.

  • This presentations recieved a best paper nomination for GECCO 2022.

2022-07-06 - Deep Learning

Bastiaan Kleign discussed a paper “Denoising Diffusion Implicit Models” https://arxiv.org/abs/2010.02502

Notes:
  • Other approaches: Flowbased, VAE, auto-regressive, WaveNext+

  • Diffusion; reverse forward process where we gradually add noise.

  • Problems: over-denoise, extremely computationally complex, many steps - same number backwards as forwards.

  • Can be derived from Langevin Diffusion equation from physics.

  • Consider a forawrd Markov Process that gradually replaces the signal with Guassian noise.

    \(q(x_t|x{t-1})= N(x_t;\sqrt{1-\beta_t} x_t - 1, \beta_t I)\) where \(x_t = \sqrt{\alpha_t} x_{t-1} + \beta_t \epsilon_t\)

  • \(\beta_1,...,\beta_t\) is the noise schedule (fixed by designer), but some new papers automate schedule selection.

  • We assume we can approximate the inverse process with a reverse Markov Process.

  • Naturally objective function is the cross entropy - negative log liklihood –> KL, which can be made tractable with Jensen’s inequality.

  • The symbol \(\epsilon\) is the noise, the objective function can be reformulated to predict the noise \(\epsilon\).

  • This diffusion method can generate images of people that do not exist, but this takes 100,000 steps, so diffusion models are very slow.

  • Langevin’s Equation: \(m\frac{d^2x}{dt^2} = - \gamma \frac{dx}{dt} = \nabla V_t + \eta(t)\), where \(x\) is particle location, \(V\) is a stiationary potential, \(\eta(t)\) is a random force

  • From physics, requires stochastic calculus.

  • Choise \(v(x) = - \log \pi(x)\), with \(\pi(x)\) an equilibrium density, and no accelation term, results in random walk sampling from \(\pi(x)\). \(\nabla V\) is then the score.

  • A more generalized diffusion approach satisifed the equations, but is no longer a Markov Process.

  • Forward process is no longer Markovian, but Backward process is.

  • If its not Markovian, we donot need to add noise each step, we only add noise at the end.

  • The actual forward steps can be derived from BAtes Rule, but we don’t need them, we only need Eq. (9) for training.

  • When \(\sigma_{t=0}\) we end up with a deterministic relationship between the noise \(\epsilon\) and the output.
    • Backward process is deterministic.

    • Map straight from noise to output.

  • We can interpolate between two noise inputs, and get meaniful output, due to deterministic nature of noise-output mapping.

  • In DDIM the initial state is the only place where stochasicity occurs –> meaningful interpolation.

  • No requirement for number of steps foward and back to be the same, can choose fewer steps (the usually choose 100) for backward process.

  • Neural Ordinary Differential Equations (NODE).

2022-07-06 - ECRG

Location: Zoom, Time: Friday 3pm-5pm , Attendees: Shaolin Wang, Bing Xue, Mengjie Zhang, Hui Ma, Bach Hoai Nguyen, Jesse Wood et al.

General Notes:
  • Due to time difference for GECCO, rest during the day between sessions for wellbeing.

  • Be there for best paper nominiation for GECCO conferences - will be early in the morning.

  • Yi has reigstered those who asked as SIGEVO members, we get free virtual attendence to GECCO.

  • A lot of students are getting sick - important to take care, so we can recover and back at it.

  • Attend Shaolin’s GECCO session on Tuesday, so we can vote for his best paper nomination.

Ruang Jaio gave a presention on “Solving Multi-objective feature selection problems in Classification via Problem Reformation and Duplication Handling”

Notes:
  • “Data and features determine the upper limit f machine learning, and models and algorithms just approach that lmit.”

  • High-dimnesional issues:
    1. Curse of dimensionality.

    2. Not all features are useful.

  • To address (1) there are techniques:
    • Feature selection - pick a subset of the relevant.

    • Feature extraction - create new, smaller set of features.

    • Feature construction - produce a new high-level feature using some original features.

  • FS motivation
    • Improve classification performance.

    • Reduce the dimensionality.

    • Simplify learnt model.

  • Multi-objective motivation
    • A set of non-dominated features can be found taht meet real-world applications.

  • FS categories:
    • filter - only consider data.

    • wrapper - compare to classification performance.

    • embedding - ?

  • Proposed model is evluated against many state-of-the-art moethods and on many real-world datasets.

  • Combination of duplication/constraint handling gives better results.

  • Sequential search:
    • Forward

    • Backward

    • Bi-directional

  • Issues with sequential search - gready, easy to fall into local optimum.

  • Stochastic search - using evolutionary algorithm.

  • Pros:
    • global search,

    • does not require domain knowledge.

  • Issues with mutli-objective feature selection (MOFS): their are frequently appeared duplication in the objective space, leads to poor diversty and permature convergence in the population.

  • Goals: develop a novel approach to balancing the preferences between classification accuracy, number of features, reduction of duplication.

  • Genetic operators:
    • Crossover - single point crossover

    • Mutation - bit flip generation

  • Novelty: duplication handling - only duplicate solutions inthe objected space that are not dominated by other solutions are retained int he population.

  • non-dominated solution - in a multi-objective problem, a non-dominiated slution if a candidate solution, if values generated by one (or more) objective functions reduces the quality of values by the other objective function. These candidates are paretor optimal - we can explore the pareto front to get th set of pareto optimal solutions. All pareto optimal solutions are considered equally good and considers equal to eachother (Khanmohammadi 2021).

  • Constraint handling: select solutions that satisfy constraints preferences btween the difference objectives.

2022-07-09 - GECCO #1

Location: Boston, Masacheseutts Time: Saturday 12:30am - 2:00am, Attendees: Frank Neuman, Jesse Wood et al.

Frank Neuman gave a presentation on “Evolutionary Diversity Optimization”.

Notes:
  • Diveristy plays a crucial role in EC.
    • prevent premature convergence.

    • allows for pareto-optimal set of solutions.

  • Goals:
    • A good set of solutions that differ in terms of interesting features/attributes.

  • TSP - Travelling Sales Problem, give a set of nodes, determine the path for a round trip.

  • USe evolutionary algorithsm to evolve isntances. Because constructing instances if a very difficult problem in general.

  • Genetic operators: explosion and implixion operaters. The mutations operators such that they have a much higher impact on their points.

  • Produce diverse image sets using evoluitionary computation methods.

  • USe the \((\mu + \lambda)_{EA}\) for evolving images instances.

  • Select individuals based on contribution to diversity of the image.

  • Discrepency-based evolutionary diversity optimization.

  • Adapative random walk mutaltion operator.

  • Use of star discrepeny measure for diversity optimization of images.

  • Indactor-based multi-objective optimization.

  • Diversity optimization makes a multi-obkjective indicate based search for diversity for a single-objective problem.

2022-07-10 - GECCO #2

Location: Boston, Masacheseutts Time: Saturday 12:30am - 2:00am, Attendees: Deppo Izzo, Jesse Wood et al.

Depp Izzo gave a keynote address on “Optimization Challenges at th European Space Agency (ESA)”.

Notes:
  • Cassini (NASA), controlling the velocity of a spaceprobe to nagivate our solar system.

  • Space-probes slingshot around planets, a solution to the three body problem, to get free rotation from the mass of a stellar object.

  • Deep Space Maneuver (DSM).

  • Messenger (NASA), had high thrust ion engines to adjust its orbit around the sun to allow its trajectoty to reach another planet, Venus.

  • Hyabusa (JAXA), ion propulsion is an electric engine, DSM are more difficult to design tractories for, but we can pull it off with less mass, and non depending on planetary orbits (more frequent less time constrained flights).

  • Dawn (NSAN), Repi Columbo (ESA).

Interplanetary trajectories, GTOC problems - an America’s cup fo Rocket Science, design near impossible trajectory problems. * Near-to-impossible interplanetary trajectory problems: compelxity ensures a clear competition winner. * Gloabl Trajectory Optimization Competition (GTOC). * GTOC 1; save the earth problem, won by NASA Jet Propulsion Labratory (JPL).

  • An asterord/meteor is heading to Earth, and we must optmize trajectory to reach it.

  • GOTC 2: Multiple asteroid rendevous, visit 4 asteroids.

  • Trajectory optimiztion is a multi-objective problem; optmize mass and time.

  • GOTC X; conline the galaxy as uniformly as possible (Sumbission by JPL).

  • ESA Libraries:
    • pyrep - trajectory problem building blocks.

    • pygmo2 - massovely parelled evolutions.

    • dcpy - differential genetic programming.

    • GTOC portal - very difficult trajectory problems.

    • optimize.esa.int - a gym from ESA with problems.

  • Building blocks:

  • Lagrange prpagation - predicts the time evoltoin of an orbit from starting conditions.

  • Lambarts problem - going form on planet to another in a fixed time.

  • Mivovitch slingshot, velocity is convered in slingshot, design the amount of rotation.

  • Optimization problems:
    • Mutiple Gravist Assist Interplanetary (MGA).

    • MGA-1DSM, only one DSM is allowed between each <unlegible>.

    • MGA-LT, only low thrust maneuvers.

    • Tours and multiple visits.

  • Problems vary largely as we increase the number of spatial dimensions.

  • Designing an adequate representation to capture the problem was the most important part - needed to get results that make any sense.

  • took +8 hours to perform the evolutionary search.

2022-07-03 - Deep Learning

Location: Zoom, Time: Wednesday 3pm-4pm , Attendees: Jesse Wood, Bastiaan Kleign, et al.

Bastiaan Kleign gave a discussion of the paper: “Bayesian Deep Ensemblers via the Neural Tangent Kernal” https://proceedings.neurips.cc/paper/2020/hash/0b1ec366924b26fc98fa7b71a9c249cf-Abstract.html

Notes:
  • Neural Tangent Kernel (NTK) is a kernel of deep learning ANNS, it describes the evolution of the ANN during training by gradient descent. https://en.wikipedia.org/wiki/Neural_tangent_kernel

  • Scalar output case:
    • An ANN with scalar output consists of a family of cuntions \(f(.,\theta): \R^{n_{in}} \rightarrow \R\)

    • parameterized by vector of parameters \(\theta \in \R^P\).

    • The NTK is a kernel \(\theta(x,y;\theta) = \sum_{p=1}^{P} \partial_{\theta_p} f(x;\theta) \partial_{\theta_p} f(y;\theta)\)

  • (Jacot 2018, jacot2018neural) proposed that an infinite-width limit, ANNs have a Gaussain distributions described by the Kernel of Gaussian processes.

  • (Lee 2019, lee2019wide) Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent.

  • (He 2020, he2020bayesian) We exmploe the link between deep ensembles and Gaussian Processes (GPs) through the lens of Neural Tangent Kernels.

  • See (Rajat 2019) blog post at https://rajatd.github.io/NTK/ for a great explanation of NTKs.

2022-07-13 - Weekly

Location: CO352, Time: Friday 12.30pm-1pm , Attendees: Bach Nguyen, Bing Xue, Mengjie Zhang, Jesse Wood.

Notes:
  • Abstract deadline - only need paper title and an abstract - deadline: Monday 18th of July.

  • Submission deadline - submit the finished paper for acceptance/review - deadline: Monday 25th of July.

  • Shaolin got the best paper award for the GECCO 2022 with his LRE for GP based UCARP.

  • Next draft - for Bach’s feedback - deadline: Friday 15th of July.

  • Standard PSO with single-objective function. Combines error and selection ratio. Can cite (Kennedy 1995) in paper.

TODO:
  • [x] Next Draft - Friday 15th of July.

  • [x] Abstract Registration - Monday 18th of July.

  • [x] Revision - Tuesday 18th of July.

  • [x] Submission deadline - Monday 25th of July.

2022-07-14 - FASLIP

Location: Zoom, Time: Thursday 2pm-3pm , Attendees: Ramya, Jesse Wood, Bing Xue, Bach Hoai Nguyen, et al.

Ramya suggested a talk from Yoshua Bengio called “Yoshua Bengio: Deep Learning Cognition | Full Keynote - AI in 2020 & Beyond “ https://www.youtube.com/watch?v=GibjI5FoZsE

Notes:
  • He talks about deep learning and whate are the future perspectives. He makes three mains points:
    1. We must build a model which learns meta-learning causual effects in abstract space of casual variables. This requires a necessity to quickly adapt to change and generalize out-of-distribution by sparsely recombining modules.

    2. The necessity to acquire knowledge and encourage exploratory behaviour.

    3. The need to rbidge the gap between the aforementioned (1) and (2) ways of thinking, with old neural networks and conscious reasoning taken into account.

  • Issues: sample complexity, human-in-the-loop, autonomy.
    • Sample complexity - we need a large number of samples to get performant deep neural networks.

    • human-in-the-loop - domain expertise is requried to verify ifficaacy of a model or create labelled training sets.

    • Babies can learn on their own, without supervision they learn a physics model of the world.

    • Generalize to out-of-distribution data.

  • Humans are able to learn out-of-distribution data. E..g tell a human a science fiction story, despite never visiting space or seeing aliens, we have a complex conceptual framework where we can recombine aspects from previous narratives to synthesise an understanding of a novel scenario.

  • When we humans learn a new concept we update a small part of our existing owrld model and adapt our knowledge efficiently. Most machine learning modles require full re-training to learn to handle new instances.

  • Models need to learn high0level abstractions of the knowledge, good to distentation representations. (Higgen 2018, Ramesh 2022).

  • Afforadance - we conceptualize the world, not as a set of objects, but as doing things, with a purpose or use, that can be manipulated by humans.

  • Richard Dawkins an evolutionary bioligist proposed “The Evolutionary Imaginiation: Animals as models fo their world” (Dawkins 1995).

  • Attention (Vaswani 2017, vaswani2017attention) is focusiing on one thing at a time, analogous to consciouness, what we consider important?

  • Humans store episodic memories in their hippocampus. Auto-encoded are an approximation of concepts from associate memories.

  • The problem of credit assignment through time is inefficient due to backpropagration. Can be solved with “Sparsely Attentive Backtracking” (Ke 2018, ke2018sparse). Credit Assignment is like when you drive and you hear a loud pop sound. Later you find a flat tire. Then you remember the loud sound - and deduce the tire must have poppsed then.

  • System 1 & 2 thinking (Chen 2019, chen2019deep)
    • System 1: intuitive, fast approach (intuition).

    • System 2: slower, analytical, reason (reasoning).

    • When a human plays chess, tehy encoproate both system 1 and 2 level thinking - this is true intelligence, a combination of intuition and reason.

  • (Bengio 2017, bengio2017consciousness) A new paper is proposed for learning representations of high0level concepts of the kind we manipulate through language.

  • (Chevalier 2018, chevalier2018babyai) Grounded reasoning about language in the real-world. A combination of language models and vision. BabyAI allows humans to interatively train artificial agents to understand language instructions.

2022-07-15 - ECRG

Location: Zoom, Time: Friday 3pm-5pm , Attendees: Shaolin Wang, Bing Xue, Mengjie Zhang, Hui Ma, Bach Hoai Nguyen, Jesse Wood et al.

Mengjie Zhang hosted a Q&A on Genetic Programming and CNNs literature.

General Notes:
  • Shaalin Wang won the best paper award for the GECCO 2022 conference in Bostom massachusetts.

  • Abstract registration for AJCAI 2022 is due 18th July, sumission deadline is 25th of July.

  • GECCO 2023-2024 will continue to run in the dual conference mode - allowing for virual and in-person attendence.

  • I plan to get my paper draft finished for Friday evening (today), hopefully I can get two more revsisions before the deadline on the 25th of July - Monday week.

Notes:
  • GP paradigms:
    • TGP - tree genetic programming.

    • LGP - linear genetic programming.

    • CGP - cartesian genetic programming.

    • \(G^3P\) - grammar guided GP.

  • CNNs (LeCunn 1998, lecun1998gradient)
    • OG: Feature Extraction + Classification network.

    • Convolution operation exploits local connectivity between pixels.

    • Shared Weights Neural Networks (SWNN).

    • Important to use consistent terminilogy in AI research.

2022-07-20 - Deep Learning

Location: Zoom, Time: Wednesday 3pm-4pm , Attendees: Hayden Dyne, Jesse Wood, Bastiaan Kleign, et al.

Hayden Dyne gave a talk about the paper “Guassia Process-based Stochastic Model Predictive Control for Overtaking in Autonomous Racing” (Brudigam 2021, brudigam2021gaussian) https://arxiv.org/pdf/2105.12236.pdf

Notes:
  • A fundamental aspect of racinn is overtaking other race cars. Previous work focusses on the laptime optimization.

  • Prpose a method to plan overtaking procedures in autonomous racing.

  • A Gaussian Process (Ramussen 2004) is used to learn the behaviour of the leading vehicle.

  • Based on the output of the GP, a stochastic Model Predictive Control algorithm plans optmized trajectories such that controlled autonmous race car is able to overtake the leading vehicle.

  • They generate safety constraints, similar to (Brudigam 2018). The safety rectanle ensures the vehicle safely passes the leading vehicle.

  • The GP identifies weaknesses in the driving behaviour of the leading behicle while the controlled vehicle is trying to overtake.

  • Future work - extend the current model to simulate a whole lap and include multiple cars to overtake.

2022-07-20 - IEEE Writing Seminar

Location: Zoom, Time: Wednesday 3pm-4pm , Attendees: Bach Nguyen, Mengjie Zhang, Peter Andreae, Jesse Wood, et al.

Notes:
  • Pondy was one of Bach’s supervisors when he was studying for his PhD. Pondy was my COMP100 level lecturer for many papers.

  • Audience for the workshop is particularly thesis students, Masters/PhD.

  • Audience - If you don’t know who your adueince is, you won’t know hwat to say or how to say it.

  • Assumed knowledge - general knowledge in the field that we assume the audience will know.

  • How much to explain? Write to level other PhD students, not at an undergraduate level.

  • Thesis: real audience is examiners. Intended audience is general experts.

  • What - What should the paer or thesis say? Consider waht can and should not be included.

  • Writing should be scoped to the correct level of detail.

  • Introduction - critical for setting the authors expectations. Many people won’t read a paper at ALL if the introduction is not clear. (Sign post, mental model, prepare).

  • Planning - Top down and bottom up approaches to plaaning writing.
    • Top down - has a skeleton for an essay that is later expanded.

    • Bottom up - is a brain dump that is iteratively refined.

  • Hook - setup a context/framework for the reader to understand the next bit. “Tell ‘em what your gonna say, say it, and tell ‘em what you said” (Goldberg 1999).

  • Grammarly - use grammarly on all academic work, but don’t assume that is is always right. Sometimes grammar rlies on context sensitive information the application does not understand.

  • SPAG - always fix basic spelling errors, otherwise the examiner will think you are just being lazy.

  • Pronouns - use plurals and try to avoid having to use gendered langauge or the singular they, avoid the confusion entirely.

  • I - avoid using I, academics use the passive far too much, it is ahrder to read, hides important information, ambigious. “I was told you were driving 60 km/hr in Kelburn yesterday”.

  • Paper - in a paper we can use “we”, referring to the author(s), but thesis is individual work, so avoid “we”, to make it clear who takes credit for work.

  • Active entity - make the paper an active entity, “this paper shows that”, “section 2.5 showed that”, “the proposed method uses”.

  • Name - you have a system, give it a name. Then we can refer to it by name henceforth. These names can be invented, e.g. “TADPOLE” and “HOPPER”, or as acronyms, e.g. “XGG”, “RTF”.

  • Parallel ideas in parellel form. Use the same sentence structure; don’t restructure to make it interesting.

  • Use belleted lsits frequently if you have a sequence of similar/related statements.

  • Aphorisms:
    • Omit needless words.

    • Write simply - flowery language is hard to read.

    • Activate your sentences; avoid the passive.

    • State important ideas first, reader may skip.

    • Give it a name, easy to describe things with names.

    • Avoid negatives where possible.

2022-07-21 - Weekly

Location: CO355, Time: Friday 12.00pm-12:30pm , Attendees: Bach Nguyen, Mengjie Zhang, Jesse Wood.

Notes:
  • Bing is at WCCI 2022 in Italy so she could not attend the meeting.

  • Bach is finishing up lecturing this week, so he will have a draft revision ready a bit later. Daniel will need to sign off on the paper from PFR side.

  • Focus on my thesis work, need to write my proposal, and the proposal needs prelminary work. I can do some GP on the GC data.

  • The AJCAI deadline have been pushed back. The abstract registraion is 29th of July, the submission deadline is 5th of August.

TODO:
  • [x] Submission deadline - 5th of August.

  • [x] GP on the GC dataset.

  • [x] Start writing proposal.

  • [x] New dataset from Daniel

  • [x] Reading on GP.

2022-07-21 - FASLIP

Location: Zoom, Time: Thursday 2pm-3pm , Attendees: Qi Chen, Jesse Wood, Bing Xue, Bach Hoai Nguyen, et al.

Bing suggested a series of talks from foundational figures in Genetic Progamming.

Bing suggested “Wolfgang Banzhaf - Full Interview” https://www.youtube.com/watch?v=tj5-H6ECxyM

Notes:
  • AI effect - as soon as we solve an issue it is no longer considered intelligent.

  • Active devices force solutions to adapt representations with changing complexities.

  • Real-world application - given the correct constrainted/discretized representation we ca creativly apply randomness to many applications.

  • Evolution takes shortcuts the programmers were not expecting, we have to gradually evolve our fitness functions to get the desired behaviour. Similar to (Lehman 2022).

  • Hardware - Technologies for parallell and distributed computing should allow for harware growth in Evonltionary Computation.

  • Proofs - proofs are very difficult to formulate for real-world problems. Engineers can write tests for safety critical systems. But it is difficult to find proofs for complex evolutionary computations.

  • Exploitations vs. exploration - we want to allow diversity, but we have constraints for a desired.

Bing suggested “Risto Miikkulainen - Full Interview” https://www.youtube.com/watch?v=6H9jzq0Oj0s

Notes:
  • def. Aritifical Intelligence - something more complex than what was programmed emerged.

  • Evolutionary computation - a family of methods imspired from biology. Almost all EC is population based.

  • EML - we can apply evolutionary computation to optimizing the architecture and weights for neural networks.

  • Neural evolution - is a very computationally expensive, but with improving technology we have the comput, and more so they are being employed for this.

  • Domain insights - evolution does not have the same preconcieved notions of what the solution should be. It has the potential to discover new knowledge, exploring areas outside of the domain expertise.

  • EC is similar to DL, as the ideas and principles have been around for a while, but only now are we approaching comput necessary for efficiency.

  • Diffusion - diffusion of innovtation; EC needs to be taught to AIML practitioners and be part of their toolbox for solving problems. Democratizing evolutionary computation.

Bing suggested “Ken Stanley - Full Interview” https://www.youtube.com/watch?v=XWUsl24zYOU

Notes:
  • Accesibility - AI is not really accessible to the non-technical general public. It is a few years off, but one day an everday person should make an AI to automate a mundane task of their choosing.

  • Games - create a game where users can train robots to fight in a robot war. But a game is a wat for non-technical users to AIML practitioners without barriers.

  • Incentives - A picture breeding website that allowed for organic human-in-the-loop exploration of image gneration. We can incentivise humans to join the loop and offer supvervision in areas where AI struggles.

  • Bottlenecks - similar to DL, their was a large fondation of theoretical work, but lacking hardware. THE DL revolution was mostly a hardware advance. The same is true for EC, where populaiton-based algorithms can be optimized with parellelisation.

  • Compute - with compute, it becomes tractable to search high-dimensional spaces.

2022-07-22 - ECRG

Location: Zoom, Time: Friday 3pm-5pm , Attendees: Bing Xue, Mengjie Zhang, Hui Ma, Bach Hoai Nguyen, Jesse Wood et al.

Notes:
  • Bing (my supervisor) got an IEEE early career award. This is a very difficult award to get. She was also given an “outsanding editor” from IEEE transactions on evolutionary computations.

  • Bing is also giving a plenary talk at WCCI 2022 in Italy.

  • This Wednesday we had a writing seminar organized by the IEEE young professionals, with a talk by Pondy, organized by Bach, Fangfang, Menjie (see minutes 2022-07-20 - IEEE Writing Seminar).

  • Progress: writing my paper for AJCAI, deadline was extended to August 5th, so room for more revisions and Plant and Food Research (PFR) sign off.

  • Daniel and Kevin are flying up from Nelson to talk about the domain expertise required for REIMS data.

  • Starting preliminary work on GP for GC data, and writing my proposal for my PhD. Important to clearly state and refine my research intentions.

  • Meng has sent out a good list of reccommended readings on CNNs (Lecun 1998, lecun1998gradient), foundational work from Geoffrey Hinton, a good read for Deep Learning background.

2022-07-27 - Deep Learning

Location: Zoom, Time: Wednesday 3pm-4pm, Attendees: Mawell Clarke, Bastiaan Kleign, Jesse Wood et al.

Maxwell Clark gave a talk on Neural Tangent Kernels (NTKs):

Notes:
  • A Gaussain Process is a stochastic process (a collection of random variables indexed over time or space), such that every finite collection of those variables has a multivariate normal distribution. https://en.wikipedia.org/wiki/Gaussian_process

  • The direection of a learning network becomes fixed, as we increase the width towards infinity. (Or) As the width of the nueral network increases, we see that the distribution of outputs over different random instantiations of the network becomes Gaussian (He 2020, he2020bayesian).

  • In linear algbera, an eigenvector, or characteristic vector of a linear transformation, is a non-zero vector that changes at most by a scalar factor when that linear transformation is applied to it. The corresponding eigen value \(\lambda\) is the factor by which the eigenvector is scaled. https://en.wikipedia.org/wiki/Eigenvalues_and_eigenvectors

  • A Taylor series, named after Brook Taylor who introduced it in 1715, of a function is an infinite sum of terms that are expressed in terms of the functions derivitve at a single point. For most functions, the function and the sum of its Talyor series are equal near this point.

  • Transformers can do Bayesian inference, The propose prior-data fitted networks (PFNs). PFNs leverage large-scale machine learning techniques to approximate a larget set of posteriors (Muller 2021, muller2021transformers).

  • Requires the ability to sample from a prior distribution over supverised learning tasks (or functions).

  • Their method restates the objective prosterior apprimixation as a supervised classification problem with set valued input: it repeatedly draws a task (or function) from the prior, draws a set of data points and their labels from it, marks on of the labels and learns to make probabilistic predictions for it based on the set-valued input of the rest of the data points.

  • PFNs can nearly perfectly mimic Gaussian Processes and also enable efficient Bayesian Inference for intractable problems, with 200-fold speedups in networks evaluated.

  • PFNs perofrm well in GP regression, Bayesian NNs, classification on tabular data, few-shot iamge classification - there applications demonstrate generality of PFNs.

2022-07-28 - Weekly

Location: CO355, Time: Friday 12pm-1pm, Attendees: Bing Xue, Mengjie Zhang, Bach Hoai Nguyen, Jesse Wood.

Notes:
  • PFG have performed preliminary EDA on the REIMS data, classification/regreesion task for Hoki, Mackeral and Hoki-Mackeral mixture.

  • Do not need to go into maths at the moment, instead look at the high level ideas, see existing work in the field, and see where future work is needed.

  • Use GP on the GC data to get some preliminary work done for the PhD proposal. Don’t set the tree depth to deep.

  • PSO is wrapper, but what is it wrapping? It wasn’t clear in my paper draft what classifier the PSO wrapped, it should be made explicit the PSO used an SVM classifier.

  • A table for feature selection with a set k, to show the accuracy with reduced featureset, compared this to full dataset from classification section.

  • Include best (not just average) PSO run in the table.

  • Comare our results to Daniel’s preliminary work, in terms of accuracy and time taken to perform anaylsis.

  • Add “AI Application” as a keyword to the abstract.

TODO
  • [x] GP Tree on GC data.

  • [x] PSO wraps SVM classifier for AJCAI paper.

  • [x] Add table for Feature Selection to AJCAI paper.

  • [x] Include best (not just average) PSO run in the table.

  • [x] Add “AI Application” as a keyword to the abstract.

2022-07-28 - FASLIP

Location: Zoom, Time: Thursday 2pm-3pm, Attendees: Bing Xue, Mengjie Zhang, Bach Hoai Nguyen, Jesse Wood et al.

Bing suggested an interveiw from Darrell Whitley https://www.youtube.com/watch?v=ZJXxaoOBCOA

Notes:
  • Satellite scheduling in space using Evolutionary computation. Detecing space junk to. EC has applications in aerospac (very similar to GECCO 20220-07-10 GTOC Challenge).

  • EC expands capabilites of AI to performtasks beyond jst things humans are good at, i.e. vision and text processing. EC is on the bleeding edge of artificial intelligence and search algorithms.

  • We track up to 13,000 pieces of space junk. We try to ping every object once a week. Trying to pink these objects is an optimization probelm, Important for aerospace as we don’t want ot hit space junk during a rocket launch.

  • General Electric have a jet engine for Boeing that used evolutionary algorithm, EC designs engines that humans never thought of by diversity.

Bing suggested an interview from Malcom Haywood https://www.youtube.com/watch?v=3M3N2o1sGbM

Notes:
  • He did a PhD in neural networks, fuzzy logic, then settled on Evolutionary Computation. You have to try all these fields out, and find the one that suits you.

  • Gradient descent methods offer effective results, but are very constrained, and don’t offer simple/elegent solutions.

  • Evolutionary algorithms offer effective solutions that can be run on a local machine, and don’t require +10,000 GPU hours.

  • Desiging an agent to esccape a swarm of oponent spaceships. the agent fell into a black hole that was a rounding error. This shows the suprising creativity of digital evolution (Lehman 2020, lehman2020evolution).

  • Follow the money, is possibly a way for EC to track into the mainstream. Full-self driving would benefit from simple and interpretable solutions.

  • Hardware is so complex, we as humans don’t have the capacity to write code that utilizes it (Similar to Bjarne Stroustrup in C++ compiler optimization https://youtu.be/uTxRF5ag27A).

Bing suggested an interview from Erik Goodman https://www.youtube.com/watch?v=BkORxgpOc7w

Notes:
  • A center for the study of evolution is funded by the natural science fondation. They have 8/10 years of funding left.

  • How do we define intelligence? Prevsiouly, we thoug off this as the turing test. But now, especially after LaMDA Google AI engineer Blake Lemoine (https://twitter.com/cajundiscordian) claimed their language model was sentient. We can easily mimic intelligent behaviour, but this doesn’t make the machine intelligent.

  • We define intelligent as the ability of an agent to synthesize new knowledge and adapt to novelty.

  • When computer scientists listen to evolutionary biologists, and vice versa, we get interesting results.

  • Nature has massive parellisim embedded, allowing for organisms to compete together to survive to be the firsst for the environment it adapts to.

  • In EC parellisation can correspond to population-based search.

  • Evolutionary deep learning is likely the future.

  • Democratization, small scale DL with EC.

2022-08-03 - Deep Learning

Location: Zoom, Time: Wednesday 3pm-4pm, Attendees: Daniel Braithwaithe, Bastiaan Kleign, Jesse Wood, et al.

Notes:
  • Grad Student Descent (Gencoglu 2019, gencoglu2019hark).

  • Learning strides in convolutional neural networks (Riad 2022, riad2022learning).

  • Convolution theorum corresponds to multiplication in the fourier domain. Because multiplication is more efficient than convolution.

  • A 2013 paper proposes replacing the signal-domain convolution with multiplication in the fourier transform (Mathieu 2013, mathieu2013fast).

  • FFT is a linear unitary transform - gradients in the fourier domain can be compared with the ivnerse fourier transform.

  • It is non-trivial to learn non-linearity (activations) in the fourier domain. We have to take inverse fourier transform (IFT), then apply the non-linearity, and transform back.

  • Learn kernals in the fourier domain, spectral pooling - projects into the fourier tdomain and truncates the singal’s frequency representation.

  • Pooling is an operation that reduces the spatial dimension of the input.

  • Adaptive attention span to learn the cropping size for spectral pooling (Riad 2022, riad2022learning). \(m_z(x)\) is a softclamp function to learn the attention span.

2022-08-04 - Weekly

Location: CO352, Time: Friday 12pm - 1pm, Attendees: Bach Hoai Nguyen, Bing Xue, Mengjie Zhang, Jesse Wood.

Notes:
  • I need to put the table and new graphs on overleaf. Also respond to Bing’s comments.

  • The submission is due on August 5th Friday, Bach wants me to send the new results before midnight tonight.

  • Should still have access to the ECS cloud computing grid. This is a distributed system ideal for EC algorithms that use parellisation.

2022-08-04 - FASLIP

Location:: CO350, Time: Thursday 2pm - 3pm, Attendees: Quinglan Fan, Bing Xue, Mengjie Zhang, Jesse Wood, et al.

Notes:
  • Classify an image into a category, i.e. cat or dog. ImageNet is classic benchmark for this task.

  • Challenges:
    1. High dimensionality

    2. Variations - rotation, scale, hue, intra-class.

    3. Interpretability - NNs are blackbox.

  • CNNs have high classification accuracy, but!
    • requires sample complexity (big data).

    • needs compute (TPUs, GPUs)

    • black-box means low interpretability.

  • Genetic Programming (GP) - is an evolutionary computation technique.

  • They use strongly-typed GP where we define the input-output data tpyes.

  • Motivations for GP:
    • Flexible-lenghth representation

    • Evolving solutions automatically

    • Global search ability

    • interpretability

    • multi-task, multi-objective

  • The most important bottleneck is the representation (see 2022-07-10 - GECCO #2). Sufficient representation was needed to capture complexity to the problem. Garbage in, garbage out.

  • Function set: image related operators, operators that are specialized to work on image data specifically.

  • Global and local features, can be combined with flexible-feature reuse. Proposed method can automatically learn to choose classification algorithms based on the task.

  • Terminal set: achieve flexible feature re-use, and, allow automatic classifier selection.

  • This is benchmarked on 12 standard image classification datasets, and against state-of-the-art deep learning and traditional methods. E.g. CIFAR-10 MNIST.

  • Results are comparable to state-of-the-art deep learning methods.

  • We can examine GP trees, especailly for image classification, by seeing what features are used to make classification decisions, visually. E.g. A facial expression classifier looks at the mouth.

  • We can’t use the results of the experiment to justify its parameter settings. This is tuning to the validation set, and not based on theory/previous work. Instead refer to initial experiments, background literature; expert knolwdge (not) black magic, Cthulhu, Grad Student Descent (Gencoglu 2019, gencoglu2019hark).

2022-08-05 - ECRG

Location: Zoom, Time: Friday 3pm - 5pm, Attendees: Bach Hoai Nguyen, Bing Xue, Mengjie Zhang, Jesse Wood et al.

Notes:
  • Submission deadline is August 5th (today!!!) for AJCAI 2022, this this is likely flexible (as no explicit time was given). Can submit tomorrow with minor fixes.

  • Computer Communication and Artificial Intelligence (CCAI) deadline for this conference is Jan 10th 2023.

  • Bing’s ingagural lectures is Tuesday evening on 9th August.

2022-08-10 - Deep Learning

Location: Zoom, Time: Wednesday 3pm - 4pm, Attendees: Felix, Bastiaan Kleign, Jesse Wood et al.

Notes:
  • Usually too expensive to compute the Jacobian matrix and its inverse. Most models restrict the function class with easy evaluation of the Jacobian determinant, or an efficient estimator thereof.

  • Self normalizing flows replaces expensive tasks in gradient by learned approximate inverses at each layer. This reduces the complexity from \(O(N^3)\) to \(O(N^2)\).

  • What is a Jacobian Matrix? What is a normalizing flow? Inverse Jacobian?

  • This model can perform efficiently and well on high-dimensional image data, for example the MNIST handwriting dataset. A visual analysis of their reconstruction appears to be good.

2022-08-11 - Weekly

Location: CO352, Time: Friday 12pm - 1pm, Attendees: Bing Xue, Mengjie Zhang, Jesse Wood.

Notes:
  • Classification Map cab be extended to automatically determine the boundary on the floating point line using clustering.

  • Can use multiple trees, one for each class (Multi-tree), or one-vs-rest (Committee).

  • Flexibility, there is a lot of options to explore with the GP representation.

  • Parsimony pressure - fitness function includes accuracy and GP tree size. Can choose to only use Parisonomy pressure for selection, and evaluate fitness as only accuracy.

  • VUW has a thesis on program simplification, to eliminate redundancy in GP tree.

  • Start with 4 artihmetic operators, can extend later if needed.

  • Diminishing returns - Simplification may be a very costly research objective, but may have bery little impact on the classification accuracy for the industry application. Industry research should be results driven.

  • AutoML - e.g. TPOT, automated ML is the process of automating the process of applying machine learning to real-world problems.

  • Interpretability is a good objective for industry research.

  • Ricardo Poli has a GP Guide that is available for free online (Riccardo 2009, riccardo2009field). https://ia801902.us.archive.org/3/items/AFieldGuideToGeneticProgramming/AFieldGuideToGeneticProgramming.pdf

2022-08-11 - FASLIP

Location: CO350, Time: Thursday 2pm - 3pm, Attendees: Hayden Anderson, Bing Xue, Jesse Wood et al.

Notes:
  • Explainable AI (XAI) considerations and perspectives from the social sciences.

  • Interpetability - levels of understanding of how an outcome is produced. The focus is on the system not the outcme.

  • Explainability - Explanation of how the outcome is rpodces. The focus is on the outcome not the system.

  • Pyschology - from psychology these are around teh wrong way, in psychology we reverse these definitions.

  • Interpretable: ability of a human to meaning from a given stimulus. OFten high level of imprecision.

  • Context - who is it interpretable to? Is it the target audience of the interpretation? Is their assumed knowledge needed in order to correctly implement the model.

  • Linear model is interpretable - we consider linear models white-box because they are linear…

  • BUt if we have a linear model to our Grandmother, she would consider it nonsense.

  • Interpretability can be applied to GP trees. I.e. protected division, specialized operators, uncommon tree structures.

  • Fuzzy-trace theory is a mental model of how we process incoming stimuli, verbatim to gist scale. We constantly forget.

2022-08-07 - Deep Learning

Location: Zoom, Time: Wednesday 3pm - 4pm, Attendees: Demelza Robinson, Bastiaan Kleign, Jesse Wood et al.

Demelza gave a talk on Gaussain Processes for Regression. GPs can be used for fitting a function to data, we can train GP to predict given training data.

Notes:
  • Demelza gave a talk on Gaussian Processes for Regression. GPs can be used for fitting a function to data, we can train GP to predict given training data.

  • GPs find a condition distribution of the targets given the inputs. Compare our prior beleifgs with the likelihood using Bayes Rule.

  • To make predictions, average over all possible parameter values. Linear model lacks expressiveness, so we can map the probelm to a higher dimensionanal sapce using basis functions.

  • Probabilistc classification where test predictions are in the form of class probabilities.

  • Tricker because we could assume the likelohood function was Guassian, but for classification the likelohood is non-gaussian, but posterior can be approximated.

  • Confusing words: posterior, prior, positive semi-definite, basis functions, kernel trick, reproducing hilbert space, heteroskedastic.

  • heteroskedastic - can observe the same features of an instance, at two seperate times, and belong to different classes at each time.

2022-08-18 - FALSIP

Location: CO350, Time: Thursday 2pm - 3pm, Attendees: Kaan Demir, Bach Hoai Nguyen, Jesse Wood et al.

Kaan gave a presentation on feature selection for mult0lable classification.

Notes:
  • Predicting a set of labels that correspond to an instance, Different to binary/multi-class classification, as an instance can correspond to multiple classes simulateniously.

  • Problem tranformation - can be used to transform the multi-label problem into a simpler representation.

  • Binary - break multi-label into a set of binary classification problems, run classifier, then concatenate labels together to get the final prediction.

  • For example, the SVM model uses a one-vs-rest approach for multi-class, it breaks the problem into a series of binary classification problems, same solution proposed for (Kerber 1992, kerber1992chimerge).

  • The issue with binary classification approach is that the classifiers are independent, and share no information about the problem.

  • Classifier chains, label a class, and append that label as a feature to the next classifier.

  • Issue with classifier chains, there is no clear order to chain the classifiers in.

  • Label powerset, convert multi-label into multi-class problem.

  • Issues - Very sparse and comibinatorial explosion for many label problems.

  • Multi-label KNN (ML-KNN) - an extremely expensive version of KNN that is suited to multi-label datasets natively. KNN scales by number of instances, with online learning, that requires the whole training set in memory. The complexity scales by number of lables and instances.

  • Sparsity-based classification, diffentiable and works for optmizing with neural networks.

  • Feature selection - reduce the number of features; they can be irrelevant, redundant, complimentary.

  • Search - search methods can be sequential, guided stochasity (EC), balanced with a quality-metric (classification accuracy) and feature interactions.

  • FS methods, MRMR, reliefF, classifier, ML-KNN, Sparsity-based models.

  • Sparsity based models: \(\min{z} ||XZ - Y||_{2,1} + \Lambda ||z||_{2,1}\). Where \(||z||_{1,2}\) is the sum each column norm, and, \(||z||_{2,1}\) is the sum each row norm.

  • The \(\Lambda\) provides a constraint to sparesely cluster the feature into a lower dimensional representation. E.g. Globe \(\to\) Plane, the geometric locality (distance) is lost in this transformation.

  • Use a kernal to capture the non-linear interactions between features, conserve the distance in the manifold, A gaussian affinity matrix.

  • Laplacian eigenmap, the overall “connectesness” of each feature toward all other features. Graph regularization - embed laplacian eigenmap into a linear model.

  • Use evolutionary computation, and co-operative co-volution. We require EC to search this comibinaratorially explosive search space for multi-label feature selection.

  • Caveat: the choice of kernel, i.e. Radial Bias Function (rbf), Gaussian, Linear, determines the non-linear patterns the Laplacian eigenmap can capture.

  • Similar to SVM, a kernel that captures the complexity of the data accurately must be chosen, but in this case it is difficult to evaluate the kernel quality, difficult to find an appropriate metrix.

  • Gist: A linear model for mult0label classification taht considers the non-linear interactions, preserving the manifold without a graph regularization term.

2022-08-19 - Weekly

Location: C0364, Time: Friday 11am - 12pm, Attendees: Bach Hoai Nguyen, Jesse Wood.

Notes:
  • A one tree approach is very primitive Genetic Program for multi-class classification.

  • Multi-tree GP is better. Each tree seperates one class from the rest. There is one tree for each of the classes.

  • E.g. one tree would seperate blue cod from all the other class. The next tree would seperate snapper from the other classes.

  • Class dependent crossover, ensure crossover is only happening within the individual trees, not across the multi-tree, i.e. crossover between snapper and bluecod trees is not allowed.

  • Mutation can happen to any tree, not class dependent.

  • Class depedent Feature Construction; Genetic Programming for Multi-feature Construction on High-Dimensional Classification (Tran 2019, tran2019genetic).

  • Multi-feature Construction (MFC), Feature Selection (FS), Genetic Programming (GP).

  • Mid-trimester break: for next two weeks, meeting time will be Thursday 12pm (the old time).

TODO:
  • [x] Implement Multi-tree GP in Python.

  • [x] Prepare for Kevin+Daniel visit next week.

  • [X] Read (Tran 2019, tran2019genetic).

2022-08-19 - ECRG

Location: Zoom, Time: Friday 3pm - 5pm, Attendees: Peng Wang, Yi Mei, Bach Hoai Nguyen, Jesse Wood et al.

Notes:
  • Marine Conference - Bing, Meng and Fangfang are at the Marine Conference in Nelson. So Yi had to chair the ECRG meeting this week.

  • IEEE Symposium - Register for the IEEE Symposium, presenting at this event provides free membership to the IEEE organisation. New students SHOULD register and attend. Most IEEE conferences require IEEE membership for attendence.

  • EvoStar - a major conference that we are targeting. The deadline is September it is important to start preparations for work we with to submit for this conference.

  • Progress:
    • A single GP tree for multi-class classification,

    • will extend this to be multi-tree GP,

    • using one-vs-rest approach,

    • also prepare for FASLIP presentation next week,

    • content from this can be revised for IEEE Symposium.

Peng Wang gave a presentation “Differential Evolution-Based Feature Selection: A Niching-based GP Approach”. Peng is a third-year PhD candidate.

Notes:
  • Motivations - curse of dimensionality, redundancy in features, improve compute, and perhaps performance.

  • Goals - multi-objective (MO) proivdes a pareto optimal set of candidate solutions that are non-dominated and balance desired objectives; i.e. accuracy and size.

  • Filter, Wrapper, Embedded are the three approaches to feature selection (FS) and feature construction (FC).

  • Embedded - feature selection is embedded in the learning process.

  • Wrapper - feature selection method servers as a wrapper for an existing prediction algorithm (i.e. classifier).

  • Filter - a ftiler method is featuress are evaluated absed on general characteristics, a classifier model is not used.

  • Aim - get better feature subsets with less redundnacy. Feature subsets that maintain classification algorithms.

  • Tradeoff - there is a tradeoff between accuracy and feature subset size.

  • Differential Evolution, is a method that optimizes a problem iteratively trying to improve a candidate solution with respect to a given measure of quality. https://en.wikipedia.org/wiki/Differential_evolution

  • Real - for DE the genotype is some form of real-valued vector.

  • Operators - the mutation/crossover operators make use of the difference between two or more vectors in the population to create a new vector.

  • Differentiable - DE does not require the optimization problem to be differentiable, it does not rely on gradient descent as many deep learning methods do. This allows for more difficult tasks without domain expertise / differentiable mathematics.

  • Genotype - a genotype is the genetic makeup of an individual (in biological terms), genotypes are encoded as strings or trees of values, it is the representation, the encoding.

  • Phenotype - a phenotype is the expression of a trait, this is the decoded genotype, we evaluate a candidate individual to get its predictive output is the solution space, evaluated in the environment.

  • Example. The gene for ginger hair is recessive and stored in an individuals DNA - this is the genotype. The hair colour of the indiviudal, are they ginger. is the expression of that trait - phenotype.

  • For evolutionary computation, an example is, Genotype - a GP tree, Phenotype - the prediction/output.

TODO:
  • [x] 2022-08-24 Register for the IEEE Symposium

  • [x] 2022-08-25 FASLIP Presentation

  • [x] 2022-08-31 Prepare IEEE Symposium presentation

  • [-] 2022-09-30 Plan a EvoStar 2023 paper.

2022-08-23 - PFR Daniel and Kevin

Location: CO345, Time: Tuesday 11am - 3pm, Attendees: Daniel Killeen, Kevin Mitchell, Mengjiie Zhang, Bing Xue, Bach Hoai Nguyen, Jesse Wood et al.

We had a meeting in C0345 with Daniel and Kevin to discuss the domain expertise required for REIMS data.

Notes:
  • Kevin is from Callaghan, comes from a chemistry background, with university level knowledge in statistics.

  • Demelza,

  • GC-MS data isn’t something PFR want to pursue, ultimately the technique is slow, they are looking for fast techniques.

  • Factory needs fast techniques,

  • REIMS, take a sample anything, e.g., fish, shoot it with a lazer, it creates ions, and then suck those ions up.

  • REIMS data, is similar to a chemical fingerprint, each peak corresponds to

  • Best characterized fish in the history of the planet.

  • Quality Control (QC) - or check samples, these are all identical, if the technique was working properly they should be tightly clustered, due to measurement noise they are not. The QC samples are a 50-50 mixture of the hoki and mackeral, they aim to be an average of the two fish.

  • If a sample varies by more than 30% RSD, we take it out, because the variables are not behaving.

  • Work with preprocessed data, get good results, then return to the raw data, and try to automate the entire process.

  • In chemistry data, sometimes PC-1 picks up noise in the dataset, then later PC-2 picks up the signal.

  • Lipidomics - every tissue has a very specific lipid profile, we plan to use the lipid profiles to distinguish tisssue samples.

  • Kevin is currently processing oils, this is a slow and manual process, but will be high-resolution, another set of mass spectrometry data.

  • Research questions:
    1. Can REIMS data be used to classify different hoki tissues? What variables are responsible?

    2. Can REIMS data detect mineral oil contamination in hoki? At what concentration? What variables are responsible?

    3. Can REIMS data be used to distinguish between different hoki individuals? What variables are responsible?

    4. Can REIMS data detect hoki contmination in jack mackerel? At what concentration? What variables are responsible?

    5. Can REIMS data be used to classify different jack mackerel tissues? What variables are responsible?

    6. Can REIMS data detect mineral oil contamination in jack mackerel? At what concentration? What variables are responsible?

    7. Can REIMS data be used to distinguish between different batches of jack mackerel? What variables are responsible?

  • Good to relate results from a machine learning model, for example Principal Component Analysis, and try to explain the results using chemistry knowledge.

  • Prinipal Component Analysis (PCA) is an un-supervised learning technique, that projects the features into a lower dimensional space, and then tries to find the best linear combination of features that maximises the variance.

  • We expect supervised-learning methods, that have access to the class labels, will perform better than unsupervised learning methods.

  • There is a preference towards intrepretable models, because we can try to explain these models using domain knowledge in chemistry.

  • We have Hoki contaminated with mineral oil (MO), often oil from boats gets into the fish, and contaminates the sample.

  • There are two types of data, spectroscopy and spectrometry, they sound similar but are very different.
    • spectroscopy - Spectroscopy is the study of the absorption and emission of light and other radiation by matter. It involves the splitting of light (or more precisely electromagnetic radiation) into its constituent wavelengths (a spectrum), which is done in much the same way as a prism splits light into a rainbow of colours.

    • spectrometry - Spectrometry is the measurement of the interactions between light and matter, and the reactions and measurements of radiation intensity and wavelength. In other words, spectrometry is a method of studying and measuring a specific spectrum, and it’s widely used for the spectroscopic analysis of sample materials.

  • Hoki + Mackeral
    • REIMS (AgResearch) + MS data (Callaghan)

  • Vibrational Spectroscopy
    • Reference data, qualatative data that takes months to generate.

    • Raw vibrational spectroscopy (Otago)

  • Seasonal sample (Dec 2023): Hoki, Mackeral, Green Shell Mussell (GSM)
    • REIMS + MS data

    • Vibrational Spectroscopy (Otago)

    • Same data, on different datasets.

  • Genetic Programming, can visualize the results and model, due to interpretable nature of GP trees.

  • We are working on the front-end of the factory, we get raw data fast, and process it fast.

  • No time-domain because the REIMS tool is instant.

  • No instrumental drift, the technique will get the same measurements for the same QC sample, even if years apart. Only day-to-day drift.

  • Mass spectrometry, m/z is the mass to charge ratio, most ions will be single charge, REIMS uses time-of-flight MS. No time variable is needed, because x-axis is mass to charge ratio.

  • Time-of-Flight (TOF) is a mass analyser that utilises an electric field to accelerate generated ions through the same electrical potential, and then measures the time each ion takes to reach the detector.

  • Heavier ions take longer to travel through the acceration shaft, and therefore have a longer time-of-flight. Lighter ions take shorter time to travel through the acceleration shaft, and therefore have a shorter time-of-flight.

  • Good publishing strategy, it is possible to publish two papers on a datasets: chemistry side (dataset) and machine learning side (model).

  • Journal paper, 70% of the paper must be new, 30% can be from previous work, such as conference papers. From journal to conference, from a machine learning perspective, the work is on improving the algorithm.

  • FGR policy to force students to be first author on their papers, we can copy work from papers straight to the thesis, if we are first author (and did the work).

  • Sue just wants the box at the start of the factory that tells us what we need, we can worry about the publishing papers.

  • Orthogonal Projections to Latent Structures Discriminant Analysis (OPLS-DA) are powerful statistical modeling tools that provide insights into separations between experimental groups based on high-dimensional spectral measurements from NMR, MS or other analytical instrumentation.

  • OPLS-DA is a supervised approach, that will pull out characteristics that can linearly discriminate between variables, and is popular in chemistry field already.

  • Main benefit of REIMS, REIMS was able to detect horse meat contaminaiton in beef, at as little as 3% contamination. We want to find contamination in fish, such as mineral oil, oxidation, pre-fatty acids, mixed-species contamination.

  • Setup weekly meetings with Daniel, ask questions, track progress, ensure we are on track and fulfilling research objectives from PFR.

Afterwards, we had lunch, and informally touched on topics discussed in meeting.

Notes:
  • PFR publish their datasets in chemistry journals, as open data available to the public.

  • It is normal for PhDs to encounter a motivational crisis during their study.

  • Possible to get stuck on the post-doc track, Daniel knows +50 year old post-docs.

  • Meng will support work towards my pursuing post-doctral work after my PhD.

  • Fish is relatively low-impact in bio-chemistry field, when compared to human healthcare research.

  • Can push the environment/ecology side of research to press importance of the work.

  • Easier to publish in high volume from machine learning side, rather than chemistry side.

  • Kevin is able to arrange time for me to visit Callaghan, and see how the REIMS data is prepared in-person.

  • Good to get hands on experience with the REIMS data, to get a hollistic view of the project scope.

2022-08-24 - FALSIP

Location: CO350, Time: Thursday 14:00 - 15:00, Attendees: Jesse Wood, Bach Nguyen, Bing Xue, Mengjie Zhang et al.

Jesse Wood (me) gave a talk on “Automated Fish Classification” this week. See resources folder for a copy of my presentation slides.

2022-08-25 - Weekly

Location: CO352, Time: Thursday 12:00 - 13:00, Attendees: Bach Nguyen, Bing Xue, Jesse Wood.

Notes:
  • They need volunteers for the university open day for the new AI course, offered to volunteer from 12:00 - 15:00. It is hosted in AM102 - AM104.

  • Fitness function for Multi-tree GP should be one objective function, the classification accuracy.

  • Remove the sogmoid function, just have classification as >0 is positive, otherwise negative.

  • Check protected divisiojn operator, the expressions should not be retruning inf or -inf.

TODO:
  • [x] Single objective function as classification accuracy.

  • [x] Remove sigmoid activation function.

  • [x] Prepare IEEE Symposium presentation.

  • [-] (removed) Check protected division operator.

20221-08-26 - ECRG

Location: Zoom, Time: Friday 15:00 - 17:00, Attendees: Gonglin Yuan, Bach Nguyen, Bing Xue, Mengjie Zhang, Jesse Wood et at.

Gonglin Yuan gave talk on “Evolutionary Perofrmance Predictor Geneitc Algorithms (EPPGA)”.

Notes:
  • Evolutionary Perofrmance Predictor Geneitc Algorithms (EPPGA).

  • EPPGA is a method for neural architecture search.

  • Convolutional Neural Networks (CNN) are an efficient method for processing image data. Examples of CNN on ImageNExt dataset include ResNet (Simonyan 2014), VGG-16 (He 2018), AlexNet (Krizhevsky 2012).

  • Limitations of CNN - domain experts in deep learning required to find a good network archtecture, this is trail and error, rule fo thumb, monkeys on typewriters, black magic, hyperparameter tuning is not often done by theory / informed design.

  • Neural Architecture Search (NAS), using global search algorithms, i.e. Evolutionary Computation, to find optimal hyperparamters automatically.

  • Current methods to improve performance of NAS:
    1. Shallow training

    2. Paramter sharing

    3. Performance predictors

  • Evolutionary Neural Architecture Search (ENAS).

  • Efficient performance predictor based genetic algorithm (EPPGA) is better than (NPPGA) where N stands for not.

  • The performance predictor models if offspring can outperform its parents. The fitness (can) still be evaluated independently in the selection stage, and the eprformance predictor only for crossover operations. So the performance predictor can be an appromixation, it does not need to be exact.

2022-08-24 - IEEE Symposium

Location: CO350, Time: Wednesday 14:00 - 15:00, Attendees: Jesse Wood, Bach Nguyen, Bing Xue, Mengjie Zhang et al.

Jesse Wood (me) gave a talk on “Automated Fish Classification” this week.

Notes:
  • This was a 10 minute version of my 2022-08-24 - FALSIP talk.

  • See resources folder for a copy of my presentation slides.

2022-09-01 - Weekly

Location: CO364, Time: Thursday 12:00 - 12:30, Attendees: Bach Nguyen, Jesse Wood.

Note: Bing and Meng are at a conference in Auckland.

Notes:
  • Most work was preparing for the IEEE Symposium this week. I gave my “Autoamted Fish Classification” talk from FASLIP (last week).

  • After revising my notes from 2022-08-23 - PFR Daniel and Kevin, I have a much clearer idea for what my PhD proposal will be.

  • Mutation + Crossover = 100%, even though Deap is seqeuntial genetic program, we set these parameters as such to avoid questions at conferences.

  • Fitness function of argmax(map >0) is biased towards the first class, and does not penalize multi-label classifications. Even though a multi-label classfication for multi-class classification is wrong.

  • Add the hamming distance, the magnitude of the difference of two binary vectors \(|v_1 - v_2|\), as a distance-based regularization term to penalize multi-label classifications.

  • Think about research objectives for the REIMS data, start wrtigin them in the proposal, get heaps of ideas down on paper, then sift through them to find the diamonds in the rough.

  • Error of GP-tree is no longer the balanced classification accuracy, we must measure balanced accuracy and error, as two independent statistics for the console/logging output.

TODO:
  • [x] Write up research objectives for REIMS data.

  • [x] Hamming distance regularization term.

  • [x] Log error and balanced accuracy.

  • [x] Crossover + Mutation = 100%.

2022-09-01 - FASLIP

Location: CO350, Time: Thursday 14:00 - 15:00, Attendees: Dylon Zeng, Bach Nguyen, Jesse Wood et al.

Dylon Zeng gave a talk on Multi-Object Tracking for Mussell Farms.

Notes:
  • Images are taken from a moving boat using a go-pro.

  • Variation between different mussell frams, weather conditions, float colour, camera angle …

  • YOLO v7 deep learning model does well for the object detection, but requires tuning for each Mussell Farm - we want a generalized solution that works for all mussell farms.

  • Laplacian of Gaussian (LoG).

  • Pre-processing:
    1. Gray-world equilization.

    2. Homomorphic filtering.

    3. Perspective warping.

  • Q-Identifier, true and relative bearing between floats.

  • Future work:
    • Semi-supervised

    • Mechanical turk for annotated dataset.

    • Rough labels.

2022-09-02 - ECRG

Location: CO350, Time: Friday 15:00 - 17:00, Attendees: Jablin Lin, Bing Xue, Bach Nguyen, Yi Mei, Jesse Wood et al.

General notes:
  • IEEE Symposium: Jordan won best presentation, Christrian won runner up.

Jablin Lin gave a talk on “Evolutionary Transfer Learning for Feature Selection in Classification”.

Notes:
  • Real-world example of transfer learning is riding a bicycle and motorcycle.

  • We can transfer knowledge across domains to improve performance.

  • Evolutionary Sequential Transfer Learning, Multi-task, Multi-form optmization.

  • Feautre selection adn classificationt asks are often related, and share common knowledge - we wish to transfer knowledge (i.e. similar solutions) across related tasks.

  • Feature selection is a multi-objective problem that balances the numbers of features selected and the classification accuracy.

  • Mixture model - A mixture distribution is made up of several component distributions \(P(x) = \sum_i P(c=i)P(x|c=i)\) (Goodfellow 2016, goodfellow2016deep).

2022-09-07 - Deep Learning

Location: Zoom, Time: Wednesday 15:00 - 16:00, Attendees: John (JP) Lewis, Bastiaan Kleign, Jesse Wood et al.

JP gave a presentation on “Diffusion”.

Notes:
  • What regularized auto-encoders learn from the data generating distribution. Summarizes a result from (Vincent 2011).

  • Tweedie’s formula (1954).

  • Minimizing the errored squared difference between \(s(y)\) and the score of \(p(y|x)\) is equivalent to perfroming score <unlegible> with teh estimator \(s(y)\).

  • Imagen, Dalle 2, Stable Diffusion are all diffusion based text-to-image generating models.

  • (Ho 2020) Course-to-fine interpolation that vary the number of diffusion steps prior to latent mixing. The losiness of the model can be controlled.

  • Bastiaan asked if Diffusion will replace a GAN, or are they another fad like Normalizing flows, the foward-backward step process is very computationally expensive.

  • With 0 diffusion steps, we get mixed sources images in the pixel space (blend), with 1,000 steps, source information is lost and get novel images.

2022-09-08 - FASLIP

Location: CO350, Time: Thursday 14:00 - 15:00, Attendees: Jinyan Li, Bing Xue, Mengjie Zhang, Yi Mei, Jesse et al.

Jinyan Li gave a talk on “Compression for DNA/RNA Sequencing”.

Notes:
  • Reduces complexity of the optimization problem by breaking down the problem space into smaller blocks.

  • Apply clustering so each block has a similar time-complexity, better edges, so comrpession performance can be improved.

  • A combination of minimizers/maximizers for compressing a sequence of characters - a DNA sequence.

  • The problem is DNA/RNA sequencing, the human genome contains 3 billion base pairs, and around 30,000 geneges.

  • File compression of DNA/RNA sequencing files, to allow for efficient distribution across the internet.

  • The hardware used for the experiments is a distributed computer cluster, if the method allows.

  • “Cell trajectory inference from Single-cell RNA Sequencing”

  • Process:
    1. t-SNE

    2. Density-based clustering

    3. KDE/MCD

    4. Trajectory formation

    5. Psuedo-time assignment

    6. Determine vague cells

  • Bifraction tpology of the cell trajectory. A difficult pattern to determine.

2022-09-09 - Weekly

Location: CO352, Time: Friday 11:30 - 12:00, Attendees: Bach Nguyen, Bing Xue, Mengjie Zhang, Jesse

Notes:
  • Go back to the original Thursday meeting time after next week.

  • Multi-tree GP for multi-class classifcation is a very difficult problem, hence my low classification accuracy so far, more work needed to compete with SVM performance.

  • Instead use mutli-tree GP for feature constructions, the algorithm is great at exploration because of its stochastic nature, not so good at classification, as it has to learn this from scratch.

  • We can wrap the multi-tree feature construction in the SVM classifier, this way we play to the strengths of EC and SVM, respectively.

  • Report - write up the experiments in the form of a report, with parameter settings, psuedo-code, and results, so supervisors can discuss possible future directions (also good for a possible publication - EvoStar deadline is looming).

  • I forgot to tell my supervisors of my progress towards forming my research objectives for my proposal! I should follow up with an email, or address this next meeting, I wrote an agenda postick note to remind myself.

TODO:
  • [x] Wrapper-based Multi-tree GP for Multi-class Classification.

  • [x] Report describing implementation above.

  • [x] Discuss Research Objectives with supvervisors.

2022-09-09 - ECRG

Location: CO350, Time: 15:00 - 17:00, Attendees: Junhao Huang, Yi Mei, Bach Nguyen, Jesse Wood et al.

General Notes:
  • The AJCAI paper outcomes - accepted/rejected - are to be announced on Monday 12th of September. I have submitted a paper (my first) - fingers crosssed!

  • Progress report:
    1. Write research objectives for proposal

    2. Implemented Mutli-tree GP for mutli-class classification.

    3. Future work: Wrapper-based feature construction with Multi-tree GP. * (Note: [Yi] good to mention technical details in progress report)

  • [Yi] “When you have recieve a review, don’t rush into revisions, make a review letter template, adderss all comments with personal knowledge, then discuss with supervisors, once you are all on the same page, then begin revisions. This can save a lot of time when handling journal revisions.”

Junhao Huang gave a talk on “PSO for Compact NAS for Image Classification”.

Notes:
  • Neural Architecture Search (NAS)

  • Search strategies include:
    • Reinforcement Learning (RL)

    • Gradient Descent (GD)

    • Evolutionary Computation (EC)

  • NAS is a non-convex, non-differentiable hyper-parameter optimization problem.

  • Performance evaluation for NAS is a incredibly expensive, networks have to be trained from scratch for each individual every generation.

  • MobileNet - an efficient and compressed CNN architecture that reduces the compelxity of the NAS.

  • Contributions:
    • An alternate MBConv Block.

    • Integrated acceleration scheme.

    • Reduce time and model complexity for NAS.

  • Modified MBConv Block:
    1. Remove <unlegible> feature addition - replace with channel concatenation.

    2. Replace squeeze-and-excitation with channel concatenation.

  • Representation is similar to classification map, where a floating point number line is used to select the type of node for the network, using a float-string representation for an individual.

  • Optimizations:
    1. Dynamic early stopping

    2. Input downsampling

    3. Architecture downsampling

  • Genotype - [sp-conf, sp-connections], the configuration and connections for the network architecture, respectively.

  • Experiments:
    • Datasets: CIFAR-10, CIFAR-100, ImageNet

    • Compuation cost \(\approx\) 0.07% of original cost.

    • Large-scale architecture transfer - CIFAR-10 \(\to\) CIFAR-100.

  • Alternate approach, pre-train large-scale model on CIFAR-100, freeze weights, re-train final 2 layers for CIFAR-10.

  • Benefits: method required 1 GPU day, can be run on commidity hardware, performs well, doesn’t need mega compute, or complex architecture.

2022-09-14 - Deep Learning

Location: Zoom, Time: Wednesday 14:00 - 15:00, Attendees: Bastian Kelign, Maxwell, Hayden, Daniel et al.

Bastiaan gave a talk on “Physics Informed Deep Learning” (Raissi 2019, raissi2019physics) https://maziarraissi.github.io/PINNs/

Notes:
  • Neural networks that are trained to solve supervised learning tasks while respecting any given law of physics.

  • Resulting networks form a new class of data efficient universal ufnction apprimators that natrually encode any underlying physics law as prior information.

  • They propose continious and discrete time models.
    • Continous:
      1. Burgers Equation

      2. Schrödingers Equation

    • Discrete:
      1. Allan Cahn

      2. Navier-Stokes

      3. Korteweg-de Vries

  • You need data complexity - lots of data points - to approximate an equation.

  • “The number of collocation points needed to globally enforce a physics informed constraint […] will increase exponentially.”

  • A more structured neural network representation leverages the classical Runge-Kutta time stepping schemes.

  • Runga-Kutta https://en.wikipedia.org/wiki/Runge%E2%80%93Kutta_methods

  • We can employ the gradient of inputs in more situations, e.g. normalizaing flows, feature visualization, etc…

2022-09-16 - Weekly

Location: CO352, Time: Friday 11:00 - 12:00, Attendees: Bach Nguyen, Bing Xue, Mengjie Zhang, Jesse

Notes:
  • Increase the populzation size to at least 1,000 (\(\beta\) #features in Tran 2019).

  • Change “half and half” \(\to\) “Ramped half and half”.

  • Perform feature selection first, using MRMR, to reduce terminal set to top ranked features.

  • Winner-takes-all, multi-class classification method where largest output is the prediction. A class dependent representation is made.

  • Single-Tree GP four multi-class classification is possible using a vector representation of shape (n_classes,).

  • Experiments, there are two possible avenues to go down:
    1. MCIFC

    2. Winner-takes-all

  • Work towards writing a paper for EvoStar 2023 - a conference held in Brno, Czech Republic - whose submission deadline in November 1st.

TODO:
  • [x] Write up experiments for EvoStar 2023.

  • [x] Increase populaiton size.

  • [x] Change “half and half” \(\to\) “Ramped half and half”.

  • [x] Winner-takes-all Multi-tree GP.

  • [ ] Vector-based approach? (A third possible GP method to compare)

2022-09-16 - ECRG

Location: CO350, Time: 15:00 - 17:00, Attendees: Jordan MacLachlan, Yi Mei, Bach Nguyen, Jesse Wood et al.

General notes:
  • The government has removed all the restrictions of the traffic light system that were put in place to prevent COVID-19 transmission.

  • No need to wear face masks after Sunday at the university.

  • The EvsoStar conference deadline is in November, ECRG are heavily involved in the conference.

  • The AJCAI 2022 notification is at the end of the week (today!)

  • Progress:
    1. Multi-tree GP wrapper-based FC.

    2. Better accuracy than other methods.

    3. Increasing population size helped a lot.

    4. Aim for EvoStar 2023 paper for November.

Jordan MacLachlan gave a presentation in CO350 on “GP for Emergency Medical Dispatch”.

Notes:
  • EMD is a very underfunded and understaffed, one dispatcher had to serve 50,000 km:math:`^2`area of the lower North Island.

  • There isn’t an existing representation for modeeling the task, Jordan has to devise a dynamic graph-based representation.

  • The fitness function is to minimize the average weighted response time.

  • Existing methods:
    1. Probabilistic models,

    2. Expert dispatch,

    3. Hard-rules (human designed)

  • Graph-based representations are NP-hard, similar to UCARP and DJSS, but unique constraints for the EMD domain.

  • Approaches:
    1. Greedy

    2. Global

    3. Proactive

  • The dataset does not exist that efficiently captures the EMD problem, very time-consuming, no public datasets were available.

  • The learned rules that employed the global methods showed significant improvement over greedy/local methods.

  • Potential to reelase the simulation software as open-srouce, however the IP may be protected for potential business opportunities later.

  • Dataset may contain private information - federated and differential learning approaches may allow the use of datasets with private/personal information from many EMD dispatchers, without revealing sensitive information.

  • The fitness function is the weighted average response time, this acoutns for the urgency of the emergecy, if non-urgent they can wait! If ugrent, time is of the essence.

  • Urgency has a 30% chance of randomly changing (according to Wellington Free Ambulance). E.g. an urgent call reporting an animal attack on a child, may be a hamster bite that drew no blood on a childs finger.

2022-09-22 - Weekly

Location: CO352, Time: Friday 11:00 - 12:00, Attendees: Bach Nguyen, Bing Xue, Mengjie Zhang, Jesse

Notes:
  • Need to provide a travel plan for the conference. The conference goes from 5th - 7th of December. The university can pay for accomodation, travel and insurance (for the conference duration). Any extra days will have to covered at my expense.

  • The GP is overfitting to the training set - because we are using the entire training set. Instead (especially for smaller datasets) we should employ 10-fold cross validation, to prevent the model from overfitting.

  • For AJCAI 2022, a poser is still a paper, my paper will be posted in the conference proceedings. This is officially my first paper that has been accepted into an academic journal.

  • To prepare for AJCAI 2022, I need a camera-ready paper and poster for the October 5th registration deadline. Also I need to organize my travel plans to Perth.

  • Bach showed me an exemplar poster from his previous work. A poster should provide a high-level summary of the most important ideas from the paper. I can split it into 4 individual A4 sheets of paper, and then tape them together. (Alternative: get an A6 poster printed at university canvas shop).

TODO:
  • [x] Winner-takes-all Multi-tree GP.

  • [x] AJCAI Response Letter (DUE: 2022-10-05)

  • [x] Camera-ready paper for AJCAI (DUE: 2022-10-05).

  • [x] Poster for AJCAI (DUE: 2022-10-05).

  • [x] Organize travel plans for AJCAI 2022 (ASAP).

  • [x] Cross-validation fitness evaluation for MCIFC.

2022-09-22 - FASLIP

Location: CO350, Time: Thursday 14:00 - 15:00, Attendees: Faizal Hafiz, Bing Xue, Mengjie Zhang, Bach Nguyen, Jesse Wood et al.

Faizal Hafiz gave a talk on “Data Driven Modelling: Fundamental Challenges & Some Proposed Solutions” https://www.researchgate.net/profile/Faizal-Hafiz

Notes:
  • SKEMA is a Business School in France.

  • Pattern recognition (PR) and System Identification (SI), are bosth sub-classes of data modelling.

  • Feature selection needs maximum relevance + minimum redundancy (Zhao 2019, Ding 2005).

  • Exploring feature subsets in high-dimensional data has a combinatorial explosion.

  • Non-linear Autoregressove with eXogenous (NARX).

  • In SI, we want a mathematicl expression which maps inputs \(\to\) outputs, no black-box models.

  • Attribute selection - identify a sparese set of significant attributes \(X^* \subset X_{model}\).

  • Proposed solution: Two-dimensional swarms (2DS).

  • Orthogonal forward resgression with error reduction ratio (OFR-ERR).

  • OFR-ERR is one of the most successful algorithms when operating in orthoganal search spaces.

  • Limitations: it is known as the nesting effect, the algorithm cannot correct previous mistakes.

  • Floating search principle, a method from PR community, can fix the nesting effect, and correct past mistakes.

  • His work combines ideas from PR and SI, by generalizing the problems from both, to get a meta-heuristic that solves both.

2022-09-23 - ECRG

Location: CO350, Time: Friday 15:00 - 17:00, Attendees: Carlos A. Coello Coello, Bing Xue, Mengjie Zhang, Bach Nguyen, et al.

General Notes:
  • ECRG will be held in person frm now on, we have permission from the school for this, given the COVID-1 restrictions have been lifted.

  • Carlos is an IEEE fellow, distinguished lectuerer, Editor-and-cheif for Transactions on Evolutionary Computation, previous Vice-President for CIS, and a big name in EMO with over +63,000 citations.

  • My intro [shorten/sharen this]: “Hi, I’m Jesse, using EC to explore fish oil chemistry dataset with high-dimesionality low sample size - used PSO, GP for multi-class classification. EC is useful to explore combinatorially expliosive problem spaces, such as possible feature subsets”.

  • Bach’s intro “using Evolutionary Machine Learning (EML)for feature selection, classification and transfer learning. A lecturer at the university”.

  • “Ying has a ‘boy’ in China” - Meng 2022.

Carlos gave a presnation on “Lessons learned after 25 years designing Multi-objective Evolutionary Algorithms” https://en.wikipedia.org/wiki/Carlos_A._Coello_Coello

Notes:
  • Carlos’s talk covered lessons that he had learnt through analogy from his research career.

  • We don’t publish our failures, people will often fall into the same traps that others have done previously, but not been able to publish.

  • As a PhD student, Carlon wanted ot be unique, he had his on ideas, and didn’t want to follow the herd.

  • He settled on the topic of “An Empircal Study of Evolutionary Techniques for Multi-objective Optimization in Engineering Design” (1996).

  • Plymouth Engineering Design Centre (UK) as a Research Fellow - interest in constraint-handling - still did EMO but tranfered knowledge into a new domain.

  • The largest research center in Mexico, he joined the Cimputer Science section, a research centre that can grant degrees but has no undergraduates.

  • Passion for research is one of the most important requirements to join EVOCINV.

  • Discipline is a key element (deadlines were non-negotiable).

  • Smart students are challenged and pushed to their limits, impoosible problems were used as bait.

  • Young people do ‘research by analogy’. they write the paper before they understand the problem. They throw a bnch of algorithms at a problem, find one that performns best with statistical significance, then publish in a journal. Similar to gradudate student descent (Gencoglu 2019, gencoglu2019hark).

  • Instead, Carlos focussed on valuable contributions. Understand the problem, thoroughly, think about possible solutions, omplement solutions, finally write the paper.

  • Lessons:
    1. Try to be the first.

    2. Do thinks that matter.

    3. Learn from your failures.

    4. Our legacy.

2022-09-28 - Deep Learning

Location: Zoom, Time: Wedneday 15:00 - 17:00, Attendees: Amr Mossy, Bastiaan Kleign, Maxwell, Daniel, Hayden et al.

Amr Moss discuessed the paper “SurVAE Flows: Surjections to Bridge the Gap between VAEs and Flows” (Nielsen 2020, nielsen2020survae) https://arxiv.org/abs/2007.02731

Notes:
  • Normalizaing flows use bijective transformations to model densities.

  • Normlizaing flows is a series of simple functions, which are invertible, or the analytic inverse of the function can be calculated.

  • Variational Auto-Encoders (VAEs) learn stochastictransformations that are non-invertible. A variational auto-encoder is an auto-encoder whose training is reguliarzied to avoid overfitting to ensure that latent space has good propoerties that enable generative process. See chapter 14 on Regularized Autoencoders (Goodfellow 2016). https://www.deeplearningbook.org/contents/autoencoders.html

  • Surjective transofrmations bridge the gap between VAEs and normalizing flows, wwherin their transofmrations are deterministic in one direction - thereby allowing exact likelihood computations and stochastic in the reverse direction - hence providing a lower bound on the corresponding likeliood.

  • They propose a moudlar framework of composable transofrmations that encompase VAEs and normalizing flows.

2022-09-29 - Weekly

Location: CO352, Time: Thursday 12:00 - 13:00, Attendees: Bing Xue, Bach Nguyen, Jesse Wood

Notes:
  • A response letter is not needed but I wrote one for practise. My supervisors suggested I make the revisions, then send the final version to Bach to check.

  • The university will cover the conference and one day before/after. In total I can travel to Perth from the 4th to -0th of December, they will cover insurance, accomodation and travel.

  • Bing sent me an email with the details regarding the trip and the what the university will cover.

  • Contrinue writing my proposal. Possible research objectives
    1. Deep learning state-of-the-art methods

    2. Evolutionary Computation / Evolutionary Computation

    3. Visualation

    4. Preporcessing

    5. Feature Selection / Feature Construction

    6. Transfer Learning

  • Performance of Multi-tree GP for classification was poor on the test set (62% accuracy) - it overfits the training, cannot generalize.

  • Performance of Multi-tree GP for feature construction with Wrapper-based SVM classifier was good, 98% train accuracy in 200 generations, and 72% test accuracy.

  • Try GP on the fish part dataset instead of fish species, there is more room for improvement on the test performance here.

TODO:
  • [x] Send AJCAI paper zipfile to Bach for review.

  • [x] Submit camera-ready AJCAI paper.

  • [x] Write up research objectives for proposal.

  • [x] (1/2) Multi-tree GP on fish part dataset.

2022-09-29 - FASLIP

Location: CO350, Time: Thursday 14:00 - 15:00, Attendees: Qinyu Wang, Bing Xue, Mengjue Zhang, Jordan MacLachlan, Jesse Wood et at.

Notes:
  • Qinyu shared a presentatio on “GP for fine-grained image classification”.

  • Differences between classes are very minute, e.g. classify very similar species of bird.

  • Almost all state-of-the-art approaches use CNNs from deep learning.

  • Limitations of CNNs; compute, interpretable, comexplity, architecture, supervised learning task, annotated class labels.

  • “Part-based R-CNNs for Fine-grained Category Detection” (Zhang 2014) https://arxiv.org/abs/1407.3867

  • Design a GP method with program/tree simplification for image classification
    • Remvoe redundant/ineffective nodes

    • Simplify building blocks

  • Develop a GP method with knowledge transfer to fine grained image classifcation
    • Generate simple sub-tasks.

    • Evaluation of sub-tasks.

    • How to transfer knowledge.

  • Improve the interpretability of well-performing GP based methods.

  • “This looks like that; Deep Learning for interpretable image recognition” (Chan 2019) https://arxiv.org/abs/1806.10574

  • Automaticalled Defined Functions (ADFs) - first mentioned in Koza Genetic Programming II (Koza 1994)

2022-09-30 - ECRG

Location: CO350, Time: Friday 15:00 - 17:00, Attendees: Wolfgang Banzhf, Bing Xue, Mengjie Zhang, Bach Nguyen, Jesse Wood et al.

General notes:
  • Wolfgang Banzhaf is the John R. Koz Chair for Genetic Programming in the Department of Computer Science and engineering at the Michagen State University. He published a foundational text “Genetic Programming: An Introduction” (Banzhaf 2008).

  • 5 students had papers accpeted to the Australassian Joint Conference on Artificial Intelligence (AJCAI).

  • Important to start booking flights for AJCAI in December, Bing sent an email with the details on organizaing the trip. We will need to arrange a transfer in Auckaldn, Sydney or Mebourne for a flight to Peth.

  • Intro: “Hi Wolfgan, I’m working on chemistry datasets on fish oil tissue samples, high-dimensional low sample size, they have a laser pen which can suck up fish oil tissue quickly (Mass Spectrometry). Consider using EC, with PSO/Multi-tree GP for detecting contaminiation in fish processing factory.” [This intro needs to be shorter/sharper and rehearsed].

Meng Xu (Nora), have a talk on “MOEA/D for Genetic Programming for Multi-Objective Job Shop Scheduling”.

Notes:
  • “Multi-objective evolutionary algorithm based on decomposition” (Zhang 2008) https://ieeexplore.ieee.org/abstract/document/4358754

  • Flexible - a job can be processed by a set of optional machines.

  • Dynamic - job shop schedling (JSS) can encounter dynamic events.

  • Scehduling heuristics are often human designed, a time-consuming process, which makes them rigigid and umnable to adapt to real-time dynamic events.

  • Hyper-heuristics - automatically design scheduling heursitics often used to solve hard computation search problems, such as MO-DFJSS.

  • Limitations:
    1. No studies have been done that use MOEA/D for MO-DFJSS. Existing work has shown promise on the static JSS problem.

    2. MOGP/D has a training/test process, this may not be suitable for MO-DFJSS.

    3. In MO-DFJSS the objectives are not always conflicting with eachother.

    4. Differnt range of obejctives may cause the pareto front to bias towards some objectives.

  • The afrementioned limitations shall be addressed with the following techniques:
    1. Normazliation

    2. Weight adjustment

    3. Map indviduals to sub-problems

    4. Archive - store nom-dominiated solutiosn

    5. Subset selection - more efficient evaluation

  • The Poisson distribution is a discrete probaility distribution that expressed the probability of a given number of events occruing in a fixed interval or time or space if these events occur witha known constant mean rate and independently of the theime since the last event. (Wikipedia) https://en.wikipedia.org/wiki/Poisson_distribution

2022-08-05 - Deep Learning

Location: Zoom, Time: Wednesday 15:00 - 17:00, Attendees: Felix Yan, Paul, JP, Hayden, Daniel, Jesse et al.

Felix Yan discussed the paper “Densely Connected Normalizing Flows” (Grcic 2021, grcic2021densly) https://openreview.net/pdf?id=JNSwviqJhS

Notes:
  • Normalizing flows are bijective mappings between input and latent representations with a fully factoritzed distribution.

  • Normalizing flows (NF) are attrictive due to exact likelihood evaluation and efficient sampling.

  • However their effective capacity is often insuffiencet since bijectivity constraints limit the model width.

  • The proposed method addresses this limitation by incrementally padding intermediate representations with noise. Precondition noise in accordance with previous invertible units, coined “cross-unit coupling”.

  • Their invertible glow0like, modules increase the expressivity by fusing a densely connected block with NYstron self-attention.

  • They refer to their proposed achitecture as DenseFlwo, since both cross-unit and intra-module couplings rely on dense connectivity.

  • Experiments show significant improvements due to prposed contributions and reveal state-of-the-art density estimation under moderate computing budgets.

2022-08-06 - Weekly

Location: CO352, Time: Thursday 12:00 - 13:00, Attendees: Daniel Killeen, Bing Xue, Bach Hoai Nguyen, Jesse Wood.

Notes:
  • The camera-ready version of the AJCAI paper has been submitted to the conference. This is all ready to go, all that is left is preparing flights, accomodation, registration, and also the poster/presentation.

  • Showed Daniel a reserach objectives, and the proposal outline draft, he said a good summary of what was dicussed.

  • He clarified that there will be two state-of-the-art mass spectrometry datasets: (1) REIMS, (2) Direct Infusion MS.

  • The Direct-Infusion Mass Spectrometry is extracted oils, time-consumung, and focuses on lipids.

  • He will send reference data, what we are trying to predict.

  • Also, generating a seasonal dataset, to see if there is variation between composition across seasons.

  • Benchmarks - compare two analytical state-of-the-art chemistry techniques using machine learning techniques.

  • “Never been more charactierized fish in the histroy of the world!” - Daniel Killeen 2022.

  • A proper proposal tempalte is available on the course website.

  • Daniel will send through a follow up email with more inforamtion, to re-iterate, a brain dump of ideas.

TODO:
  • [x] Read Daniels email

  • [x] Flights, accomodation, registration for AJCAI

  • [x] Notes / follow up on Daniels email

  • [x] Poster/Presentation for AJCAI

  • [x] Proposal proper template + more writing

2022-08-06 - FASLIP

Location: CO350, Time: Thursday 14:00 - 15:00, Attendees: Ziyi Sun, Bing Xue, Bach Hoai Nguyen, Jordan, Jesse et al.

Ziyi Sun gave a talk about Mask RCNNS.

Notes:
  • Instance segmentation - jointly estimate class labels and individual objects in an image.

  • RCNN is a region based object detection (Girshick 2014, girshick2014rich), application o CNN for object localization, detection and segmentation.

  • Good quote from (Girshick 2014) to rephrase “when lablled training data is sparse, supervised pretraining for an auxillary task, followed by domain-sepcific fine tuning, tyields signficiant performance boost.”

  • Beware, since (Xin 2022) is a recent paper, that suggests multi-task optimization is pointless (in deep learning).

  • To generate “proposals” for the region where the object lies, a small network is slide over a convolutional feature map that is output by the last convolutional layer.

  • Ziyi proposed a Mask R-CNN technique, an extenion of R-CNN that uses masks to improve accuracy.

  • Feature Pyramid Network (FPN) (Lin 2017, lin2017feature) are a “top down architecture with lateral connections is developed for building high-level smeantic feature maps at all scale.”

2022-08-06 - ECRG

Location: CO350, Time: Friday 15:00 - 16:00, Attendees: Zhixing Huang, Wolfgan Benzhaf, Bing Xue, Yi Mei.

General notes:
  • Meng is attending an Agriculture + Seafood Conference in nelson, he iw working on organizing some industry collaborations.

  • Harith is looking for people to review 80+ networking papers for an upcoming conference.

  • Good to book a meeting with Wolfgang to gain insight from his expertise. Send him an email or feel free to pop into his office.

  • Me:
    • Hengzhe’s DEAP 5x speedup is very effective for Multi-tree GP, where each inbdividuals consists of multiple subtrees. Exponential speedup in evaluation time for my algorithms.

    • Very productive meeting with my industry partner discussing domain expertise in Mass Spectrometry SOTA methods, we clarified the research objectives, and the research questions.

    • Bing suggested I creafully re-read Daniel’s email, and get a good understand of the problem, before tackling the solutions.

Zhixing Huang gave a talk about “My Study on linear Genetric Programming: A Brief Review”.

Notes:
  • Linear GP is a sequence of register-based instructions (Banzhaf 2009)

  • Instructions include, source and target register, and instruction.

  • Effective instruction (extron) and ineffective instructions (intron).

  • Steady-state algorithm (Weinstein 2022 pg 229). The new population is formed by the winners of tournament selection and the offspring of the winners.

  • Genertic operators: linear crossover, macro mutation, micro mutations.

  • Representation: LGP is rperesented as a graph-based GP.

  • Applying Linear Genetic Programing (LGP) to Dynamic Job Shop Scheduling (DJSS).

  • Dynamic JSS can repsonse to dynamic events (see 2022-09-30 - ECRG for more).

  • GP-based hyper-heuristics (Nguyen 2007)

  • Instance rotation and generational EA (Hildebrait 2010) - fitness between generations are no longer comparable with instance rotation.

  • Breathroughs are given as trial-and-error hyper-paremters that worked (Huang 2021, Huang 2022) - black magic?

  • Mutli-task - optimize multiple similar tasks simultaenously, hopeing that shared information amoung tasks can mutually enhance the optimization performance - Beware, since (Xin 2022)!!!

  • Motivation - EC is time-consming while training resources are limited. For LGP share knwoelge through graph based structures.

  • Shared-individuals - individuals ahve multiple tasks, and enncode solutiosn for this, but can share common building blocks.

  • Multi-population framework - intra-task sharing, where individuals share knowledge across sub-populations.

  • Semantic LGP for solving symbolic regression (Huang ____).

  • Semantics, calculations register values on each input isntance. Represented as a semantic matrix. Looking to find the inverse mapping of the semantic matrix using a (linear) learning model.

  • Select candidated based on best perofmrance of individual semantics.

  • “Mutate-and-divide”, a prgoram to recursively propagate the error through the program using a semantic matrix.

  • “LGP is a poor man’s ADF” – Wolfgang Banzhaf 2022.

Related:

2022-10-12 - Deep Learning

Location: Zoom, Time: Wednesday 15:00 - 16:00, Attendees: Madhurjra Choudhary, Bastiaan Kleign, Paul Teal, Daniel, Maxwell, Jesse et al.

Madhurjra discussed two papers:
  1. Adeep learning method for bearing fault diagnosis based on cyclic spectral coherence and convolutional neural networks (chen2020deep) https://www.sciencedirect.com/science/article/pii/S0888327020300698

  2. Noise-Aware Sparse Gaussian Processes and Application to Reliable Industrial Machinery Health Monitoring (yang2022noise) https://ieeexplore.ieee.org/abstract/document/9864068

These are to papers, CNN (Chen 2020) and Gaussian Process (Yang 2022) methods, for applications in real industry environments monitoring machinary.

Notes for (Chen 2022):
  • (Chen 2022) propose a Cyclic Spectral Coherence (CsCoh) + Convolutional Neural Networks (CNNs) for rolling element fault diagnosis.

  • Data:
    • The domain is rolling element fault diagnosis - i.e. ball bearings in a factory setting.

    • A rotating bearing will modulate (go up and down) in ptich in a non-periodic manner, this is a telltale sign of a faulty ball bearing.

  • Method:
    • Combine CsCoh + CNNs for fault diagnosis of rotating elements in a factory.

    • Cyclic Speherical Coherence (CsCoh) is used to preprocess virbation signals, estimated by the fourier transform of Cyclic ACF (see paper for derivation).

    • Group Normalization (GN) is developed to reduce the internal covariant shift by data distribution discrepency, extends applications of the algorithm to real industrial environments.

  • Results:
    • Their proposed method improves classification performance, >95% accuracy needed for use in real-world.

    • CsCoh proivde superior dsciminate feature representations for bearing health statuses under varying conditions.

    • Group Normalization increases robustness for data from differenet domains (with different data distributions).

  • Why it matters?
    • Garbage-in-garbage out - Preprocessing can dramatically improve the performance of a CNN.

    • Group Normalization makes the method robust, and applicable to out-of-distribution data from unseen domains.

    • Detecting faults in ball bearings is crucial for safety, automation, and efficiency in factories.

Notes for (Yang 2022):
  • (Yang 2022) proposed a Noise-Aware Sparse Gaussain Process (NASGP) with Bayesian Inference Network.

  • Data:
    • Domain - maintainace of machinary equipment requires real-time health monitoring. Most state-of-the-art models require high quality monitoring data, but are not robust to noise present in real-world applications.

    • Problem - predict an estimate of the reamining useful life of machinary equipment using noisy data.

  • Method:
    • Noise-Awate Sparse Gaussain Processes (NASGP) + Bayesian Inference Network.

  • Results:
    • NASGP are capable of high-performance and credible assessment under strong noises.

    • Developed a generative additive model to bridge the gap between latent inference mechanism and domain expert knowledge.

    • Method worked well in two different domains: (1) remaining useful life prognosis, (2) fault diagnosis in rolling bearings.

  • Why it matters?
    • The method is robust to noise, and can be applied to real-world applications, not just academic benchmarks (toy datasets).

    • Method provides a generative additive model that works well in two different domains.

    • Important to monitor machinary equipment in real-world applications, to ensure safety, automation, and efficiency.

Related:

2022-10-13 - Weekly

Location: CO352, Time: Friday 12:00 - 13:00, Attendees: Bing Xue, Bach Nguyen, Jesse

Notes:
  • Send Bing the passport scan + frequent flyer number to lock in the tickets for the flight to AJCAI 2022.

  • Bing has purchased a full registratioj for AJCAI 2022, at least one author needs full registration by October 14th.

  • Send Bing a copy of the regristration confirmation email when it comes through.

  • Limitations, a good section to include as a header for my literature review Literature Review notes.

  • Try cross validation on train to improve genelization - EvoStar deadline for submissions is coming up at November 1st.

  • Book accomodation for AJCAI 2022 at Ibis Styles Perth https://ibisstylesperth.com/

TODO:
  • [x] Passport + Frequenct Flyer Number

  • [x] Book accomodation for AJCAI

  • [x] Crossvalidation to improve test performance.

  • [!] Forward Registration Confirmation Email to Bing (never recieved this!)

2022-10-13 - FASLIP

Location: CO350, Time: Thursday 14:00 - 15:00, Attendees: Hengzhe Zhang, Mengjie Zhang, Bing Xue, Bach Nguyen, et al.

Hengzhe Zhang gave a presentation on “Genetic Programming for Feature Construction and Symbolic Regression”.

Notes:
  • Feature construction technique to transform features into a better feature space.

  • “An Evotluionary Forest for Regression” (Zhang 2021)

  • Good ensemble = good quality + good diversity

  • Methods:
    1. Niching and clearing

    2. Novelty search

    3. MAP-Elites

  • Genotype:
    • Semantic space - don’t sum loss (MSE), represent loss as a vector.

    • Behaviour space

    • MAP-Elites - grid

  • Dimensionality reduction:
    • PCA

    • t-SNE

    • VAE

  • These methods are not appropriate to define a behaviour space of GP. Euclidean distance is not an accurate measure for quality/diversity, cosine similarity is not ideal, but better measure of quality/diversity.

  • VAE struggles on out-of-distribution data, it does not generalize well to new domains.

  • “Creating Diverse Ensembles for Classification with Genetic Programming” (Nickerson 2022)

  • SYntehtic symmetric guided vectors, i.e. a mirror image in behaviour space to increase diversity.

2022-10-14 - ECRG

Location: CO350, Time:: Friday 15:00 - 17:00, Attendees: Julie Blommaert, Mengjie Zhang, Bing Xue, Yi Mei, Bach Nguyen, et al.

General notes:
  • Julie Blommaert from Plant and Food Research (PFR) is here at ECRG http://julieblommaert.com/

  • Yi had a survey paper on “Intrepretable Programming in GP” accepted with minor revisions, I should read this.

  • IEEE CIS elections for board positions, suprising number of people from Evolutionary Computation (EC). Vote Bing in this election.

  • IEEE CEC deadline for tutorials is at the end of themonth, plenary talks and conference attendence, start planning now!

  • The EvoStar deadline is the November 1st - usually there is a 7-10 day extension for this conference.

  • Can make submission to EvoMUSART, although not entirely popular within the group, perhaps discuss with Bach/Meng/Bing.

  • Introduction: discussed ENGR489 background, trip to PFR in Nelson, fish classification on gas chromatography. now working on classification/regression/XAI for REIMS fish data, for detction of adulterations in food products.

Julie Bommaerty gave a presentation on “Wierd and Wonderful Genomes” http://julieblommaert.com/

Notes:
  • Julie did her PhD on Zooitic plankton that leaves in sea water, post-doc in bord genomics phylogeny.

  • Philogeny - a tree dlinieating evolutionary history, see (Weinstein 2022) ch. 1 and Bret’s PhD thesis.

  • Genomic data, high-dimensionality low sample size, exsiting work uses statisitcal methods, lack state-of-the-art AIML approaches.

  • Take advantage of “wierdness in genomics” to apply domain exptertise to enhance modeeling genomics data.

  • Genomics is an interdiscplinary field of biology focussing on structure, function, evolution, mapping, and editing of genomes.

  • Size analogy:
    • Genome, from Rome -> Paris

    • Long read - bus

    • Short read - smart phone

  • Research objective:
    1. Genetic variation in snappered linked to trasnposbale elements in valuable traits?

    2. Where is the sex determiniation region hiding?

    3. Gut microbiome influence fowth and interact with last genome?

  • Genomic variations:
    • insertion/deletion

    • microsatellite

    • TE insertion

    • SNP

  • Visualizations:
  • Transposable elements (TE), were where they accumluate, they can change expression of genes
    • Maize corn -> rainbo corn.

    • (Borque 2018, Hof 2016) related materials.

    • (Hof 2016) moth that changes colour of its wings due to transposons.

  • Sex determination for snapper; variants, copy number, TE accumulation.

  • Fast vs sklow growing snapoper; F1 vs F4; sea penn vs ocean.

  • Def. transposons, a class of genetic elements that can “jump” to different locations within a genome. Although these elements are frequencly called “jumping genes”, they are always maintained in an integrated site in the genomes. In addition, most tranposons eventually become inactive and no longer move. https://www.britannica.com/science/transposon

  • Effects of tranposons are passed on to offsrping, they are encorporated into the genome.

  • Hardware limiktations of computers, memory, storages, read/write, gpu/cpus, are barriers to effective processing of genomic data (Kulasekara 2014).

Related:
  • Very similar 2022-09-08 - FASLIP where Jinyan Li gave a talk on “Compression for DNA/RNA Sequencing”.

2022-10-19 - Deep Learning

Location: Zoom, Time: Wednesday 15:00 - 17:00, Attendees: Syed Faisal Ishtiaq, Felix Yan, Paul teal, JP Lewis et al.

Syed discussed the paper “Denoising diffusion implicit models” (Song 2022), Bastiaan Kleign previously discussed this paper in 2022-07-06 - Deep Learning.

Notes:
  • Denoising diffusion implicit models (DDIM) generalize the forward diffusion process using in DDPM (Song 2020).

  • The proposed method can produce high-quality images 10x - 50x faster, a tradeoff with computation for sample quality.

  • Can perform semantically meaningful interpolation in the latent space. DDIM can interpolate direcrly in the latent space, DDPM cannot, the model can interpolate smoothly between two images.

  • DDIM has an accelerated generative processing, the denoising process does not depend on a specific forward process, we can choose forward processes with less timesteps, accelerating the corresponding geenration, without having to train a new model.

  • Principle: “Train a mdoel with an arbitrary number of forward steps, but only sample from some of them in the generative process”.

2022-10-20 - Weekly

Location: CO352, Time: Friday 12:00 - 13:00, Attendees: Bing Xue, Bach Nguyen, Jesse

Notes:
  • Gave a presentation about (Black 2019) paper to my supverisors, slide available https://github.com/woodRock/fishy-business/blob/main/resources/2022-10-20_black2019rapid.pdf

  • I need to start writing the proposal, get my research objectives together, two per week. Need to finish my proposal ASAP.

  • Aim for a draft proposal by the end of the 9th month of my Phd. Need to get cracking, crack the whip.

  • Need to get some preliminary work done for my PhD proposal, currently exploring multi-tree classification with winner-takes-all and wrapper-based methods for gas chromatograph.

TODO
  • [x] IEEE Webmaster self-nomination

  • [x] Proposal - 2 reseach objectives

  • [x] Preliminary work

  • [-] NESI HPC Niwa Compute application

2022-10-20 - FASLIP

Location: CO350, Time: Thursday 15:00 - 16: 00, Attendees: Dylon Zeng, Bing Xue, Bach Nguyen, Jesse et al.

Dylon Zang suggested a talk from Simon Lucas on “Evolutionary Algorithms on Game AI”. https://www.youtube.com/watch?v=MM3JoMYcAYQ

Notes:
  • EA suited to problems where we can easily design a fitness function, example given is procedural generation of a 2D maze.

  • EAs work suprisingly well ‘out-of-the-box’ for select problems, population-based evolutionary algorithms are good at exploring combinatorially explosive search space.

  • Providing we don’t aim for optimal, we fan find ‘good enough’ solutions for large search spaces.

  • GANs + EAs (Volz 2018), to evolve mario levels in the latext space of a deep convolutional generative adversarial neural network.

  • Rolling Horizon Evolutionary Algorithm (RHEA) good for desinging good enough at a given task, not god-tier, not super-human.

  • You need a model that can be advanced forward rather rapidly.

  • RHEA lookahead, considers multiple futures, and chooses actions with best outcome, similar to death crystals from Rick and Morty S04E01 https://rickandmorty.fandom.com/wiki/Death_Crystal

  • NBTEA to tune the hyperparameters for balancing game play.

2022-10-21 - ECRG

Location: CO350, Time: Friday 15:00 - 17:00, Attendees: Hayden Anderson, Wolfgang Benzhaf, Bing Xue, Yi Mei, Bach Nguyen, Jesse et al.

General notes:
  • Computer cience often never gets ist own category for academics. It is often lumped in with other disciplines, such as mathematics and engineering.

  • The EvoStar conference deadline is coming up, the deadline is 2022-11-01. We are encourages to submit papers for this conference.

  • Progress report:
  • Overseas conference leave (OCL) is for staff wishing to travel to conferences overseas, important to submit a paper to the conference, to increase the changes of OCL being accepted.

  • Counter-factual explanation? A few students mentioned this for explainable AI in Genetic Programming (GP).

  • Christian is doing “meta-learning for loss function”, I should investigate this further.

  • There was a Niwa / VUW workshop on science/AI, I should follow up with Bing to get invited to future events.

Hayden Anderson gave a presentation on “The Programming Language Julia” https://julialang.org/

Notes:
  • Julia won the 2019 James H Wilkinson Prize for Numerical Software.

  • Used at NASA and CERN.

  • Read-eval-print-loop (REPL), it is a compiled language, but REPL allows for code testing during development.

  • We do not need to provide typing, as we would for strictly typed languages like c++ or Java, Julia is duck-typed, with optional static typing.

  • Julia supports c++ style string interpolation.

  • Short circuit evaluation, 1-indexed, single-line function definition, easy vectorization of functions.

  • Each function can have an arbitrary number of method implementations, each operating on different types.

  • Multiple-dispatch https://www.youtube.com/watch?v=TPuJsgyu87U

  • Make any type iterable, indexable, …

  • LLVM (low level virtual machine), compiles to byecode, no c backend,

  • Julia has first class functions; they can be assigned to variables and called using the standard funciton call syntax from the variable it was assigned too.

  • Julia represented code as a data structure in the language itself.

2022-10-26 - Deep Learning

Location: Zoom, Time: Wednesday 15:00 - 17:00, Attendees: Bastiaan Kleign, Felix Yan, Paul teal, JP Lewis et al.

Notes:
  • Bastiaan presented a story involving denoising diffusion, neural ODEs, Fokker-Planck and forward/backward Kolmogorov, Langevin, and normalizing flows.

  • Fokker-Planck is \(\approx\) Forward Kolmogorov, Fokker-Planck invented it 50 years earlier than Kolmogorov https://en.wikipedia.org/wiki/Fokker%E2%80%93Planck_equation

  • Bastiaan went to the same school as Fokker, who invented the Fokker-Planch equation 50 years before he attended that school.

  • GAN sate-of-the-art but temperamental, we want to beat GANs.

  • DDPM (Ho 2020, Song 2020); slow backward mdoel. Synchonony thikning, performance limited by ELBO-like criterion.

  • Forward SDE and backward SDE \(=\) Langvein; no synchrony, good but also slow.

  • Normalizing flows (Kobyzev 2020); fast, too much structure prevents good prevents good performance, but max likelihood.

  • CNF; fast, no structure requirements, max likelihood, yet not so good.

  • (Tegmark 2022) propose Poisson Flow Generative Models (PFGM) which map a uniform distribution on a high-diemsnaioal hemisphere into any data distriubtion.

2022-10-27 - Weekly

Location: CO364, Time: Friday 12:00 - 13:00, Attendees: Bach Nguyen, Jesse Wood.

Note: Meng and Bing are away in Auckland this week.

Notes:
  • t-SNE (Van 2008) is a blackbox kernel-based method, it is good for visualization, but not so much for interpretability. The transformation applied to the data is not interpretable. We don’t get a direct mapping between the original features and their transformation.

  • I mentioned the GC-MS image representation as part of the preliminary work, and Daniel’s comments on the GC-MS datacbe (Khakimov 2015).

  • The probelm is not clear, this should be at the start of the introduction. More from the AIML side on problem, less on chemistry, or make the equilance between terms clear upfront.

  • Write a summary on my tree GP results. Analyze these results, compared to existing work.

  • Daniel’s focus is on the chemistry dataset, his expertise/research follows this, however my PhD is applying AIML to these datasets. My proposal should focus on AIML and attempt to bridge the gap between the two fields for this domain.

TODO:
  • [x] Write a summary on my tree GP results. Analyze these results, compared to existing work.

  • [x] Focus on problem from AIML perspective,

  • [x] State equivalent terms from chemistry upfront (i.e. Table or Glossary).

Related:
  • t-SNE (Van 2008) is a dimensionality reduction / visualization technique.

  • (Khakimov 2015) use a datacube to visualize GC-MS data.

  • (Wood 2022) performs classification/feature selection on GC-MS fish data.

2022-10-27 - FASLIP

Location: CO350, Time: Thursday 14:00 - 15:00, Attendees: Mengjie Zhang, Bach Nguyen, Jordan MacLachlan, Christain Raymond, Jesse et al.

Mengjie Zhang gave a plenary talk at WCCI 2018 in Brazil on “Evolutionary Deep Learning for Image Analysis”. We watched a recording in todays meeting https://ieeetv.ieee.org/mengjie-zhang-evolutionary-deep-learning-for-image-analysis

Notes:
  • Def. Computervision - The “art” os making computers see (and understand what they see).

  • Table of human and computer analogies for computer vision.

Human

Computer

Perception

Image Acquistion

Selection informaiton extraction

Feature Extraction

Grouping by similarity

Segmentation

Spatial information

3D information recovery

Object Recognition + Semantic interpretation

Image understanding

  • Deep learning requires siffucient model complexity. Relates to (Chase 1973) which shows domain expertise allows people to build meaningful schema to represent patterns.

  • Popular DL methods for computer vision:
    1. Convolutional Neural Networks (CNNs) (Lecun 1989)

    2. Deep Belief Networks

    3. Autoencoders (Krammer 1991)

  • Evolutionary Deep Learning (EDL), a controversial claim by Meng, that EC techniques are technically deep learning.

  • Limitations of DL:
    1. Hyperparameters - too many, manual tuning, black magic, student gradient descent (Gencoglu 2019).

    2. Gradient Descent - stuck in local optima, doesn’t gurantee global optima, differntiable, GPUs/TPUs compute needed.

    3. Fixed network structure - no flexible representation, rigid.

    4. Blackbox - not interpretable, difficult to verify, no human understanding of the model.

    5. Sample complexity - many samples needed, high volume of data required.

    6. Domain expertise - required in ML and Application domain, very expensive and time consuming.

  • EDL/EC methods address these limitations of conventional DL techniques.

  • (Wang 2018) propose EvoCNN to automatically search for an optimal CNN architecture without any manual work evolved.

  • Cartesian Genetic Programming (CGP) is a form of GP that uses a graph representation to enocde a computer program. It is called cartesian because it presents a program using a 2D grid of nodes.

  • 2-tier GP (2012) automatically determine facial features without any domain knowledge required.

  • ConvGP - an honours student of Meng introduced a convolutional operator to GP, to improve performance on image analysis.

  • EDL benefits:
    1. Great accuracy

    2. Efficiency

    3. Small training set \(\to\) surrogate model

    4. Intrepretable

  • Meng includes an advert for a Wellington conference the university was hosting in 2018. Class act.

Related:
  • (Chase 1973) which measures recongition of chess pieces by players of varying skill. Expert players could recall more pieces in standard positions. Caveat, skill lever didn’t effect recall on random board positions.

  • See (Goodfellow 2016>goodfellow2016deep) chapter 14 Autoencoders.

  • (Gencoglu 2019) proposes the term “grad student descent” for the flawed process of tuning hyperparameters by hand.

2022-10-28 - ECRG

Location: CO350, Time: Wednesday 15:00 - 17:00, Attendees: Qi Chen, Wolfgang Banzhaf, Fanfan Chen, Bing, Meng et al.

General notes:
  • My topic: “Mutli-label classification / multi-output regression for the application area of rapid determination of bulk composition and quality if marine biomass for mass spectrometry”.

Wolfgang Banzhaf gave a presenation on “Computational Evolution Needs Guidance”.

Notes:
  • Wolfgang comes from Baden - Wurttemberg, Germany. A small town in the south-west of the country, known for its wine.

  • He said while touring local vineyards in Wellington with Meng, he liked the wine in New Zealand as well.

  • Baden-Wurttemberg has produced many famous people, Einstein, Ernst M…

  • The COVID-19 virus and all of its variants demonstrates the power of evolutions, marvellous examples natural evolution from the real-world.

  • “From artificial evolution to computational evolution: a research agenda” (Banzhaf 2006).

  • EC must avoid the mistake of GOFAI, GOFAI abstracted away the substrate of knowledge.

  • The central dogma of EC is a unimodal representation:

DNA

Genotype

Proteins

Phenotype

Function

Fitness

  • In reality this a far more complex graph-like process with many feebdakcs.

  • Limitations:
    • Static fitness

    • Fixed representation

    • Closed systems

    • Genomes composed of discrete and indepdent genes

    • No role for expressed material

    • Scalability

    • No epigenetics

  • Breaking points: in biology the DAN specifies specific places where it can break, the locations for crossover are encoded in the genotype.

  • Bacterial stress response: stress-induced mutagenssis raised the level of mutation in biology. Dynamic/scheduled/adaptive mutation rate.

  • Niche construction: feedback loops make causation very difficult to determine, e.g. chicken and egg analogy.

  • Fitness is a naive (post-hoc explanation) metric, not the signal. It is a concept humans invented to describe an organisms fit the the environment it is in.

  • Neutrality - do we properly jump to other places without a change in fitness, is this related to diversity?

  • GA vs GP; two mexican hat functions where:
    • GA - find a global optima in a continuous fitness landscape.

    • GP - find a mathematical model th represent the discrete fitness landscape.

    • Similar to (Dawkins 1995) “The Evolved Imagination: Ainimas as models fo their world”.

  • Evolutionary design of structures, this is the heart of engineering, i.e. agile, iterations, development, science itself.

  • Science itself is an iterative refining of world models, that represent the world at sufficient complexity to match empircal observation.

  • Guided mutation: provide guided mutations, constraints direct and reduce the search space for an evolutionary search algorithm.

  • Rules: have a clear picture of what is allowed beforehand, throwout everything else.

  • Secrets of evolution:
    1. Elements (both recurvsive and elementary)

    2. Randomness of combinations

    3. Rules

    4. Guidance

  • Koza’s Sextuc polynomial, e.g. \(x^6 - 2x^4 + x^2\),

  • Root means squared error (RSME) doesn’t get it right, it is a very local measure that considers \(|\hat{y} - y|\) (point-wise distance).

  • Correlation, which looks at the shape, is a global measure. Maximise correlation; then finally apply linear scaling.

  • Active learning: Using least amount of data that still solves the problem (look where it matters).

  • Benchmakred RMSE vs Correlation + Linear Scaling on Feynman AI benchmark dataset for symbolic regression.

Related:
  • (Banzhaf 2006) a review article on EC and its biological inspiration published in nature.

  • (Dawkins 1995) proposed that animals evolve as models of their world, similar to GP as a model of a discrete fitness landscape.

  • RMSE (root mean square error) given by \(\frac{1}{|T|}\sqrt{\sum_{i \in T} || y_i - \hat{y_i} ||^2}\), there \(T\) is the datset, \(y\) is actual value, \(\hat{y}\) is the predicted value, \(||.||^2\) is the L2 norm. See (Goodfellow 2016) Chapter 2, page 36 for explanation of L2 norm

  • (Tegmark 2020, Tegmark 2022<tegmark2020aifeynman2>), Feynman AI and Feynman AI 2.0, explore the simplfiying properties for physics equations on the Feynman AI benchmark dataset.

  • Similar to Banzhaf’s work with correlation + linear scaling, in (Tegmark 2020, Tegmark 2022) they exploit the general symmetry of physics equation, to reduce the search space from general symbolic regression to physics-based symbolic regression.

2022-10-31 - Guest Speaker

Location: CO350, Time: 15:00 - 16:00, Attendees: Huanhuan Chen, Mengjie Zhang, Bing Xue, Jesse et al.

Huanhuan Chen gave a talk on “Neural Network Learning in the model space” http://staff.ustc.edu.cn/~hchen/pub.htm

Notes:
  • Limitations of Deep Learning:
    1. High-volume - sample complexity

    2. Black-box

    3. Distance metric

  • Learning the model space with signal fitting mode (e.g. deterministic reservoir computing).

  • Map local data into a function space, and construct models in that function space.

  • Grey box, try to imrpove the interpretability of learning methods by repreenting data with generative models.

  • Model learning framework:
    1. Generation

    2. Measurement

    3. Employment

  • Difficult to compare DL methods by comparing their parameter space, KL-divergence needs to use integrals to calculate \(D_{KL}(P||Q)\).

  • Fault diagnosis:
    1. Detection

    2. Isolation

    3. Identificaiton

  • Reservoir computing is a framework for computing dervied from Recurrent Neural Networks (RNNs) theory, that maps input singals into higher dimensional computational spaces through the dynamics of a fixed, non-linear sustem called a reservoir. After the input signal is fed to the reservoir, which is treates as a black box, a simple readout mechanism is trained to read the state of the reservoir and map it to the desired output. https://en.wikipedia.org/wiki/Reservoir_computing

  • Van de Pol Oscillator - a two dimensioanl times-series, a non-conservative oscilator with non-linear damping. It evolves in time according to a second-order differential equation. https://en.wikipedia.org/wiki/Van_der_Pol_oscillator

Van der Pol oscillator phase portrait
  • Conservative force: a force only depends on the position of the object. A non-conservative force dependts on the apth taken. A scalar definition is not possible, because different paths result in different forces. E.g. gravity is a conservative force, friction is not!

  • A differentail equation conssits of a function and its second-order derivative is called a second order differential equation.

  • Damping is an influence within or upon an oscillatory system that has the effect of reducing or preventing its oscillation.

  • Spatio-temporal graph to understand video semnatics, apply Hidden Markov Models (HMM) to perceive spatio-temporal information.

Related:
  • See (Russell 2010), chapter 15 probabilistic reasoning over time, for explannation of Markov processings and Hidden Markov Models (HMM).

2022-11-09 - Deep Learning

Location: Zoom, Time: 15:00 - 16:00, Attendees: Paul Teal, Bastiaan Kleign, Maxwell Clarke, Hayden Dyne, Daniel Braithwaite et al.

Paul discussed the paper “The Submerged part of the AI-Ceberg [perspectives]” (Couillet 2022).

Notes:
  • The paper is a powerful critique of the sustainability and environmental impacts of AI on the planet.

  • Thoughts:
    1. Existential threat to our jobs, personal/touchy subject.

    2. AI applied directly to sustainability applications ignored, e.g. JSS, Routing, controlling plasma in nuclear fusion reactor.

    3. Carbon is a negative externality, planned obscelence, incentives for longevity.

    4. Article places blame at foot of the consumer, not the companies which are repsonsible for 71% of the GHG emissions.

    5. The population of Uganda uses less electricity than the populaiton of California uses to heat their pools.

  • Negative externality when the production/consumption of a product has cost on a third party. E.g. carbon/GHG emissions, affective polarization.

  • Carbon credits, also known as carbon offsets, are permits that allow the owner to emit a ccertain amount of carbon dioxide or other greenhouse gases.

  • Maxwell’s thought: The solutions to carbon emissions of AI are non-specific to AI, and apply to all industries that consume carbon, so all industries.

Related:
  • See (Couillet 2022) for sustainability perspectives on AI.

  • See (DeepMind 2022)”Successfully controlling the nuclear fusion plasma in a tokamak with deep reinforcement learning”

  • See (Guardian 2017) Just 100 companies responsible for 71% of global emissions, study says

  • See (Lomberg 2021) California uses more electricity on pools and hot tubs than all 44m people use in Uganda

2022-11-10 - FASLIP

Location: CO350, Time: 14:00 - 15:00, Attendees: Prof. Xin Yao, Bing Xue, Bach Nguyen, Jordan MacLachlan, Jesse et al.

Prof. Xin Yao gave a talk “What can evolutionary computation do for you?” several years ago at the Worksop of Grand Challenges in Computational Intelligence. https://www.youtube.com/watch?v=pcl08XA1REM

Slides available here https://homepages.ecs.vuw.ac.nz/foswiki/pub/Users/BingXue/Topics/XinYao%20What%20EC%20can%20do%20For%20you.pdf

Notes:
  • Discovering “new” physical laws in astrophysics - Modelling Radial brightness Distributions in Elliptical Galaxies. (Li 2004).

  • As obvervational data icnreases some of these laws do not seem to describe the data bery well.

  • EC is an enitrely data-driven apporach, that removes human bias, that can compliment existing work in another field (e.g. astrophysics).

  • Previous work had applied fitting algorithms to find suitable paramters for a function.

  • Their approach (Li 2004) was Genetic Programming (GP) and Fast Evolutionary Programming (FEP).

  • EC can be used to derive unknown constants in differential equations derived from physical laws.

  • You can publish papers in a hournal by converting a matlab codebase for stochastic ranking (fancy bubblesort) into C programming language.

  • Real-world problem has many constraints, e.g., linear, nonlinear, equality, inequality

  • Constraint handling, EC works better (Runarsson 2000), it is more effective; good at dealing with non-differentiable and non-linear problems; Avoid unecessary and unrealistic assmptions.

  • Similar to 2022-09-23 - ECR where Carlos discussed constraint handling.

  • Route optimization for getting trucks, where EC outperfomed all other existing algorithms (Handa 2006).

  • Ubiquity of mutli-objective problem solving in hardware. “Digital filter deisng using multiple paretor fronts” (Schnier 2004)

  • Theoretical foundations for Evolutionary Computation are lacking. It is unclear from the outset if EC and which EC will perform well before evluating the algorithms on a dataset.

  • Interpretability- we want to understand the relationship between the problem and the algorithms.

  • The sapce and time complexity of EC is lacking, usually happy with \(O(n^2)\), and exponentials don’t play nice at scale.

  • EC is a darkhorse, it can do far more than might have initially thought.

  • This is an old talk, but his pioints are still valid today, and the field is still growing and EC is in a golden age for research.

Related:
  • (Li 2002) use evolutionary computation to solve differentiral equations for deriving physics laws.

  • (Li 2004) use evolutionary computation to find models that fit observational data in astrophysics.

  • (Runarsson 2000) used stocastic ranking (bubblesort variant) for constrained optimization with Evolutionary Computation.

  • See 2022-09-23 - ECRG<2022-09-23 - ECRG> where Carlos A. Coello Coello applied Multi-objective (MO) Evolutionary Computation (EC) with constraint handling.

2022-11-11 - Weekly

Location: CO352, Time: Friday 14:00 - 15:00, Attendees: Bing Xue, Bach Nguyen, Mengjie Zhang, Jesse

Notes:
  • Candidate Development Plan (CDP) - Write a draft for the CDP, first half is due by Wednesday 2022-11-16.
    • Previous goals;
      • [x] GP Algorithm

      • [x] Paper

      • [x] Agile

      • [!] Proposal

    • Future goals
      • [x] Propsoal

      • [x] AJCAI Conference

      • [x] New piece of work

  • Proposal
    • Write chapter 1 introduction, for the proposal, full draft.

    • Write research objectives.

    • Important to finish proposal before christmas.

    • Check out Bing/Bach/Christains proposal as exemplars for the proposal.

  • Holiday season - university closes from 2022-12-22 -> 2023-01-10 (22nd Dec -> 10th Jan).

  • Journals, can target journals for papers as well.
    • Journals hve no deadlines.

    • Process: Submit -> Major revision -> Minor Revision -> Accept.

    • Target IEEE transactions for high impact journals.

  • Dylon’s proposal is next Monday 2022-11-14

TODO:
  • [x] Write draft CDP before 2022-11-16 and send to supervisors.

  • [x] Write chapter 1 introduction for my proposal.

  • [x] Write research objectives for my proposal.

  • Bach’s proposal feedback:
    • [x] Focus on problem from AIML perspective,

    • [x] Write a summary on my tree GP results. Analyze these results, compared to existing work.

    • [x] State equivalent terms from chemistry upfront (i.e. Table or Glossary).

  • [x] Forward Registration Confirmation Email to Bing (never recieved this!)

2022-11-11 - ECRG

Location: CO350, Time: 15:00 - 17:00, Attendees: Christian Raymond, Mengjie Zhang, Bing Xue, Wolfgang Banzhaf.

General notes:
  • University has stopped supporting PhD/Masters students attending international conferences, the department will still try to support the students as best tehy can. No longer an automatic acceptance process.

  • If students have papers acepted at top journals/conferences, the department can request grant money to facilitate this.

  • Our papers, for the conferences/journals ECRG target, are fully refereed (peer-reviewed), sio ECRG is often given priority, Other discplines are not peer-reviewed.

  • ECRG was the largest contributed to conference on Evolutionary Computation. E.g. GECCO, EvoStar.

  • Progress report:
    • CDP 2022-11-26

    • Proposal - Chapter 1 / Research Objectives

    • AJCAI poster/presentation 2022-11-24

  • “Who enjoyed the proposal stage?” – Meng, I raise my hand , “Who finished and enjoyed it?” – Meng.

Christian Raymond gave a talk on “Meta Learning loss functions”.

Notes:
  • M.C. Escher Drawing Hands (1948)

M.C. Escher Drawing Hands (1948)
  • Humans can quickly/efficiently solve complex and novel tasks.

  • Inductive bias - an inductive bias of a learning algorithm is the set of assumptions that the laearner uses to predict outputs of given inputs it has yet encountered (Wikipedia).

  • Meta learning - Intelligent systemes leverage their past experiences over m learning episodes covering a distribution of related tasks to automatically select the suitable inductive bias.

  • Trends in AI: Hand designed features \(\rightarrow\) End-to-end training (AlexNet) - automatically learn features \(\rightarrow\) Automatically designed algorithsm (Meta Learning).

  • Taxonomy:
    • Meta-objective - Why?

    • Meta-representation - What?

    • Meta-optimizer - How?

  • Cost vs. loss vs. objective
    • Loss function - loss for an instance

    • Cost function - loss for a set of instances (a dataset)

    • Objective function - cost function + regularization term.

  • Limitations of loss:
    1. Selection

    2. Task specificity

    3. Goal-alignment.

  • Goodhart’s Law - “When a measure becomes a target, it ceases to be a good measure” (Wikipedia).

  • Spearch space design:
    1. Superset

    2. Dense

    3. Satisfy GP closure

    4. Task and model agnostic

  • Rather small poulation size, similar to neural architecture search (NAS).

  • Constraints:
    1. Required arguments.

    2. Non-negative output (i.e distance metric)

  • Meta-objective, this tells us what the task is, i.e. classification/regression.

  • Loss functions are converted from GP trees to loss networks.

  • The loss network is a computational graph, we can apply forward/backward proposation to the loss network to optimize the meta-parameters.

  • Affine transformation - is a geometric transofmration that preserves lines and parellelsim (Wikipedia).

  • Automatic differentiation - is a set of techniques to evaluate the deriviate of a function specified by a computer program (Wikipedia)

  • Benchmarks: ML:math:^3, TaylorGLO, GP-LFL, (theirs) EvoMAL.

2022-11-30 - Weekly

Location: CO352, Time: Wednesday 14:00 - 15:00, Attendees: Mengjie Zhang, Bach Nguyen, Bing Xue, Jesse

Notes:
  • CDP:
    • Current progress towards my thesis is “good”

    • I met 3/4 of my previous CDP goals.

    • CDP goals for next 3 months are very clear.

  • Proposal:
    • Revisit proposal (refresh memory)

    • Describe the data

    • Write up preliminary work

    • A few rounds of revision are needed

  • DEADLINE:
    • November 30th marks 9 months into my thesis, proposal should be done.

    • Focus on finishing my proposal before I go on holiday in January (2023-01-18)

TODO:
  • [x] Forward Registration Confirmation Email to Bing (never recieved this!)

  • [x] Write draft CDP before 2022-11-16 and send to supervisors.

  • [x] Submit CDP by November 30th 2022-11-30

  • [x] AJCAI Print poster 2022-22-02

  • [x] AJCAI memorize presentation 2022-22-02

  • [x] AJCAI dress rehearsal at FASLIP 2022-12-01

  • [x] Write chapter 1 introduction for my proposal.

  • [x] Write research objectives for my proposal.

  • Bach’s proposal feedback:
    • [x] Focus on problem from AIML perspective,

    • [x] Write a summary on my tree GP results. Analyze these results, compared to existing work.

    • [x] State equivalent terms from chemistry upfront (i.e. Table or Glossary).

2022-11-30 - Deep Learning

Location: Zoom, Time: Wednesday 15:00 - 16:00, Attendees: Ali Knott, Bastiaan Kliegn, Paul Teal, Marcus Frean, Jesse at al.

Ali not gave a talk on social media recommender systems. This is related to his GPAI 2022 report https://gpai.ai/projects/responsible-ai/social-media-governance/transparency-mechanisms-for-social-media-recommender-algorithms.pdf

Notes:
  • Combat Terrorism and Violent Extremesim Content (TVEC).

  • Deboost/derank unwanted content.

  • After March 2019 Christchurch Mosque Shoting, a NZ lead call to end TVEC online.

  • Cognitive bias revealed by social media, moral-emotional expressions, violent comments, political out-group, false information.

Related:
  • See (Huszar 2022) for study by Twitter employees on amplification of political content.

  • See (Jiang 2019) for degenerative feeback loops on social media reccomender systems from DeepMind.

2022-12-01 - FASLIP

Location: CO350, Time: Thursday 14:00 - 15:00, Attendees: Bin Tang, Jesse Wood, Bach Hoai Nguyen, Qi Chen, Andrew Lensen, Peter Andreae

(Joseph) Binh Tang gave a presentaion on “Greedy algorithm for the discounted knapsack problem (DKP)”.

Notes:
  • Discounted Knapscak problem (DKP):
    • N-groups of items.

    • Each item has a value and cost.

    • We muse selecect a list of items to maximize value.

    • Discount - certain groups of items have a discount when purchased together; this creates complex feature interactions (NP-hard!!!).

  • All related work: GR-DKP, NROA, D-GROA; all these methods use an H list of 3n items sorted in descinding order by their value/cost ratios.

  • Tran proposes a binary representation, state transition, contribution-based value-cost heyristic.

  • Borda count, a family of poisitional voting rules which gives each candidated, for each ballot, a number of points corresponding to the number of candidates ranked lower.

  • Consensus (not majoritarian) - the borda count is intended to elect boradly acceptatable options, rather than those preferred by the majority.

  • Proposed algorithm: operation-based greedy algorithn for DKP.

Jesse Wood (me) gave a presentation on “Automated Fish Classification”. Here is the feedback from my supervisor and questions that were asked.

Questions:
  • Is this the first application of ML to gas chromatrography on fish oil?

  • Why use gas chromatography (why not computer vision)?

Feedback:
  • Too much focus on gas chromatrography.

  • Move PSO slide to the section on feature selection.

2022-12-02 - ECRG

Location: CO350, Time: Friday 15:00 - 17:00, Attendees: Yifan Yang, Yi Mei, Jordan MacLachlan, Hengzhe Zhang, Jesse et al.

Yifan Yang gave a presentation on adaptive mutation techniques for dual-tree genetic programming for DCDWSC.

Notes:
  • Task: deadline-constrained dynamic workflow scheduling in cloud computing (DCDWSC).

  • Adaptive mutation techniques for dual-tree genetic programming (DTGP).

  • Virtual Machine Selection Rules (VMSR).

  • Task Selection Rule (TSR).

  • Example: A problem instance contains 6 VM types and 30 random workflows.

  • Mutation success ratio \(\hat{\Phi}\), is ratio of success mutations over total mutations \(\Phi = \frac{S}{S+T}\) where \(S\) is the number of successful mutations and \(T\) is the number of total mutations.

  • 1/5 rules, increase \(\alpha\) the mutation deviation \(\hat{\Phi}\), when \(\hat{\Phi} < \frac{1}{5}\). Increase \(\alpha\) if \(\hat{\Phi} > \frac{1}{5}\).

  • Control the mutation deviation based on the mutation success ratio.

  • Loss function is the log likelihood, see (Goodfellow 2016) for derivation of the log likelihood from maximum likelihood formula.

2022-12-05 - AJCAI #01

Location: Freshwater Bay, Hyatt, Perth, Time: Monday 9:00 - 12:00, Attendees: Michael Stewart

A practical guide to knowledge graph constructions from technical short text Michael Stewart michael.stewart@uda.edu.ac

Notes:
  • A Google colab notebook for this workshop is available here https://colab.research.google.com/drive/1JK2UIb0XZwtLyduGUwNXL-fXjOdqekNk

  • A knowledge graph is a type of grapb databasee that captures knowledge/information about objects, entites and relationships.

  • Echinda: an example of a knowledge graph that has been applied to the domain application of maintanence.

  • IBM, LinkedIn, AirBnB, Walmart, Ebay, Volvo - all companies that employ knowledge graphs in the real-world.

  • Wikipedia, freebase, YAGO, semantic scholar - examples of general purpose knowledge graphs.

  • “The search for information takes 14-20% of an engineers times” - Deloitte.

  • Knowledge graphs serve as a hyperspace route (Star Wars) that link discrete places (information spaces) together.

  • Hyperspace routes https://static.wikia.nocookie.net/starwars/images/f/f2/GalaxyNEC.jpg

  • Flair is a deep learning library for natural language processing. https://github.com/flairNLP/flair

  • In the tutorial we aim for the knowledge step.

  • Pyramid of knowledge https://mitosystems.com/wp-content/uploads/2012/07/PyramidSqare.png

  • NLTK is the best tokenizer to use in Python, it uses machine learning libraries to perform smart tokenisation. https://www.nltk.org/

  • Terminology:
    • corpus - a body of text.

    • tokenisation - split text into smaller units.

    • vocabulary - set of unique words in corpus.

    • word embeddings - numerical representations of language.

  • Language models:
    • word2vect - word-context predictyion.

    • gloVE - log-bilinear regression model.

    • Fasttext - feed forward nerual network.

    • ElMo - bi-directional LSTM.

    • BERT - bi-directional transformer.

    • Flair - bi-LSTM + conditional random fields.

  • Tasks:
    • Sequence labelling - assigning a label to every tiem in a sequence.

    • Text classification - assign on or more labels to an entire sequence.

  • ML progression:
    • Rule-based approachs to NLP have to be handcrafted, often used in industry, but very time consumung and laborious to maintain.

    • Feature extraction, hidden markov models, conditional random fields, trained on hand-crafted features.

    • Representation learning, bi-dreictional LSTM/Transformers, very little effort/domain expertise needed, they simply need a set of annotated training data.

  • Lexicon normalizer - a simple model that can replace a phrase with its correct form.
    • Gist: used to fix spelling/grammer mistakes.

  • Name entity recognition (NER) - extract entites from short text descriptions and construct nodes from these entities.

  • Labels:
    • B - beginning

    • I - inside

    • O - outside

    • tokens [“michael”,”works”,”at”,”university”,”of”,”western”,”australia”]

    • labels [“B-PER”,”O”,”O”,”B-ORG”,”I-ORG”,”I-ORG”,”I-ORG”]

    • (Note: Copilot wrote this above line purely from context)

  • A good idea to create an “abstract base class” to represent a NER model. This means we can create classes that inherit from this. Each model must implement the base class functions.

  • Relation extraction is the process of definition relationsgips between entites in text. It procides the realtion component of each triple.

  • Query Entity Relation Model - We can mine knowledge, by crafting semantically meaningful queries, on the Entity Relation Model that was constructed from natural lnaguage models.

  • Lexiclean, c tool that can be used to produce synthetic training data rapidly https://github.com/nlp-tlp/lexiclean

  • Cofereference resolution - Automatically determine coreference of words in text.
    • E.g. “Pump is broken. It is not working.”

    • In this example “it” references “pump”. We replace “it” with “pump” so information is not lost in tokenisation.

  • Related works:
    • Community detection

    • Knowledge graph embeddings

    • Reasoning over knowledge graphs

    • Ontologies

    • RDF graphs

    • Entity linkings

  • Related:

Reinforcement learning - an agent which interacts with environment, transition, can be stochastic/deterministic, find a policy to maximize expected return. Hung Le thai.le@deakin.edu.au

Notes:
  • Grid-world:
    • state - state is discrete

    • action - space is discrete

    • reward - nothing or n cheese

    • transition - move agent to grid / update cheese count

  • Choose certain that maximises the value (\(\epsilon\)-greedy algorithm) - bootstrap learning.

  • Memory - Q-table in value learning can be considered as a memory. It remembers how good a state-action pair is on average. The memory is very basic, non-smooth and redundan (Watkins 1992).

  • RL challenge - the optimal policy can be complex. The solution to a task can be very complex policy, the reward is very simple and sprase. It is very difficult to reach the goal state by accident.

  • Human vs. RL - most RL agents still fial compared to human performance. RL agents lack a good memory.

  • Neural networks stoate task-specifc semantic memories.

  • Episodic memory is the memory of everyday events that can be explicitly stated or conjured.
    • e.g. Popped try, bump sound, car analogy from Lex Fridman podcast.

  • Types of memory:
    • Working memory - short-term, quick, RAM.

    • Episodic memory - long-term, quick.

    • Semantic memory - long-term, slow.

  • Thinking fast and slow - Daniel Kahneman https://www.goodreads.com/book/show/11468377-thinking-fast-and-slow
    • System 1 - fast, automatic, emotional, intuitive, associative, unconscious, effortless.

    • System 2 - slow, controlled, logical, rational, analytical, conscious, effortful.

  • Deep Q-Network (DQN) is a type of reinforcment learning algorithm that uses a neural network to apprimiate the Q-value function for a given state-action pair. This allows the algorithm to learn from experience and make decisions based on what it has learned (Mnih 2013).

  • Semantic control:
    • the ability to regular which memories are retrieved and used.

    • This can involve actively searching for specific memories, or suppressing unwanted memories.

    • Important for problem-solving/decision making.

    • Thought to be mediated by prefrontal cortext, involved in executive function.

  • DQN limitations:
    1. Inefficency

    2. Boostrap noise

    3. Reply buffer

  • Episodic control paradigm:
    • key-value table, key \(\to\) experience, value \(\to\) returns.

    • Directly binds epxerience to return, returns to experiences that have high return to make decsions.

  • They propose a hybrid of episodic + semantic memory.

  • Model-free limitations:
    1. Near-deterministic assumption

    2. Sample inefficiency

    3. Fixed tradeof of episodic + semantic

  • RNN varients: RL-LSTM, RL-MEM

  • Distraction - naviagation with distraction, a model that requires “semantic control”, it must learn to ignore noise in the benchmark dataset (Hung 2019).
    • working memomry methods always perform better for these benchmarks.

  • Exploration limitations: Rewards can be very sparse, RL agents cannot learn anything until they collect the right reward. Should it explore randomly forever?

  • Efficient exploration methods: need sampling-efficiency in the exploration of RL agents/ Don’t just randomly walk through the search space.

  • Intrininsic motivation, a hunch, gut, desired. E.g. humans pursue purpose. Agents can be motivated towards interesting consequences.

  • Novelty - a state is boring if it can reached from working memory - maximize entropy (finroamtion gain), punish repition.

  • Deep Mind have a maze benchmark dataset for reinforcement learning.

  • First return, then explore (Ecoffet 2021). Jump between states in working memory, and explore from those states that are promising.

  • Continious action - MuJoCo benmarch, is that funny simulation of robots struggling to walk, the janky.

  • Trust-region is better than naive policy gradient.
    • Trust-region - if an adequate model is found within the trust region, then the trust region is expanded; conversely, if apprimixation is poor, than the region is contracted.

  • Limitations: stuck in local optima? Store many past policy and rely on all of them. Backup trust region from virtual policy.

Related:

2022-12-07 - AJCAI

Location: Ballroom, Hyatt, Perth, Time: Monday 9:00 - 12:00, Attendees: Tim Miller

A talk from Tim Miller

Notes:
  • TODO Fill out this stuff!

Related:

2022-12-15 - Weekly

Location: CO352, Time: Thursday 14:00 - 15:00, Attendees: Mengjie Zhang, Bach Nguyen, Bing Xue, Jesse

Notes:
  • Presentation - Bing thought my presentation was really good. It showed passion in the subject. Made the topic interesting to the aduience, then zoomed in on the technical details.

  • Plan to discuss my presentation at the ECRG meeting.

  • Don’t send Meng too many emails over the break. But can email Bach/Bing over the university shutdown, from December 20th to January 10ths.

TODO:
  • [x] Send Bing a copy of my slides.

  • [x] Continue writing my proposal.

  • [x] Write research objectives for my proposal.

  • Bach’s proposal feedback:
    • [x] Focus on problem from AIML perspective,

    • [x] Write a summary on my tree GP results. Analyze these results, compared to existing work.

    • [x] State equivalent terms from chemistry upfront (i.e. Table or Glossary).

2022-12-15 - FASLIP

Location: CO350, Time: Thursday 14:00 - 15:00, Attendees: Quinyu Wang, Mengjie Zhang, Bach Nguyen, Bing Xue, Jesse

Notes:
  • Qinyu Wang gave a presentation for her proposal, the topic was Genetic Programming for fine-grained image classification.

  • Fine-grained - distinguish different objects with a subclass of a aparent class. I.e. Dog species classification where all dogs are variations the dog species. Or birds, where all birds are variations of the bird species.

  • Difficulties - large number of classes and limited number of instances - low sample complexity.

  • Proposed method - automatically prepocess, flexible feature extraction.

2022-12-15 - ECRG

Location: CO350, Time: Thursday 14:00 - 15:00, Attendees: Neil, Bing Xue, Yi Mei, Mengjie Zhang, Bach Nguyen, Jesse

Notes:
  • Neil, Bing and Yi have been awarded a fellowship in Engineering New Zealand. We celebrate this in today’s ECRG.

  • This is the final formal week of the 2022 year, we will resume business as usual in mid-January, Wednesday 2023-01-11.

  • We should not be on-site during the university shutdown, need to email security if we plane to be onside.

  • Nobody email Meng, he needes a break during the university shutdown, from December 20th to January 10th.

  • Do not email Bing until after the New Year, supervisors need breaks too!

  • Progress:
    • Attended AJCAI 2022

    • Presented paper/poster

    • Presentation went well

    • Start writing my proposal

    • Submit proposal by mid-Jan

  • Participation award - Andrew mentioned that I attened the entire conference, the most attendence in the group.

  • Conference venues:
    • Next year AJCAI 2023 will be held in Brisbane

    • AJCAI 2024 is held in Melbourne

    • IJCAI 2025 will be held in Shanghai.

  • Andrew was right! No one believed Andrew whenhe said all buses were free in the West Perth area. Instead we walked for 40 minutes to our hotel.

2023-01-12 - Weekly

Location: C0350, Time: Monday 12:00 - 13:00, Attendees: Bach Nguyen, Bing Xue, Mengjie Zhang, Jesse Wood

Notes:
  • I should have a draft for my proposal, share as a PDF with supversirs, should expect a lot of feedback for the first revision.

  • Overleaf - Make the proposal on Overleaf available to Bing, Bach and Meng.

  • Chemistry - Share with Daniel, after a few rounds of revisions from supvervisors, not to waste his time. Daniel can help with introduction/background chapters on chemistry domain knowledge.

  • Ohope - I leave for my holiday on the Wednesday 2023-01-18. Submit a draft to supvervisors, then see if we need to have a meeting afterwards.

  • Write! - My supvervisors expected the first or second chapter to be complete by New Year. I have set the goal of completing a first draft of the entire document before my holiday. Now it is time to write, write, write!

TODO:
  • Proposal
    • [x] Introduction

    • [x] Background

    • [x] Contributions

    • [x] Milestones

    • [x] Thesis Outlines

    • [x] Resources

  • [x] Write research objectives for my proposal.

  • Bach’s proposal feedback:
    • [x] Focus on problem from AIML perspective,

    • [x] Write a summary on my tree GP results. Analyze these results, compared to existing work.

    • [x] State equivalent terms from chemistry upfront (i.e. Table or Glossary).

2023-01-12 - FASLIP

Location: CO350, Time: Thursday 14:00 - 15:00, Attendees: Shanshan Tang, Bing Xue, Mengjie Zhang

Notes:
  • Topic - Research on mutli-task learning algorithms for Alzheimer’s disease prediction.

  • Alzheimers is a gradually progressive neurodegenerative disease affecting cognitive functions, without a cure, early detection can prevent worsening.

  • Cognitive tests:
    • MMES - mini mental state examiniation

    • CLOCK - clock drawing test, i.e. draw an analog clock

  • Neural imaging:
    • MRI - magnetic resonance imaging

  • Tasks:
    1. Identify relevant biomarkers.

    2. Predict cognitive scores from MRI.

  • MRI \(\to\) extraction \(\to\) features \(\to\) cognitive scores

  • Mutlitask regression - loss function \(\min_W \frac{1}{2} ||y_j - x_j w_j ||^2_2 + \lambda R (W)\)

  • Regularization term has correlation matrix, not a simple sum up.

  • Assumptions:
    1. All tasks related

    2. no outliers

    3. Tasks have group structures

    4. Relationship is not symmetric - i.e. directed acyclic graph (DAG)

2023-01-13 - ECRG

Location: CO350, Time: Friday 14:00 - 15:00, Attendees: Bing Xue, Yi Mei, Mengjie Zhang, Bach Nguyen, Jordan, Hayden, Hengzhe, Jesse

Meng gave his talk on “How to write a paper?”

Example Format:
  1. Abstract

  2. Introduction

  3. Background

  4. Proposed Method

  5. Results & Discussion

  6. Further Discussion

  7. Conclusion

  8. Acknowledgements

  9. References

Notes:
  • If you use research methodlogy another paper. Dicuss this with your supervisors, see if their evaluation methods are justified with theory!

  • To write our ideas clearly, is vey difficult.

  • “If you can’t explain it to a five-yeard-old, you do not understand it.” - Feynman Technique.

  • When we write, we organize our thoughts and iron out contradictions, misconceptions and mistakes - JBP.

  • A thesis/journal paper will require a liteature review, identify gaps in the existing literature.

  • See journal for table on literature gaps.

  • Novelty is often the first acceptance criteria for a paper.

  • For writing a thesis, the audience is general computer scientist. We need to attract people’s attention to read the paper! Hook, line, sinker!.

  • Application area, neighbouring works have implemented similar techniques, this paper is a novel application of these methods in this field.

Explained:
  1. Abstract

  2. Introduction
    • General statement (scope)

    • Why (motivation)

    • What to do? (have done)

    • How to do? (proposed method)
      • Goals/aims/objectives/research objectives

      • Hypothesis/contributions (contributions are only positive)

      • Overarching goal - single sentence
        • Coherence; overall goal and objectives/contributions need to match up!

        • Objectives should address the gaps in existing literature, coherant logic, inlimitated limitations in existing work, but only ones relevant to our work are required in the paper.

  3. Background
    • Background (general knowledge) - you will find this in a text book.

    • Related work - 5-10 years work
      • Don’t just list general descriptions of work.

      • Need to identify in detail limitations in their work.

      • Provide motivations for the paper based on related work.

    • Summary - provide coherence between objectives -> references.

  4. Proposed Method
    • Benchmark dataset/simulation

    • Comparison method (competitors)

    • Parameter settings/configurations

    • (Optional) performance measure (if non-standard, i.e. claiming interpretability)

    • Flow-chart / psudeo-code (reproducability)

  5. Results & Discussion
    • Overall results

    • Abalation studies

    • Quantitative, e.g. tables, evolutionary process, numbers

    • Qualitative, e.g. interpretability of model, graphs, visualization

  6. Further Discussion
    • Unusual results:
      • Many - seperate sections.

      • One - a single subsection

  7. Conclusion
    • Not a sumary!

    • Have goals been achieved?

    • Something new!

  8. Acknowledgements

  9. References

2023-01-20 - ECRG

Location: CO350, Time: Friday 15:00 - 17:00, Attendees: Andy Song, Bing Xue, Yi Mei, Mengjie Zhang, Bach Nguyen, Jordan, Hayden, Hengzhe, Jesse

Andy Song from RMIT gave a talk “Fast Evolutionary Neural Achitecture Search with Performance Predictors” https://www.rmit.edu.au/contact/staff-contacts/academic-staff/s/song-associate-professor-andy

Notes:
  • Neural Architecture Search (NAS) ofteb requires a high computation cost to evaluate candidate networks from the search space.

  • That cost can be mitaged by performance prediction so there is no need to evaluate every candidate network.

  • Evaluation:
    • NAS-Bench-201

    • Darts

  • AmoebaNet is a representative evolutionary NAS method from (AAAI 2019).

  • Training a predictor can be difficult, requires a large number of evaluated architectures.

  • Song proposes a novel evolutionary based predictor-assisted e3volutionary NAS (Pre-NAS).

  • Pre-NAS leverages new evolutionary search strategies and integrated high-fidelity weight inheritance over generations.

  • His most recent work on the topuic has been publiced in TVEC/GECCO (Peng 2022, Peng 2022) are the two papers on Pre-NAS.

  • Key differences between Pre-NAS and EAS:
    • Random mutations

    • mutation: operation/cpnnection

    • representation: cell-based

    • 48 \(\to\) 0.6 GPU days
      • (Improvement in training time using Pre-NAS vs train from scratch!!!)

Related:
  • (Peng 2022) for IEEE TVEC, first paper on Pre-NAS.

  • (Peng 2022) for GECCO, follow up paper on Pre-NAS.

2023-01-30 - Jordan’s Proposal

Location: CO350, Time: Monday 14:00 - 15:00, Attendees: Jordan MacLachlan, Andrew Lensen, Yi Mei, Mengjie Zhang, Bing Xue, Hayden, Hengzhe, Jesse

Jordan gave his proposal seminar for his PhD on “Genetic Programming for Emergency Medical Dispatch”.

Notes:
  • Vehicle types (ascending in urgency level):
    1. Patient transfer

    2. Ambulance

    3. Intensive care vehicles

  • Emergency Medical Dispatch (EMD)

  • Multi-agent, collaborative, non-stationary, multi-response.

  • \(C(i) = \frac{1}{n} \sum^{n}_{t=1} (\tau_i^a - \tau_i^r) omega_t\)

  • Weighted average response time is \(\frac{\varphi_i}{(\tau_t^i)^2 \Theta_t}\)

  • Genetic Programming Hyper Heuristic (GPHH) is the proposed method to solve (optimize) the objective (fitness) function above.

  • GPHH provides more interpretable solutions than reinforcement learning (RL) techniques.

  • Dispatch methods:
    1. reactive

    2. global

    3. pro-active

  • Don’t need to manually design handcrafted rules, just need to identify important features, and feed to a GPHH.

  • Consider, differential and federated learning for preserving patient privacy.

  • [x] Give Jordan “Data Science” - MIT Press book, for Privacy and Ethics chapter, to read.

2023-02-02 - Weekly

Location: CO355 (Meng’s office), Time: Thursday, 12:00 - 13:00

Notes:
  • Plant and Food Research (PFR) are having a workshop on Tuesday 9th of February (2023-02-09). The workshop is being held in Lower Hutt.

  • Collaboration between:
    • Plant and Food Research

    • Callaghan Innovation

    • Victoria University of Wellington

    • Otago University

    • Deaqin University (Victoria, Australia)

  • Domains:
    • Chemistry

    • Biology

    • Machine Learning

  • Presentation - AJAI 2022 presentation should be sufficient for PFR workshop, can add preliminary results from proposal, if time allows.

TODO:
  • [x] Send supervisors draft for Chapter 4 - Prelminary work

  • [x] Chapter 2 - Literature Review

  • [x] Chapter 3 - Contributions

  • [x] PRF Workshop Presentation 2023-02-09

  • [x] (Optional) PFR Workshop + Preliminary Work

2023-02-02 - FASLIP

Location: CO350, Time: Thursday 14:00 - 15:00, Attendees: Hengzhe Zhang, Mengjie Zhang, Bing Xue, Yi Mei, Jordan MacLachlan, Andrew Lensen, Hayden Anderson, Jesse

Hengzhe gave a talk about “GP-based Ensebmle Tips and Tricks”.

Notes:
  • Motivations:
    1. Generate diverse set of solutions in single run.

    2. Ensemble learning avoids overfitting.

    3. Mult-tree GP much better than single-tree GP.

  • Protected division is non-differentiable, numerical instablity, use analytical quotient instead \(AQ(x,y) = \frac{x}{\sqrt{1 + y^2}}\)
    • For gradient descent methods, consider using analytical quotient over protected division.

  • “Mutli-tree GP is unbelievably good” - Hengzhe Zhang 2023

  • Boosting theory tells us weak learners are not weak.

  • Chinese proverb - “Two heads are better than one.”

  • ^ This relates to the strenfth of boosting / multi-tree GP methods.

  • (Hengzhe 2021) An Evolutionary Tree for Regression

  • Diversity is important for weak learners in an Ensemlbe model, if the populaiton is not diverse enough, the Ensemble will overfit.

  • Lexicase selection > tournament selection

  • (Zhang 2023) SR-Forest: A Genetic Programming-based Hetergenous Ensemble Learning Method with Symbolic Regression. IEEE TVEC.

  • Heterogenous ensemble refers to an ensemble with classifiers from different tribes of AI (Domingos 2015), e.g. a mix of Random Decision Trees (RDTs) and Linear Regression (LR) trees.

  • Conversely, a Homogenous ensemble refers to an ensemble with classifiers that are identifical, e.g. 5x random decision-trees (RDTs).

Related:
  • (Hengzhe 2021) An Evolutionary Forest for Regression, TVEC 2021 paper.

  • (Hengzhe 2023) SR-Forest, an upcoming TVEC 2023 paper.

  • (Domingos 2015) The Master Algorithm - book from Pedro Domingos.

2023-02-03 - ECRG

Location: CO350, Time: Friday, 15:00 - 17:00, Attendees: Mengjie Zhang, Bing Xue, Yi Mei, Huia Ma, Andrew Lensen

General notes:
  • Jordan passsed his proposal seminsar that he gave on 2023-01-30 - Jordan’s proposal.

  • GECCO abstract registration is due on Sunday 1AM, Huia suggests jokingly we use ChatGPT to synthesize a fake but convincing abstract, and submit it.

  • “Hi, I am Jesse, I am a PhD student… not candidate” - Meng believes that there is an important distinction between students who have, and have not, passed their proposal seminar.

  • Progress report (me):
    • Finished preliminary work

    • Writing Chapter 3 - Contributions

    • PFR workshop - Thursday 2023-02-09

  • Peng Wang, https://ecs.wgtn.ac.nz/Main/GradPengWang, has his last week as a PhD student at ECRG, next week he will join the post-doc faculty.

  • Huia Ma, and Andrew Lensen, are happy for me to tutor for the SWEN304 - Databases course, I should apply online!

Jiabin Lin, https://ecs.wgtn.ac.nz/Main/GradJiabinLin, have a talk on “AMTEA-based Multi-task optimiztion for multi-objective feature selection in classification”.

Notes:
  • AMTEA - An evolutionary sequentail transfer learning method which addresses a target task by the help of knowledge learned from a source domain.

  • Multi-task - multiple tasks are addressed simultanesouly!

  • Online learning - online update of knowledge pool for multi-task.

  • They build a mixture model, see (Goodfellow 2016), mixture models includes probability distributions from each sub-task of the multi-task problem.

  • Factored bernoulli distrubution, (Goodfellow 2016), is used to describe the join probability distribution model.

  • Competitors:
    • SPEA2

    • SPEA2-FSMTO

  • Wilcoxon Signed Rank Test (W-test) for statisitical signficance of classification results.

  • Expectation-maximisation (EM) is used to fit the mixture model, see (Do 2008).

  • ChatGPT (OpenAI 2022):
    • Jesse: “Explain a mixture model to a five year old, using toys and colours as examples.”

    • Expectation: “A mixutre model in statistics is a way to descibe a set of data that comes from more than one type of underlying distribution.

    E.g. Imagine you have a box of toys that contains balls of two colours, red and blue. If you draw a toy from the box, it can be either red or blue. This is like having two sperate “mixtures” in the box that are mixed together to form an overall colletion. The mixture models help us understand the proportion of each colour in the box, as well as how they are combined.”

  • Transfer Learning (in principle): There exists some transferable skills between these tasks, the target and source, and the source may be easier to learn.
    • e.g. Riding a bike, and a motorcycle. Someone who can ride a bike, may be able to ride a motorcycle with less training.

Related:

2023-02-08 - Deep Learning

Location: Zoom, Time: Wednesday, 15:00 - 16:00, Attendees: Bastiaan Kleign

Notes:
  • Particles - in the stochastic situation, a particle movement is equivalent to the diffusion process of real-world particles in a gas, where the aprticles converge to an equilibrium.

  • I sent (Wood 2022) to Bastiaan, which cited (Karras 2022) paper, in September 2022.

  • \(dx = (f(t)_x - \frac{1}{2} g(t)^2 \nabla_x \log P_t(X)) dt\)

  • Evolution of \(P_t(X)\) is governed by the Fokker-Planck equation.

  • For initial \(X(0)^t\), the forward solution of Fokker-Planch can be written as:

  • \(P(x(t) | x(0)) = \mathcal{N} (X(t);s(t)x(0),s(t)^2\sigma(t)^2 I)\)

  • Physics analogy: Deforum art diffusion-based video generation uses (Karras 2022), which is absed on diffusion models, which modles a real-world process of diffusion of gas in particle physics.
    • Perhaps, understanding the underlyinh physics theory behind ideas that have been grafted into AIML, is easier than launching straight into the differential calculus.

  • Denoiser - MMSE estmitaor is the best L2 norm, the best denoiser for Rembrant paintings, we can’t dinf a perfect denoiser, but can approximate this by training a neural network to do so.

Related:

2023-02-08 - Callaghan Innovation Workshop

Location: Callaghan Innovation, Lower Hutt, Time: Thursday, 9:30 - 16:00, Attendees: Sue Marshal (PFR), Daniel Killeen (PFR), Mengjie Zhang, Bing Xue, Qi Chen, Amir

Callaghan Innovation, https://www.callaghaninnovation.govt.nz/, hosted an industry/academia workshop to share progress towards the Cyber-Marine project.

Collaboration between:
  • Plant and Food Research

  • Callaghan Innovation

  • Victoria University of Wellington

  • Otago University

  • Deaqin University (Victoria, Australia)

Domains:
  • Chemistry

  • Biology

  • Machine Learning

Daniel Killeen daniel.killeen@plantandfood.co.nz gave a presentation the first two research objectives of the Cyber-Marine program.

Notes:
  • Variable biomass - a variety of species coming in, in heterogenous batches of species. A unique challenge for New Zealand fisheries, as opposed to other countries with homogonous catches of one species, i.e. Salmon in Norway.

  • Flex-factory - the aim is to build the flex-factory, irrespective of biomass coming in, optimize the biomass coming out.

  • Compositional refernce data is “time-sensitive”. It is two expensive to analyze the biomass using traditional methods. We seek to optimize/automate the analysis of the biomass.

  • Data fusion - combine data streams from multiple streams of spectroscopic data for better predictions, analogous to multi-modal machine learning.

  • Homogenous wet biomass - Traditionally, chemistry techniques are applied to dry powders. However, we are working on raw-fish, a homogonized wet biomass, a.k.a. raw fish that is unprocessed, this is state-of-the-art.

  • Reference data - at the moment there is literally no reference data, last study was done in 1980s, need accurate profiles for NZ biomass.

  • Direct infusion HRMS - direct infusion high resolution mass spectrometry (HRMS), is a benchmark to compare the REIMS dataset.

  • Future dataset - all the models we build for the REIMS/HRMS can be applied to future datasets when PFR have created them.

  • Hoki-Mackeral - hoki-mackerl compositions, only four or five papers published on these datsets.

  • Bycatch - the New Zealand government plans to introduce new legislation that would require all biomass to come in, this would increase the heterogenity of mixes of species in biomass, more mixed species, increasing the challenge of creating value from the incoming biomass in the flex-factory.

  • Anomoly detection - given the variability of incoming biomass due to bycatch laws, we need a robust AIML model, that can identify species outside of the training data entirely, catch all anomlies in a mixed bag of bycatch - a robust model to handle heterogenity.

  • Limitations - the REIMS/HRMS project is about finding the limitations of the ambient mass spectrometry techniques for real-time rapid detection of raw homogenous wet biomass.

  • The Hodge-podge, a uniquely NZ problem - heterogenous species with batches of hodge-podge catches. This is a quniuely NZ problem, Canada is mostly homogounous, i.e. single species of Salmon.

  • Seasonal variation - Make sure the mdoels are learning semantically meaninfgul \(\approx\) chemical specific, not batch-species noise, as NZ marine biomass has a seasonal variation, which leads to data drift.

  • Online/transfer learnig - Have an AI model that can grow with new datasets, online learning to cope withs easonabl variationa and data drift. Transfer learning approach to share knowledge between related tasks.

Kirill Lagutin, a Principal Research Scientist from Callaghan Innovation https://www.rd.callaghaninnovation.govt.nz/about-us/our-people/page-33/, gave a presentation on hoki-mackeral compositions.

Notes:
  • Labels on food products have:
    • Serving information

    • Calories

    • Nutrients

  • Chemistry things they look for:
    • Lipid yields

    • Lipid classes

    • Fatty acids

    • Phospholids (big bucks $$$ - 15x increase in price per capsule)

  • Hoki suck out all their lipids at put them into their eggs, when they spawn, they are left a wreck after this. Important to note, seasonal variation in chemical composition of Hoki, due to spawning process.
    • I.e. we cant rely on lipids for detection of Hoki species, or else a spawning Hoki would be missed by an AI model.

  • Def. “Spawn”,
    • ChatGPT: “In the context of hoki fish, “spawn” refers to the process of laying and fertilizing eggs. Hoki are a type of saltwater fish that belong to the cod family and are found in the cold waters of the southern Pacific Ocean. When hoki are ready to reproduce, they release their eggs and sperm into the water, where the eggs are fertilized and begin to develop into new hoki fish. This process is known as “spawning.”

  • Fatty Acids:
    • SUFA - Saturated fatty acids

    • PUFA - Polyunsaturated fatty acids

    • MOFA - Monounsaturated Fatty Acids

  • Fatty acids explained by ChatGPT:
    • SUFA, PUFA, and MUFA are acronyms used to classify different types of fatty acids based on their molecular structure.

    • SUFA stands for Saturated Fatty Acids, which are fatty acids that have no double bonds between the carbon atoms in their molecular structure. They are typically solid at room temperature and are known to raise LDL cholesterol levels (the “bad” cholesterol), which can contribute to the development of cardiovascular disease.

    • PUFA stands for Polyunsaturated Fatty Acids, which are fatty acids that have more than one double bond in their molecular structure. These double bonds cause the fatty acid molecule to bend and kink, making it less likely to pack together and more likely to remain liquid at room temperature. There are two main types of PUFAs: omega-3 fatty acids and omega-6 fatty acids. These are considered to be “good” fats and are important for overall health, as they provide a number of health benefits, including reducing inflammation and improving heart health.

    • MUFA stands for Monounsaturated Fatty Acids, which are fatty acids that have only one double bond in their molecular structure. These double bonds cause the fatty acid molecule to bend, but not as much as in PUFAs. MUFAs are liquid at room temperature and tend to be more stable than PUFAs, meaning they are less prone to oxidation and less likely to contribute to the development of chronic diseases. Foods high in MUFAs include olive oil, avocados, and nuts.

    • It is important to note that all three types of fatty acids are needed in the diet, but in different proportions. A healthy diet should include a balance of all three types, with an emphasis on PUFAs and MUFAs, and limited amounts of SUFAs.

  • Hoki-mackerel mix - the Hoki-Mackeral mix is made from weighted measurements, put ina blender, and then minced.

  • Mackeral (not worth it!) - Mackerel are worth nithing, so the fisherman do not even catch this species to quota. There is no financial incentive for fisherman to catch is.
    • $2 per kilo (sold whole)

    • $8 per kilo (repurposed by flex-factory)

  • Story - the value of the fish in the story, a fish byproduct is being pruposed into fish products at human good grade levels. Re-use, sustainability, save the planet, etc…

  • Bigger variation in the individual hoki species, than the Hoki-Mackerel mix.

Jeremy Rooney gave a presentation on spectroscopy for mutli-variate analysis of a Hoki-Mackerel dataset.

Notes:
  • Cross-validation - low sample size hoki-mackerel datasets, so cross-validation is used to validate the model.

  • Mutli-variate analysis, profile metrics for multiple features of interest for Hoki-Mackerel datset.

  • 3 spectroscopic datasets:
    • FT-Ramen

    • Micro-Ramen

    • InGaAs Ramen

    • FT-NIR

  • “Spectrally meaninful” \(\approx\) “semantically meaningful”, There is a bit of noise, but we want “spectrally meaningful” information, similar to semantically meaningful gas chromatrography from (Wood 2022).

  • Commerical vs research - Commercial measurement methodlogies are very shoddy, they only care about using the standard methods (perscribe in the 1950s), however researchers have more accurated methods (but industry refuse to use them!)

  • For internation trade, the more error prone stand methods are used! They care more about compliance than accuracy.

  • PUFA/MUFA satured fatty acids are identified with spectroscopic methods.

  • Error - measurement techniques a measurement error margin, i.e. FT-NIR, these error margins can be added as box-and whisker or error bars, to demosntrate the uncertainty of the chemistry measurements.

  • Hoki - If you are in the Hoki Season, the incoming species for fish processing factory is mostly homogenous. However, outside of the Hoki season, the incoming species are more heterogenous (e.g. hodge-podge). AI models must be robust, to deal with varying class imbalances.

  • Online vs. offline tradeoff:
    • There exists a tradeoff between speed and accuracy.

    • online:
      • quick/instant

      • low resoution

    • offline
      • 3 days

      • sent away

    • An important terminology, when referring to factory equipment. Wan’t to avoid offline, unless strictly necessary, or provides a significant benefit.

  • Callibration: PFR callibrates the instruments of industry, industry are too lazy to calibrate their measurements themsevles, industry care more about compliance than accuracy, researchers care more about accuracy than compliance.
    • Similar to how a professional vs novice guitarist, a professional will always tune their guitar before playing, whereas a novice will not.

Related:
  • (Wood 2022) performs interpretable/accurate/efficient automatic fish classification on Gas Chromatography data.

Mengjie Zhang gave a presentaiton on the contributions of ECRG to the Cyber-Marine project.

Notes:
  • Our team - five students from VUW who are working on Artificial Intelligence side of things with Cyber-Marine.

  • Topics/areas of interet:
    1. Data processing

    2. Data-driven modelling

    3. Interpreting model features

    4. Mutli-objective optimization / Decision making

  • AJCAI 2023 - I presented with conference, journal, book chapter.

  • Jargon - the chemistry people say variable, the AI people say feature. The terminology can be used inter-changeably, but there are important differences.
    • AI people use the term feature with domain agnosticism, we don’t care / or understand the exact meaning of the feature with respect to the domain. In fact, we would rather not have to, good to build models that don’t require domain exerptise at all, or at least very little.

    • Chemistry people use the term variable. This refers to the domain, and the task at hand. If they are interested in lipids, a variable is a lipid of interest. When a chemist says variable it is inherantly linked to domain-specific knowledge, and means a very specific thing.

  • Center for Data Science and Artificial Intelligence (CDSAI) - VUW has approved the creation of a new center, which will be a hub for AI research and education.

Jesse Wood (look Ma! I’m on TV) gave my AJCAI 2022 presentation at the workshop:

Notes:
  • Identify imporant features with HRMS / GC techniques, then use faster, simpler, detection methods in production.

  • In other words, prototypes are expensive, but proof-of-concept, find what is possible.

  • Production is hard, and we can take engineering shortcuts to make the prototype at scale, cheaper and more efficient.

  • Ask Kirill for a new dataset on Mass Resonance for Vineyards, similar approach to (Wood 2022) for explainable AI models.

  • Daniel proposed 1-2 week visit to Callaghan Innovation, sit down with LCMS, understand the tool inside and out.

  • Domain expertise is time consuming!

  • Related important features back to domain expertise.

  • Multi-modal; both REIMS + HRMS, identify an important superset of features/variables.

  • GC-MS - 3D datacube for classificaiton.

  • Blackbox; chemists can verify the model, build trust in its predictions.

  • Demo; every AI company needs a stunt department, build trust, and encourage academic/industry trust in AI models.

Most important takeaway(s):
  • For adoption of technology, for example AI models, we need models the domain experts can understand and trust.

  • Explainable AI is almost more important than accuracy, for adoption of technology by domain experts in academia and industry.

  • For adoption in industry, there needs to be an economic incentive, accuracy is not enough! There needs to be profits.

2023-02-10 - ECRG

Location: CO350, Time: Friday, 15:00 - 17:00, Attendees: Mengjie Zhang, Bing Xue, Yi Mei, Andrew, Jordan, Hayden, Christian

General notes:
  • GECCO deadline - GECCO deadline is midnight Sunday, big conference for ECRG, important to submit work for this conference.

  • Evening - The ECRG official evening is between 5pm - 8pm, Meng is accepting later revisions by email.

  • Progress report:
    • Callaghan Innovation - Industry workshop.

    • Domain experts in chemistry fields care more about interpretability and exlpainable AI, than perfect accuracy.

    • Toy datasets provide good benchmarks for state-of-the-art new methods, but real-wrodl appliacations will have real-world impact.

  • GIS - Jordan is working on converting graphs for EMD to real-wrold maps.

  • Niwa - former CRG sutdent, whose supervisors were Huia and Aaron, works for Niwa now. Still works closely with his former supvervisors. He does work with Niwa’s NESI High Performance Computing (HPC).

Hengzhe Zhang, https://ecs.wgtn.ac.nz/Main/GradHengzheZhang, gave a presentation on “Map-Elites with Cosine-Similarity for Evolutionary Ensemble Learning”

Notes:
  • Ensemble learning can ensure a diverse set of candiate solutions - this ensures that models don’t overfit the training data.

  • A quality-diversity (QD) metric ensures that candidates have both quality and diversity in the populaiton of candidate solutions.

  • Multi-dimensional archive of phenotypic elites (MAP-elites)

  • (Mouret 2015) is the paper that proposed MAP-elites.

  • Dimensionality reductiont techniques:
    • Domain knowledge

    • Variational-autoencoders (VAE)

    • Cosine-based similarity

    • t-SNE, PCA, KPCA

  • Cosine-based PCA (KPCA) is better than euclidian distance, and t-SNE, for a large number of datasets, no worse for others.

Related:
  • (Mouret 2015) is the paper that proposed MAP-elites for diversity.

2023-02-13 - Deep Learning

Marcus Frean, https://people.wgtn.ac.nz/Marcus.Frean, gave a presentation on “Bayesian Linear Regression and Biological Neurons”.

Notes:
  • Bayesians track a distribution of parameters, \(p(W) \to p(W | \mathcal{D})\) and predict by integrating over that distribution.

  • Generally, bayesian updating is ahrd. But it’s easy fo rthe right (“conjucate”) combinations of prior and likelihood, i.e. when the functional forms of prior and likelihood are such that the posterior takes the same form as the prior.

  • For example:
    • Gaussian prior, and

    • Linear/Gaussian likelihood

    • => Gaussian posterior

  • Gaussian-Linear sweet spot - with a gaussian prior on weights \(P(W) = \mathcal{N} (w|m_0v_0)\) and a lienar prediction with gaussian noise \(\mathcal{N} (y|w.x, \sigma^2_{noise})\), linear algebra shows the posterios over weights is also Gaussian with updated precision and mean:
    • precision: \(V^{-1} = v_0^{-1} + \frac{1}{\sigma^2} X^TX)\)

    • mean: \(m = V^{-1} (v_0^{-1}m_0 + \frac{1}{\sigma^2} X^Ty)\)

  • RVFL - Random Vector Functional Link - a neural network with random weights.

  • RVFL, a.k.a. “extreme learning machine”, a random hidden layer with non-linearity.

  • Reparameterization trick - Marcus uses a similar method as the non-linear SVM (Cortes 1995), the reparameterization trick, to use a kernal to project the data into a higher dimensional space.

Related:
  • See (Cortes 1995<cortes1995support>) for the non-linear SVM, and the reparameterization trick.

2023-02-16 - Weekly

Location: CO352, Time: Thursday, 12:00 - 13:00, Attendees: Bing Xue, Mengjie Zhang, Jesse Wood

Notes:
  • Proposal page limit - 40 pages + references - although ECRG is known for going over the page limit.

  • Introduction - my introduction is ar too breif. I need to expand on this a lot more. This section is usually 10+ pages. I should briefly cover the contents of each section, and/or, include a table of contents.

  • Write - I need to write, the 1 ear deadline for the proposal submission is rapidly approaching, 2023-03-01, and I need to submit before this deadline.

  • Daniel and Bach send their apologies for not being able to attend the weekly meeting. Daniel is busy, and Bach just got back from overseas.

  • There was no weekly meeting last week, due to the PFR Workshop<2023-02-08 - Callaghan Innovation Workshop>.

  • Story - Although, after the PFR workshop, I can definitely see the forest for the trees. I ee the ‘story’ if the Cyber-Marine flex-factory, and its niche in the New Zealand fish processing market.

  • Outliers - PFR emphasized the robustness required, our AIML models need to handle out-of-distribution data, that is to identify classes that are possibly not even in the training data. See (Di 2009<di2019survey>) for anomoly detection using GANs, similar to (Black 2019<black2019rapid>) where thresholds are established for unknown outliers.

  • I have ordered my contributions as four tasks, in ascending order of increasing difficulty. These all related directly to domain specific problems in fish processing. AIML techniques of incerasing complexity will likely be required to solves these problems as their difficulty increases.

Related:
  • See PFR Workshop

  • See (Di 2009) for literature survey on GANs for anomoly detection

  • See (Black 2019) for outlier detection with variance thresholding

2023-02-16 - FASLIP

Location: CO350, Time: Thursday, 14:00 - 15:00, Attendees: YiBin Sun, Heitor Murilo Gomes, Mengjie Zhang, Bing Xue, Jordan MacLauchlan, Hengzhe Zhang, Jesse Wood

Notes:
  • YaBin Sun, https://www.researchgate.net/profile/Yibin_Sun3, is a PhD candidate from Waikato University, one of his supervisors is Heitor Murilo Gomes, http://heitorgomes.com/

  • Datastream - “Data like a stream” - an apt metaphor to describe the nature of a digital stream of data.
    • We may only have access to an instance for a short period of time, or once.

  • 3Vs
    • Volume - the amount of data

    • Velocity - the rate at which data is generated

    • Variety - the different types of data

  • Concept drift - the change in the underlying distribution of data.

  • Types of concept drift:
    • Virtual drift: shift in distribution that does not effect decision boundaries.

    • Real drift: shift in distribution that does effect decision boundaries.

  • Types of drift:
    • Sudden

    • Gradual

    • Incremental

    • Reocrruing - seasonal variation is an example.

  • Concept drift detection:
    • Statistical analysis

    • Performance Metrics

    • Drift Detection Algorithms

  • Drift detection algorithms:
    1. Page-Hinkley

    2. ADWIN

  • Fast Incremental Model Tree with drift detection (FIMT-DD)

  • Adaptive Random Forest for Regression (ADR-Reg)

  • Self optimizing k-nearest leaves (SOKNL)

Related:

2023-02-17 - ECRG

Location: CO350, Time: Friday, 15:00 - 17:00, Attendees: Mengjie Zhang, Bing Xue, Hui Ma, Hayden, Christian, Jesse

General notes:
  • Teaching starts:
    • AIML 427 - Data Processing

    • COMP 307 - Artificial Intelligence (tutor)

  • An A+ in COMp307 means a student can launch into a PhD / Summer Research / Masters project, according to Meng.

  • Human feedback - a human study provides qualitative/quantitative feedback on human interactions with an AIML model, good for claiming interpretability in explainable AI (XAI) research.

Hamish O’Keeffe okeefhami@myvuw.ac.nz<okeefhami@myvuw.ac.nz> gave a talk on CNN-based Instance Segmentation for the Assessment of Mussel Harvests.

Notes:
  • Application-based research, not a novel technique as in academia, but application of existing techniques in exciting new ways.

  • Computer vision tasks (increasing order of difficulty):
    1. Image classification

    2. Object Detection

    3. Segmentation

    4. Semantic Segmentation

    5. Instance Segmentation

  • Instance segmentation - combines object detection and semantic segmentation.

  • Panoptic segmentation - combines semantic segmentation and instance segmentation, “again things and stuff” – Hamish

  • Panoptic segmentation is used in the real-world application of self-driving cars, e.g. Tesla FSD.

  • Techniques:
    1. YOLACT

    2. CenterMask

    3. Anchor Free

    4. R-CNN

    5. Blend Mask

  • Augmentation: For datasets of small size, we can use augmentation to artificially increase the sample complexity of the image dataset.

  • Average precision (?) : The average precision of the model was evaluated for the mussel dataset. An average taken from Area Under (AUC), and thresholds \(\in \[ 0.5, 0.95 \]\)

  • Framerate - 2.5 - 3.5 seconds per frame, Mussel Havervest Assessment. Slow, but still faster than manual assessment by human eye.

  • Online learning - in deployment for mobile devices users should be able to identify edge-cases and send them back to be part of the new training data, similar to Tesla’s FSD and long tail of computer vision, i.e. edge cases.

  • Standardization - checkered background to standardize the photos, also useful for measurements in future applications.

2023-02-22 - Deep Learning

Location: Zoom, Time: Wednesday, 15:00 - 16:00, Attendees: Bastiaan Kleign, Maxwell Clarke, Paul Teal, Amr Mossy, Jesse Wood

Ali Knot gave a presentation on GPT-2 \(\to\) GPT-3

Notes:
  • query: a soft database lookup, that shows you what you should be focussing on, what is related.

  • Colocation, e.g. “pissed off”, erefers to certain words or phrases that co-occur frequently within a given context.

  • Word embeddings - distributed input representations, can be throught of as word emebeddings.

  • Attention - using attention (Vaswani 2017) to repreent an input sentnece, attnetion can be calucated efficiently in parallel.

  • Transformer architrecutre often used stacked attention blocks, with multi-head attention mechanisms (i.e. hydra).

  • Transformer pre-traning is essentially transfer leraning, - semi-supvervised learning to lear general knowledge.

  • Generative pre-trained (GPT) proposed in (Radford 2018) from OpenAI.

  • BERT (Delvin 2018), a Google competitor, to build a general purpose, bidrectional, it can see both ways, left to right.

  • Tasks are often specieid in text, a task specific language model, conditioned to perform tasks.

  • Learn arbitrary tasks simply by being trained on a very large corpus.

  • Assumption that training corpu includes textual descriptions of tasks, along with examples of their performance.

  • (Brown 2020) intoduces in context learning with GPT-3, the model can be taught a new task in the input text passed tp the model.

  • Coined “few-shot” learning without fine-tuning, no further optimization needed, learning from example at inference time.

Related:

TODO - Big ass note catchup needed here!!!

2023-05-03 - Deep Learning

Location: Zoom, Time: Thursday, 15:00 - 14:00, Attendees: Bastiaan Kleign, Maxwell Clarke, Paul Teal, Amr Mossy

Bastian presented a talk on the paper (Song 2023) on consistency models.

Notes:
  • Consistency models (Song 2023) - new family of generative models that achieve high sample quality without adversarial training.

  • Cheaper at inference and at least as good.

  • No text embedding in the latent space.

  • Applications; editing images, images \(\to\) painting/drawing, in-painting - filling missing pixels.

  • Denosing score matching from (Vinent 2011) - paper that sows a simple denoising autoencoder training criterion is equivalent to matvching the score (with respect to the data) of a specific energy-based model to that of a nonparametric Pazen density estimator of the data.

  • (Karras 2022) provdes a design space that clearly seperates the concrete deisgn choices for generative models. Before the design was uncessarily convoluted.

  • Original diffusion mdels for text-to-image were (Ho 2020, Song 2020). Papers from (Karras 2022) provide clarity to their design. However, (Song 2023) propose consistency models, as an alternative to diffusion altogether.

  • ControlNet from (Zhang 2023), adds conditional control to existing diffusion models, such as stable diffusion. For example canny edge maps for semantic consistency of edges in generate images.

Related:
  • See (Song 2023) - the paper presented today.

  • See (Vincent 2011) for score matcher

  • See (Goodfewllow 2016) Chapter 16, page 567, for energy-based models.

  • See (Karras 2022) provides a concrete design space for generative models.

  • See (Ho 2020) Denoising diffusion probablistic models (DDPM) - original diffusion paper.

  • See (Song 2020) Denoising Diffusion Implicit Models (DDIM) - faster diffusion process.

  • See (Zhang 2023) ControlNet adds conditional control existing diffusion models, such as stable diffusion.

2023-05-04 - Weekly

Location: CO352, Time: Friday, 12:00 - 13:00, Attendees: Mengjie Zhang, Bing Xue, Jesse Wood

Notes:
  • Sunday, 5 PM deadline, neet to extend the bullet points, send a draft to my supervisors, they will give my feedback on Monday.

  • The Candidate Development Plan (CDP) is due this month, important to focus on the proposal submissionf or this month.

  • The preliminary work (3), Contributions (4), Resources (5), are for administration purposes. Don’t fuss about Literature Review (2) for now.

  • Important to stop adding to proposal, assume knowledge cut-off of today, and write my ideas of then.

TODO:
  • [x] Candidate Development Plan

  • [x] Proposal introduction draft to supervisors

2023-05-10 - Deep Learning

Location: Zoom, Time: Wednesday, 15:00 - 16:00, Attendees: Maxwell Clarke, Bastian Kleign, Felix, Brijesh, Paul.

Notes:
  • The computation graph of a transformer is setup in a way that isi is very efficient to be computed in parallel on GPU clusters.

  • (Zhai 2021) proposed the Attention Free Transformer (AFT), the predecessor to today’s paper, the Receptance Weighted Key Value (RWKV) transformer.

  • RKWV is an pen source language model, pronounced “rwa-kuv”, it is a 7 billion parameter attention free architecture, with hypothetically infinite context length.

  • Live demo RWKV Raven available on Huggine face, see (Peng 2023) for more link.

  • Idea: replace the multi-head attention in a transformer with an Attention Free Transformer (AFT) from (Zhai 2021).

  • Linformer, Reformer, and Linear transformer, (Wang 2020, Kitaev 2020, Katharopoulos 2020) are the inspiration for RWKV.

  • Advantages:
    • RWKV takes ess memory over inference.

    • Run GPT-4 based start-up on cheaper hardware that requires less memory.

    • Democracy of AI technology, availability of LLMs on commodity hardware.

Related:

2023-05-25 - Weekly

Location: CO352, Time: Thursday, 12:00 - 12:30, Attendees: Mengjie Zhang, Bing Xue, Bach Hoai Nguyen, Jesse Wood

Notes:
  • Need to come in early to unviersity to have enough time work on PhD during office hours. Difficult to juggle schedule with tutoring, personal commitments, and PhD, without spending some more time in the office.

  • Candidate Development Plan is due, need to send my supervisors a draft today, so they can sign off on this.

TODO:
  • [x] Candidate Development Plan (CDP) due 2023-05-25 - today!

  • [x] Come in early for PhD work.

2023-05-25 - FASLIP

Location: CO350, Time: Thursday, 14:00 - 15:00, Attendees: Cheng Xie, Bing Xue, Mengjie Zhang, Hengzhe Zhang, Jordan MacLachlan

Cheng Xie from the Harbin Institute of Technology China gave a talk on Particle Swarm Optimization (PSO)-based Neural Architecture Search (NAS) for Physics-Guided Reversible Residual Neural Networks Cheng Xie - https://ieeexplore.ieee.org/author/37089449701

Notes:
  • Turtable servo system - large simulation experimental equipment use to simulate flight altitude.

  • Use advanced control methods to improve trackin performance of servo system:
    1. Model predictive Control - Forward

    2. Supervised control - Inverse

    3. Inner model control - Forward/Inverse

  • (Xie 2023) is a physics-guided reversiable rediedual nerual netwro model: applied to build foward/inverse models for turntable servo systems.

  • A residual neural network (He 2016) is a deep learning model in which the weight layers learn redidual functions with reference to the layer inputs.

  • Defn. a network with skip connectionts that perform identity mappings, merged with layer outputs by addition.

  • (He 2016) proposed ResNet for imace Recognition, the original Resnet paper that won the ILSVRC 2015 classification task. Residual neurons, or skip connetions between layers.

  • Skip connections provide shortcuts for information flow between layers of a nerual network. Skip connections allow a network to better propogage information between layers, which inproves performance overall.

Related:

2023-05-30 - CDP Sign Off

Location: CO352, Time: Tuesday: 09:00 - 09:30, Attendees: Bing Xue, Bach Hoai Nguyen, Jesse Wood

Notes:
  • Bing thinks my CDP is very clear, I have laid out in detalk what needs to be done, and how I will do it.

  • I had personally had “minor concerns” with my progress on my proposal, however, Bing was not concerned.

  • I need to submit my propsoal and move to full registration ASAP.

  • Send a draft of my proposal to my supervisors, and they will give me feedback.

2023-06-02 - ECRG

Location: CO350, Time: Friday, 15:00 - 17:00, Attendees: Pondy’s farewell.

Peter Andreae (“Pondy”) https://people.wgtn.ac.nz/Peter.Andreae

Notes:
  • Peng has officially completed his PhD, as he submitted his manuscript to the school library.

  • Visiting student from the US who is a biology major, doing a PhD in AI, with computer vision for duck identification with multi-modal images + genetic reference samples.

  • Progress report:
    • I have simbitted a revised propsoal introduction to my supversivers.

    • Based on Bach’s feedback to my previous draft.

    • Will revise and resubmit based on further feedback.

  • Passion: Pondy enjoyed his teaching the most, and told me in person it is importaant to show an interest (perhaps even exaggerated) in the topic your are teaching, to make the cirriculum bearable to the students.

  • MIT researcher - Pondy is an MIT Alumni, who did his PhD in symbolic AI in 1986, has since conceded that GOFAI does not work, rather than sit in resentment, he decided to help others with their research, as a non-primary contributing author.

  • “To help others succeed in what they whnt to do, by making them do it better” - another pathway in academia (support role).

  • Pondy warned about specializing too heavily in one field, it is important to have a broad knowledge of AI topics, and even a broad knowlege of science in general, as this can facilitate future collaborations in other fields, you may have not have otherwise known exist!

2023-08-22 - Weekly

Location: CO352, Time: Tuesday, 15:30 - 16:30, Attendees: Bing Xue, Jesse Wood

Notes:
  • 15-month deadline for the preliminary registration, proposal needs to be submitted by then.

  • Proposal is a proof-of-work, or research ability, that demosntrates I am capable of doing a PhD.

  • Routine is important, exercise, sleep routine, good diet, etc… - lifestyle changes to support good mental health.

2023-08-31 - Weekly

Location: CO364, Time: Thursday, 12:00 - 12:30, Attendees: Bach Hoai Nguyen, Jesse Wood

Note:
  • Health is the first priority.

  • No unnecessary pressure ATM, ease back into PhD work.

  • (If able) send supversiors a draft of my proposal, and they will give me feedback.

2023-09-07 - Weekly

Location: CO352, Time: Thursday, 10:00 - 10:30, Attendees: Bing Xue, Bach Hoai Nguyen, Mengjie Zhang, Jesse Wood

Notes:
  • Need to submit my proposal ASAP.

  • Send proposal to supvervisors for feedback.

  • Get feedback on introduction from Daniel.

  • Changed weekly meeting time to Thursday, 10:00 - 10:30.

  • Good habit, to be a morning person, to be early in the office on a Thurday.

2023-09-07 - FASLIP

Location: CO350, Time: Thursday, 14:00 - 15:00, Attendees: Huixiang Zhen, Bing Xue, Mengjie Zhang, Hengzhe Zhang, Jesse Wood

Huixiang Zhen gave a talk on “Model Selection Based Offline Data-driven Evolutionary Algorithm (MSEA)”

Huixiang Zhen, China University of Geosciences, China, Nov. 2022-Nov. 2023 https://www.researchgate.net/profile/Huixiang-Zhen

Notes:
  • Offline - real fitness evaluation not available during the optimmization process, and can only be predicted on historcial data.

  • Challeges:
    • Not enough data, especially for high-dimensional problems.

    • Cannot collect new data

    • Cannot validate the quality of the solution during optimization.

  • Symmetric search space - existing ensemble models have a preference a symmetric search space (e.g. fitness landscape).

  • Expensive evaluation - true objective function may be intractable or very difficult/expensive to evaluate.

  • Instead, we use a surrogate model to approximate the true objective function.

  • Evaulate fitness of candidate solutions using a surrogate model, rather than the true objective function.

  • Airfoil - a wing of an aircraft, the shape of the wing is important for aerodynamics.

  • Real-world problem - airfoil optimization is an example of an offline optimization. Design an airfoil specification that provides the largest lift coefficient.

  • (Sobienczky 1999) PARSEC - Parameteric model that uses 11 or 12 parameters to repreent major structural sectional features of an airfoil.

  • Including:
    • leading edge radii,

    • upper and lower crest location,

  • Constructs an airfoil using a sixth-order polynomial.

Related:

2023-09-21 - Weekly

Location: CO352, Time: Thursday, 10:00 - 10:30, Attendees: Bing Xue, Bach Hoai Nguyen, Mengjie Zhang, Jesse Wood

Notes:
  • Revise and resubmit the proposal based on Bach’s feedback. Subtmit within the next two weeks.

  • Mention that it is an ongoing project, new datasets or objectives mau come through.

  • Need to submit the proposal ASAP, and move to full registration.

  • Bing. Meng, Bach all play table tennis, a bit rusty, but they enjoy the game.

  • Just send chapter 1, delete other pages from pdf, send to Bing / Daniel.

  • Iteratively change it, small chunks often, revise based on feedback.

2023-09-21 - FASLIP

Location: CO350, Time: Thursday, 14:00 - 15:00, Attendees: Junhao Huang, Bing Xue, Mengjie Zhang, Hengzhe Zhang, Jesse Wood

Junhao Huang gave a talk on “Automated Design of Efficient Multi-scale NEtworks via Multi-Path Weight Sampling”.

Notes:
  • Single branch CNN - AlenxNet, VGG-net

  • Multi-branch CNN - GoogLeNet

  • ResNet - slip connections between layers, to avoid vanishing gradient problem, allow gradient to flow through the network.

  • DenseNet, MobileNet - efficient model representation for commodity hardware and mobile/client/CPU devices.

  • Neural architecture search (NAS) - automatically design network architectures in a predefined search space.

  • Huang proposes EMixNet - a lightweight multi-scale network.

  • Warms starts with weights sampled frmo supvernet, fine-tune for 1 epoch.

  • 0.01% improvement on performance on CIFAR-10.

Related:

2023-09-28 - Weekly

Location: CO352, Time: Thursday, 10:00 - 10:30, Attendees: Bing Xue, Bach Hoai Nguyen, Mengjie Zhang, Jesse Wood

Notes:
  • Send an update tomorrow, and complete chapter 1 ASAP.

  • Deadline, needs to be completed within the next 3 weeks.

  • Maori Scholarsghhip Evening - go to CO350 at 4pm, free food/drink, meet Maori undergraduate students.

2023-09-28 - FASLIP

Location: CO350, Time: Thursday, 14:00 - 15:00, Attendees: Peng Wang, Bing Xue, Mengjie Zhang, Jesse Wood

Peng Wang gave a talk on “GP for dimensionality reduction”.

Notes:
  • Feature selection - select a subset of features from the original feature set.

  • Feature construction - construct new features from the original feature set.

  • Single-tree GP strugges for multi-class classification tasks, can use a classification map (Smart 2005)

  • M2GP - a multi-dimensional generptric programming approach for multi-class classification problems.

  • M2GP is used for feature construction, performs best with random forest classifier appneded for downstream classification tasks.

  • Limitations: M2GP has a fixed number of dimensions \(d\) for the constructed feature space.

  • M3GP - multi-class classification with GP, with progressively searched for an optimal number of dimensions \(d\) for the constructed feature space.

  • Pruning - pruning operator iterates through each dimension, removes it, and checks if fitness improves, then removes that dimension, if so.

  • Similar to Accumulated Local Effect Plots (ALE), are a tool for functional decomposition. They depict how changing a single feature impacts the model’s prediction.

Related:

2023-09-04 - Deep Learning

Location: Zoom, Time: Wednesday, 15:00 - 16:00, Attendees: Ali Knott, Bastian Kleign, Paul Teal, Marcus Frean.

Ali Knott gave a talk on “GPT Alignment” https://people.wgtn.ac.nz/ali.knott

Notes:
  • GPT can generalize to text/images that exist outside of training distribution entirely.

  • In-context learning (ICL), few-shot learning on unseen tasks, without further optimiztion. A few papers on this, but still largely a mystery as to how it works (Dong 2022).

  • Intuition: basicallu pointing to a region of the text latent space and saying I want to produce text here.”

  • Prompt length: longer promts point to more precise regions of the text latent space.

  • Next word prediction is orthogonal to harmful content.

  • Most harmful content can be detected due to its characteristics. Misfinromation is much harder, and requires domain experts to detect.

  • Remove harmful content from the training set, otherwise the language model can emulate that behaviour implicitly.

  • Harmful content detection: Use (other) AI to filter the training set for harmful content.

  • Fine-tuning - transformer-based models are readily fine tunable (this property originated with transformers).

  • Get GPT-4b to generate a range of responses, assign a reward value to each response, train a reward model, that maps any text onto a reward value.

  • This is done by a type of reinforcement learning called proximal policy optimization (PPO) (Schulman 2017).

  • Policy gradient methods work by an esimation of the policy gradient and plugigng that into stochastic gradient descent.

  • PPO is a policy gradient method that uses a surrogate objective function to estimate the policy gradient. Approximating human behaviour.

Related:

2023-10-06 - ECRG

Location: CO350, Time: Friday, 15:00 - 17:00, Attendees: Yi Mei, Felix Yan, Hayden, Kaan, Hengzhe, Peng, Junhao, Jesse Wood

General notes:
TODO:

Ruiqi Chen gave a talk on “Deep Reinforcement Learning for Job Shop Scheduling”. https://ruiqi-chen.cn/

Notes:
  • Job shop scheduling:
    • Job shop sechduling refers to the allocation of resrouces, such as machines and operators, subject to certrain constraints.

    • It inovles determing order and timing of a set of jobs to be processed.

    • Goal of optimizing one (or more) objective(s), such as minimizing completion time, minimzing delays, or maximizing resource utilization.

  • Human designed heuristics rely on domain exerptise, and are often sub-optimal. They are static, and cannot adapt to changing conditions.

  • Traditional deep reinforcement learning (DRL) have fixed input size, and fixed parameterization (architecture) that do not generalize well to other problems.

  • The job shop scheduling problem can be represneted as a disjunctive graph (Balas 1969).

  • Routing problems can be solved with attention-based representations (Kool 2019).

  • Node2vec (Grover 2016) is a technique for learning low-dimensional representations of nodes in a graph.

  • Word2vec (Mikolov 2013) is a technique for learning low-dimensional representations of words in a corpus.

Related:

2023-08-11 - Deep Learning

Location: Zoom, Time: Wednesday, 15:00 - 16:00, Attendees: Bastian Kleign, Paul Teal, Marcus Frean, Hayden Dyne, Ali Knott, Jesse Wood

Baastian presented the (Liu 2023) work for discussion.

Notes:
  • One-step stable diffusion with rectified flow (Liu 2023).

  • Rectified flow is a unified ODE-based framework for generative modelling and domain transfer.

  • It uses interpolation to learn a vector fiel that maps between two probability distributions. These vector fields contain parallel vectors (that do not touch!)

  • Retification, given a flow already, this effectively gives us pairs, that we can train on, sampled randomly from distributions \(P(x)\) and \(Q(x)\).

  • Instablity with InstaFlow for regions where there are holes in the probability distribution. Bastiaan suggests adding a small amount of noise to the input, to avoid this instability.

  • SDXL is used to refine the generated output into high-resolution images.

  • The diffusion process, or denoising autoencoder, architecture is based on concepts from physics, e.g. thermodynamics (Sohl 2015)

  • Brownian moation is the random motion of particles suspsended in a medium (a liquid or gas).

  • The many-body interactions that yield the Brownian pattern cannot be solved by a model accounting for every involved molecule.

  • Thus, only probablity-based models can be used to describe the Brownian motion.

Related:

2023-10-12 - Weekly

Location: CO352, Time: Thursday, 10:00 - 10:30, Attendees: Bing Xue, Bach Hoai Nguyen, Mengjie Zhang, Daniel Killeen, Jesse Wood

Notes:
  • Need to submit my proposal this week. PGR had given me a 3 week deadline, that expires at the end of this week.

  • Next week meeting is at 10 AM , important to show up on time, and early.

  • I will submit this week, may have to work over the weekend.

  • Proposal won’t be perfect, but can revise it retroactively for the thesis.

  • Proposal seminar is a 35 - 40 minutes presentation in which I defend my proposal.

  • Deadline is very early for the $20,000 scholarship, likely I won’t get it, as it geared towards Education / Environment. Focus on proposal first, and hope for the best.

2023-10-12 - FASLIP

Location: CO350, Time: Thursday, 14:00 - 15:00, Attendees: Lin, Bing Xue, Mengjie Zhang, Hengzhe Zhang, Jesse Wood

Lin gave a talk on “Image Operators for Image Processing and Computer Vision”.

Notes:
  • Dectectors:
    1. Edge detectors

    2. Texture detectors

    3. Shape detectors

  • Sobel operator - edge detector (Sobel 1990)

  • uses convolution to detect edges in an image. An isotropic 3x3 image gradient operator .

  • Grey-level co-occurence matrix (GLCM) - texture detector (Karalick 1973).

  • A GCLM is a histogram of co-occuring grescale values of a diven offset over an image .

  • Histogram of Oriented Gradients (HOG) - shape detector. (McConnell 1986).

  • The technique counts occurrences of gradient orientation in localized portions of an image.

Related:

2023-10-13 - ECRG

Location: CO350, Time: Friday, 15:00 - 17:00, Attendees: Yi Mei, Felix Yan, Hayden, Kaan, Hengzhe, Peng, Junhao, Jesse Wood

Tapabrata Rohan Chakraborty (guest speaker) gave a keynote on CHAI. This talk was very popular, and we ran out of seats in the room. He is a well decorated academic scholar in deep learning and medical applications. Here is a proof: https://www.turing.ac.uk/people/researchers/tapabrata-rohan-chakraborty

A/prof of University of Londa, IEEE fellow, Alan Turing Institude, Turing-Roche Partnership, Big Data Institute, Oxford Institude, UCL - Cancer Institute, Otago PhD, Post-doc Oxford.

Synposis: A high level talk, brief introduction to all the work going on in the lab.

TRAIL:
  • Transparent and Robust Artificial Intelligence Labratory (TRAIL)

  • Transparent AI
    • Explainable AI (XAI)

    • Fairness and bias

    • Causal ML

    • Multi-modal

  • Robust AI
    • Robust conditional learning

    • Uncertainty quantification

    • Dealing with missing data

    • Safe/ethical AI

UCL DL Research:
  1. Self-attention

2. Graphical neural networks - contribution:

  • use the learned features as a mutli-head attnention mechanism.

  • as an error signal, e.g. heatmaps.

  • XAI transformer ouptut for contribution to domain knowledge (post-hoc).

High-dimensionality
  • 1GB per instance

  • too many features for DL.

  • frameworks to handle out-of-the-box.

  • e.g. genonmic data, 1GB per instance, millions of features.

Multi-scale graphical neural network
  • segment data to different levels or resolution.

  • e.g. multiple resolutions of one image taken at different scales.

  • i.e from low-resolution to high-resolution.

e2E data-driven
  • end-to-end data-driven DL with interpretable mechanistic model.

  • DL procedures coerce output that is fine-tuned by mechanistic model.

  • layers are differnetiable and so errors can be back-propoogated for trainig together.

Real-world
  • models that are deploted in real-world domains

  • it can be expensive, time/money, to generate high-quality annotated label data.

Toy dataset
  • This is differnet to real-world dataset.

  • must suit DL methods, and benchmarks, from CHAI.

  • Task: Fuzy waterflow model, standard task in CHAT.

Causal inference:
  • Causal nference as Baysian inference

  • Sufficient samples sufficiency

  • can derive counterfactual explanations that mimic causal inference.

  • i.e. examples of almost every <diseases>

  • go from bayesian association to causal inference

Contrastive learning
  • compare two patients

  • ask doctors if there was a difference in the decision

  • intuition: what if?… scenarios, long tail of edge cases, e.g. Tesla FSD.

  • outliers

  • catastrophic forgetting

Causality
  • Proximal <eligible> distal causaility (work in progress).

Robots
  • Automated annotator

  • semi-supervised artificial learning

  • <eligble>taining generator

Strucutred missingness
  • learning from data with structure missingness - published in Natur MI 2023.

  • e.g. bias to minorites, access to healthcare, bias inherent to dataset.

  • goal: can missingness itself be a singal for bias in large scale health datasets.

Quantify uncertainty
  • Conformal predictions to quantify uncertainty.

  • CHAI applications domains do not trust black-box models.

  • Doctos want to be the ones making the decisions.

  • AI-powered tools can assist, but not replace, doctors.

Conformal validation:
  1. training

  2. validation

  3. conformal validation

  4. test

Prostate cancer
  • Application: quantifying uncertainy in automated prostate cancer detection

DLSS
  • low-resolution image modailty, DLSS \(\to\) high-resolution image modality, e.g. MRI.

  • foundational models: transformers, neural rendering, GANs, etc…

Projects:
  1. Responsible AI in social media and health care.

  2. Global Partnership on AI (GPAI)

3. 3D shape analysis of Lithic Fragments (Maori stone tools) * AI: mechanstic models + deep learning

IEEE CIM - IEEE Computation Intelligence Magazine
  • Special issue on AI for health care.

  • Guest editor: Tapabrata Rohan Chakraborty

2023-11-18 - Weekly

Location: CO352, Time: Thursday, 10:00 - 10:30, Attendees: Bing Xue, Bach Hoai Nguyen, Mengjie Zhang, Daniel Killeen, Jesse Wood

I sent my supervisors a copy of my slides, and overleaf link, in advance so Bing can bring it up on her screen.

Layout:
  1. Problem statment

  2. What? Why?

  3. Motivations
    • Identification

    • Contamination

    • Traceability

  4. Marine biomass
    • Identification

    • Contaminiation

    • Traceability

  5. Machine Learning
    • Feature construction

    • Few-shot similarity-based contrastive learning

    • Outlier thresholding

  6. Research Goals

  7. Preliminary Work
    • EDA of REIMS

    • GP on GC dataset

  8. Contributions, Timeline, Thesis Outline

Notes:
  • Use powerpoint, not LaTeX, easier, faster.

  • Application domain difficult for AIML people to understand.

  • Explain new AIML concepts that the group is not familiar with.

TODO:
  • [x] Hide code/datasets from Github repository - industry project IP.

  • [x] Send supvervisors slides.

  • [x] Practise presentation.

2023-10-18 - Deep Learning

Location: Zoom, Time: Wednesday, 15:00 - 16:00, Attendees: Bastian Kleign, Paul Teal, Marcus Frean, Hayden Dyne, Ali Knott, Jesse Wood

I gave a talk on Falcon-130B, a new state-of-the-art language model, released open-source to the public, by the United Arab Emirites. https://falconllm.tii.ae/

  • Falcon-130B - a large language model (LLM) from the United Arab Emirites (UAE) in Adbu Dabi, An open-source LLM released with an Apache 2.0 License, on Github and Hugging face.

  • Bastian discussed CLIP

  • Bastiaan discussed Stable Diffusion Extra Large (SDXL)

Releated:

2023-10-19 - FASLIP

Location: CO350, Time: Thursday, 14:00 - 15:00, Attendees: Lin, Bing Xue, Mengjie Zhang, Hengzhe Zhang, Jesse Wood

Improving Buoy Detection with Deep Transfer Learning for Mussell Farm Buoyancy - a talk by Carl McMillan mcmillcar@myvuw.ac.nz

Background:
  • YOLO is a state-of-the-art object detection model for computer vision.

  • Each iteration of YOLO is an improvement on the previous iteration.

  • YOLO versions are often made by different authors, and are not always compatible with each other.

Dataset:
  • Mussel buoy dataset - 1,000+ images of mussel buoys in the Marlborough Sounds, New Zealand.

  • Human supverised label annotation of buoy images.

  • Images are drawn from a video of a boat travelling through Mussell farms in the Marlborough Sounds.

Method:
  • Manual label annotation of randomly selected video frames to generate a train/test datset.

  • YOLOc7 pretrained on the MS-COCO dataset. Then retrained on the mussel buoy dataset.

Experiment:
  • Compare different YOLO model sizes (tiny, v7) for buoy detection.

  • Ablation studies - with adverse weather conditions

Results:
  • YOLO-tiny vs YOLOv7 model size. Does not effect detection accuracy much.

  • Abalation studies - adverse weather conditions effect results more than model size.

Limitations:
  • Label annotation is painfully slow.

  • Carl labelled +1,000 images by himself.

  • YOLOv7, newer models now exist.

  • Adverse weather conitions

Future work:
  • Weather invariant buoy detection via data augmentation

  • Super-resolution of buoy images.
    • E.g.
      1. Neural rendering to upscale. (NR)

      2. Deep Learning Super Sampling (DLSS)

ECRG - 2023-10-20

Location: CO350, Time: Friday, 15:00 - 17:00, Attendees: Yi Mei, Wang (new), Bing Xue, Mengjie Zhang, Yi Mei

General notes:
  • Professor Wang s visiting researcher from China. Part of the China & New Zealand international research transfer programme.

  • NZ + China Research transfer - they previously offered only 2 eeks for the research and collaboartion visit, but it has now been extended “to be longer because it was too short”.

  • Hayden took a survey for 200 different 100-level students.
    • 50% did not attempt.

    • 10% did not finish.

    • 40% finished.

Progress:
  • I have submitted my proposal on Sunday at 2 AM.

  • I have started preparing my proposal seminar.

  • I plan to have 2-3 rehearsals before the seminar.

  • Revise seminar based on supervisors feedback.

2023-11-01 - Health + AI Seminar

Location: Zoom, Time: Monday, 15:00 - 16:00, Attendees: Jesse Wood

Notes:
  • NZ Algorithm Hub: open-source repository of existing AI algorithms that can be applied to existing domain applications in health. Explicitly provides code for easy use by Scientists without technical expertise in computer science. Also discussed ethical implications of an algorithm, and what applications it is suitable for.

  • De-indentification: releasing sensitive data to the public in an anonymous manner, that doesn’t share private/personal/medical information with the world. Encrypted datasets?

  • Bias: open-soirce medical datasets often remove outliers/minorities, at the risk of sharing their personal information, due to unique combination of their medical condition. But, that means public datasets are biased towards majority populations, and unfairly not representative of minority populations in a community.

  • ChatBots: level 1 help desk support for medical inquires, summarisation of clinical notes from PDF documents in lay terms, technical jargon for practitioners.

  • New Zealand country-specific datasets, rather than reliance on international models that don’t generalise to our niche demographic population.

  • Understanding data sovereignty, see Rosey Dobson. What do people, end-users from the public, expect medical professionals and the government from using their health records to do. The privacy and consent of patients must be honoured.

  • Narrow AI applications where AO can outperform human domain experts in medical diagnosis. For example, predicting a patient’s ethnicity from an x-ray, unintended phrenology from unintentionally capabilities of AI models that humans can’t understand. Important to sure the patients sentive and private immutable characteristics are not made publicly available or exploited by bad faith actors.

  • AI doomers: foomers make Medical AI applications difficult, e.g. risk in safety critical domains.

  • Maori: data sovereignty, co-governace, equity, DEI, colonialism, …, etc.

Notes:
  • AI to detect and cure disease and improve health

  • Biology breakthrough, decoding of the human genome. If you open the double helix you have a bunch of letters, e.g. ATCG. The book of life.

  • Gene is like a word in the book. Proteins are the machines of life.

  • Biology applications: gene and protein networks

  • Medical diagnosis application: NZ has the highest breast cancer rate I’m the world.

  • Tools for diagnosis:
    1. ultrasound

    2. blood

    3. plasma

    4. biological sample

  • Detection is a classification task, cancer or benign

  • Motivations: results need to be: accurate, interpretable and trustworth

  • Methods: morphological feature extraction.

  • Hierarchical clustering to see feature importance.

  • Machine learning techniques:
    1. KNN

    2. LDA

    3. NC, 3. MLNN/SOM

  • MLNN/SOM gives a highly accurate and parsimonious output that extracts meaningful feature and provide interpretable results.

  • For AI applications in a safety critical domain of medicine the models need to be interpretable, explainable, and trustworthy.

  • BIology jargon: genetic and proteomics, differentially expressed genes, active genes

  • protein and gene networks to see how active genes and proteins are for performing a mechanistic operation or function in the body.

  • Human genome: 30,000 base pairs.

  • Each person has 7,000+ genes.

  • Feature selection: select a small set of discriminating genes, bi-biological filter with best-first-search with SVM.

  • Dimensionality reduction via spectral clustering for genomic datasets. Spectral clustering groups genes that perform familiar functions. E.g. 17 clusters for cancer dataset 15 clusters for benign dataset, shared two clusters in common. Can reduce dimensionality of genomic datasets by filtering for clusters that are useful in differenting between belign and malignant cancer samples, a binary classification task that is a downstream medical application of the feature election/ feature construction.

  • 94% accuracy with just 13 genes. Down from 97% accuracy with the full dataset. A reasonable tradeoff between accuracy and dimensionality, for smaller, explainable, interpretable models, that can be easily understood and trusted by domain experts in medicine.

  • Future proof vaccine development. AI applications for drug discovery. mRNA and micro-RNA cells. Current application: tuberculosis vaccine drug discovery. Future work - Effective for developing covid vaccines.

  • Deep neural network and explainable AI for ealrly detection of Alzheimers. Chris Hemsworth, Marvel’s Thor, developed ealrly on-set dementia. Incurable or treatable once Alzheimers reaches last-stage. My grandmother passed away from Alzheimers several years ago.

Questions and answers:
  • Q1. Advice for data scientist or computer scientists?

  • A1. Don’t need a formal background in the application domain. Can have expertise in software engineering or artificial intelligence, and learn enough domain knowledge in the application of medical biology to be a researcher in of AI in a real-world application domain. Resources available: online courses available, YouTube, Talks, Events. Interest in the application domain, e.g. I’m interested in fish 🐟, is crucial to real-world AI applications.

  • Q2. Inter-disciplinary collaboration, how difficult is it for medical professionals to work with data scientists? 🥼

  • A2. Most difficulty comes from the bureaucracy needed to get access to the data, and keep patients private medical information private. Most research collaborations are positive, medical professionals want to trust models, and be able to understand them easily.

  • Q3. Tangent on drug discovery for cancer.

  • A3. Evolutionary game theory with evolutionary arms race between the cancer and the treatment. Don’t want to develop treatment-resistant super-cancers.

2023-11-23 - Taran Proposal Seminar

Location: CO431, Time: Thursday, 10:00 - 10:45, Attendees: Taran, Mengjie Zhang, Aaron Chen, Andrew Lensen, Jordan MacLauchlan, Jesse Wood

2023-11-23 - Weekly

Location: CO352, Time: Thursday, 12:00 - 12:30, Attendees: Bing Xue, Bach Hoai Nguyen, Mengjie Zhang, Jesse Wood

Candidate Development Plan (CDP)
  • Last 6 months
    • Proposal submission

    • Preliminary work

    • Literature review

    • Reverse suspension (TODO)

  • Next 6 months:
    • Proposal seminar

    • First contribution - identification

    • Paper: CNN/ST-MT-GP on GC,MS,GC+MS data

  • Issues
    • Mum passed away (discussed last CDP)

    • Psychiatric ward - 19 day involuntary suspension

    • 6 month voluntary reverse suspension

    • Funding for last 6 months of my PhD.

Convolutional Neural Networks:
  • Allowed to report the best + mean run

  • But cannot claim the best run as a contribution

  • Need to show that the best run is
    1. statistically significant,

    2. reproducible

    3. generalizable

    4. not overfitting or underfitting

Datasets:
  • GC,MS,GC+MS data:
    • Mass Spectrometry (MS)

    • Gas Chromatography (GC)

    • Gas Chromatography (GC) + Mass Spectrometry (MS) = Gas Chromatography-Mass Spectrometry (GC-MS)

    • GC+MS is multi-modal dataset.

    • GC,MS is single-modal dataset.

  • Evaluate these datasets with classification techniques.
    • Convolutional Neural Network (CNN)

    • Single-Tree/Multi-tree Genetic Programming (ST/MT-GP)

    • Multi-Feature Class-Independent Feature Construction (MFCIFC)

    • Linear Discriminant Anaylsis (LDA)

    • Support Vector Machines (SVM)

    • Random Forest (RF)

    • K-Nearest Neighbour (KNN)

    • Multi-later Perceptron (MLP)

    • Naive Bayes (NB)

    • Logistic Regression (LR)

    • Decision Tree (DT)

  • A model that generalizes and performs well across all datasets is ideal.

  • Compare these datasets to other datasets in the literature

  • Evaluate using

Next meeting:
  • Today we meet at 12PM becuase of Taran’s proposal seminar

Related:
TODO:
  1. [x] First contribution

  2. [x] Proposal seminar

  3. [ ] Reverse suspension

  4. [ ] Paper: CNN/ST-MT-GP on GC,MS,GC+MS data

2023-11-23 - FASLIP

Location: CO350, Time: Thursday, 14:00 - 15:00, Attendees: Mohamed Rimas, Hamish O’Keefe, Bing Xue, Mengjie Zhang, Hengzhe Zhang, Jesse Wood

Rimas gave a presentation on “Bloat detection by Tree Replcament in Genetic Programming based Symbolic Regression” https://ecs.wgtn.ac.nz/Main/GradMohamadRimasMohamadAnfar

What is it?
  • Bloat detection of Genetic Programming (GP) trees

  • Application domain of ransomware detection

  • AJCAI 2023 paper accpeted to conference, as 7 minute talk.

Methods:
  • Bloat detection - a simple if statement is used to detect bloating.

  • Wilcox statistical signifance test, not standard t-test.

  • Standard GP for symbolic regression

  • Train for 60 epochs

  • Population size 100

  • Terminal set: input as arguments

  • Function set: arithmetic, trigonometric, logarithmic, exponential

Results:
  • Improves tree size

  • Reduces bloating

  • No effect on fitness

Why does it matter?
  • Real-world application of AI in cybersecurity.

  • Bloat detection is a common problem in GP.

  • Prevent real-world harm from cyber-attacks.

Issues:
  • Note: fitness is not encorporated into the tree replacement algorithm.

  • Tree Replacement algorithm does not improve fitness.

Future work:
  • Encorporate fitness into the objective function of the tree replacement algorithm.

Related:
  • (Wood 2022) I previously had a paper accepted to AJCAI 2022.

Hamish gave a presentation for “Instance Segmentation Techniques using Neural Networks for Assessment of Green-Lipped Mussells”

What is it?
  • Computer vision segmentation tasks

Methods:
  • YOLACT - You Only Look At CoefficienTs (Githut Copilot)

  • CenterMask - Anchor-free Instance Masking (Githut Copilot)

Results:
  • Average Precision (AP)
    • YOLACT low

    • CenterMask high

  • Frame rate (FPS)
    • YOLACT high

    • CenterMask low

Why it matters?
  • Niche New Zealand Seafood Industry

  • Real-world application of AI

Novelty:
  • Using a standardized checkered board for measuring fish size.

Issues:
  • Paralax effect for checkers board for measuring fish size.

  • As you get closer to the fish or board, the size of the checkers board changes.

  • Impossible to fix, or account for, without a 3D camera, or deep learning.

Future work:
  • Fix paralax effect with
    • 3D camera, or

    • deep learning, e.g.
      • denoising autoencoder,

      • vision transforme,

      • diffusion,

      • …,

      • etc…

Related:
  • (Wood 2022) I previously had a paper accepted to AJCAI 2022.

TODO:
  • [x] Hamish - 2 slides elevator pitch

  • [x] Rimas - 2 slides elevator pitch

2023-11-27 - GNS Science tour

Location: GNS Science, Time: Monday, 10:00 - 11:00, Attendees: Jesse Wood, Colin <last-name>

Green Hydrogen By Colin <last-name> ====================

Hydrogen
  • Hydrogen, odourless colourless gas.

  • first element on the periodic table.

  • most abundant element in the universe.

  • fuel of stars, and our sun.

Colours of space
  • gold, green, pink hydrogen

  • trapped, zero-CO2, nuclear

  • grey hydrogen, natural gas, steam methane reformation.

Chemistry
  • only chemistry slide: 2H20 \(\to\) 2H2 + 02

  • electrolysis is very inefficient, needs to be innovated upon.

  • Electro catalytic reaction

  • I lied, here’s more: 2H2 + O2 \(\to\) Energy + 2H20

Hydrogen Car
  • Electrolysis + Fuel cell = Hydrogen car

  • yellow hydrogen, if we were using a solar panel, it would be green hydrogen.

  • two tanks, one makes hydrogen the other. makes water. Usually these would be seperated, a hydrogen car, and a fuel ststion. Hydrogen car is for demonstrative effect.

  • see video

Hindenburg
  • Hydrogen is flammable.

  • long-haul trucks * green hydrogen.

  • Toyota FCV.

  • next generation FCEV.

  • Auckland * first hydrogen bus.

  • America’s Cup chase boat powered by hydrogen.

Hydrogen conference
  • New Zealand Hydrogen Symposium

  • Starts at 31st Jan

Research and Development
  • Needs to be cheaper and easier to make.

  • Need to replace lost energy.

  • Rare earth metals, or transition metals.

  • Catalyst: Iridium, Platinum.

  • cheaper, more efficient, electrolysis.

  • only need catalyst at the surface.

  • any catalyst not at surface is wasted.

  • make a very thin layer of catalyst.

Microcommunites in remote communities.
  • Hydrogen supported microgrids.

  • supports remote communities.

  • previously: microgrid to support energy. needs, diesel-powered.

  • now: hydrogen-powered microgrid supports their energy needs today.

  • 10 days of autonomy if solar is not available.

Ngahina Marae
  • 750 Marae in Aoeteroa

  • previously, diesel-powered generator

  • now, slowly, transferred to hydrogen-powered generators

  • Ngahina Marae * first case study

“computer scientist so not a real scientist” 😭🥼 -* Jesse Wood

Composite magnetic materials development for applications in inroads charging systems By <unknown> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Charge onroad
  • charge cars parked or at traffic lights.

  • need EVs to reduce emissions.

Diffusion of innovation 1. certain sectors are keen 2. certain sectors want it easy 3. certain sectors are laggards

Materials engineering
  • Different from Tesla coils

  • Wireless charging, capacitance charger

  • rare earth metals, transition metals, Iridium and platinum * are very brittle, and break easily.

  • Composite material, magnetic concrete

  • materials engineering

  • Soft magnetic composite (SMC)
    • mechanically robust

    • shapeable

    • low losses

  • Multi-objective (MO) optimisation problem with multiple tradeoffs between
    1. permeability

    2. losses

    3. durability

    4. cost

  • modelling through simulation is iterative, make material, model it, test IRL, compare to model, improve model, improve composite.

  • New Zealand iron sand * composition, common on North Island beaches

  • Iron Beam Analysis (IBM) * smashing particles. To test composite materials for MO Pareto optimal solutions.

Manufacturing methods
  1. 3D printed moulds

  2. pressed composite

  3. Binder

Permeability
  • B = uI / r^2

  • in air u=1

  • iron sand u=4

  • composites

u=100 * tradeoff: permeability vs durability

Shape of water
  • spherical/elongated particles

  • spherical: spread equally across the circle, not a good packing, Escher tiling problem

  • elongated has much better permeability

Mathematic losses
  • inefficient \(\to\) high loss

  • high loss \(\to\) heat transfer

  • 😔

  • don’t want to start a fire 🔥

Electric recharging
  • charging pads under roads

  • charge while moving

  • charging pads for EVs on left most lane

“The slow lane, would turn into, a fast lane” #IAmSpeed

Energy efficiency Research By Murray Energy Material Scientist, SSIF ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Efficiency
  • using less energy to perform the same task.

  • thermal energy focus
    1. increasing thermal conductivity

    2. decreasing thermal conductivity

Fridges
  • fridges * enhancing heat transfer

  • energy rating +0.5 to +1

  • improving heat transfer in refrigerating units

Controlling heat transfer
  • structures:
    1. flat

    2. nanostructures

    3. microstructure

    4. hierarchical

Anisotropic wetting surface
  • anisotropic wetting surface development

  • works in separation, but fails integration testing

  • still an active research area.

Knudsen effect
  • Knudsen effect \(\to\) gas conduction, free path reaches averge size of a pore, the pore will stop producing heat.

  • pores need to be as small as possible, to prevent above, and ensure heat transfer.

  • solid conduction
    1. heavy elements

    2. strong anharmonicity

    3. complex cell unit

Heat Transfer
  • heat is transferred through a material by vibrations in the lattice

  • anharmonicity * bonds don’t have harmonics

  • complex cell * complex structures in the smallest unit of the material, bonds going all over the place.

  • nano particles (NP)

  • nano composites (MC)

  • super lattices (SL)

  • NPs, SLs

Allen-Feldman
  • Allen-Feldman theory
    1. propagon * atom propogating as wave

    2. diffuson * <unseen>

    3. locom * atom chilling

Energy Harvesting By Martin Markwitz ^^^^^^^^^^^^^^^^^^

  • 66% of energy is wasted into heat transfer.

  • Voyager, rocket ship, space probes.

  • rock of plutonium, thermo electric heat. transfer tools to harvest heat.

  • cars * heat transfer from engine to the car battery.

  • [REDACTED]

  • commercially sensitive

  • [REDACTED]

  • conductor: metal

  • insulator: glass

  • semi-conductor, somewhere in between

  • arrangement of atoms give it their special Proper

  • glass is a particular arrangement of oxygen and silicon that gives you a transparent window 🪟

“You guys are trying to make Greek letters really big, without breaking the laws of physics, or the materials you are working with.” -* Jesse Wood this

The rock does float, but not for the right reason #LK-99
  • diamagnetism, not Meissner effect

  • repulsion, not superconduction

Gupta By Gupta

  • arrange carbon in different configurations to create diamond 💎

  • doping a material, implant nitrogen/phosphorus, make it conductive.

  • construct based material for quantum computers 🥼🧑‍🔬⌨️🖥️🖱️🧑‍🔬🥼

  • tungsten-carbonite

“You’ve heard of alchemy?”

Gupta said

“Did you make gold?” 🪙

I replied

“We did better, we made platinum!”

Gupta finished

Ammonia
  • in our bodies, 4th most abundant element is nitrogen.

  • half of that nitrogen in the body is from. ammonia

  • half of the planet depend on ammonia.

  • plants can naturally fix nitrogen into ammonia.

  • 200,000,000 tonnes of ammonia, 500,000,000 tones of carbon dioxide.

  • thermodynamics: entropy always wants to increase the number of molecules.

  • synthesize ammonia at a farm when it is needed. Excess ammonia can be stored as a fuel. Ammonia contains hydrogen, and has much greater volumetric density than regular hydrogen.

  • it’s a better energy vector

  • electrocatalyst can produce ammonia in seconds, rather than years by traditional chemistry methods.

  • A surface where the field passes electricity, a catalyst speeds up a reaction, electrocatalyst uses electrons to do this.

  • 500 bar pressure and 299 degree temperature, WWI Germany was trying to synthesize ammonia, Heber & Bosch, allies tried to cut off ammonia supply.

Career pathway talks By et al. ^^^^^^^^^^^^^^^^^^^^

Study
  • masters the PhD

  • research assistant, lab, desk job

  • research scientist

  • reports, emails, days like this

Next
  • chemical engineering

  • university of Canterbury

  • didn’t want to get a job

  • offered job at the university

  • research assistant

  • post-doctorate

  • patent office for 6 months

  • learning about intellectual property

  • open-polytechnic drip-fed degree in economics.

  • research consulting for (originally) coal, (now) energy and consulting

  • young, dumb, and…

Maori + Chemistry
  • te reo māori is a valuable skill in industry/research.

  • applying chemistry to Māori knowledge.

  • outreach & education

  • future work: study geomorphology

  • Taniwha, science and Māori spoken word, purakou, parallels for explaining volcanoes 🌋

They said

“technical role, you are an aspiring scientist, and you are overqualified.”

She said
“Don’t put me in a box!

Please reconsider my application.”

Then she got the job.

Next
  • earth sciences at university

  • tried consulting but was never hired

  • camel scientific: instrumentation company

  • basically a sales job, e.g. cambell data logger

  • on the phones

  • weather station for solar monitoring

  • solar farms being built in eastern Canada

  • blew it up on site plugging it in wrong.

  • working on every project in Ontario.

“Noone ever told me what to do!

I just did what I thought was best. I booked flight, booked hotels, worked onsite.” 👨‍🔧

“This is a new role, we don’t really know

Just go do things, and work it out!” 💪

General
  • Wellington is a small place 🤏🌏

  • not what you know, but, who you know

  • science is broken

  • create the job role you dream of working

Advice for young scientists:
  1. Networking is valuable

  2. Know your audience

  3. Less is more

  4. Be yourself

  5. Writing is important

Lab and facilities tour By et al. ^^^^^^^^^^^^^^^^^^^^^^^

Notes:
  • see photos

  • Vaan set Graaf

  • Hip replacement with particle splattering

  • in a small layer of angstrom (1 million times smoother than a table) of composite silver.

  • They didn’t have to be replaced afterwards. Just used once. But it worked.

  • argon * the Penning Effect, Ag is a heavy element, efficient mass transfer when seperating water with electrolysis.

  • mass spectrometer * limited to 100 atomic units (photo with helmet in it)

  • MS data with X axis showing atomic weights. Much easier to interpret and understand.

  • AI for chemistry analysis, writing grants, and drafting emails. It gets 80% of the job dome, and requires 20% human intervention.

2023-11-30 FASLIP

Location: CO350, Time: Thursday, 14:00 - 15:00, Attendees: Emrah, Bing Xue, Mengjie Zhang, Hengzhe Zhang, Jesse Wood

Emrah gave a talk on Nuceli Segmentation and Mitosis detection with deep learning.

What is it?
  • Nuclei segmentation is a key part of cancer diagnosis.

Background
  • Histopathology - Hematoxylin and eosin (H&E) stain

  • H&E is one of the principal tissue stains used in histology.

  • It is the most widely used stain in medical diagnosis and is often the gold standard.

  • For example, when a pathologist looks at a biopsy of a suspected cancer, the histological section is likely to be stained with H&E.

Motivations:
  • Use deep learning to automate nuclei segmentation and mitosis detection.

Method:
  • Encoder-decoder architecture

  • generative model

  • U-Net (modified)
    1. Reducing the depth of the architecture to 3

    2. Applying same padding to each vonvolutional layer

    3. Dice pxiel classification layer,

Results:
Related:
  • Paper from author (Emrah 2022)

  • Dice pixel classification layer (Shaukat 2022)

  • MoNuSeg dataset :ref::(Kumar 2019) <kumar2019multi>

2023-12-01 - Weekly

Location: CO352, Time: Friday, 14:00 - 15:00, Attendees: Bing Xue, Bach Hoai Nguyen, Jesse Wood

Motivations and goals - they key focus of the proposal is coeherence between these two parts
  • problem statement

  • motivations

  • goals

Exemplar - look at Jordan’s proposal as an example of how to structure these sections.

Goals != contributions

2023-12-02 - Proposal Seminar

Location: CO431, Time: Friday, 10:00 - 11:00, Attendees: Jesse Wood, Mengjie Zhang, Bing Xue, Bach Hoai Nguyen, Daniel Killeen, Andrew Lensen

I gave my proposal seminar today.

Presentation notes:
  • open the following;
    1. fish + ai

    2. preliminary work

    3. literature review graph

  • room: CO431 at 12 PM

Questions:

Q1. Technical details of methods proposed. A1. CNN, MCIFC, RF SVM, PCA-LDA \(\to\) Occam’s razor Q2. (Excluding fish) what is the contribution? A2. Applications in a new domain fish + AI Q3. What are the staitiscal methods in question? A3. Grubb’s test, outlier thresholding, oil detection in fish Q4. Apply existing paradigms to a new domain? Fish + AI Q5. Human-powered AI-assisted decision making.

2023-12-07 - Weekly

Location: CO352, Time: Thursday, 10:00 - 10:30, Attendees: Bing Xue, Bach Hoai Nguyen, Mengjie Zhang, Jesse Wood

Proposal seminar results - Ali Knott will send me an email with my results, soon!

Post-graduate people:
  • Ali Knott is the post-graduate coordinator of AI + CS

  • Jennifer Ferreira is the post-graduate coordinator for SE

  • Neil Dogeson is the dean of Faculty of Graduate Research (FGR)

  • Nokthaba is the associate dean of the Faculty of Graduate Research (FGR)

I will apply to change my course code to AIML 694, when it exists.

Course codes:
  • 690 - standard AIML

  • 692 - engineering AIML (TBD)

  • 694 - applications AIML (TBD)

Progress:
  • Exploratory data analysis of REIMS.

  • Standard AIML algorthms applied to REIMS.

  • DT and RF perform best, near-perfect accuracy.

Future work:
  • Apply ST-GP, MT-GP, MCIFC to REIMS.

  • Further exploratory data analysis.

  • Limitations, Motivations, Goals
    1. Limitations - EDA finds technical limitations

    2. Motivations - find technical details to address the limitations

    3. Goals - formalize tehcnical details into novel approaches.

  • An example of PSO - limitations, motivations, goals:
    1. Limitations - REIMS is a high-dimensional, noisy, few-shot dataset.

    2. Motivations - Global search, feature interactions, non-global local optima.

    3. Goals - PSO for feature selection, can perform global search and address these technical limitations.

  • Feature selection
    • Meng suggested the following

    • Apply my feature selection analysis from GC to REIMS dataset.

    • See how many features are needed for good accuracy.

TODO:
  • Chapter 2: Identification
    • [ ] Explore the performance of gradually adding more features as you did for the AJICAI paper

    • [ ] Analyse DT results, like the selection of the features, over multiple runs of DT

    • [ ] Discuss the findings of EDA, to explore what are the technical challenges in the data and discuss what possible techniques can be used to address each of the challenges

    • [ ] Run experiments using the GP methods to see the results

    • [ ] Bring hard copies of results/writing or send the pdf file before the meeting.

  • Future work
    • [ ] Vector-based approach? (A third possible GP method to compare)

  • Papers

2023-12-07 - FASLIP

Location: CO350, Time: Thursday, 14:00 - 15:00, Attendees: Xinming Xi, Bing Xue, Bach Hoai Nguyen, Mengjie Zhang, Hengzhe Zhang, Jesse Wood

Xinming Xi Southern University of Science and Technology China gave a talk on “Evolving circuits using Genetic Programming”.

Background:
  • Memristor - “resistor with memory”.
    • Calculate + store \(\to\) same time.

    • an analog process, faster than digital

    • physical device, non-volatile, state-of-the-art electronices device for circuit design.

    • A single memristor emulates a synapse in the brain.

  • Evolved Hardware Research (EHW)

Motivations
  • Manual design of circuits is a time consuming, laborious, and intensive task.

  • Hence Evoled Hardware Research (EHW) is needed to automate the design of circuits.

Method
  • Analog circuit topology synthesis by means of evolutionary comptionation.

  • Candidate solutions provide both circuit topology and component values.

  • Circuit representations can be: (1) string-based, (2) tree-based, (3) graph-based?

  • PMOS/NOMS are voltage controlled devices.

  • One tree to represent the circuit topology, one tree per function, to represent the value, e.g. R, NMOS, PMOS.

  • Structure checks to prevent loops in the circuit topology, e.g. short circuits.

  • three types of strucutre checks
    1. all external terminals

    2. devices are connected

    3. random generated terminals

2023-12-08 - ECRG

Location: CO352, Time: Friday, 14:00 - 15:00, Attendees: Ruwang Jiao, Bing Xue, Bach Hoai Nguyen, Mengjie Zhang, Jesse Wood

Ruwany Jiao gsave a talk on “Evolutionary Mutli-objective Feature slection for Classification” https://ruwangjiao.github.io/

Multi-objective
  • a set of tradeoffs between different feature subsets to meet differetn requirements of different stakeholders.

Feature selection
  • high-dimensional datasets: curse of dimensionality

Classification
  • a simple example is to imagine a machine learning model for binary classification of cats and dogs.

Feautre relevance
  • for cat and dog classification

  • age is an irrelevant feature.

Feature redundancy
  • for cat and dog classification

  • tail shape, and tail height

  • shape may capture information needed for classification, making height redundant.

Challenges of feature selection:
  1. complex feature interactions

  2. large search space

  3. objectives conflict

Existing methods
  1. brute force

  2. sequential search

  3. stochastic search

“Populationed based search is particularly suited to multi-objective optimization” (Xue 2015)

Weekly related features can become complimentary features \(f_1 + f_2 = f_{new}\)

Related:

2023-12-14 - Weekly

Location: CO352, Time: Thursday, 10:00 - 10:30, Attendees: Bing Xue, Jesse Wood

Suspension form - resubmit my suspension form with a better explanation for my suspension.

Proposal - need to provide some notes on technical details of my proposal for next round of Candidate Development Plan (CDP).

Progress - Implemented a Convolutional Neural Network (CNN) for identification task, research objective 1, sub-objective 1 - fish species classification.

Feature selection - see how many features are needed for good results.

GAN - good idea to focus on GANs for outlier thresholding for contaminiation of fish. But should understand them inside and out, and dedicated more than one day to researchig them.

Email - send Meng, Bing, Bach (i.e. supervisors) an email with the minutes for this meeting summarized.

2023-12-14 - FASLIP

Quinyu Wang gave a talk on “Genetic Programming with Aggregate Channel Features for Flower Localization” https://ecs.wgtn.ac.nz/Main/GradQinyuWang

Background:
  • Object localization processes
    1. Edge detection

    2. Pattern recognition

    3. Saliency detection.

  • Object localization
    • GBBox, oBBox, cBBox

Dataset:
  • Flower images.

  • 102 categories of flower images

  • 40 images per class

  • 224x224 resolution

Method:
  • YoloV8 pretrained on COCO dataset, 100 epochs to fine-tune to flower dataset.

  • Detection accuracy \(\frac{TP}{ TP + FN}\)

  • Aggregate channels, GP tree to aggregate input channels together, rather than treat them as seperate channels.

  • Idea: allow feature interactions between multiple channels in aggregations generated by genetic programs.

2024-02-01 - FASLIP

Hengzhe to give a talk on his EuroGP paper “Improving Generalization of Evolutionary Feature Construction with Minimal Complexity Knee Points in Regression https://ecs.wgtn.ac.nz/Main/GradHengzheZhang

Background:
  • regression - this method works for regression, but can be applied to other tasks, such as classification.

  • feature construction - construct a new set of features to improve learning performance

Existing work (Regression):
  1. Kernal-based methods

  2. Neural networks

  3. Genetic programming

What is it?
  • feature construction - convert features into a new feature space where it is linearly seperable

  • mutli-objective, balance accuracy and complexity, where:
    • accuracy - regression performance

    • complexity - depth of decision tree

  • Pareto optimial - solutions on the pareto front represent the best tradeoffs between the two objectives, accuracy and complexity, respectively.

Existing work (knee-points):
  • Angle-based methods

  • Utility function

  • Distance to extreme line

Proposed work:
  • clustering - use a clustering algorithm to determine knee points.

  • cluster with the largest bend angle becomes a knee point

  • NGSA-II - non-dominated sorting algorithm for environmental selection.

Philosphy:
  • Occam’s razor - the simplest solution is the best solution.

2024-02-01 - Weekly

Location: CO352, Time: Thursday, 15:30 - 16:00, Attendees: Bing Xue, Mengjie Zhang, Jesse Wood

Notes:
  • Medical reason for absence from work.

  • Move meeting to Thursday afternoon at 3:30 PM.

  • Minutes - send a copy of the minutes to my supervisors after each meeting.

  • 100% accuarcy - investigate results to see if this is too good to be true.

What I’ve done?
  • Classificaation on fish species and part

  • Convolutional Neural Network (CNN) on fish species and part

  • Feature selection on fish species.

TODO:
  • [ ] Feature selection on fish part

  • [ ] Decision tree explainability (XAI)

  • [ ] Write a draft for a thesis chapter on this work

2024-02-02 - ECRG

Location: CO352, Time: Friday, 15:00 - 17:00, Attendees: Lin Yida (Benny), Bing Xue, Mengjie Zhang, Christian Raymond, Jesse Wood

Lin Yida (Benny) is a new student from Taiwan.

Notes:
  • He studied at the National Taiwan University of Science and Technology.

  • He gave a presentation on his work he did there.

Previous work:
  • Fixed network transport job shop scheduling (FNT-JSS)

  • Wrote code in C# for simulations.

Background:
  • Pruning Radiata Pine trees.

  • Branches take nutrition from the tree trunk.

  • Pruning branches to ensure tree trunk is strong.

  • Want strong trunks in pine trees for high quality wood.

Data:
  • Drone imagery with monocular/stereo camera

  • (Why not) LiDAR

Method:
  • Computer vision for branch detection.

  • Manually annotate labels for data.

  • Camera types:
    1. monocular depth estimation

    2. stereo depth estimation

    3. multi-view depth estimation

Why it matters?
  • Forestry is an incredibly dangerous and laborious task.

  • Automatically identifying trees suitable for pruning is a valuable task.

  • It can save time, money, and lives.

2024-02-08 - Weekly

Location: CO352, Time: Thursday, 15:30 - 16:00, Attendees: Bing Xue, Jesse Wood

Notes:
  • Bach is still on leave.

  • Mengjie was busy.

  • Need to write drafts for experiments that I have run.

  • Decision tree results are good, but need to be explained.

What I’ve done
  • [x] Feature selection on fish part

  • [x] Decision tree explainability (XAI) for 100% accuracy

  • [ ] Write a draft for a thesis chapter on this work

TODO:
  • [x] Explain the decision tree output in English to accompany figure.
    • [x] Gini

    • [x] Value

    • [x] Samples

    • [x] Class

  • [ ] Make decision tree figure more interpretable.
    • [ ] pure nodes have color

    • [ ] no color for internal nodes

  • [ ] Write drafts for experiments that I have run.
    • [ ] Feautre selection

    • [ ] Classification

    • [ ] CNN

    • [x] Decision tree

2024-02-09 - ECRG

Location CO352, Time: Friday, 15:00 - 17:00, Attendees: Tom Bensky, Bing Xue, Mengjie Zhang, Christian Raymond, Jesse Wood

Tom Benksy from the University of California San Luis Osispo (SLO) Poyltechnica gave a talk on “GP for class scheduling in a large physics department”. Visiting ECRG for a couple months. https://physics.calpoly.edu/tbensky

Background:
  • Tom is a physics professor

  • In November 22, ChatGPT took him be surprise in academia. Impossible to assess students work. Can we trust student’s work wasn’t done by ChatGPT.

  • 5x7 images for MNIST dataset using Artifical Neural Networks (ANNs) and squashing functions. During the second AI winter.

  • He was surprised not everyone in ECRG was learning LLMs.

AI Timeline:
  • 1950 - turing test

  • 1975 - AI winter

  • 1980 - AI boom, expert systems

  • 1988 - Second AI winter

  • 2012 - Deep Learning revolution

  • 2019 - AI boom, LLMs, Attention, Transformers

  • 2022 - ChatGPT.

What was it?
  • Class scheduling for a large physics department.

  • Implementing a Genetic Algorithm (GA) to create teaching schedules for ~50 professors and ~200 classes in a university physics department.

Method:
  • Australian told him “long binary strings and mix them up and your results get better” - The Genetic Algorithm.

2024-02-15 - FASLIP

Location: CO350, Time: Thursday, 14:00 - 15:00, Attendees: Carl McMillan, Bing Xue, Mengjie Zhang, Jesse Wood

Carl McMillan gave a talk on “Evolutionary deep learning for building a buoyancy detection alerting system for NZ mussel farms”

https://ecs.wgtn.ac.nz/Main/GradCarlMcMillan

Mussels:
  • Perna canaliculus,[a] the New Zealand green-lipped mussel, also known as the New Zealand mussel, the greenshell mussel, kuku, and kutai, is a bivalve mollusc in the family Mytilidae (the true mussels). P. canaliculus has economic importance as a cultivated species in New Zealand. https://en.wikipedia.org/wiki/Perna_canaliculus

Motivation:
  • When mussels grow on floats, they weigh them down.

  • This can cause the floats to sink, or even implode, due to water pressure.

  • Device needs to run on a portable device.

Limitations:
  • Few training examples.

  • Data has to be collected from the field.

  • Label annotations are needed to done manually.

Why it matters?
  • Planned $3 billion dolar expansion of aquaculture industry in New Zealand.

Method:
  • YOLO, R-CNN, others

  • transfer learning to improve performance.

  • Float buoyance:
    • 0 - 10% \(to\) very low

    • 10 - 30% \(to\) low

    • 30 - 60% \(to\) medium

    • 60 - 100% \(to\) high

2023-02-15 - Weekly

Location: CO352, Time: Thursday, 15:30 - 16:00, Attendees: Bing Xue, Jesse Wood

Decision tree:
  • Original book from 1984 on Decision Trees mentions applications.

  • One such application is mass spectra analysis to toxic chemical detection.

  • See (Breiman 2017)

What I’ve done
  • Written up an explanation of the decision tree results, and explained the model output.

  • Fish species, 100% accuracy on test and train with decision tree

  • Fish part, 100% accuracy on whole dataset

  • need to exclude test dataset from training.

  • Not a validation set, considering the whole GP process.

  • Instead can call it “sub-training” or “fitness evaluation set”.

TODO:
  • [ ] Wrapper-based multi-tree GP on train, subtraining, and test dataset.

  • [ ] Try alpha \(\alpha \in [0, 0.5, 1]\) for the GP methods.

  • [ ] Decision tree
    • [ ] Make decision tree figure more interpretable.

    • [ ] pure nodes have color

    • [ ] no color for internal nodes

  • [ ] Write drafts for experiments that I have run.
    • [x] Decision tree

    • [ ] Feautre selection

    • [ ] Classification

    • [ ] CNN

2024-02-13 - ECRG

Location: CO352, Time: Friday, 15:00 - 17:00, Attendees: Bernhard Pfahringer, Bing Xue, Mengjie Zhang, Jesse Wood

Bernhard Pfahringer gave a talk on “All you need is patience & persistance” https://profiles.waikato.ac.nz/bernhard.pfahringer

Notes:
  • Phind v10 language model. prompt: “Who is Bernhard Phahringer?”

  • TAIAI project, headed by Waikato, environmental datastreams.

  • “plausiable fluent outputs with no guarantee of correctness.” - Language models

  • Timeline:
    • 1990s - bagging and boosting, e.g. he attended conferences where ADABoost was presented.

    • 1997 - DeepBlue

    • 2000s - SVMs

    • 2014 - AlexNet

    • 2017 - Transformers

    • 2022 - ChatGPT

  • Impossible nowadays to submit a paper with instant acceptance.

  • Data leakage - an important problem in industry projects and datastreams.

  • For me: Data leakage
    • in fish dataset, ensure same fish isn’t seen twice

    • fish have ids in training labels, stratify based on fish id.

  • XGBoost - try using XGBoost for my dataset.

  • SO-KNL was mentioned, Benhard is an author on this paper.

Related:
  • SO-KNL (Sun 2022), speaker author, along with Heitor.

2024-02-22 - FASLIP

Location: CO350, Time: Thursday, 14:00 - 15:00, Attendees: Bing Xue, Mengjie Zhang, Jesse Wood

Visting student from Nanjing University, Mingqian gave a talk on “Differentiable Architecture Search in Image classification”.

Task:
  • Identify AI generate images with good generalization.

Methods:
  • Stable Diffusion, GANs, and other image generation methods.

  • Differentiable Architecture Search (DARTS)

  • Attention mechanism

Results:
  • Performs well on in distribution data, i.e. other stable diffusion models.

  • Does not generalize well to out of distribution data, i.e. other image generation methods.

Related:
  • Attention mechanism (Vaswani 2017)

  • Differentiable Architecture Search (Liu 2018)

  • Denoising diffusion probabilistic models (DDPM) (Ho 2022) for original DDPM paper.

  • Denoising diffusion implicit models (DDIM) (Song 2022)

2024-02-22 - Weekly

Location: CO352, Time: Thursday, 15:30 - 16:00, Attendees: Bing Xue, Mengjie Zhang, Jesse Wood

What I’ve done?
  • [x] Wrapper-based multi-tree GP on train, subtraining, and test dataset.

  • [1/2] Write drafts for experiments that I have run.
    • [x] Decision tree

  • [x] CNN with heavy regularization on fish parts.

  • [x] Mass Bank of North America (MoNA) - mass spectrometry dataset (potential for transfer learning)

Results:
  • Fish parts:
    • MT-GP 50% on test dataset - very poor.

    • SVM getes 70% on test dataset - better.

    • CNN with heavy regularization 80% on test dataset - very good.

Discussion:
  • Meng suggested adding a distance measure to the fitness function.

  • Such as intra-class distance, and inter-class distance.

  • This has been done before in many MT-GP papers.

  • It serves as a regularization technique to improve generalization.

TODO:
  • [ ] Wrapper-based multi-tree GP
    • [ ] Intra/inter Distance metric for regularizations

    • [ ] Try alpha \(\alpha \in [0, 0.5, 1]\) for the GP methods.

  • [ ] Decision tree
    • [ ] Make decision tree figure more interpretable.

    • [ ] pure nodes have color

    • [ ] no color for internal nodes

  • [ ] Write drafts for experiments that I have run.
    • [ ] Feautre selection

    • [ ] Classification

    • [ ] CNN

2024-02-23 - ECRG

Location: AM101, Time: Friday, 15:00 - 17:00, Attendees: Steven L Jaques, Mengjie Zhang, Jesse Wood, Chirstian Raymond, Kaan Demir

Steven L Jaques gave a talk on “Skin optics the connection between intrinsic properties of skin and the appearance.”

Notes:
  • Measurements \(to\) theory \(to\) optical properties \(to\) effects of light

  • Measurements \(to\) theory \(to\) optical properties \(to\) biological parameters \(to\) clinical decision.

  • “Big different between observation and intrinsic properties”

  • Method optical light spectroscopy

  • Absorbtion - escaping light reflectance.

  • Fat from oil in milk, tiny bit of milk, lets the light escape.

  • Cloudy eye in dead fish, no water to maintain constructive interference.

  • Optimal Coherance Tomography (OCT) image

  • Blue skin is actually just melanin layer under collagen, an illusion.

2024-02-29 - FASLIP

Location: CO350, Time: Thursday, 14:00 - 15:00, Attendees: Junjia Feng, Bing Xue, Mengjie Zhang, Jesse Wood

Junjia Feng gave a talk on “Multi-modal aspect oriented sentiment analysis”

Background:
  • Mutli-modal - image and text

  • Most tweets before 2015 were text only. After 2015, more images were included.

  • Sentiment analysis - positive, negative, neutral

  • Aspect oriented - fine grained
    • Document level

    • Sentence level

    • Aspect level

  • e.g. “Today’s salad was delicious, but the steak was not fresh”

  • Aspects, salad \(\to\) positive, steak \(\to\) negative

  • Related works: TomBert, CapBERT, BLIP-2

Related:
  • BLIP-2 (Li 2023) propose Blip-2 for multi-modal pre-training.

  • TomBERT (Yu 2019) propose TomBERT for multi-modal sentiment classification.

2024-02-29 - Weekly

Location: CO352, Time: Thursday, 15:30 - 16:00, Attendees: Bing Xue, Mengjie Zhang, Jesse Wood

Notes:
  • Excellent results for wrapper-based multi-tree GP.

  • Result: 100% train accuracy, 75% test accuracy. In 3/5 runs.

  • Tinker - tinker with hyperparameters to see if results can be improved.

  • Too good to be true - +25% is a massive improvement, check the code to ensure this is not a mistake.

  • Feature analysis - find ways to visualize the constructed features. E.g. GP trees.

2024-03-01 - ECRG

Location: CO350, Time: Friday, 15:00 - 17:00, Attendees: Xiaocheng, Jigang Fan, Bing Xue, Mengjie Zhang, Jesse Wood

Xiaocheng have a talk on “traffic assignment via genetic programming”.

Notes:
  • traffic efficiency

  • Buiding roads simply building roads can backfire

  • Real world challenges:
    1. dynamic travel time

    2. unknown delay function

    3. dynamic/uncertain arrivals

  • Dynamic travel time - if everyone takes the shortest path, it may not be the fastest, due to increased traffic.

  • Nash equilibrium, drivers are selfish and will take the fastest route, sacrificing the global optima - optimal system performance.

Jigang Fang gave a talk on “intelligent fault diagosis methods”

Notes:
  • Intelligent fault diagnosis (IFD)

  • Rotating bearings are key components of rotating machinery.

  • Data: photo, thermography, sensors.

  • Class imbalance - In the real world fault conditions contain fewer samples than in normal conditions.

  • Imbalanced data approaches:
    1. misclassifcation cost

    2. data augmentation

    3. hybrid methods

  • Proposed method is FAWGAN-GN

  • Visualization - 2D projection of class seperability with t-SNE for dimensionality reduction.
    • Note: I should use this for my wrapper-based GP method.

2024-03-14 - FASLIP

Location: CO350, Time: Thursday, 14:00 - 15:00, Attendees: Serafina Slevin, Bing Xue, Mengjie Zhang, Jesse Wood

Serafina Slevin gave a talk on “AI in Climate Change: Ice sheet modelling using machine learning”

Background:
  • The Antarctic Ice Sheet (AIS) has huge potential to increaase the sea level in changing climate conditions.

  • The total volume of the AIS would raise the water level by 55 - 60 meters.

  • Predicting climate change is hard. The current best way to do so is climate simulations.

Motivations:
  • Mimic climate simulations

  • Reduce computation

  • Interpretable models

Method:
  • inputs: x-axis, y-axis, precipitation, air temperature

  • outputs: ice thickness, ice velocity, ice mask

  • Random forest, Logistic Regression, Linear/Polynomial SVM

  • Sequential modelling - taking the predicted value of one target as a feature for predicting another one.

Results:
  • Random forest gets the best results on the training set.

Future work:
  • GP for feature construction that feeds into a random forest classifier.

  • CNNs and RNNs, neural networks suited for spatial and temporal data, respectively.

2023-03-14 - Weekly

Location: CO352, Time: Thursday, 15:30 - 16:00, Attendees: Bing Xue, Mengjie Zhang, Jesse Wood

What I have done?
  • Results are stochastic for wrapper-based GP feature construction

  • Frozen the seed for the random forest, so fitness evaluation is consistent.

  • Feature selection, with MRMR can get 69% accuracy.

Randomness:
  • same random seed \(\to\) same results

  • different random seed \(\to\) different results

  • Control the random seed to get 30 independent runs.

Send an email/pdf with:
  • Evolutionary process

  • Final tree

  • Table with results

Related:
  • Original MRMR paper (Ding 2005)

  • Uber’s MRMR paper:ref:(Zhao 2019) <zhao2019maximum>

2023-03-14 - FASLIP

Location: CO350, Time: Thursday, 14:00 - 15:00, Attendees: Quinyu Wang, Bing Xue, Mengjie Zhang, Jesse Wood

Quinyu Wang gave a talk on “Genetic Programming with Aggregate Channel Features for Flower Localization” https://ecs.wgtn.ac.nz/Main/GradQinyuWang

Background:
  • Our brains can process an image to recognize a flower in a fraction of a second.

  • Task: image localization

  • Object localization processes
    1. Smoothing

    2. Edge detection

    3. Pattern recognition

    4. Saliency detection.

  • Object localization
    • GBBox, oBBox, cBBox

Limitations:
  • sliding window search mechanism

  • manually detected channel features

Dataset:
  • Flower images.

  • 102 categories of flower images

  • 40 images per class

  • 224x224 resolution

Method:
  • Proposed method: Strongly-typed tree-based genetic programming (GP)

  • Benchmarks:
    • YoloV8 pretrained on COCO dataset, 100 epochs to fine-tune to flower dataset.

  • Fitness
    • Detection accuracy \(\frac{TP}{TP + FN}\)

    • Average IoU \(\frac{1}{N}\sum^n_i (prediction_i, ground-truth_i)\)

  • Aggregate channels, GP tree to aggregate input channels together, rather than treat them as seperate channels.

  • Idea: allow feature interactions between multiple channels in aggregations generated by genetic programs.

Results:
  • Outperforms YOLOv8 on the flower localization task.

  • Automatically extract and aggregate features for flower localization.

2024-03-22 - ECRG

Location: CO350, Time: Friday, 15:00 - 17:00, Attendees: Guenther Raidl, Jigang Fan, Bing Xue, Mengjie Zhang, Jesse Wood

Günther Raidl gave a talk on “Advanced in Large Neighborhood Search for (Dynamic) Electric Dial-A-Ride Problems https://www.ac.tuwien.ac.at/people/raidl/

Background:
  • Types of travel:
    • public transit services

    • taxi services

    • Dial-A-Ride (DAR) services

  • Dial-A-Ride problem (DARP) is NP-hard

  • Generalization of
    • Capaciatates vehicle routing problem

    • Pickup and delivery problem

  • Electric Autonmous DAPR
    • Vehicles have to be charged

    • Total travel time

    • Excess ride time

Related work:
  • (Bongiovani et al. 2019, 2022, 2023)

  • Mixed integer linear porgramming

  • Deterministic annealing

  • Bilevel large neighborhood search

Method:
  • Large Neighborhood Search (LNS)
    • set of destroy operators

    • set of repair operators

  • Linear Programming (LP)

  • Surrogate evaluation function
    • Learning a surrogate evaluation function

    • During an supvervised learning offline phase.

Related:

2024-03-28 - Weekly

Location: CO352, Time: Thursday, 15:30 - 16:00, Attendees: Bing Xue, Mengjie Zhang, Jesse Wood

Notes:
  • AIRA is a workshop, not a real peer-reviewed conference. False alarm for accepted paper.

  • Wrapper-based multi-tree GP gets 48.61% \(\pm\) 19.04% test accuracy on fish part dataset in 10 independent runs.

  • Transformer based architecture gets 99.1 \(\pm\) 1.66% test accuracy on fish species dataset in 10 independent runs.

  • Next week I should bring a print out of my results to my meeting with Bach.

  • Bing and Meng will be away for the next couple of weeks.

  • SVM \(\to\) KNN, for wrapper-based multi-tree GP.

  • A simpler classifier is less likely to overfit the training set.

  • Use cross validation for the fish part dataset, due to limited number of samples.

  • Ask Daniel if there is any more data available for the fish parts classification problem.

2024-04-04 - FASLIP

Location: CO350, Time: Thursday, 14:00 - 15:00, Attendees: Binke Xue, Bing Xue, Mengjie Zhang, Jesse Wood

Binke Xue gave a talk on “Ensemble learning based on nueral networks for tree segementation”

Background:
  • Industry partner Manaaiki Whenua, Landcare Research, Informatics

Dataset:
  • Validation: 66 images

  • 256x256 pixel colour images

  • 512 images total

Method:
  • PSPnet, U-net, LinkNet

  • Resnet-152, ResNeXt

  • Dice coefficient for evaluation metric for image segmentation.

Results:
  • Enseble method based on 3x LinkNet and 1x U-net performed worse than single model on validation.

  • However, the ensemble method performed best on the test set

Related:

2024-04-05 - ECRG

Location: CO350, Time: Friday, 15:00 - 17:00, Attendees: Zhixing Huang, Jigang Fan, Bing Xue, Mengjie Zhang, Jesse Wood

Zhixing Huang gave a talk on Linear Genetic Programming (LGP) for Dynamic Job Shop Scheduling (DJSS). https://ecs.wgtn.ac.nz/Main/GradZhixingHuang

Background:
  • Linear genetic programming (LGP)

  • A sequence of register based instructions

  • Sequential execution of each instruction

  • Destination register, source register, function

Differences from regular GP
  • Genotype - many programs

  • Phenotype - most essential programs

  • Semantics - program behaviours

  • Fitness function - objective function

Motivations:
  1. Flexible building block reusing and compact representation

  2. Mutli-output, multi-input

  3. Assembly language

Limitations:
  1. Limited fitness evaluation

  2. LGP mainly focus on classifcation/regression problems

Method:
  • Multi-task Linear Genetic Programming (MTLGP)

  • Multi-representation Linear Genetic programming (MRLGP)

  • Grammar-guided Linear Genetirc Programming (GGMGP)

Conclusions:
  • GP search is essentially a symbolic search technique.

2024-04-11 - FASLIP

Location: CO350, Time: Thursday, 14:00 - 15:00, Attendees: Bach Nguyen, Mengjie Zhang, Jesse Wood

We watched the talk from Xin Yao on “What can evolutionary computation do for you?” (again) https://www.youtube.com/watch?v=pcl08XA1REM

Related:

2024-03-28 - Weekly

Location: CO352, Time: Thursday, 15:30 - 16:00, Attendees: Bach Nguyen, Jesse Wood

Notes:
  • Random forest for wrapper-based GP with 6 constructed features.

  • Class dependent GP. For fish parts, 6 classes, 6 trees.

  • Bing has a paper of class independent GP in the Pattern Recognition journal

2024-04-18 - FASLIP

Location: CO350, Time: Thursday, 14:00 - 15:00, Attendees: Yuye Zhang, Bing Xue, Jesse Wood

Yuye Zhang gave a talk on “Multimodal Machine Learning (MML)”

Background:
  • Modality - refers to the way in which something is expressed or percieved.

  • Unimodal - refers to a problem or dataset with a single modality.

  • Mulitmodal - refers to a problem or dataset with multiple modalities.

  • MML - multimodal machine learning

  • Types of multimodal data:
    1. Multimedia for the same object.

    2. Same media type, different sensors.

    3. Information with different structure or representation.

  • Principles of multimodal data:
    1. Modalities are heterogenous

    2. Modalities are connected

    3. Modalities interact

Existing works:
  • Contrastive Language Image Pretraining (CLIP) (Radford 2021)

  • Dataset WebImage text - 400 million text-image pairs

  • Momentum contrast for unsupvervised visual representation learning (He 2020)

  • BERT: Pre-graining of deep bidirectional transformers for language understanding (Devlin 2018).

  • Language dirven semantic segmentation (Li 2022)

  • Deep multimodal multilinear fusion with high order polynomial pooling (Hou 2019)

2024-04-18 - Weekly

Location: CO352, Time: Thursday, 15:30 - 16:00, Attendees: Bing Xue, Jesse Wood

Notes:
  • Embedded GP for fish parts gets 75% test accuracy.

  • Try embedded GP for fish species

  • Bring results printed out to my next meeting.

  • Write a report about class-dependent multi-tree GP.

  • Send Bing a PDF of my slides for my proposal.

2023-05-02 - Weekly

Location: CO352, Time: Thursday, 15:30 - 16:00, Attendees: Mengjie Zhang, Bach Hoai Nguyen, Jesse Wood

Notes:
  • Explain the output of the attention map.

  • Troubleshoot loss curve for CNN.

  • GP for oil and cross-species contamination detection.

2024-05-09 - Weekly

Location: CO352, Time: Thursday, 15:30 - 16:00, Attendees: Mengjie Zhang, Bach Hoai Nguyen, Jesse Wood

Notes:
  • oil detection - need to split train, validation and test set. Can’t use the test set.

  • Run experiments 30 times for statistical significance.

  • Write thesis chapters while I am rnning experiments.

  • Plan the next week’s work in advance, run new ideas by supervisors first.

2024-05-16 - Weekly

Location: CO352, Time: Thursday, 15:30 - 16:00, Attendees: Mengjie Zhang, Jesse Wood

Notes:
  • Candidate development plan is due tomorrow.

  • Change “cross-species” \(\to\) adulterated

  • We have 4 datasets, same data, different labels.

  • Start writing up my experiments in a paper format.

  • Investigate SVM results, try different kernels.

  • Add Linear Disciminant Analysis (LDA)

  • Compare features selected for each of the interpretable methods.

TODO:
  • [x] Candidate development plan

  • [x] Try different SVM kernels.

  • [x] Linear Disciminant Analysis (LDA)

  • Compare features selected for each of the interpretable methods.

  • [ ] Start writing up my experiments in a paper format.

  • [ ] Change “cross-species” \(\to\) adulterated

  • [ ] Selected feature comparison.