Skip to content

Technical Interview Questions

Source: Notion | Last edited: 2024-07-18 | ID: 54848fd4-37c...


针对data science challenge的问题,方法的优缺点

Kulaphong Jitareerat: UBC, GPA 9.4, 泰国人,本科泰国,班级第一,master2024年刚刚毕业

之前经历kaggle 拿到不错名次

master和startup合作的项目:具体谈谈constructing linear and nonlinear machine learning models, and visualizing predictions using Looker.

Q:Can you explain the core principles of XGBoost and how it differs from traditional gradient boosting algorithms? Additionally, can you discuss some practical scenarios where you have successfully applied XGBoost in your work, and what were the results?

  • boosting: iteratively adding new trees that correct the errors of the previous trees
  • objective func: consisting of a training loss fun and a regularization term, reg term prevents overfitting
  • second order derivative: traditional uses first-order; while it uses both first and second order derivative to optimize the model, resulting in faster and more stable convergence
  • pruning: traditional uses pre-pruning, it uses both pre and pro pruning Q: if our dataset is imbalanced, what could you do?

By adjusting sample weights and using regularization, it can significantly improve model robustness.

优势: missing data,high dimensional features, imbalanced

在特征工程上,你怎么考虑的?(你知道xgboost方法的缺陷吗? 比如需要仔细提前特征工程,我看到了一些努力,比如加入了lag和momentum,为什么没有考虑增加更多呢)

可以考虑的比如:

  • 滞后特征(Lag Features):捕捉时间序列数据中的时间依赖性。
  • 移动平均特征(Rolling Mean Features):平滑数据,捕捉长期趋势。
  • 滚动标准差特征(Rolling Standard Deviation Features):捕捉数据的波动性。
  • 移动最小值和最大值特征(Rolling Min and Max Features):捕捉数据的范围和极值。
  • 动量特征(Momentum Features):捕捉数据变化的速度。
  • 季节性特征(Seasonal Features):捕捉时间序列中的周期性模式。

物理学家

使用mean-variance的方法有什么好处和不好的地方呢

  1. Could you share what is your thinking process when you were given the challenge problem?
  2. Explain the concept of the efficient frontier in portfolio optimization to someone with a non-financial background. How did you apply this concept in the challenge project?
  3. If a models is underfitting, what to do? what about overfitting?
  4. How to know if a model is underfitting or overfitting?
  5. Explain the difference between supervised and unsupervised learning.
  6. Can you describe a machine learning project where you faced challenges due to your initial lack of familiarity, and how you overcame them?
  7. Can you walk us through how you approached learning the financial concepts needed for the challenge project? What resources did you use?
  8. Describe a complex problem you solved during your PhD or postdoctoral work. How did you approach it, and what was the outcome?
  9. How do you handle situations where your initial approach to solving a problem doesn’t work? Can you give an example?

计算机博士,本科研究生都在伊朗读的,学校非常一般,本科甚至是技校,主要项目经验在classification

About submission

q: 数据分成train, test,这样做的原因?可能有什么问题

q: 直接设置epoch是10,为什么?(没有使用early stop)

q: I noticed that you have used Optuna before for hyperparameter tuning. What do you think about hyperparameter tuning? Have you compared different packages that do HT?

For straightforward tasks, Scikit-learn’s GridSearchCV or RandomizedSearchCV may suffice. For more complex and large-scale problems, Hyperopt, Optuna, Ray Tune, or Bayesian Optimization offer more sophisticated and efficient solutions.

q :What is the working principle of Long Short-Term Memory, and what problems does it solve?

LSTM (Long Short-Term Memory) networks are a type of recurrent neural network (RNN) designed to handle long-term dependencies.

  • LSTMs have a memory cell that can maintain information in memory for long periods of time.

  • The architecture includes gates (input gate, forget gate, output gate) that control the flow of information in and out of the memory cell. Problems Solved:

  • LSTMs address the vanishing gradient problem common in standard RNNs, allowing them to learn long-term dependencies and sequences more effectively.

本科在印度,硕士在Ottawa大学,本科短期实习都只有2个月,本科毕业写node.js,做了半年,研究生期间做了2个实习,1个3个月,1个7个月,

  1. 做过NLP,
  • Tokenization:
    • Question: What is tokenization in NLP, and why is it important?
    • Expected Answer: Tokenization is the process of breaking down a text into smaller units called tokens, which can be words, subwords, or characters. It is important because it transforms raw text into a structured format that can be processed by machine learning models. Proper tokenization is crucial for the accuracy of NLP tasks such as parsing, sentiment analysis, and translation.
  • Word Embeddings:
    • Question: Explain the concept of word embeddings and name some commonly used word embedding techniques.
    • Expected Answer: Word embeddings are dense vector representations of words that capture their meanings and relationships based on context. Common techniques include Word2Vec, GloVe (Global Vectors for Word Representation), and FastText. These embeddings help in capturing semantic similarity and improving the performance of NLP models.
  1. 做过time series,
  • Stationarity:
    • Question: What is stationarity in time series analysis, and why is it important?
    • Expected Answer: A time series is stationary if its statistical properties, such as mean and variance, remain constant over time. Stationarity is important because many time series forecasting methods, including ARIMA, assume that the data is stationary. Non-stationary data can lead to unreliable and inaccurate models.
  • Autocorrelation Function (ACF) and Partial Autocorrelation Function (PACF):
    • Question: What are the Autocorrelation Function (ACF) and Partial Autocorrelation Function (PACF) in time series analysis? How are they used?
    • Expected Answer: ACF measures the correlation between the time series and its lagged values, helping to identify the overall correlation structure. PACF measures the correlation between the time series and its lagged values, controlling for the values of the intermediate lags. ACF and PACF are used to identify the order of AR and MA components in ARIMA models.
  1. 传统的SARIMA, Describe a project where you applied SARIMA for time series forecasting. What challenges did you face, and how did you address them?

4 research

q: give me an example to show how do you Integrated state-of-the-art techniques from research papers. Conducted data analysis and anomaly detection, resulting in outstanding predictive performance and a commendable RMSE score.

  1. Pre-trained Model: Start with a large pre-trained model (e.g., GPT-3, BERT).
  2. Quantization: Apply quantization to reduce the precision of the model weights.
  3. Low-Rank Adaptation: Use low-rank adaptation to update only a small set of parameters.
  4. Fine-Tuning: Fine-tune the model on the target dataset using the QLoRA approach.
  5. Deployment: Deploy the fine-tuned model with reduced memory and computational requirements.
  6. Quantization:
  • Definition: Quantization refers to the process of reducing the precision of the numbers used to represent a model’s weights. This reduces the memory footprint and can improve computational efficiency.
  • Application in QLoRA: QLoRA applies quantization techniques to the pre-trained model weights, typically using lower precision (e.g., 8-bit or 16-bit instead of 32-bit floating point).
  1. Low-Rank Adaptation:
  • Definition: Low-Rank Adaptation (LoRA) involves decomposing a large matrix into two smaller matrices that approximate the original matrix. This reduces the number of parameters that need to be updated during fine-tuning.
  • Application in QLoRA: By combining low-rank adaptation with quantization, QLoRA updates a small number of parameters (low-rank matrices) while keeping the majority of the model’s parameters in a quantized form. This drastically reduces the number of parameters that need to be fine-tuned, saving memory and computational resources.
  1. Reduced Memory Usage: By quantizing the model weights, QLoRA significantly reduces the memory footprint of the model, making it feasible to fine-tune large models on hardware with limited memory.
  2. Efficient Fine-Tuning: The combination of quantization and low-rank adaptation allows for efficient fine-tuning, as only a small fraction of the model’s parameters need to be updated.
  3. Scalability: QLoRA can be applied to very large models, enabling fine-tuning of models that would otherwise be too resource-intensive to adapt.

泰国人,年龄相对大,40,本科engineering,24岁去读研究生美国南加利福利亚大学,学电子工程,34岁读研究生ubc,data science

Q: Gathered business requirements from users, conducted data analysis, defined project objectives, established key product metrics, developed machine learning solutions, synthesized insights and outcomes, validated and monitored key performance indicators, drove continuous improvement and sustainability, and led a team of data scientists.

简单扼要的说,what is the project objective,key metrics, what is the machine learning solutions, how did 有drove continuous improvement

Q : Developed large‑scale price and demand forecasting models on non‑stationary financial indexes including Brent, WTI, Naphtha, resulting in savings of $1.75M/year. Applied feature engineering and feature selection techniques to identify previously unnoticed leading features, and developed an explanation system pivotal for driving business decisions.

Could you share some highlight in this project?

Q : Mixed integer non-linear programming (MINLP) could be used for portfolio optimal allocation, and you have experiences using it before. What do you think about this method?

Q :有过多年工作经历,也对多种技术很熟悉,我想问,你对每种技术或者对其中一些技术的心得是什么,(比如什么是关键问题,怎么用更好,这个技术什么时候不好用)

我们想找的人物画像

  1. 性格沉稳细心,做研究仔细,让人可以放心

  2. data science的能力强,对神经网络,传统机器学习算法,统计知识熟悉

  3. 考察方法: 简要解释各个常用的算法的核心原理,比如lstm, cnn, rnn, transformer, reinforcement learning, beyesian

  4. 做研究能力强,可以迅速理解paper中的新鲜的点,并且可以实现

  5. 表述能力强,简洁扼要,善于总结归纳,节约沟通成本

  6. 工作的主动性强,他会自己思考我们该干什么,提供有意义的思路

  7. Technical Interview

  8. Past Experience, and Behavioural Questions

  9. Compensation Expectations, current salary?

  10. Any questions for us?

  1. 你选择了这个技术或者方法,为什么,它的优势是什么,劣势是什么,考虑过别的方法吗
  2. 你又曾经反复思考,精进,让结果越来越好的项目经历吗? 如果有,可以描述一下吗,如果没有,为什么呢
  3. 你之前的老板是如何评价你的,介意分享老板的联系方式吗?
  4. we will discuss your challenge submission and explore your thought process, problem-solving skills, and technical expertise.Additionally, we would like to learn more about your past projects, including your contributions to these projects and feedback from your previous colleagues and supervisors.
  5. 假如我是一个大学生,你会怎么解释LSTM,XGBoost,统计学方法
  1. Intro
  2. Small talk: Where are you currently located? How is the weather?
  3. Thanks for completing the coding challenge
  4. schedule:Interview going to be 2 parts: 1. Past experience and projects, as well as reference checks 1. Technical discussion about coding challenge solution and some machine learning questions.

Part 2 Scenario Questions: Assess meticulousness, calmness, and communication skills through specific scenarios.

Section titled “Part 2 Scenario Questions: Assess meticulousness, calmness, and communication skills through specific scenarios.”

Strong Proactivity, Provides Meaningful Ideas Rather Than Waiting for Tasks

Section titled “Strong Proactivity, Provides Meaningful Ideas Rather Than Waiting for Tasks”
  1. Reflecting on your career experiences, what is the technical accomplishment you are most proud of? Could you share the details and why it means so much to you?

  2. **Have you ever had an experience where you repeatedly thought about and improved a project to achieve better results? If so, can you describe it? **

  • Assessment Point: Evaluate the candidate’s proactivity and continuous improvement ability.
  1. As part of our hiring process, we conduct background checks. Would you be comfortable with us reaching out to your current or previous managers/supervisors? If you prefer that we do not contact your current manager, we can instead reach out to your previous ones. Nate: Digital Manager, Mr. Sayan Chindaprasert. managing multiple teams,

**Regarding your current/previous job, what is your supervisor’s name? **

JAVAD

Tom Chau, Present Lab / Hospital? - Strengths: perseverance, work hard, excited with work, 15-16 hours; Weakness: EG Signals (lack of domain knowledge?) could be more productive

Nate: Strengths: Anatical thinking, problem solving, go extra miles, Weakenss: market intelligence team. Weakness: knowing more about business knowledge,

first job: boss is role model. Prioritization, time management.

Personal:

Strengths: Eager to learn, adaptability,

Weakness: compassionate to colleague, offer help to colleagues

Dan Fuller, MUN, U of Saschechuan - Strengths: work hard, Weakness: domain knowledge, learning curve,

Harsh

Rafael Diniz, Strengths: Good knowledge of ML, problem solving skills; Weaknesses: communication skill, business

Jeffrey Xu: Strengths: quick learner; Weaknesses:

**How do you spell it? How would your direct supervisor describe your strengths and weaknesses? **

Current host: Matteo Fasiello, Strengths: Independence, persistance Weakness: invest into details too much, slows him down. Overthink, relying on other people

  • Assessment Point: Evaluate the candidate’s work performance and credibility through third-party feedback.
  • Question 8: How do you proactively identify and solve problems in your work? Can you provide a specific example?
    • Assessment Point: Evaluate whether the candidate can proactively identify and solve problems rather than passively waiting for tasks.

Strong Research Skills, Quickly Understand and Implement New Concepts from Papers

Section titled “Strong Research Skills, Quickly Understand and Implement New Concepts from Papers”
  1. Can you briefly introduce a recent academic paper you have read, explain how you understood the new methods in it, and whether you have tried to implement any of the algorithms?
  • Assessment Point: Evaluate the candidate’s ability to learn and apply new knowledge and their interest and capability in research. Principle component pursuit, pricinple component analysis

Strong Communication Skills, Concise and Clear, Good at Summarizing

Section titled “Strong Communication Skills, Concise and Clear, Good at Summarizing”
  1. Can you briefly explain the main content of your current work and how you communicate effectively with other team members?
  • Assessment Point: Evaluate the candidate’s ability to express themselves and summarize key points clearly.
  1. How do you communicate with team members from different backgrounds (e.g., developers, product managers) in a project? How do you ensure the efficiency of communication?
  • Assessment Point: Assess whether the candidate can communicate clearly and effectively with other team members to reduce communication costs.
  1. Can you describe a research project you have been involved in and explain how you ensured its accuracy and thoroughness?
  • Assessment Point: By describing the project’s details and specific steps, evaluate whether the candidate demonstrates attention to detail in their work.
  1. Have you ever encountered situations in your career where you needed to repeatedly verify your code or validate data? How did you handle it?
  • Assessment Point: Evaluate the candidate’s ability to meticulously check data and research results.

part 2: Technical Questions: Assess understanding of algorithms, research ability, and practical skills.

Section titled “part 2: Technical Questions: Assess understanding of algorithms, research ability, and practical skills.”

Can you briefly explain the core principles of the following algorithms: LSTM, CNN, RNN, Transformer, Reinforcement Learning, and Bayesian methods?

LSTM: What is the working principle of Long Short-Term Memory, and what problems does it solve?

Section titled “LSTM: What is the working principle of Long Short-Term Memory, and what problems does it solve?”

Working Principle:

  • LSTM (Long Short-Term Memory) networks are a type of recurrent neural network (RNN) designed to handle long-term dependencies.
    • LSTMs have a memory cell that can maintain information in memory for long periods of time.

    • The architecture includes gates (input gate, forget gate, output gate) that control the flow of information in and out of the memory cell. Problems Solved:

    • LSTMs address the vanishing gradient problem common in standard RNNs, allowing them to learn long-term dependencies and sequences more effectively.

CNN: What is the role of convolution and pooling operations in Convolutional Neural Networks?

Section titled “CNN: What is the role of convolution and pooling operations in Convolutional Neural Networks?”
**Role of Convolution**:
- Convolutional layers apply filters (kernels) to input data to create feature maps, capturing spatial hierarchies in the data.
- Convolutions help in detecting patterns such as edges, textures, and more complex structures at higher layers.
**Role of Pooling**:
- Pooling layers reduce the spatial dimensions of the feature maps, typically through max pooling or average pooling.
- This operation helps in reducing the number of parameters, computational cost, and mitigates overfitting, while retaining the most important features.

RNN: How do Recurrent Neural Networks handle sequential data, and what are their main drawbacks?

Section titled “RNN: How do Recurrent Neural Networks handle sequential data, and what are their main drawbacks?”
**Handling Sequential Data**:
- RNNs process sequences by maintaining a hidden state that captures information from previous time steps.
- They update the hidden state at each time step based on the current input and the previous hidden state, making them suitable for tasks involving sequential data.
**Main Drawbacks**:
- RNNs suffer from the vanishing and exploding gradient problems, making it difficult to learn long-term dependencies.
- They can be computationally expensive due to their sequential nature, which limits parallelization during training.

Transformer: What are the advantages of Transformer compared to RNN, and how does the self-attention mechanism work?

Section titled “Transformer: What are the advantages of Transformer compared to RNN, and how does the self-attention mechanism work?”
**Advantages of Transformer**:
- Transformers leverage self-attention mechanisms, allowing them to consider all positions of the input sequence simultaneously, which improves parallelization and reduces training time.
- They handle long-range dependencies more effectively than RNNs because self-attention enables direct connections between distant positions.
**Self-Attention Mechanism**:
- Self-attention calculates a weighted sum of the input values, where the weights are determined by the similarity between input elements.
- It involves computing three vectors for each input element: Query (Q), Key (K), and Value (V). The attention score is obtained by taking the dot product of the Query with the Key, followed by a softmax to obtain attention weights. These weights are then used to compute a weighted sum of the Values.

Reinforcement Learning: What is the difference between the reward function and the value function in Reinforcement Learning?

Section titled “Reinforcement Learning: What is the difference between the reward function and the value function in Reinforcement Learning?”
**Reward Function**:
- The reward function provides immediate feedback to the agent about the quality of an action taken in a particular state.
- It is a scalar value received after each action, guiding the agent to learn which actions are beneficial.
**Value Function**:
- The value function estimates the long-term return (cumulative reward) expected from a given state (or state-action pair).
- There are two types: State Value Function (V(s)) which estimates the return starting from state s, and Action Value Function (Q(s, a)) which estimates the return starting from state s and taking action a.

Bayesian: What is the basic idea of Bayesian methods, and how are they applied in machine learning?

Section titled “Bayesian: What is the basic idea of Bayesian methods, and how are they applied in machine learning?”
**Basic Idea**:
- Bayesian methods are based on Bayes' theorem, which updates the probability of a hypothesis based on prior knowledge and new evidence.
- They provide a probabilistic approach to inference, incorporating prior beliefs and updating them with observed data.
**Application in Machine Learning**:
- Bayesian methods are used in various machine learning tasks, including Bayesian inference, Bayesian networks, and Bayesian optimization.
- They help in estimating the uncertainty in model predictions, incorporating prior knowledge, and making robust predictions by averaging over multiple hypotheses.

Thank you for the detailed discussion on your technical skills and experience. It’s been very insightful. I think we’ve covered all the technical aspects thoroughly.

I’d like to shift gears and talk a bit about the compensation package and how we can make this opportunity mutually beneficial. Does that sound good to you? Our existing compensation package includes base salary + profiting from using our trading system. Do you mind sharing your expected base salary?

Javad

165+17%

Harsh

95k

Nate

100k-120k

Any questions for us?

  • You can expect to hear from us in 2-3 weeks
  • Thanks for joining us

Trading system: profitable in 2020 and 2021, 7100% profit, and in 2022 it was roughly 25% profit. 2023, close to 100% profit. Sometimes when the prediction power is low, we turn the system off.