Interview Questions
Source: Notion | Last edited: 2025-01-20 | ID: 63f1ab0a-f88...
介绍公司做什么
开发量化交易软件和交易策略,加密货币市场,比特币,以太坊。
讨论公司业绩
我们的交易频率比较高,但不是传统的高频交易。在交易量最高的时候,大概一个月的交易量在50-100亿美元。在去年,我们在一个第三方的加密货币基金研究机构,把量化交易基金的结果横向pk了一下,我们给客户赚到的净收益是排在第二。
在加密货币的基金,因为是个比较新的领域,通常人数不多,我们也一样。现在包括我在内全职有5个人,另外还有几位兼职在做。
最近因为加密货币市场波动变小,这是做技术积累最好的时间,所以我们想趁这个机会,找到聪明而努力的researcher,利用机器学习,或者其他方法,去做出更好的交易策略。这样在下一次的牛市到来的时候,大概在2年半左右,就能顺势而上。
Tell me about your most impressive projects and biggest wins.
How do you spend your time during an average day, and what have you got done in the last month.
What have you done related to Data science or machine learning, data processing?
Section titled “What have you done related to Data science or machine learning, data processing?”briefly introduce a project you did using CNN, GRU, SVM, KNN, SOM, Bayesian CNN
(Talk to the candidates about what they’ve done. )
Go deep in a specific area and ask about what the candidate actually did—it’s easy to take credit for a successful project.
Section titled “Go deep in a specific area and ask about what the candidate actually did—it’s easy to take credit for a successful project. ”How do you solve data missing/inconsistency problem/ use CNN to process time series data?
Section titled “How do you solve data missing/inconsistency problem/ use CNN to process time series data?”(Ask them how they would solve a problem you are having related to the role they are interviewing for.)
Have someone do a day or two of work with you before you hire her; you can do this at night or on the weekends.
Section titled “Have someone do a day or two of work with you before you hire her; you can do this at night or on the weekends. ”Give him the raw google trends data and ask him to process it and return with read-to-use dataset
For Alexander Loginov
Section titled “For Alexander Loginov”General Introduction
Section titled “General Introduction”- What machine learning area are you interested in doing?
- what are the main usage of your models?
- What technical or problem area are your strength? What problem particular you are good at to solve? high frequency intraday trading, stock selection?
- What machine learning models/algorithms have you used? since you mentioned about image processing/social network mining, abnormaly detection in post doc
- What do you think are the key influencers of the sucesss or failer of an intraday automated trading system
Model training
Section titled “Model training”- Could you explain how do you train FXGP? like How often is the FXGP retrained? how long does it take?
- How do you decide your model needs to be retrained? On detecting poor trading behavior the FXGP training process is re-triggered from a completely new (set of) initial DT–TI population.
- How is your initial condition affect your final outcome? (multi-agent FXGP)
- What computing resources time/power do you need to do GA or Genetic Programming?
Model interpretation
Section titled “Model interpretation”- GA is like NN, a black box, how do you interpret what your model has learned, what are they trying to do?
- What data have you used for your modelling? what differences are there?
- Is your method mainly used in stock market? our data is 24/7, there is no quiet time for your stock ranking selection. How would you address that?
- What data do you use for intraday automated trading system? ohlc? (you explore stock selection, we could try to do crypto symbol selection)
- The NASDAQ 100 historical rates (‘Mid’ prices) for the period from August 1, 2014, to August 31, 2017, are used in Chapter 6. All stocks with missing or out of range date and time stamps and stocks with missing, ‘0’ or negative prices were excluded.
- follow up question: just exclude them? dis continuity causes any problem for the GP?
- How do you construct features?choose TI from a set of TIs
Model validation
Section titled “Model validation”- How do you know your GP model is not overfitted? Have you used real money to validate your trading agent? how does it work?
Stock selection? For eonlab’s symbol selection
Section titled “Stock selection? For eonlab’s symbol selection”Thesis
Section titled “Thesis”- Do you imply that changing assets frequently could reduce transaction cost
under a frequent intraday trading scenario. Such a scenario implies that transaction costs have a more significant impact on profitability and investment decisions can be revised frequently
- In the formula of moving sharpe ratio, R_free is the best available risk-free rate of return. what is R_free?why is it 0?

My note of his thesis
Section titled “My note of his thesis”chap 2
For example, in constructing both TI (as opposed to merely selecting from a set of predefined TI) and DT, performance is only directly expressed at the level of the DT
A symbiotic coevolutionary approach was previously proposed to link the fitness expressed at the level of DT to the TI (Figure 2.1) without having to define surrogate performance functions for the TI [98].
In this thesis the same coevolutionary approach will be assumed as the starting point. However, this will be extended to support multi-agent operation, stop-loss and take profit orders, and more efficient execution. The latter is particularly important because without this, real-time operation at minute-to-minute intervals would not be possible.
Could you explain what do you mean by “without this, real-time operation at minute-to-minute intervals would not be possible” ?
Retraining of the automated trading agent is explicitly triggered by performance of the champion agent. That is to say, when performance of a previously satisfactory model degrades below a threshold, the process generating the data has probably changed.
Could you explain what is the threshold? Is each trading agent an independent GP (TI are different, but are DT the same or different?)if they are different, do we need to retrain for each agent or just for one?
chap 3
Agents are evolved assuming a ‘Train–Validate–Trade’ cycle (Figure 3.1). Train and Validate represent two sequential historical partitions of the data from which Base FXGP evolves trading agents (DTs with linked TIs) and the best agent is then used for trading. Thus, given train sequential records of the Training partition, the TI and DT populations are coevolved. The next test sequential records from the Validation partition are used to verify and identify a single champion DT–TI combination (the champion agent). It is possible that the result of model validation is a failure to identify a champion agent, in which case the training cycle would be reinvoked from an entirely new initialization of the DT–TI populations. Assuming that a champion agent is identified then Trading may commence until one of several retrain criteria triggers the identification of a new trading agent.


How do you decide that minimum TI population is 100? does this impact the training time much? How do you do this kind of hyper parameter tuning? N_t = 1000, how long is it? 1000 minute? N_v = 500
chap 6
In addition, a general bias towards trading with single stocks per day was evident. That is to say, although increasing the number of stocks traded per day resulted in a lower ‘average loss per trade’ the corresponding ‘average profit per trade’ was also much lower.
A clear preference is exhibited for adopting a ranking based on either a simple Moving Average, or the Moving Sharpe Ratio (with Moving Sharpe Ratio showing better results over a simple Moving Average of Daily Returns in all cases), both in terms of the profitability and the Sharpe Ratio
The Moving Sharpe Ratio outperforms other investigated ways of prioritizing specific stocks for frequent intraday trading using the proposed FXGP algorithm.
?40 day estimation period for the ranking statistic? what is the ranking statistic, is it the MSR? why 40 day estimation period is the preferred parametrization, any method that is used to find this optimal parameter?
chap 7
It has been shown that the floating spread with a median value of 0.02 USD results in a much worse performance than a fixed spread of 0.1 USD, i.e. an order of magnitude difference. T
Diagram 1
Section titled “Diagram 1”
question about diagram 1
Section titled “question about diagram 1”On page135, it said
A cold start period provides data from which GP-trading agents are evolved for each of the N stocks in the portfolio. Stock data is described in terms of 1-minute candlesticks. Each GP agent learns to maximize their respective stock’s return independently by simultaneously designing the technical indicators and determining the buy-hold-sell signals.
At the end of the trading day, the return from each agent is used to rank each stock from the portfolio, and the stocks with the highest S ranks are selected for trading at the next trading day (Section 6.3). During trading day t + 1, the S agents selected to make investments, trade with a money management policy, whereas the remaining GP agents continue to trade under simulated conditions.”
How “A cold start period provides data”?
Do you mean during the trading day t+1, only ‘S’ agents trade with real money and the remaining GP agents run with simulated money and All of these agents return are used to rank each stock from the portfolio (is the portforlio containing only ‘s’ stocks or still have all the stocks). the stocks with the highest S ranks are selected for trading for the trading day t+2.
For Sa Li
Section titled “For Sa Li”University of Alberta computing science professors (L-R) Ryan Hayward, Martin Mueller, Rich Sutton, and Michael Bowling from the Computer Games and Reinforcement Learning research groups, who supervised AlphaGo researchers David Silver, Aja Huang, and Marc Lanctot during their time at the University of Alberta.
温哥华,2020-2022,2017-2022
Section titled “温哥华,2020-2022,2017-2022”Q:回加拿大以后做的, 为什么没有继续
President,IntelliRaise Technologies Ltd
CEO, Wisdom Artificial Intelligence Cor 2017-2022
南京,孝得智能, Sept 2019~ April 2020
Section titled “南京,孝得智能, Sept 2019~ April 2020”Dean,Nanjing Xiaode Intelligent Research & Design Institute,
VP of R&D and Production,Nanjing Xiaode Intelligent Technology Co., Ltd Nanjing, China, Sept 2019~ April 2020 Q: 有具体的coding或者modelling 的工作吗 § Lead the development of Xiaode Internet of Things health management platform, and develop a universal IoT access protocol set § Leading the development of institutional elderly care products and systems, including institutional elderly care information platform, face recognition and electronic fence system, UWB tracking system, continuous monitoring of vital signs, health table, universal nursing bed, nursing station management system, bedside smart touch screen, alarm center, etc. § Leading the development of home care products and systems, the safety and health monitoring system for the elderly living alone, the tracking system for the demented elderly, the care system for the disabled elderly, intelligent voice robots, etc.
温哥华,Sept. 2015 ~ August 2017
Section titled “温哥华,Sept. 2015 ~ August 2017”Founder and CTO,SimuMind Technology Ltd. Vancouver, Canada, § Lead the team to design and develop the world’s leading big data platform § Lead the team to develop an automated big data test platform § Lead the team to design and develop an AI mining analysis platform § Presided over the design and development of the first intelligent assistance system for oncologists in China § Presided over the design and development of the first intelligent auxiliary case handling system for Fujian Provincial People’s Procuratorate § In 2016 and 2017, presided over the design and development of user gas consumption prediction and abnormal consumption detection systems for large gas companies, 你做什么具体的工作 § Presided over the design and development of other artificial intelligence algorithm projects applied in the financial and public security fields, finanical 有哪些工作呢
温哥华,May 2015 ~ Sept 2015
Section titled “温哥华,May 2015 ~ Sept 2015”Chief Data Architect,www.awesense.com Vancouver, Canada, § Leading R&D team to build a big **data stream processing platform **using stream processing technology to process power data in real time
温哥华,March 2012 ~ April 2015
Section titled “温哥华,March 2012 ~ April 2015”Chief Data Scientist,Plentyoffish.com Vancouver, Canada and San Jose, USA,March 2012 ~ April 2015 § Use machine learning algorithms and data mining technology to design and implement online dating matching prediction models, advertising prediction models, user upgrade systems, and user response models. This matching prediction recommendation model significantly improve the matching rate, and it is currently the most advanced online dating recommendation system in the world. § Successfully built a LINUX Ubuntu private cloud. This cloud computing system integrates multiple GPU servers and is managed by the SLURM system. This private cloud has the fastest large-scale multi-threaded computing speed in the world and is especially suitable for deep machine learning algorithms. Using this cloud platform, the speed of machine learning training large data sets has been greatly improved. This work was specifically mentioned at the GTC2013 annual meeting and was rated as the most advanced technology at that time This set of algorithm and code can be found: https://github.com/alecinvan/cudaDBBP § Independently wrote a set of data processing systems in multiple programming languages specifically for the aforementioned cloud platforms. § Designed and developed a real-time big data analysis platform for massive data. This platform adopts the latest big data technology to realize the safe storage and real-time processing of big data.
Q:这下面四个工作的开始都有点overlap,怎么决定的呢
温哥华,3. 2014 ~ 1. 2015
Section titled “温哥华,3. 2014 ~ 1. 2015”online market content management software using machine learning algorithms and data mining technology, 什么模型,什么数据,什么结果
美国,austin,senior程序员,5.2013 - 10.2014
Section titled “美国,austin,senior程序员,5.2013 - 10.2014”matching system to match insurance companies, insurance agents, and customers
用的什么模型,什么数据?
温哥华,教书,职校,1.2012 - 9.2013,为什么呢?在之前的发展不好吗?
Section titled “温哥华,教书,职校,1.2012 - 9.2013,为什么呢?在之前的发展不好吗?”温哥华,算法科学家,Canada’s Michael Smith Genome Sciences Centre, March 2009 ~ March 2012
Section titled “温哥华,算法科学家,Canada’s Michael Smith Genome Sciences Centre, March 2009 ~ March 2012”§ Designed and independently wrote a set of data processing systems using **multiple programming languages **specifically for high-performance computer clusters and designed a set of statistical mathematical models to test and predict gene mutations and find affected chromosomal genes. 哪些统计模型,统计貌似不是你 的主要专业方向,这块理解 § Design a statistical model to analyze the mouse sample data at different stages after the application of the drug and identify different gene expression levels after the application of the drug. § Multiple PostgreSQL databases have been built on the cloud platform for real-time data processing. Through this type of database, knowledge-based technologies can be successfully applied to life sciences. § Published paper in “Nature” magazine.
温哥华,senior 程序员,Sept 2007 ~ Jan 2009
Section titled “温哥华,senior 程序员,Sept 2007 ~ Jan 2009”Using machine learning algorithms to process and predict online gaming data § Used machine learning algorithms to design an online poker prediction system. Write parallel computing programs to improve processing speed. § Designed an online horse racing model using machine learning algorithms and invented a new set of data variable extraction coding system. 这是什么,可以解释吗? § Wrote a set of online affiliate marketing applications
埃德蒙顿,嵌入式软件系统开发,程序员,4,2007-9,2007
Section titled “埃德蒙顿,嵌入式软件系统开发,程序员,4,2007-9,2007”edmonton,phd 2002-2007
Section titled “edmonton,phd 2002-2007”Using a variety of machine learning algorithms (regression、support vector machine、 neural network、genetic algorithms、classifier system、reinforcement learning、fuzzy system, etc.) to design an autonomous robot intelligent navigation system. Invented a new machine learning algorithm (Enhanced Classification Machine System) to design an autonomous robot navigation system
新加坡,master 2000-2002
Section titled “新加坡,master 2000-2002”新加坡,程序员,01-02,无线通信智能软件, 什么语言
Section titled “新加坡,程序员,01-02,无线通信智能软件, 什么语言”程序员,3rd Academy of China Aerospace Science & Industry Corp, 95-2000
Section titled “程序员,3rd Academy of China Aerospace Science & Industry Corp, 95-2000”西北工业大学本科,91-95
Section titled “西北工业大学本科,91-95”For Alex
Section titled “For Alex”How would you rank your strengths, in terms of presentation skills, writing skills, machine learning skills, statistcal modelling, communication skills, programming skills?
Section titled “How would you rank your strengths, in terms of presentation skills, writing skills, machine learning skills, statistcal modelling, communication skills, programming skills?”What is your most impressive project as a data scientist?
Section titled “What is your most impressive project as a data scientist?”In “online dynamic pricing model using Machine Learning and Statistics” project, could you explain your process with details?
Section titled “In “online dynamic pricing model using Machine Learning and Statistics” project, could you explain your process with details?”post doc, 08-10
Section titled “post doc, 08-10”regression modelling, statistics and econometric methodologies to analyze economic policy levers
Section titled “regression modelling, statistics and econometric methodologies to analyze economic policy levers”statistician 10-11
Section titled “statistician 10-11”**discrete time series forecast models **and **statistical analysis of post-secondary enrollment rates in Ontario **to inform the allocation of over $200 Million dollars in long-term capital planning for the Province of Ontario
regression analysis, time series analysis and econometric methodologies
cofounder, data scientist 11-13
Section titled “cofounder, data scientist 11-13”Multivariate predictive analysis
ontario, 13-14, Center for Global eHealth Innovation
Section titled “ontario, 13-14, Center for Global eHealth Innovation”implemented Social Media Analysis platform
time series analysis and other biostatistics methodologies)
vancouver, 14-15, Neurio, ?
Section titled “vancouver, 14-15, Neurio, ?”Implemented Machine Learning methodologies
vancouver, 15-19, GE Digital/Bitstew, time series
Section titled “vancouver, 15-19, GE Digital/Bitstew, time series”vancouver, 19-20, 2nd address, graphical models,
Section titled “vancouver, 19-20, 2nd address, graphical models,”Ian Moore
好处是会统计,bayesian,time series 出生, 编程习惯挺好的,毕竟做过developer, writing好的人一般思路比较清晰
坏处是还可能在私下做其他项目
Alexander
优点是会做GP,对交易感兴趣,也有交易领域的了解
坏处是有点直接,说话不是让人那么舒服,可能也比较有自己的想法,不知道能不能服从公司的安排的工作内容
Li chang
优点是做RL, 对币安的交易系统不是那么了解,对于RL理解似乎比较深,编程能力不太清楚
缺点,对于其他deep learning的领域好像做的不多