原文:http://blog.import.io/post/20-questions-to-detect-...

數據科學家是正式的21世紀最性感的工作,每個人都想分一杯羹。人們誰稱自己數據的科學家,但誰實際上並沒有正確的技能。很多人可能會認為他們是數據科學家,純粹是因為他們處理的數據。 

kirk.jpg

“假的數據科學家往往是專家在一個特定的領域,堅持認為他們的領域是唯一真正的數據科學。這種信念的數據科學是指科學的工具和技術(數學,計算,正視化,分析,統計,實驗,問題的定義,建立模型和驗證等),以獲得新發現的全部武器的應用程序來看,洞察力和價值,從數據收集。“

柯克源性,在主要的數據科學家博思艾倫諮詢公司和創始人RocketDataScience.org

為了幫助您排序從假冒(或誤導)一個真正的數據科學家,我們已經整理20個面試問題:


  1. 解釋什麼正規化,以及為什麼它是非常有用的。 Explain what regularization is and why it is useful. 
  2. 哪些數據科學家做你最欣賞?該初創公司?Which data scientists do you admire most? which startups?
  3. 你將如何驗證您創建生成採用多元回歸定量結果變量的預測模型的模型。 How would you validate a model you created to generate a predictive model of a quantitative outcome variable using multiple regression. 
  4. 解釋什麼精度和召回。他們如何涉及到ROC曲線?Explain what precision and recall are. How do they relate to the ROC curve?
  5. 你怎麼能證明你帶到一個算法的一個改進是一個真正的改進,沒有做任何事情? How can you prove that one improvement you've brought to an algorithm is really an improvement over not doing anything? 
  6. 什麼是根本原因分析?What is root cause analysis?
  7. 你熟悉定價優化,價格彈性,庫存管理,競爭情報?舉例說明。Are you familiar with pricing optimization, price elasticity, inventory management, competitive intelligence? Give examples. 
  8. 什麼是統計力量?What is statistical power?
  9. 解釋什麼是重採樣方法是,為什麼他們是有用的。同時解釋其局限性。Explain what resampling methods are and why they are useful. Also explain their limitations.
  10. 它是更好地有太多的誤報,或者太多的假陰性?說明。Is it better to have too many false positives, or too many false negatives? Explain.
  11. 什麼是選擇偏倚,為什麼它很重要,你怎麼能避免呢? What is selection bias, why is it important and how can you avoid it? 
  12. 列舉一個你將如何使用實驗設計來回答一個關於用戶行為的問題的一個例子。Give an example of how you would use experimental design to answer a question about user behavior. 
  13. “長”和“寬”格式數據之間的區別是什麼?What is the difference between "long" and "wide" format data?
  14. 你用什麼方法來確定是否公佈的統計資料中的文章(如報紙)或者是錯誤的或者提出支持作者的觀點,而不是在一個特定的主題正確,全面真實的信息?What method do you use to determine whether the statistics published in an article (e.g. newspaper) are either wrong or presented to support the author's point of view, rather than correct, comprehensive factual information on a specific subject?
  15. 解釋愛德華·塔夫特的理念,以“圖表垃圾”。Explain Edward Tufte's concept of "chart junk."
  16. 你會如何篩選異常值,你應該怎樣做,如果你找到一個? How would you screen for outliers and what should you do if you find one? 
  17. 如何將您使用的極值理論,蒙特卡洛模擬或數理統計(或其他東西)正確估計一個非常罕見的事件的機會呢?How would you use either the extreme value theory, monte carlo simulations or mathematical statistics (or anything else) to correctly estimate the chance of a very rare event?
  18. 什麼是推薦引擎?它是如何工作的?What is a recommendation engine? How does it work?
  19. 解釋一下什麼是假陽性和假陰性的。為什麼是重要的相互區分這些? Explain what a false positive and a false negative are. Why is it important to differentiate these from each other? 
  20. 你使用的可視化工具的哪一個?你怎麼想的Tableau的?R' SAS?(用於圖形)。如何有效地表示5維的圖表(或視頻)?
  21. Which tools do you use for visualization? What do you think of Tableau? R? SAS? (for graphs). How to efficiently represent 5 dimension in a chart (or in a video)?


你如何來量化一個真正的數據科學家?

arrow
arrow
    全站熱搜

    MR. MINING 發表在 痞客邦 留言(0) 人氣()