Yoojeong Koh
Data Scientist
Hi this is my portfolio as a Data Scientist. I'm currently studying in my graduate school for the Master's degree and I majored in Statistics. Once I organize my projects and papers I wanna make more sophisticated portfolio on my github:) Just enjoy looking around what this data scientist dreamer's doing. THanks!
Tools I use
Interpreted high-level general-purpose programming language. Its emphasizes code readability with its use of significant indentation. Its language constructs as well as its object-oriented approach aim to help programmers write clear, logical code for small and large-scale projects.
I usually use R studio for convenience. It is a programming language for statistical computing and graphics that you can use to clean, analyze, and graph your data. It is widely used by researchers from diverse disciplines to estimate and display results and by teachers of statistics and research methods.
It is the standard and most widely used programming language for relational databases. It is used to manage and organize data in all sorts of systems in which various data relationships exist.
VBA stands for Visual Basic for Applications. Excel VBA is Microsoft’s programming language for Excel and all the other Microsoft Office programs. Power BI is a collection of software services, apps, and connectors that work together to turn your unrelated sources of data into coherent, visually immersive, and interactive insights. Power BI lets you easily connect to your data sources, visualize and discover what's important, and share that with anyone or everyone you want.
It is a command-driven software package used for statistical analysis and data visualization. It is available only for Windows operating systems. It is arguably one of the most widely used statistical software packages in both industry and academia.
Statistical Package for the Social Sciences. It's used by various kinds of researchers for complex statistical data analysis. The SPSS software package was created for the management and statistical analysis of social science data.
My Projects
2020 BigContest
Goal was to predict T-commerce company's next year's sales performance and provide optimal channel organization plan. We used R for data reconfiguration and preprocessing. Then we used python to derive feature importance and create 'untact variable'. Then used ML/DL to form the optimal model and made our predictions. For channel allocation, we used EDA done in R program and came up with the best allocation plan.
Forecasting with RNN
I forecasted the future accidental rate based on past accidental data of policyholders. I first used Nested Monte Carlo Simulation. It's accuracy was quite high but took large amount of time to complete the execution. Thus, by numerous simulations, I applied Least square monte carlo with RNN to gain time management and accuracy. This process was successful and was able to derive relatively accurate predictions.
Optimal Stock Portfolio
We analyzed 5 company's stock price data. We calculated call option and put option then detacted appropriate distribution to use. Through this result we predicted the future trend of each company's stock price. Then, we applied sharpe ratio to derive the optimum investment rate for each company based on interest rate. Finally we made web application to display our optimal investment portfolio.
Educations