Data Projects
Analyzing Indonesia’s non‐oil/gas import‐export activities
Created SARIMA models of Indonesia's non-oil/gas imports and exports monthly time series between January 2014 to December 2022, and used the models to forecast the values for the year after the end of the dataset.
Obtained SMAPE values between 4.9-6.4.
Tested for the existence of cointegration on the import and export time series using the augmented Engle-Granger test and found no evidence for cointengration between import and export.
Technical skills: SARIMA, ACF, PACF, Augmented Engle-Granger test.
Software used: pmdarima, statsmodels, scikit-learn, pandas, numpy, seaborn, matplotlib.
Analyzing ten years of cats and dogs data at Austin Animal Shelter
Created random forest and gradient boosted trees (XGBoost) models to classify whether an animal at the shelter ends up with a "good" (e.g., adopted, returned to owner), "bad" (e.g., euthanized, lost), or "neutral" (e.g., transferred to a different shelter) outcome.
Handled the heavily imbalanced dataset is heavily imbalanced (very few "bad" outcomes) with SMOTE-NC, undersampling techniques, and class weighting techniques.
Obtained weighted F-1 score of 81%.
Found the most important features in the classification with permutation importance.
Technical skills: Gradient boosted trees, random forest, imbalance sampling (SMOTE-NC, random undersample), feature importance.
Software used: XGBoost, imblearn, scikit-learn, pandas, numpy, seaborn, matplotlib.
Forecasting the Volatilities of ETFs
Modeled the daily volatilities of three ETFs (SPY, AEA, ASEA) from 01/01/2018 to 02/20/2023 with GARCH models.
Created a forecast of the daily volatilities of the three ETFs for the month after the end of the dataset.
Obtained weighted F-1 score of 81%.
Technical skills: GARCH, ACF, PACF.
Software used: pmdarima, statsmodels, scikit-learn, pandas, numpy, seaborn, matplotlib, scipy, yfinance.