Data Projects

Analyzing Indonesia’s non‐oil/gas import‐export activities

Created SARIMA models of Indonesia's non-oil/gas imports and exports monthly time series between January 2014 to December 2022, and used the models to forecast the values for the year after the end of the dataset.
Obtained SMAPE values between 4.9-6.4.
Tested for the existence of cointegration on the import and export time series using the augmented Engle-Granger test and found no evidence for cointengration between import and export.
Technical skills: SARIMA, ACF, PACF, Augmented Engle-Granger test.
Software used: pmdarima, statsmodels, scikit-learn, pandas, numpy, seaborn, matplotlib.

Analyzing ten years of cats and dogs data at Austin Animal Shelter

Created random forest and gradient boosted trees (XGBoost) models to classify whether an animal at the shelter ends up with a "good" (e.g., adopted, returned to owner), "bad" (e.g., euthanized, lost), or "neutral" (e.g., transferred to a different shelter) outcome.
Handled the heavily imbalanced dataset is heavily imbalanced (very few "bad" outcomes) with SMOTE-NC, undersampling techniques, and class weighting techniques.
Obtained weighted F-1 score of 81%.
Found the most important features in the classification with permutation importance.
Technical skills: Gradient boosted trees, random forest, imbalance sampling (SMOTE-NC, random undersample), feature importance.
Software used: XGBoost, imblearn, scikit-learn, pandas, numpy, seaborn, matplotlib.

Forecasting the Volatilities of ETFs

Modeled the daily volatilities of three ETFs (SPY, AEA, ASEA) from 01/01/2018 to 02/20/2023 with GARCH models.
Created a forecast of the daily volatilities of the three ETFs for the month after the end of the dataset.
Obtained weighted F-1 score of 81%.
Technical skills: GARCH, ACF, PACF.
Software used: pmdarima, statsmodels, scikit-learn, pandas, numpy, seaborn, matplotlib, scipy, yfinance.

Page updated

Google Sites

Report abuse