Built a distributed movie recommendation engine using Spark MLlib on AWS EMR to analyze the MovieLens 10M dataset (10 million ratings from 72,000 users across 10,000 movies). Leveraged MLlib…
Read full article →Blog
Technical deep-dives into machine learning, AI algorithms, and data science projects.
I explored the Image Segmentation dataset from the UCI machine learning repository, which describes 2,310 segmented regions (e.g., brick, sky, foliage, window) with 19 numerical attributes. The…
Read full article →Using the modified Jester online joke ratings dataset, I built an item-based recommender that suggests jokes to users based on how similarly other users rate the same jokes. The dataset contains…
Read full article →Built a distributed movie recommendation engine using Apache Spark on AWS EMR to analyze the MovieLens 10M dataset (10 million ratings from 72,000 users across 10,000 movies). Implemented…
Read full article →Implemented Google's PageRank algorithm from scratch using Python and pandas, demonstrating how web pages are ranked based on link structure. Built custom transition matrix creation and power…
Read full article →Explored Apache Cassandra's CQL query patterns and Apache Storm's windowing mechanisms for real-time stream processing. Demonstrated Cassandra's limitations with ORDER BY (only works on…
Read full article →Applied K-means clustering to 2,225 BBC News articles (2004-2005) using term frequency features. K=5 achieved 76.4% homogeneity and 76.7% completeness against editorial categories (business…
Read full article →Built comprehensive regression pipeline predicting violent crime rates across 1,994 US communities using 97 socioeconomic features (census + FBI data). Lasso achieved best test RMSE of 0.143…
Read full article →Deployed HBase NoSQL database using Docker Compose with Master, RegionServer, and Zookeeper components. Demonstrated column-oriented key-value storage operations including table creation, data…
Read full article →Built a distributed movie recommendation engine using MapReduce (mrjob 0.6.12) on AWS EMR, processing 1 million ratings from the MovieLens dataset. Computed cosine similarity scores between…
Read full article →Implemented custom KNN and Rocchio classifiers for newsgroup document classification (800 training, 200 test, 5,500 features). Rocchio method achieved 71% accuracy outperforming KNN (67% with…
Read full article →This comprehensive tutorial covers exploratory data analysis and preprocessing techniques using the Adult Census dataset (32K+ records). Key topics include: handling missing values with…
Read full article →Built a complete 4-stage ML pipeline to predict a developer's programming language choice using binary classification: (1) Cross-validation for model evaluation, (2) Grid search for automatic…
Read full article →Implemented Markov Decision Process (MDP) and value iteration to solve the FrozenLake 8x8 environment, achieving 100% success rate compared to 0.88% for random policies (113x improvement…
Read full article →Built end-to-end ML pipeline combining StandardScaler → PCA → KNeighborsClassifier, achieving 95% accuracy on Iris dataset. Used GridSearchCV to optimize entire pipeline simultaneously…
Read full article →Implemented and compared three AI search algorithms (BFS, UCS, A * ) for delivery route optimization on real-world map data from Tegucigalpa (24 locations, 64 streets). A * dramatically…
Read full article →Applied PCA to reduce Iris dataset from 4D to 2D while retaining 95.8% variance (PC1: 72.77%, PC2: 23.03%). Demonstrates dimensionality reduction fundamentals with bivariate visualization of…
Read full article →Compared t-SNE, UMAP, and PCA for dimensionality reduction and visualization of high-dimensional data (500 samples, 4 clusters). UMAP best preserves global structure with clearer cluster…
Read full article →Compared DBSCAN vs HDBSCAN for clustering Canadian museum locations (250+ samples). DBSCAN identified 6 clusters (eps=2, min_samples=5) with 3.6% noise, while HDBSCAN discovered…
Read full article →Compared Decision Tree vs SVM for detecting fraud in 284K+ credit card transactions with extreme class imbalance (99.8% legitimate). Used undersampling to balance classes, achieving Decision…
Read full article →Compared softmax regression vs One-vs-Rest and One-vs-One strategies for multi-class Iris classification (3 species, 150 samples). Softmax achieved 94.7% test accuracy matching One-vs-Rest…
Read full article →Developed a logistic regression model to predict telecom customer churn with 80% accuracy and 0.71 F1-score for churn class. Analyzed 200 customer records with features including tenure, age…
Read full article →Automated hyperparameter optimization for SVM on Iris dataset, testing 2,700 parameter combinations (3 kernels × 9 C values × 10 gamma values × 10 CV folds). Found optimal configuration: RBF…
Read full article →Implemented K-Means clustering for customer segmentation, identifying 3 optimal clusters using elbow method analysis (850 customers). Segments include education-focused (avg age 42, education…
Read full article →Built an interpretable Decision Tree classifier for patient drug prescription with 95% accuracy (200 patient records). Model predicts Drug Y with 100% precision, Drug X with 91.7% precision…
Read full article →Built an SVM classifier to predict benign vs malignant cancer cells with 95.71% accuracy on test data (683 samples). Used RBF kernel for non-linear classification, achieved 100% precision for…
Read full article →Compared simple and multiple linear regression for predicting vehicle CO2 emissions (1,067 samples). Multiple regression with engine size, cylinders, and fuel consumption achieved R² = 0.8…
Read full article →Built Random Forest regressor for California housing price prediction (20,640 samples, 8 features) achieving R² = 0.81 on test set (MAE: $33K, RMSE: $49K). Analyzed model performance across…
Read full article →Deployed Node.js applications to Fly.io's global edge network with automatic HTTPS, demonstrating modern serverless deployment workflow. Fly.io offers free tier with 3 VMs, automatic scaling…
Read full article →Explored Bitcoin Lightning Network as Layer 2 scaling solution enabling instant, low-fee micropayments off-chain. Lightning achieves millions of transactions per second vs Bitcoin's 7 TPS…
Read full article →