Tohid.

Initializing distributed systems...

Loading0%
Available for opportunities · Gujarat, India

Tohid

Shaikh

Backend Engineer · ML Engineer · Distributed Systems

Backend and ML engineer with hands-on experience building distributed systems, LLM-integrated pipelines, and data-intensive applications. From encrypted payment meshes to custom vector databases — I build the infrastructure, not just the model.

5M+
Records Processed
6
End-to-End Projects
92%
Model Accuracy
10x
Query Speedup
Currently
B.Tech · Electrical Eng.
GEC Dahod · Jun 2026

ABOUT

Building systems that actually work at scale.

I architect and ship production-grade backend systems and ML pipelines — not toy demos. My work spans encrypted payment mesh networks, custom vector databases built from scratch, and large-scale data engines processing 5M+ records.

I don't just train models — I engineer the infrastructure that makes them production-ready. Gossip-protocol networks, multithreaded DPI engines, RAG pipelines with sub-200ms latency — I own the full stack from algorithm design to deployment.

Delivered measurable impact as an ML Engineer at Edunet Foundation × IBM SkillsBuild and 3Skill — drove fraud recall from 61% to 84% on live financial data, reducing false negatives by 38%.

Actively seeking backend engineering, ML infrastructure, or full-stack roles at companies building at scale — where I can own systems end-to-end and ship things that matter.

tohid.config.json
{
"role": "Backend & ML Engineer",
"expertise": [
"Distributed Systems",
"LLM-Integrated Pipelines",
"Cryptographic Networks"
],
"shipped": [
"UPI Offline Mesh — RSA+AES",
"Vector DB from scratch",
"5M+ record UIDAI pipeline"
],
"status": "Actively interviewing",
"open_to": "Full-time · Backend · ML",
}
PythonFlaskREST APIsSQLAlchemyMultithreadingTCP/IPLLM IntegrationRAG PipelinesVector DatabasesAnthropic APIOllamaScikit-learnPandasNumPyRandom ForestSMOTEGitStreamlitDistributed SystemsSystem DesignDSAHNSWPythonFlaskREST APIsSQLAlchemyMultithreadingTCP/IPLLM IntegrationRAG PipelinesVector DatabasesAnthropic APIOllamaScikit-learnPandasNumPyRandom ForestSMOTEGitStreamlitDistributed SystemsSystem DesignDSAHNSW

PROJECTS

Things I've built

🔐

01 · Backend Systems

UPI Offline Mesh

Python · Flask · SQLAlchemy · RSA-OAEP · AES-256-GCM · Multithreading

Offline UPI payment system using a gossip-based mesh network. Hybrid RSA-OAEP + AES-256-GCM encryption secures payment packets across virtual bridge nodes. Thread-safe idempotency cache with optimistic locking prevents duplicate settlements.

RSA+AESHybrid Encryption
0 dupesIdempotency
GossipMesh Protocol
🧠

02 · AI & ML

Vector Database + RAG Pipeline

Python · Flask · HNSW · KD-Tree · Ollama · REST API

Vector database from scratch with HNSW, KD-Tree, and Brute-Force search. Sub-200ms ANN query latency. RAG pipeline using Ollama for document-grounded question answering with semantic chunking and real-time concurrent retrieval.

<200msANN Latency
HNSWSearch Algo
📊

03 · Data Engineering

UIDAI Aadhaar — Project DRAM

Python · Pandas · SciPy · Plotly · Streamlit

5M+ Aadhaar records across 12 data sources. 807 districts classified using Z-score anomaly detection and custom UER metric. 10x query speedup via vectorized boolean masking. Presented at UIDAI National Innovation Challenge 2026.

5M+Records
10xSpeedup
807Districts

04 · Systems & Networking

Deep Packet Inspection Engine

Python · Multithreading · TCP/IP · TLS · PCAP

10-module DPI engine parsing raw PCAP files across Ethernet/IP/TCP/UDP layers. 20+ apps detected via TLS SNI fingerprinting with stateful 5-tuple flow tracking. Multithreaded Reader → Load Balancer → Fast Path pipeline with consistent hashing.

20+Apps Detected
10Modules

// work history

Where I've Worked

ML Engineer Intern

Edunet Foundation × IBM SkillsBuild

Jan 2026 – Feb 2026
  • Built a Random Forest classification pipeline on 12K records, achieving 91.2% accuracy and 0.87 F1-score through structured feature engineering and cross-validated model selection.
  • Designed modular preprocessing workflows using Pandas and Scikit-learn, eliminating data leakage and ensuring reproducibility across training and evaluation environments.

ML Engineer Intern

3Skill

Dec 2025 – Jan 2026
  • Developed a fraud detection model on 20K+ imbalanced financial transactions; applied SMOTE oversampling to improve recall from 61% to 84%, reaching 92% accuracy and 0.89 ROC-AUC.
  • Reduced false-negative rate by 38% through targeted feature selection and threshold tuning on skewed real-world transaction data.

// tech stack

Technical Arsenal

Languages

PythonJavaSQLBash

Backend & Systems

FlaskREST APIsSQLAlchemyMultithreadingTCP/IPDistributed Systems

AI / LLM

LLM IntegrationRAG PipelinesVector DatabasesAnthropic Claude APIOllamaPrompt Engineering

ML & Data

Scikit-learnPandasNumPyTF-IDFSMOTERandom ForestNLP

Core CS

Data Structures & AlgorithmsSystem DesignScalable Architecture

Tools

GitStreamlitJupyterPlotlyMatplotlib

// get in touch

Let's build
something great.

Open to full-time roles, internships, and interesting projects. Building at scale — let's talk.