PinnedKaran SinghPrompt Caching: Because Who Has Time for Slow AI?Prompt Caching : First introduced by Anthropic , adopted by OpenAI … and LLama to follow soonOct 2Oct 2
PinnedKaran SinghOpenAI’s o1 Model: Paying for AI’s Internal MonologueFuture of AI, where the thoughts are invisible, and meter is alway ON !Sep 13Sep 13
PinnedKaran SinghCalculate : How much GPU Memory you need to serve any LLM ?Just tell me how much GPU Memory do i need to serve my LLM ? Anyone else looking for this answer ? Read On …Jul 117Jul 117
PinnedKaran SinghDead Cheap LLM Inferencing in ProductionA guide for leveraging Spot GPU Instances for Cost-Effective LLM Inference WorkloadsJul 7Jul 7
PinnedKaran SinghinTowards DevRedis-GoLang-React Sample Chat App on OpenShift (Kubernetes)Here is my journal to deploy a sample Redis-GoLang-React Real Time chat application on OpenShiftJan 9, 2022Jan 9, 2022
Karan SinghExposing Local Web-socket connection Securely with FRP & CaddyA step-by-step implementation guideApr 20, 2023Apr 20, 2023