Introduction:
Our recent client posed a formidable challenge: execute 1 billion jobs daily on a platform built with Ruby on Rails and an Electron front-end. Pushing the boundaries of conventional Ruby and Rails capabilities, we embarked on a journey to revolutionize scale and efficiency.
- Optimize SQL Queries with Datadog:
To kickstart our optimization efforts, we delved into SQL query enhancement. Leveraging the insights provided by Datadog, we meticulously observed and fine-tuned slow queries. This step was pivotal in laying the groundwork for subsequent performance enhancements.
- Added Database Caching with Redis:
Recognizing the need for enhanced speed, we strategically implemented Redis caching (https://aws.amazon.com/elasticache/redis/). By selectively caching database queries, we significantly reduced data retrieval times, ensuring swift and efficient access to critical information.
- Multi-Threading with Sidekiq:
In our quest for optimal performance, Sidekiq emerged as a powerful ally. Expanding our observation, we increased instance power and concurrency:
AWS m5n.xlarge:
- vCPU(4), RAM(16 GiB)
- Sidekiq Concurrency: 10
- Jobs (24 hours): 150k
AWS m5n.2xlarge:
- vCPU(8), RAM(32 GiB)
- Sidekiq Concurrency: 20
- Jobs (24 hours): 170k
- AWS m7g.4xlarge:
- vCPU(16), RAM(64 GiB)
- Sidekiq Concurrency: 40
- Jobs (24 hours): 90k
Discovering a fundamental limitation in Ruby's multi-threading capabilities, we pioneered a solution:
Queue-Specific Sidekiq Servers:
- Tailoring servers to specific Sidekiq queues, we optimized infrastructure for varied tasks.
Capistrano Deployment Optimization:
- Tweaking the Capistrano deployment descriptor ensured resource-efficient deployment.
Scale with Tiny Servers:
Limiting concurrency to 3-4, we harnessed AWS t3.small instances for optimal results:
AWS t3.small: - instance 1
vCPU(2), RAM(16 GiB)
Sidekiq Concurrency: 3
Jobs (24 hours): 20k
Queues: email_queue, broadcast_queue
AWS t3.small: - instance 2
vCPU(2), RAM(16 GiB)
Sidekiq Concurrency: 3
Jobs (24 hours): 20k
Queues: ai_context_remapping, ai_proctor
AWS t3.small: - instance 3
vCPU(2), RAM(16 GiB)
Sidekiq Concurrency: 3
Jobs (24 hours): 20k
Queue: file_processing
Conclusion: A Pioneering Solution
Our tailored approach, from optimized SQL queries to specialized server configurations, enabled us to scale the platform successfully, executing approximately 1 billion jobs daily. This breakthrough solution stands as a testament to our commitment to innovation and problem-solving.
Reach Out for Scalable Solutions!
For those seeking scalable solutions tailored to unique challenges, connect with us at info@bitsatom.com. Let's explore how we can transform challenges into success stories.
0 Comments