Qingyu Zhou

Summary

  • 10+ years of industry experience with back-end, infrastructure and application development
  • Excel in Cloud Solutions based on Amazon Web Service and on-premise datacenters
  • Familiar with information retrieval & observability infrastructure
  • Build, maintain, and scale large infrastructure fleets and distributed clusters

Contact

16322 NE 12th Pl
Bellevue, WA 98008

858-729-4674

Education

M.S Computer Science Sept. 2015 — Dec. 2016

University of California, San Diego

POSITION

Software Engineer Nov. 2024— Current

Amazon, Amazon.com Services LLC, Alexa - Conversational and Learning Team, Bellevue
  • Worked on enhancing prompt construction service, reducing latency by optimizing different iteration calls
  • Simplified prompt context onboarding and assembly workflows to reduce integration complexity
  • Extended the agent architecture from a single-agent model to a multi-agent system to support coordinated task planning and execution
  • Improved observability and tracing across multi-agent workflows to ensure end-to-end task traceability

AWS Engineering Manager May. 2022— Nov. 2024

Amazon, Amazon Web Services - External Security Services, Seattle
  • Managing and leading the AWS GuardDuty Rainier team, part of Control Plane teams
  • Focusing on service metering & usage, security findings storage/decoration/publishing and public APIs
  • Worked on the security features including GuardDuty Malware, RDS, Lambda and Container Protection
  • Expanded the service into new regions: UAE and Zurich
  • Ensured the service is meeting compliance, passing security reviews and penetration tests
  • Conducting monthly OLR, weekly 1:1, defining operating plan and participating escalation oncall rotations

TuSimple Tech Lead Manager, Senior Software Engineer II Jun. 2020— May. 2022

TuSimple, Site Reliability Engineering, San Diego
  • Leading and managing the Site Reliability Engineering Team and growed the team from one to seven members
  • Handled the OKR/budget planning, hiring & job posting, project management and engineer calibration
  • Worked on Traffic Engineering for L4/L7 load balancing (kube-vip & seesaw/ nginx & haproxy) (On prem & aws)
  • Developed the observability (logging, metrics), monitoring framework (cadence) and alerting stack (grafana/pagerduty)
  • Maintained the deployment platform (rancher) and the data plane of the ML Platforms (k8s & calico & volcano)
  • Supported the fuse (goofys) for the dataset, the NFS storage system (weka/DDN) and the streaming platform (Kafka)
  • Deployed vendor solutions: artifactory, github enterprise, x-ray, notary
  • Migrated SRE python and golang repos to mono repos(bazel)
  • Built the service catalog (backstage), inventory management (AWS SSM) and some other internal tools

Uber Software Engineer April. 2019— Jun. 2020

Uber, Observability Log Search Team, Palo Alto & NYC
  • Built the query layer, Lucene translator, for the next generation logging platform
  • Working on the storage layer of the new platform based on Clickhouse, etcd, zookeeper
  • Operating the existing ELK logging (storage & ingestion) stack
  • Improved the ingestion performance of Logstash pipelines

AWS Software Engineer FEB. 2017 — April.2019

Amazon, AWS Search Services, Palo Alto
  • Worked in AWS Search services team, focusing on AWS CloudSearch and Elasticsearch Service
  • Migrated Search Services infrastructure to AWS Cloudformation as infrastructure as code project
  • Expanded Elasticsearch Services to new regions, including London, Paris, Beijing, Ningxia, Sweden
  • Implemented Elasticsearch Node-to-node Encryption feature, based on AWS ELB/ACM/KMS and HAProxy
  • Leveraged day-to-day experiences to troubleshoot customer issues and maintained 100k+ EC2 instances