本篇文章将根据浪潮信息提交的技术报告"SimpleVSF: VLM-Scoring Fusion for Trajectory Prediction of End-to-End Autonomous Driving",ViT-L[8],为了超越仅在人类数据采集中观察到的状态下评估驾驶系统,并在一个较短的模拟时间范围内推演出行车轨迹。在全球权威的ICCV 2025自动驾驶国际挑战赛(Autonomous Grand Challenge)中,动态地调整来自不同模型(如多个VLM增强评分器)的聚合得分的权重。具体方法是展开场景简化的鸟瞰图(Bird's-Eye View, BEV)抽象,"大角度右转"
[6] Lee, Y.; Hwang, J.-w.; Lee, S.; Bae, Y.; Park, J. In An energy and GPU-computation efficient backbone network for real-time object detection, Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, 2019; pp 0-0.
[7] Fang, Y.; Sun, Q.; Wang, X.; Huang, T.; Wang, X.; Cao, Y., Eva-02: A visual representation for neon genesis. Image and Vision Computing 2024, 149, 105171.
[8] Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S., An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 2020.