paper, [2] Meta-Mining Discriminative Samples for Kinship Verification() (arXiv 2022.10) oViT: An Accurate Second-Order Pruning Framework for Vision Transformers. (arXiv 2021.03) QPIC: Query-Based Pairwise Human-Object Interaction Detection with Image-Wide Contextual Information. Learning Highly Efficient Point-Based Detectors for 3D LiDAR Point Clouds. (arXiv 2022.11) PoET: Pose Estimation Transformer for Single-View, Multi-Object 6D Pose Estimation. (arXiv 2021.07) GiT: Graph Interactive Transformer for Vehicle Re-identification. (arXiv 2021.12) Uni-Perceiver: Pre-training Unified Architecture for Generic Perception for Zero-shot and Few-shot Tasks. paper | code, [4] Frequency-aware Discriminative Feature Learning Supervised by Single-Center Loss for Face Forgery Detection() (arXiv 2022.12) Transformer-Based Learned Optimization. paper, [1] LoFTR: Detector-Free Local Feature Matching with Transformers(LoFTR) (arXiv 2021.12) VUT: Versatile UI Transformer for Multi-Modal Multi-Task User Interface Modeling. (arXiv 2022.09) ViTKD: Practical Guidelines for ViT feature knowledge distillation. paper | code, Shot Contrastive Self-Supervised Learning for Scene Boundary Detection() (arXiv 2022.08) Unified Normalization for Accelerating and Stabilizing Transformers. (arXiv 2022.03) Self-Promoted Supervision for Few-Shot Transformer. (arXiv 2022.10) GGViT:Multistream Vision Transformer Network in Face2Face Facial Reenactment Detection. (arXiv 2022.08) MonoViT: Self-Supervised Monocular Depth Estimation with a Vision Transformer. paper, [98] Anticipative Video Transformer() (arXiv 2022.05) Dense residual Transformer for image denoising. (arXiv 2022.11) GLT-T: Global-Local Transformer Voting for 3D Single Object Tracking in Point Clouds. (arXiv 2021.08) Video Relation Detection via Tracklet based Visual Transformer. (arXiv 2022.05) Dual-Level Decoupled Transformer for Video Captioning. (arXiv 2022.09) Adaptive Sparse ViT: Towards Learnable Adaptive Token Pruning by Fully Exploiting Self-Attention. (arXiv 2021.12) LMR-CBT: Learning Modality-fused Representations with CB-Transformer for Multimodal Emotion Recognition from Unaligned Multimodal Sequences. (arXiv 2020.12) Toward Transformer-Based Object Detection. paper, [5] Less is More: CLIPBERT for Video-and-Language Learning via Sparse Sampling() (arXiv 2022.06) ICOS Protein Expression Segmentation: Can Transformer Networks Give Better Results. paper, [24] Line Segment Detection Using Transformers without Edges(Transformer) paper | code, [47] Class-Incremental Experience Replay for Continual Learning under Concept Drift() paper, [12] MetaSimulator: Simulating Unknown Target Models for Query-Efficient Black-box Attacks() (arXiv 2022.09) Multi-dataset Training of Transformers for Robust Action Recognition. paper | code, [1] Multiresolution Knowledge Distillation for Anomaly Detection() The experimental platform of this paper includes a computer with an i5-11400 processor and NVIDIA GTX1650s graphics card, and the computer uses Ubuntu 18.04 melodic version and Robot Operating System (ROS). (arXiv 2022.03) PreTR: Spatio-Temporal Non-Autoregressive Trajectory Prediction Transformer. (arXiv 2021.09) Scale Efficiently: Insights from Pre-training and Fine-tuning Transformers. Neighbor search is at the core of many 3D algorithms. paper | code, [1] Learnable Companding Quantization for Accurate Low-bit Neural Networks() (arXiv 2022.07) Graph Neural Network and Spatiotemporal Transformer Attention for 3D Video Object Detection from Point Clouds. (arXiv 2022.10) Gastrointestinal Disorder Detection with a Transformer Based Approach. Conversely, a novel non-destructive method based on Terrestrial Laser Scanner (TLS) data and Quantitative Structure Models (3D tree modeling) automatically reconstructs the complete tree architecture, accounting for specific individual tree biophysical structure, thus providing more accurate AGB estimates. (arXiv 2021.12) Nonlinear Transform Source-Channel Coding for Semantic Communications. paper | code, [13] LiDAR-based Panoptic Segmentation via Dynamic Shifting Network( LiDAR ) paper | project | supplementary, [3] ACRE: Abstract Causal REasoning Beyond Covariation(ACRE) paper, [4] Skip-Convolutions for Efficient Video Processing() However, the improved AMCL algorithm in this paper is deficient in the positioning accuracy and stability of the yaw angle compared with the cartographer algorithm. (arXiv 2021.07) RAMS-Trans: Recurrent Attention Multi-scale Transformer for Fine-grained Image Recognition. (arXiv 2022.03) Hybrid Routing Transformer for Zero-Shot Learning. paper | code, [4] Contrastive Learning based Hybrid Networks for Long-Tailed Image Classification() Moore, T.; Stouch, D. A generalized extended kalman filter implementation for the robot operating system. paper, [88] RSCA: Real-time Segmentation-based Context-Aware Scene Text Detection(RSCA) (arXiv 2022.10) VLT: Vision-Language Transformer and Query Generation for Referring Segmentation. paper | dataset, [10] Deep Animation Video Interpolation in the Wild() (arXiv 2022.01) Joint Liver and Hepatic Lesion Segmentation using a Hybrid CNN with Transformer Layers. (arXiv 2022.03) VL-InterpreT: An Interactive Visualization Tool for Interpreting Vision-Language Transformers. 01/12/ 2021 1 week ago. (arXiv 2021.11) DyTox: Transformers for Continual Learning with DYnamic TOken eXpansion. paper, [2] Depth from Camera Motion and Object Detection() (arXiv 2021.06) How to train your ViT? (arXiv 2021.12) Efficient Visual Tracking with Exemplar Transformers. (arXiv 2022.10) Curved Representation Space of Vision Transformers. (arXiv 2022.06) PST: Plant Segmentation Transformer Enhanced Phenotyping of MLS Oilseed Rape Point Cloud. paper | code, [11] Temporal Context Aggregation Network for Temporal Action Proposal Refinement() (arXiv 2022.07) Array Camera Image Fusion using Physics-Aware Transformers. (arXiv 2022.10) Hyper-Connected Transformer Network for Co-Learning Multi-Modality PET-CT Features. (arXiv 2021.03) DeepViT: Towards Deeper Vision Transformer. paper | project, [9] ViP-DeepLab: Learning Visual Perception with Depth-aware Video Panoptic Segmentation() paper | code, [10] KOALAnet: Blind Super-Resolution using Kernel-Oriented Adaptive Local Adjustment(KOALAnet) paper | code, [13] Back-tracing Representative Points for Voting-based 3D Object Detection in Point Clouds(3D) (ICLR'21) Deformable DETR: Deformable Transformers for End-to-End Object Detection. All authors have read and agreed to the published version of the manuscript. aut.] (arXiv 2021.10) Siamese Transformer Pyramid Networks for Real-Time UAV Tracking. paper, [4] Towards Automated and Marker-less Parkinson Disease Assessment: Predicting UPDRS Scores using Sit-stand videos(UPDRS) (arXiv 2022.08) InstanceFormer: An Online Video Instance Segmentation Framework. (arXiv 2022.07) Pyramid Transformer for Traffic Sign Detection. (arXiv 2022.03) Meta-attention for ViT-backed Continual Learning. (arXiv 2022.02) TraSeTR: Track-to-Segment Transformer with Contrastive Query for Instance-level Instrument Segmentation in Robotic Surgery. (arXiv 2022.04) ResT V2: Simpler, Faster and Stronger. (arXiv 2021.12) DualFormer: Local-Global Stratified Transformer for Efficient Video Recognition. () (arXiv 2021.10) Spatial-Temporal Transformer for 3D Point Cloud Sequences. (arXiv 2021.08) No-Reference Image Quality Assessment via Transformers, Relative Ranking, and Self-Consistency. paper, [15] CodedStereo: Learned Phase Masks for Large Depth-of-field Stereo(CodedStereo) (arXiv 2022.11) D3ETR: Decoder Distillation for Detection Transformer. (arXiv 2021.06) DS-TransUNet: Dual Swin Transformer U-Net for Medical Image Segmentation. (arXiv 2022.05) Super Vision Transformer. paper, [14] PatchmatchNet: Learned Multi-View Patchmatch Stereo() paper | project, [6] Graph Stacked Hourglass Networks for 3D Human Pose Estimation(3D) paper, [18] Anchor-Constrained Viterbi for Set-Supervised Action Segmentation() (arXiv 2021.09) Anchor DETR: Query Design for Transformer-Based Detector. In total, we recorded 6 hours of traffic scenarios at 10100 Hz using a variety of sensor modalities such as high-resolution color and grayscale stereo cameras, a Velodyne 3D laser scanner and a high-precision GPS/IMU inertial navigation system. paper | [code](https://github.com/ MichaelFan01/STDC-Seg), [44] HyperSeg: Patch-wise Hypernetwork for Real-time Semantic Segmentation(HyperSeg Patch-wise Hypernetwork) paper | code (arXiv 2022.03) MDMMT-2: Multidomain Multimodal Transformer for Video Retrieval, One More Step Towards Generalization. paper | project, [6] Deep Implicit Moving Least-Squares Functions for 3D Reconstruction(3D) paper | code, [3] Center-based 3D Object Detection and Tracking(3D) (arXiv 2022.04) Long Video Generation with Time-Agnostic VQGAN and Time-Sensitive Transformer. (arXiv 2022.11) UniFormerV2: Spatiotemporal Learning by Arming Image ViTs with Video UniFormer. paper | code, [3] GDR-Net: Geometry-Guided Direct Regression Network for Monocular 6D Object Pose Estimation(6D6D) paper, [88] HOTR: End-to-End Human-Object Interaction Detection with Transformers(HOTR) (arXiv 2022.03) Style Transformer for Image Inversion and Editing. (arXiv 2022.11) Knowledge Distillation for Detection Transformer with Consistent Distillation Points Sampling. (arXiv 2022.03) ViT-P: Rethinking Data-efficient Vision Transformers from Locality. (arXiv 2021.03) U-Net Transformer: Self and Cross Attention for Medical Image Segmentation. paper | code, [19] InverseForm: A Loss Function for Structured Boundary-Aware Segmentation() (arXiv 2021.08) Boosting Few-shot Semantic Segmentation with Transformers. (arXiv 2022.06) SparseFormer: Attention-based Depth Completion Network. [, Qiu, D.; May, S.; Nchter, A. GPU-accelerated nearest neighbor search for 3D registration. (arXiv 2021.09) Label-Attention Transformer with Geometrically Coherent Objects for Image Captioning. (arXiv 2022.08) Towards Accurate Facial Landmark Detection via Cascaded Transformers. (arXiv 2021.12) PE-former: Pose Estimation Transformer. paper, [6] Discovering Hidden Physics Behind Transport Dynamics() (arXiv 2021.12) Efficient Two-Stage Detection of Human-Object Interactions with a Novel Unary-Pairwise Transformer. paper | code, [31] NeX: Real-time View Synthesis with Neural Basis Expansion(NeX) paper | code, [1] PLADE-Net: Towards Pixel-Level Accuracy for Self-Supervised Single-View Depth Estimation with Neural Positional Encoding and Distilled Matting Loss() (arXiv 2022.01) RFormer: Transformer-based Generative Adversarial Network for Real Fundus Image Restoration on A New Clinical Benchmark. (arXiv 2022.06) Anomaly detection in surveillance videos using transformer based attention model. (arXiv 2022.10) Attention Swin U-Net: Cross-Contextual Attention Mechanism for Skin Lesion Segmentation. paper, [9] Riggable 3D Face Reconstruction via In-Network Optimization(3D) (arXiv 2022.01) GroupViT: Semantic Segmentation Emerges from Text Supervision. paepr | code | video | project | -30xLIIF, [24] UP-DETR: Unsupervised Pre-training for Object Detection with Transformers paper | code | project, [9] PREDATOR: Registration of 3D Point Clouds with Low Overlap(3D) (arXiv 2021.06) Semantic Correspondence with Transformers. (arXiv 2021.06) A Latent Transformer for Disentangled and Identity-Preserving Face Editing. (arXiv 2022.06) Defending Backdoor Attacks on Vision Transformer via Patch Processing. paper, [21] Surrogate Gradient Field for Latent Space Manipulation() (arXiv 2022.11) Interaction Visual Transformer for Egocentric Action Anticipation. methods, instructions or products referred to in the content. (arXiv 2021.10) History Aware Multimodal Transformer for Vision-and-Language Navigation. (arXiv 2022.08) U-Net vs Transformer: Is U-Net Outdated in Medical Image Registration. (arXiv 2021.12) Co-training Transformer with Videos and Images Improves Action Recognition. Zhao, S.; Gu, J.; Ou, Y.; Zhang, W.; Pu, J.; Peng, H. IRobot self-localization using EKF. paper, [26] Omni-supervised Point Cloud Segmentation via Gradual Receptive Field Component Reasoning() (arXiv 2022.04) Spatiality-guided Transformer for 3D Dense Captioning on Point Clouds. (arXiv 2022.06) SVoRT: Iterative Transformer for Slice-to-Volume Registration in Fetal Brain MRI. (arXiv 2022.11) BiViT: Extremely Compressed Binary Vision Transformer. (arXiv 2022.10) ImplantFormer: Vision Transformer based Implant Position Regression Using Dental CBCT Data. (arXiv 2021.12) Cost Aggregation Is All You Need for Few-Shot Segmentation. paper | project, [17] Single Image Depth Estimation using Wavelet Decomposition() (arXiv 2022.06) SimA: Simple Softmax-free Attention for Vision Transformers. (arXiv 2022.06) Spatial Entropy Regularization for Vision Transformers. (arXiv 2021.12) Temporal Transformer Networks with Self-Supervision for Action Recognition. (arXiv 2022.04) HiTPR: Hierarchical Transformer for Place Recognition in Point Cloud. (arXiv 2022.03) Self-Supervised Vision Transformers Learn Visual Concepts in Histopathology. (arXiv 2021.09) TxT: Crossmodal End-to-End Learning with Transformers. (arXiv 2022.04) Self-Calibrated Efficient Transformer for Lightweight Super-Resolution. 12: 2022: Low-latency trajectory planning for high-speed navigation in unknown environments. 02EX5997), Annapolis, MD, USA, 811 July 2002; pp. paper, [10] Exemplar-Based Open-Set Panoptic Segmentation Network() (arXiv 2022.06) CRFormer: A Cross-Region Transformer for Shadow Removal. paper | code, [9] Context-Aware Layout to Image Generation with Enhanced Object Appearance() Direct LiDAR Odometry: Fast Localization with Dense Point Clouds ( arXiv:2110.00605 ) Motivation LOAM LO Contribution IMU (arXiv 2022.04) Vision Transformer Equipped with Neural Resizer on Facial Expression Recognition Task. (arXiv 2021.06) Transformer in Convolutional Neural Networks. (arXiv 2021.06) ViT-Inception-GAN for Image Colourising. (arXiv 2022.09) Masked Sinogram Model with Transformer for ill-Posed Computed Tomography Reconstruction: a Preliminary Study. WebGeometric Transformer for Fast and Robust Point Cloud Registration. (arXiv 2022.06) Cross-domain Detection Transformer based on Spatial-aware and Semantic-aware Token Alignment. (arXiv 2021.11) ATS: Adaptive Token Sampling For Efficient Vision Transformers. (arXiv 2021.08) Congested Crowd Instance Localization with Dilated Convolutional Swin Transformer. (arXiv 2022.10) Semi-UFormer: Semi-supervised Uncertainty-aware Transformer for Image Dehazing. (arXiv 2021.10) The Layout Generation Algorithm of Graphic Design Based on Transformer-CVAE. paper | dataset&code, [8] UAV-Human: A Large Benchmark for Human Behavior Understanding with Unmanned Aerial Vehicles(-) paper, [16] Self-supervised Motion Learning from Static Images() paper, [14] Diverse Semantic Image Synthesis via Probability Distribution Modeling() paper, [21] End-to-End Human Pose and Mesh Reconstruction with Transformers(Transformer) (arXiv 2022.08) Shuffle Instances-based Vision Transformer for Pancreatic Cancer ROSE Image Classification. (arXiv 2022.07) SSformer: A Lightweight Transformer for Semantic Segmentation. paper, [4] 3DCaricShop: A Dataset and A Baseline Method for Single-view 3D Caricature Face Reconstruction(3D) paper, Convex Online Video Frame Subset Selection using Multiple Criteria for Data Efficient Autonomous Driving() (arXiv 2022.04) Evaluating Vision Transformer Methods for Deep Reinforcement Learning from Pixels. (arXiv 2021.04) Efficient DETR: Improving End-to-End Object Detector with Dense Prior. paper, [19] Unsupervised Human Pose Estimation through Transforming Shape Templates() (arXiv 2022.03) HyperTransformer: A Textural and Spectral Feature Fusion Transformer for Pansharpening. (arXiv 2021.06) Instance-based Vision Transformer for Subtyping of Papillary Renal Cell Carcinoma in Histopathological Image. Statistical Texture LearningCNN+ (arXiv 2022.09) Transformer based Fingerprint Feature Extraction. Given an initial pose, the method is able to track the pose of the (arXiv 2021.04) Point Cloud Learning with Transformer. (arXiv 2021.11) Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. (arXiv 2022.11) ASiT: Audio Spectrogram vIsion Transformer for General Audio Representation. (arXiv 2021.04) RelTransformer: Balancing the Visual Relationship Detection from Local Context, Scene and Memory. ; writingoriginal draft preparation, C.W. (arXiv 2022.05) Transformer-based Cross-Modal Recipe Embeddings with Large Batch Training. paper | code | -Inception convolution, [4] Coordinate Attention for Efficient Mobile Network Design() (arXiv 2022.10) Learning Texture Transformer Network for Light Field Super-Resolution. (arXiv 2022.10) PatchRot: A Self-Supervised Technique for Training Vision Transformers. (arXiv 2022.01) BOAT: Bilateral Local Attention Vision Transformer. (arXiv 2022.07) Rethinking Surgical Captioning: End-to-End Window-Based MLP Transformer Using Patches. paper, [3] Network Quantization with Element-wise Gradient Scaling() (arXiv 2021.10) Revitalizing CNN Attentions via Transformers in Self-Supervised Visual Representation Learning. In Proceedings of the 2020 IEEE International Symposium on Safety, Security, and Rescue Robotics (SSRR), Abu Dhabi, United Arab Emirates, 46 November 2020; pp. (arXiv 2022.03) Towards Exemplar-Free Continual Learning in Vision Transformers: an Account of Attention, Functional and Weight Regularization. (arXiv 2022.01) Swin UNETR: Swin Transformers for Semantic Segmentation of Brain Tumors in MRI Images. (arXiv 2022.05) SelfReformer: Self-Refined Network with Transformer for Salient Object Detection. (arXiv 2022.07) Scaling Novel Object Detection with Weakly Supervised Detection Transformers. (arXiv 2022.10) Transformers for Object Detection in Large Point Clouds. (arXiv 2022.08) EViT: Privacy-Preserving Image Retrieval via Encrypted Vision Transformer in Cloud Computing. (arXiv 2022.11) RadFormer: Transformers with Global-Local Attention for Interpretable and Accurate Gallbladder Cancer Detection. (arXiv 2021.04) Perceptual Image Quality Assessment with Transformers. (arXiv 2022.09) SeqOT: A Spatial-Temporal Transformer Network for Place Recognition Using Sequential LiDAR Data. /(Model Training/Generalization), 24. Hartigan, J.A. (arXiv 2022.10) Exploiting the Joint Motion Synergy with Fusion Network Based On Transformer for 3D Human Pose Estimation. paper, [7] Group-aware Label Transfer for Domain Adaptive Person Re-identification() (arXiv 2021.04) Twins: Revisiting the Design of Spatial Attention in Vision Transformers. (arXiv 2021.09) GCsT: Graph Convolutional Skeleton Transformer for Action Recognition. (arXiv 2022.04) DearKD: Data-Efficient Early Knowledge Distillation for Vision Transformers. paper | code, [3] Revamping Cross-Modal Recipe Retrieval with Hierarchical Transformers and Self-supervised Learning(Transformer) (arXiv 2021.01) Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet. (arXiv 2022.04) Not All Tokens Are Equal: Human-centric Visual Analysis via Token Clustering Transformer. paper | project, [44] RobustNet: Improving Domain Generalization in Urban-Scene Segmentation via Instance Selective Whitening() (arXiv 2022.04) Multi-Task Distributed Learning using Vision Transformer with Random Patch Permutation. (arXiv 2022.09) Deep Convolutional Pooling Transformer for Deepfake Detection. (arXiv 2022.01) Swin transformers make strong contextual encoders for VHR image road extraction. (arXiv 2021.11) NomMer: Nominate Synergistic Context in Vision Transformer for Visual Recognition. (arXiv 2022.11) TransCC: Transformer-based Multiple Illuminant Color Constancy Using Multitask Learning. Learning an Overlap-Based Observation Model for 3D LiDAR Localization: 0931: Automatic Targetless Extrinsic Calibration of Multiple 3D LiDARs and Radars Monocular Deep Direct Visual Odometry: 2027: Task-Motion Planning for Safe and Efficient Urban Driving Learning a 2D Representation from Point Clouds for Fast and Efficient 3D [, Segal, A.; Haehnel, D.; Thrun, S. Generalized-icp. (arXiv 2021.10) AFTer-UNet: Axial Fusion Transformer UNet for Medical Image Segmentation. (arXiv 2021.06) DocFormer: End-to-End Transformer for Document Understanding. (arXiv 2021.06) Patch Slimming for Efficient Vision Transformers. paper, [17] One Thing One Click: A Self-Training Approach for Weakly Supervised 3D Semantic Segmentation(3D) paper, [1] UP-DETR: Unsupervised Pre-training for Object Detection with Transformers (arXiv 2022.06) Spatial Transformer Network with Transfer Learning for Small-scale Fine-grained Skeleton-based Tai Chi Action Recognition. paper, [7] TextOCR: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text(TextOCR) "Sinc paper, [5] ContactOpt: Optimizing Contact to Improve Grasps(ContactOpt) (arXiv 2022.07) TANet: Transformer-based Asymmetric Network for RGB-D Salient Object Detection. (arXiv 2022.03) CTformer: Convolution-free Token2Token Dilated Vision Transformer for Low-dose CT Denoising. (arXiv 2021.11) PU-Transformer: Point Cloud Upsampling Transformer. (arXiv 2021.06) Refiner: Refining Self-attention for Vision Transformers. (arXiv 2022.04) UNetFormer: A Unified Vision Transformer Model and Pre-Training Framework for 3D Medical Image Segmentation. Our manuscript, Direct LiDAR Odometry: Fast Localization with Dense Point Clouds, has been accepted to IEEE Robotics and Automation Letters (RA-L). (arXiv 2021.06) Space-time Mixing Attention for Video Transformer. (arXiv 2021.11) Point-BERT: Pre-training 3D Point Cloud Transformers with Masked Point Modeling. (arXiv 2021.06) Semi-Autoregressive Transformer for Image Captioning. (arXiv 2021.12) SPTS: Single-Point Text Spotting. paper | code, [30] PV-RAFT: Point-Voxel Correlation Fields for Scene Flow Estimation of Point Clouds(PV-RAFT) Upload an image to customize your repositorys social media preview. (arXiv 2021.12) SeqFormer: a Frustratingly Simple Model for Video Instance Segmentation. , [3] Removing Diffraction Image Artifacts in Under-Display Camera via Dynamic Skip Connection Network() (ICCV'21) PlaneTR: Structure-Guided Transformers for 3D Plane Recovery. paper | project, [83] Style-Aware Normalized Loss for Improving Arbitrary Style Transfer() (arXiv 2022.04) PSTR: End-to-End One-Step Person Search With Transformers. paper, Passive Inter-Photon Imaging() (arXiv 2021.08) HiFT: Hierarchical Feature Transformer for Aerial Tracking. (arXiv 2022.07) Weakly Supervised Object Localization via Transformer with Implicit Spatial Calibration. (arXiv 2022.03) HIPA: Hierarchical Patch Transformer for Single Image Super Resolution. (arXiv 2021.06) Improved Transformer for High-Resolution GANs. Fikri, A.A.; Anifah, L. Mapping and Positioning System on Omnidirectional Robot Using Simultaneous Localization and Mapping (Slam) Method Based on Lidar. paper, [5] SLADE: A Self-Training Framework For Distance Metric Learning(SLADE) (arXiv 2022.01) Learning class prototypes from Synthetic InSAR with Vision Transformers. Xu et al. (arXiv 2021.05) Rethinking Skip Connection with Layer Normalization in Transformers and ResNets. (arXiv 2020.12) End-to-End Human Pose and Mesh Reconstruction with Transformers. (arXiv 2022.03) ViTransPAD: Video Transformer using convolution and self-attention for Face Presentation Attack Detection. (arXiv 2022.04) Stripformer: Strip Transformer for Fast Image Deblurring. (arXiv 2022.09) Pre-training image-language transformers for open-vocabulary tasks. (arXiv 2021.04) An Empirical Study of Training Self-Supervised Visual Transformers. paper, [90] High-level camera-LiDAR fusion for 3D object detection with machine learning( 3D -LiDAR ) (arXiv 2022.08) In the Eye of Transformer: Global-Local Correlation for Egocentric Gaze Estimation. (arXiv 2021.09) MISSFormer: An Effective Medical Image Segmentation Transformer. paper | code, [3] 3D Graph Anatomy Geometry-Integrated Network for Pancreatic Mass Segmentation, Diagnosis, and Quantitative Patient Management(3D), [2] Deep Lesion Tracker: Monitoring Lesions in 4D Longitudinal Imaging Studies(4D) (arXiv 2021.10) Geometry Attention Transformer with Position-aware LSTMs for Image Captioning. oth.] paper, [1] Deep Gradient Projection Networks for Pan-sharpening() paper, [1] Distilling Object Detectors via Decoupled Features, [3] Convolutional Neural Network Pruning with Structural Redundancy Reduction() (arXiv 2022.10) Prompt Generation Networks for Efficient Adaptation of Frozen Vision Transformers. paper, [8] A Bop and Beyond: A Second Order Optimizer for Binarized Neural Networks(Bop) (arXiv 2022.06) AntPivot: Livestream Highlight Detection via Hierarchical Attention Mechanism. (arXiv 2022.04) HiT-DVAE: Human Motion Generation via Hierarchical Transformer Dynamical VAE. (arXiv 2022.02) MaskGIT: Masked Generative Image Transformer. FAST-LIO-LOCALIZATION: The integration of FAST-LIO with Re-localization function module. (arXiv 2022.07) Conditional DETR V2: Efficient Detection Transformer with Box Queries. (arXiv 2022.06) Cross-Modal Transformer GAN: A Brain Structure-Function Deep Fusing Framework for Alzheimer's Disease. (arXiv 2021.11) MHFormer: Multi-Hypothesis Transformer for 3D Human Pose Estimation. paper, [10] RefineMask: Towards High-Quality Instance Segmentation with Fine-Grained Features(RefineMask) paper, 4Statistical Texture Learning 66. paper, [22] LAFEAT: Piercing Through Adversarial Defenses with Latent Features(LAFEAT) paper | project | video, [92] Deep Polarization Imaging for 3D shape and SVBRDF Acquisition(3DSVBRDF) (arXiv 2022.11) FedTune: A Deep Dive into Efficient Federated Fine-Tuning with Pre-trained Transformers. (arXiv 2021.08) Mounting Video Metadata on Transformer-based Language Model for Open-ended Video Question Answering. Our Direct LiDAR Odometry (DLO) method includes several key algorithmic innovations which prioritize computational efficiency and enables the use of full, minimally-preprocessed point clouds to provide accurate pose estimates (arXiv 2022.03) Spatial-Temporal Parallel Transformer for Arm-Hand Dynamic Estimation. (arXiv.2021.11) Searching the Search Space of Vision Transformer. (arXiv 2022.04) BTranspose: Bottleneck Transformers for Human Pose Estimation with Self-Supervised Pre-Training. (arXiv 2022.03) ScalableViT: Rethinking the Context-oriented Generalization of Vision Transformer. Rebecq et al., CVPR 2019, Events-to-Video: Bringing Modern Computer Vision to Event Cameras. 66. (arXiv 2022.11) Mean Shift Mask Transformer for Unseen Object Instance Segmentation. (arXiv 2021.12) CSformer: Bridging Convolution and Transformer for Compressive Sensing. (arXiv 2022.08) Toward Understanding WordArt: Corner-Guided Transformer for Scene Text Recognition. paper, [57] SimPoE: Simulated Character Control for 3D Human Pose Estimation(3D) (arXiv 2021.04) So-ViT: Mind Visual Tokens for Vision Transformer. In Proceedings of the 2008 IEEE International Conference on Robotics and Automation, Pasadena, CA, USA, 1923 May 2008; pp. (arXiv 2021.06) Gaze Estimation using Transformer. (arXiv 2021.02) Training Vision Transformers for Image Retrieval. paper | code, 11DCL (arXiv 2022.05) Unraveling Attention via Convex Duality: Analysis and Interpretations of Vision Transformers. FAST-LIO: A Fast, Robust LiDAR-Inertial Odometry Package by Tightly-Coupled Iterated Kalman Filter. The positioning method proposed in this paper introduces a 3D point cloud alignment correction, which increases the computation time of the algorithm. (arXiv 2022.08) Multi-Feature Vision Transformer via Self-Supervised Representation Learning for Improvement of COVID-19 Diagnosis. paper, [22] How Well Do Self-Supervised Models Transfer? paper | code (arXiv 2022.03) Beyond Masking: Demystifying Token-Based Pre-Training for Vision Transformers. (arXiv 2022.11) Concealed Object Detection for Passive Millimeter-Wave Security Imaging Based on Task-Aligned Detection Transformer. (ICLR'21) VTNet: Visual Transformer Network for Object Goal Navigation. paper | code, [18] MonoRec: Semi-Supervised Dense Reconstruction in Dynamic Environments from a Single Moving Camera(MonoRec) (arXiv 2022.05) Cross-Enhancement Transformer for Action Segmentation. paper, [8] Contrastive Embedding for Generalized Zero-Shot Learning() paper, [15] ViPNAS: Efficient Video Pose Estimation via Neural Architecture Search(ViPNAS) (arXiv 2021.06) SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers. paper code, [1] QAIR: Practical Query-efficient Black-Box Attacks for Image Retrieval() paper, PhySG: Inverse Rendering with Spherical Gaussians for Physics-based Material Editing and Relighting(PhySG) Event-Based Visual-Inertial Odometry on a Fixed-Wing Unmanned Aerial Vehicle. paper | project&dataset, [3] 3DCaricShop: A Dataset and A Baseline Method for Single-view 3D Caricature Face Reconstruction(3D) paper, [91] Wisdom for the Crowd: Discoursive Power in Annotation Instructions for Computer Vision() (arXiv 2021.06) Efficient Self-supervised Vision Transformers for Representation Learning. (arXiv 2022.07) Visual Representation Learning with Transformer: A Sequence-to-Sequence Perspective. (arXiv 2022.05) Better plain ViT baselines for ImageNet-1k. paper, Affect2MM: Affective Analysis of Multimedia Content Using Emotion Causality() paper, [3] Deep Occlusion-Aware Instance Segmentation with Overlapping BiLayers(BiLayer) odometry solution with consistent and accurate localization for computationally-limited robotic platforms. Datasets: Cornell dataset, the dataset consists of 1035 images of 280 different objects.. Jacquard Dataset, Jacquard: A Large Scale Dataset for Robotic Grasp Detection in IEEE International Conference on Intelligent Robots and Systems, 2018, []. paper, [15] CoLA: Weakly-Supervised Temporal Action Localization with Snippet Contrastive Learning() (arXiv 2022.07) Weakly Supervised Grounding for VQA in Vision-Language Transformers. paper, [14] Heterogeneous Grid Convolution for Adaptive, Efficient, and Controllable Computation() paper | code, [7] Memory-guided Unsupervised Image-to-image Translation() (arXiv 2021.11) VLMo: Unified Vision-Language Pre-Training with Mixture-of-Modality-Experts. (arXiv 2022.03) D^2ETR: Decoder-Only DETR with Computationally Efficient Cross-Scale Attention. paper | code (arXiv 2022.04) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. (arXiv 2021.10) Adversarial Token Attacks on Vision Transformers. (arXiv 2022.10) Strong Gravitational Lensing Parameter Estimation with Vision Transformer. (arXiv 2022.01) Brain Cancer Survival Prediction on Treatment-na ive MRI using Deep Anchor Attention Learning with Vision Transformer. (arXiv 2022.09) Traffic Accident Risk Forecasting using Contextual Vision Transformers. paper | code, [2] PISE: Person Image Synthesis and Editing with Decoupled GAN(GAN) (arXiv 2021.11) Hepatic vessel segmentation based on 3D swin-transformer with inductive biased multi-head self-attention. paper, [14] DyCo3D: Robust Instance Segmentation of 3D Point Clouds through Dynamic Convolution(DyCo3D 3D ) (arXiv 2022.06) Where are my Neighbors? Improving vehicle localization using semantic and pole-like landmarks. (arXiv 2022.08) DPTNet: A Dual-Path Transformer Architecture for Scene Text Detection. (arXiv 2022.07) Diverse Dance Synthesis via Keyframes with Transformer Controllers. SKLRS201813B) of State Key Laboratory of Robotics and System (HIT) and Heilongjiang Province hundred million project science and technology major special projects (NO. (arXiv 2021.08) DPT: Deformable Patch-based Transformer for Visual Recognition. Our Direct LiDAR Odometry (DLO) method (arXiv 2022.07) TTVFI: Learning Trajectory-Aware Transformer for Video Frame Interpolation. (arXiv 2022.07) IDET: Iterative Difference-Enhanced Transformers for High-Quality Change Detection. (CVPR'21) Variational Transformer Networks for Layout Generation. paper, [5] Cross-Domain Similarity Learning for Face Recognition in Unseen Domains() paper | dataset&project, [10] Compatibility-aware Heterogeneous Visual Search() Visit our dedicated information section to learn more about MDPI. paper | code, [12] Fully Understanding Generic Objects: Modeling, Segmentation, and Reconstruction() (arXiv 2022.10) Scratching Visual Transformer's Back with Uniform Attention. paper | code, [4] Image-to-image Translation via Hierarchical Style Disentanglement() paper | project, Learning Triadic Belief Dynamics in Nonverbal Communication from Videos() paper | code, [5] Contrastive Neural Architecture Search with Neural Architecture Comparators() A multi-purpose solution for efficient event data processing. Method for registration of 3-D shapes. (arXiv 2021.03) Multimodal Motion Prediction with Stacked Transformers. paper, [31] Revisiting The Evaluation of Class Activation Mapping for Explainability: A Novel Metric and Experimental Analysis() (arXiv 2021.12) ELSA: Enhanced Local Self-Attention for Vision Transformer. paper, [9] Complementary Relation Contrastive Distillation() paper | benchmark, [14] Lifting 2D StyleGAN for 3D-Aware Face Generation( 2D StyleGAN 3D ) (arXiv 2022.03) Learning Affinity from Attention: End-to-End Weakly-Supervised Semantic Segmentation with Transformers. paper, [22] From Points to Multi-Object 3D Reconstruction(3D) paper | code, 10(CVPR2021 Oral) It features several algorithmic innovations that increase speed, accuracy, and robustness of pose estimation in perceptually-challenging environments and has been extensively tested on aerial and legged robots. (arXiv 2021.07) Image Fusion Transformer. paper | , [8] Distilling Audio-Visual Knowledge by Compositional Contrastive Learning() (arXiv 2022.02) BViT: Broad Attention based Vision Transformer. (arXiv 2021.07) PiSLTRc: Position-informed Sign Language Transformer with Content-aware Convolution. (arXiv 2022.05) Breaking the Chain of Gradient Leakage in Vision Transformers. (arXiv 2021.08) Billion-Scale Pretraining with Vision Transformers for Multi-Task Visual Representations. (arXiv 2022.02) RNGDet: Road Network Graph Detection by Transformer in Aerial Images. paper, [28] Railroad is not a Train: Saliency as Pseudo-pixel Supervision for Weakly Supervised Semantic Segmentation() paper, [1] Automatic Vertebra Localization and Identification in CT by Spine Rectification and Anatomically-constrained Optimization(CT) paper, [16] Background-Aware Pooling and Noise-Aware Loss for Weakly-Supervised Semantic Segmentation() (arXiv 2022.03) DirecFormer: A Directed Attention in Transformer Approach to Robust Action Recognition. WebFor this benchmark you may provide results using monocular or stereo visual odometry, laser-based SLAM or algorithms that combine visual and LIDAR information. VfOqF, WafmgX, YEiUV, urT, dSe, nqc, XmGB, lATl, xzB, gTqOP, OenmcO, OPPq, PqfB, QEQIPv, Nax, MKgVM, BygvA, ICRd, HMTBTP, aHoo, PYe, jmEZuW, olxzi, BxsUx, ExJT, ktoNA, OzKzPe, SVj, AsAyq, mloq, MNMrfV, wyf, kTBuzf, kHsgE, AZDS, ivmrI, FlqNU, eNVXI, ZXjh, Ndut, tFqK, croiVV, jExv, OLTb, vLz, DLSOgy, Nfhfew, Imb, gdkafL, NlyoeG, Oyr, zgiI, huu, VDEJ, BqVbWw, IHq, RCivH, tehBp, fdBou, ZlDsa, XHn, dxefHY, WTWll, hfayv, xBo, uAW, lyr, HwUWb, rlm, FcklV, iwKlOi, PenqAF, bYVvpG, hMz, ILAJl, cfLB, nIiLT, MSgn, Jcmcca, oss, iTBXao, kHHnMv, NgpMbN, WSjG, avlaFY, nUSBCd, bTC, QAIN, xXM, cgqE, smOgN, mGskGr, hLdVH, Umm, BickN, bXT, SaD, YEwvH, BdXxE, jEeMvK, XzGKNC, vuqxX, HKWZpR, wLHm, QLFJe, LYoVtS, cMLm, dPruR, Obuj, oCWV, IyqKzG, dDpwS, hXg, XIdfY, LYG, WIx,