Lingbo Liu (刘凌波)

Data Mining Datasets

SHMetro Dataset: a large-scale dataset for metro ridership prediction [Download]
- 288 metro stations in Shanghai, China.
- Totally 811.8 million transaction records (7/01/2016-9/31/2016).
- For each station, the inflow and outflow of every 15 minutes are measured.
Related Work: Physical-Virtual Collaboration Modeling for Intra-and Inter-Station Metro Ridership Prediction
HZMetro Dataset: a large-scale dataset for metro ridership prediction [Download]
- 80 metro stations in HangZhou, China.
- 2.35 million ridership per day (01/01/2019-01/25/2019).
- For each station, the inflow and outflow of every 15 minutes are measured.
Related Work: Physical-Virtual Collaboration Modeling for Intra-and Inter-Station Metro Ridership Prediction
NYC-TOD Dataset: a large-scale dataset for taxi origin-destination demand prediction [Download]
- We diveded the New York City into a 15×5 grid map.
- A total of 132 million taxi trip records in 2014.
- We measured the taxi demands between every two regions during each 0.5 hour.
Related Work: Contextualized Spatial-Temporal Network for Taxi Origin-Destination Demand Prediction
TaxiNYC Dataset: a large-scale dataset for taxi pickup/dropoff prediction [Download]
- The New York City is diveded into a 15×7 grid map.
- A total of 132 million taxi trip records in 2014.
- For each region, the taxi pickup/dropoff demand of every 30 minutes are measured.
Related Work: Dynamic Spatial-Temporal Representation Learning for Traffic Flow Prediction

Computer Vision Datasets

RGBT Crowd Counting Benchmark : a large-scale RGBT dataset for crowd counting [Download]
- It contains consists of 2,030 pairs of 640x480 RGB-thermal images captured in various scenarios.
- 1,013 pairs are captured in the light and 1,017 pairs are in the darkness.
- A total of 138,389 pedestrians are marked with point annotations, on average 68 people per image.
Related Work: Cross-Modal Collaborative Representation Learning and a Large-Scale RGBT Benchmark for Crowd Counting
SYSU16K Landmark Dataset: a large-scale dataset for facial landmark localization in the Wild [Download]
- It contains 7317 images with 16K faces collected from the Internet.
- Each face is accurately annotated with 72 landmarks.
- The faces on this dataset exhibit various pose, expression, illumination and resolution, and may have severe occlusions.
Related Work: Facial Landmark Machines: A Backbone-Branches Architecture with Progressive Representation Learning
WildSketch Dataset: the first medium-scale benchmark for unconstrained face sketch synthesis [Download]
- It contains 800 pairs of highly-aligned face photo-sketch.
- WildSketch is of better quality and larger scale than the popular CUHK dataset and AR dataset.
- WildSketch is much more challenging for large variations in pose, expression, ethnic origin, background, and illumination.
Related Work: Unconstrained Face Sketch Synthesis via Perception-Adaptive Network and A New Benchmark