Skip to content

5. Appendix

Appendix A

Variable Dictionary: Outcome and Structural Controls

Variable Type Construction
resale_price Continuous Raw transaction field (SGD)
log_resale_price Continuous np.log(resale_price) — dependent variable in all regressions
floor_area_sqm Continuous Raw transaction field
flat_type Categorical Ordinal encoding: 1-room=1, 2-room=2, 3-room=3, 4-room=4, 5-room=5, Executive=6, Multi-Generation=7; used as C(flat_type)
storey_mid Continuous (storey_low + storey_high) / 2
remaining_lease_year Continuous Raw field; imputed as 99 − (transaction_year − lease_commence_date) where missing
town Categorical Raw transaction field; used as C(town)

School Exposure Variables

Variable Type Construction
top_20_percent_count_0_1km Integer Count of top-quintile SRS schools within 0–1 km (BallTree spatial join)
top_20_percent_count_1_2km Integer Count of top-quintile SRS schools within 1–2 km
bottom_20_percent_count_0_1km Integer Count of bottom-quintile SRS schools within 0–1 km
bottom_20_percent_count_1_2km Integer Count of bottom-quintile SRS schools within 1–2 km
has_top20_0_1km_only Binary 1 if top_20_percent_count_0_1km ≥ 1 AND top_20_percent_count_1_2km = 0
has_top20_1_2km_only Binary 1 if top_20_percent_count_0_1km = 0 AND top_20_percent_count_1_2km ≥ 1
has_top20_both Binary 1 if top_20_percent_count_0_1km ≥ 1 AND top_20_percent_count_1_2km ≥ 1
(baseline) All three binary indicators = 0; no top-20% school within 2 km

RDD Variables

Variable Type Construction
nearest_top_20_percent_school_distance_m Continuous BallTree distance to nearest top-quintile school (metres); stored as distance_to_school_m in RDD.PY
running_m Continuous distance_to_school_m − 1000
inside_1km Binary 1 if running_m < 0
triangular_weight Continuous max(1 − |running_m| / bandwidth, 0)
matched_school_name String Name of the nearest top-quintile school; used as cluster variable in RDD

Amenity Control Variables

Distance Variable Count Variable Facility
log_nearest_mrt_distance_m mrt_count_within_500m MRT/LRT stations
log_nearest_supermarket_distance_m supermarket_count_within_500m Supermarkets
log_nearest_shopping_mall_distance_m shopping_mall_count_within_500m Shopping malls
log_nearest_bus_stop_distance_m bus_stop_count_within_500m Bus stops
log_nearest_park_distance_m park_count_within_500m Parks
log_nearest_hawker_distance_m hawker_count_within_500m Hawker centres

Appendix B

Model Specification Strings: Estimating Equation

log_resale_price ~
    has_top20_0_1km_only
  + has_top20_1_2km_only
  + has_top20_both
  + floor_area_sqm
  + C(flat_type)
  + storey_mid
  + remaining_lease_year
  + C(town)
  + C(p1_cycle_year)
  + log_nearest_supermarket_distance_m + supermarket_count_within_500m
  + log_nearest_shopping_mall_distance_m + shopping_mall_count_within_500m
  + log_nearest_mrt_distance_m + mrt_count_within_500m
  + log_nearest_bus_stop_distance_m + bus_stop_count_within_500m
  + log_nearest_park_distance_m + park_count_within_500m
  + log_nearest_hawker_distance_m + hawker_count_within_500m

Estimator: OLS (statsmodels smf.ols)
Inference: cov_type='cluster', groups=df['address']

Appendix C

Model Specification Strings: Model Design

Two count-based town-slope models are estimated:

Model A:
log_resale_price ~
    0
  + C(town)
  + C(town):top_20_percent_count_0_1km
  + floor_area_sqm
  + C(flat_type)
  + storey_mid
  + remaining_lease_year
  + C(p1_cycle_year)
  + [same 12 facility controls as the main hedonic OLS]

Model B:
log_resale_price ~
    0
  + C(town)
  + C(town):top_20_percent_count_0_2km
  + floor_area_sqm
  + C(flat_type)
  + storey_mid
  + remaining_lease_year
  + C(p1_cycle_year)
  + [same 12 facility controls as the main hedonic OLS]

Estimator: OLS (statsmodels smf.ols)
Inference: cov_type='cluster', groups=df['address']

Appendix D

Model Specification Strings: RDD via rdrobust

Y = log_resale_price
X = nearest_top_20_percent_school_distance_m

Estimator: local-linear RDD via rdrobust
Kernel: triangular
Main bandwidth: CCT / MSE-optimal bandwidth selected by rdrobust
Inference: robust bias-corrected rdrobust output with nearest-neighbour variance estimation