5. Appendix
Appendix A
Variable Dictionary: Outcome and Structural Controls
| Variable |
Type |
Construction |
resale_price |
Continuous |
Raw transaction field (SGD) |
log_resale_price |
Continuous |
np.log(resale_price) — dependent variable in all regressions |
floor_area_sqm |
Continuous |
Raw transaction field |
flat_type |
Categorical |
Ordinal encoding: 1-room=1, 2-room=2, 3-room=3, 4-room=4, 5-room=5, Executive=6, Multi-Generation=7; used as C(flat_type) |
storey_mid |
Continuous |
(storey_low + storey_high) / 2 |
remaining_lease_year |
Continuous |
Raw field; imputed as 99 − (transaction_year − lease_commence_date) where missing |
town |
Categorical |
Raw transaction field; used as C(town) |
School Exposure Variables
| Variable |
Type |
Construction |
top_20_percent_count_0_1km |
Integer |
Count of top-quintile SRS schools within 0–1 km (BallTree spatial join) |
top_20_percent_count_1_2km |
Integer |
Count of top-quintile SRS schools within 1–2 km |
bottom_20_percent_count_0_1km |
Integer |
Count of bottom-quintile SRS schools within 0–1 km |
bottom_20_percent_count_1_2km |
Integer |
Count of bottom-quintile SRS schools within 1–2 km |
has_top20_0_1km_only |
Binary |
1 if top_20_percent_count_0_1km ≥ 1 AND top_20_percent_count_1_2km = 0 |
has_top20_1_2km_only |
Binary |
1 if top_20_percent_count_0_1km = 0 AND top_20_percent_count_1_2km ≥ 1 |
has_top20_both |
Binary |
1 if top_20_percent_count_0_1km ≥ 1 AND top_20_percent_count_1_2km ≥ 1 |
| (baseline) |
— |
All three binary indicators = 0; no top-20% school within 2 km |
RDD Variables
| Variable |
Type |
Construction |
nearest_top_20_percent_school_distance_m |
Continuous |
BallTree distance to nearest top-quintile school (metres); stored as distance_to_school_m in RDD.PY |
running_m |
Continuous |
distance_to_school_m − 1000 |
inside_1km |
Binary |
1 if running_m < 0 |
triangular_weight |
Continuous |
max(1 − |running_m| / bandwidth, 0) |
matched_school_name |
String |
Name of the nearest top-quintile school; used as cluster variable in RDD |
Amenity Control Variables
| Distance Variable |
Count Variable |
Facility |
log_nearest_mrt_distance_m |
mrt_count_within_500m |
MRT/LRT stations |
log_nearest_supermarket_distance_m |
supermarket_count_within_500m |
Supermarkets |
log_nearest_shopping_mall_distance_m |
shopping_mall_count_within_500m |
Shopping malls |
log_nearest_bus_stop_distance_m |
bus_stop_count_within_500m |
Bus stops |
log_nearest_park_distance_m |
park_count_within_500m |
Parks |
log_nearest_hawker_distance_m |
hawker_count_within_500m |
Hawker centres |
Appendix B
Model Specification Strings: Estimating Equation
log_resale_price ~
has_top20_0_1km_only
+ has_top20_1_2km_only
+ has_top20_both
+ floor_area_sqm
+ C(flat_type)
+ storey_mid
+ remaining_lease_year
+ C(town)
+ C(p1_cycle_year)
+ log_nearest_supermarket_distance_m + supermarket_count_within_500m
+ log_nearest_shopping_mall_distance_m + shopping_mall_count_within_500m
+ log_nearest_mrt_distance_m + mrt_count_within_500m
+ log_nearest_bus_stop_distance_m + bus_stop_count_within_500m
+ log_nearest_park_distance_m + park_count_within_500m
+ log_nearest_hawker_distance_m + hawker_count_within_500m
Estimator: OLS (statsmodels smf.ols)
Inference: cov_type='cluster', groups=df['address']
Appendix C
Model Specification Strings: Model Design
Two count-based town-slope models are estimated:
Model A:
log_resale_price ~
0
+ C(town)
+ C(town):top_20_percent_count_0_1km
+ floor_area_sqm
+ C(flat_type)
+ storey_mid
+ remaining_lease_year
+ C(p1_cycle_year)
+ [same 12 facility controls as the main hedonic OLS]
Model B:
log_resale_price ~
0
+ C(town)
+ C(town):top_20_percent_count_0_2km
+ floor_area_sqm
+ C(flat_type)
+ storey_mid
+ remaining_lease_year
+ C(p1_cycle_year)
+ [same 12 facility controls as the main hedonic OLS]
Estimator: OLS (statsmodels smf.ols)
Inference: cov_type='cluster', groups=df['address']
Appendix D
Model Specification Strings: RDD via rdrobust
Y = log_resale_price
X = nearest_top_20_percent_school_distance_m
Estimator: local-linear RDD via rdrobust
Kernel: triangular
Main bandwidth: CCT / MSE-optimal bandwidth selected by rdrobust
Inference: robust bias-corrected rdrobust output with nearest-neighbour variance estimation