[DOC] two notebooks on Bayesian probabilistic regression #520

meraldoantonio · 2025-01-19T15:37:17Z

Reference Issues/PRs

Provides example for Bayesian Conjugate Linear Regressor #500

What does this implement/fix? Explain your changes.

This contribution includes the first two notebooks in a planned series of four. These two notebooks cover:

The general theory behind Bayesian Linear Regression.
The conjugate prior method for solving Bayesian Linear Regression and implementation of this approach using the BayesianConjugateLinearRegressor estimator.
In addition, lightweight utils.py and small synthetic dataset are included.

Does your contribution introduce a new dependency? If yes, which one?

None

What should a reviewer concentrate their feedback on?

Correctness of exposition

Did you add any tests for the change?

No

Any other comments?

No

PR checklist

No

For all contributions

I've added myself to the list of contributors with any new badges I've earned :-)
How to: add yourself to the all-contributors file in the skpro root directory (not the CONTRIBUTORS.md). Common badges: code - fixing a bug, or adding code logic. doc - writing or improving documentation or docstrings. bug - reporting or diagnosing a bug (get this plus code if you also fixed the bug in the PR).maintenance - CI, test framework, release.
See here for full badge reference
The PR title starts with either [ENH], [MNT], [DOC], or [BUG]. [BUG] - bugfix, [MNT] - CI, test framework, [ENH] - adding or improving code, [DOC] - writing or improving documentation or docstrings.

For new estimators

I've added the estimator to the API reference - in docs/source/api_reference/taskname.rst, follow the pattern.
I've added one or more illustrative usage examples to the docstring, in a pydocstyle compliant Examples section.
If the estimator relies on a soft dependency, I've set the python_dependencies tag and ensured
dependency isolation, see the estimator dependencies guide.

review-notebook-app · 2025-01-19T15:37:23Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

meraldoantonio · 2025-01-19T15:46:24Z

Hi @fkiraly,

This PR includes the first two example notebooks for the Bayesian estimators. I suggest we merge these first, and I can submit the remaining notebooks, which are still in progress, in a separate PR.

I haven’t made any major changes to these notebooks since your review in the old PR (#500).

Let me know your thoughts or if I need to make any change to this PR, thanks!

fkiraly · 2025-01-25T22:47:59Z

examples/bayesian/1. Introduction to Bayesian Linear Regression.ipynb

+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "This series of notebooks offers an in-depth exploration of the **Bayesian Linear Regression**. \n",


"the" is superfluous

fkiraly · 2025-01-25T22:50:37Z

examples/bayesian/1. Introduction to Bayesian Linear Regression.ipynb

+    "Linear regression is a widely used model due to its simplicity and interpretability. \n",
+    "\n",
+    "\n",
+    "In its simplest form, it predicts a single target $t$ as the deterministic output of the function $y$, which in turn is a linear combination of input variables $\\mathbf{x} = (x_1, \\dots, x_D)^\\top$ and parameters $\\mathbf{w} = (w_0, w_1, \\dots, w_D)^\\top$:\n",


I think introducing "t" is more confusing than helping - it is basically the same as y, from a rough semantic perspective, but a scalar rather than a function. I would not introduce it, instead I would all the function that you currently call "y" a function "f", and the nintroduce data "y" later, to whic hthe model is fit.

Further, I would use notation $f(x|w)$, the bar is common notation to separate inputs from parameters.

fkiraly · 2025-01-25T22:52:44Z

examples/bayesian/1. Introduction to Bayesian Linear Regression.ipynb

+    "\n",
+    "$$\n",
+    "\\begin{aligned}\n",
+    "p(t | \\mathbf{x}, \\mathbf{w}, \\beta) &= \\mathcal{N}(t | y(\\mathbf{x}, \\mathbf{w}), \\beta^{-1}) \\\\\n",


this notation for N is odd, I think incorrect?

What is it supposed to mean?

Should "t" be a free variable instead?

N usually stands for the distibution, not the pdf.

fkiraly · 2025-01-25T22:56:40Z

examples/bayesian/1. Introduction to Bayesian Linear Regression.ipynb

+    "   The second notebook delves into the concept of conjugate priors. By using conjugate distributions, we can derive posterior distributions analytically, simplifying Bayesian inference. This notebook also highlights how prior knowledge can influence the model and improve predictions in the presence of limited data.\n",
+    "\n",
+    "\n",
+    "3. **MCMC and Variational Inference**  \n",


there are also 4. various strategies known as "approximate Bayes" here.

fkiraly

Nice notebooks!

High-level, I think two points of improvement are most important:

it is crucial that a reader gets some signposting about what they can expect from the notebooks, otherwise notebook 1 may disappoint expectations. A typical reader - seeing the bayesian folder - will want to see how they can use skpro. However, that is not what notebook 1 is about.
- it is still interesting, but perhaps say at the start that this is more of an explainer intro with code.
for code notebooks, the textbook style is discouraged, at least I would recommend to use more telegram and bullet point style. That makes the notebook less of a lengthy read.

On the mechanical side:

in notebook 2, I would suggest to minimize extraneous content that is added only for it. For instance, can we not use one of the sklearn datasets, instead of adding csvs in the folder?
we should also minimize "line-by-line" logic in the cells, 20 lines of data generation is not very useful from a didactic perspective. Can we outsource this somewhere in a data generation routine, in the main repo? Or just use a dummy dataset from sklearn and split it up?
It is also important to split the vignettes more, e.g., have skpro vignettes separate from plotting cells.
for plotting distributions - at least univariate ones - you can use the skpro onboard plotting. If you think it is not sufficient, we should add more features to that.
- multivariate is not yet supported, but that might be also a good small project?
Regarding the final summary, with advantages and disadvantages: conjucates are also simple, compute efficient algorithms typically. A user may want to use evaluation to see how well they fare, against more memory or compute heavy variants...

Meraldo Antonio added 2 commits January 19, 2025 23:28

Finished first two Bayesian notebooks and their artefacts

5f0fa79

Added contribution

ce535b6

meraldoantonio requested review from felipeangelimvieira, fkiraly and SaiRevanth25 as code owners January 19, 2025 15:37

meraldoantonio mentioned this pull request Jan 19, 2025

[DOC] Adding a Notebook companion to BayesianMCMCLinearRegressor, Restructuring Estimator Folder #480

Closed

2 tasks

fkiraly reviewed Jan 25, 2025

View reviewed changes

fkiraly requested changes Jan 25, 2025

View reviewed changes

fkiraly added module:regression probabilistic regression module documentation Documentation & tutorials labels Jan 25, 2025

fkiraly changed the title ~~[DOC] Finished first two Bayesian notebooks and their artefacts~~ [DOC] two notebooks on Bayesian probabilistic regression Jan 25, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DOC] two notebooks on Bayesian probabilistic regression #520

[DOC] two notebooks on Bayesian probabilistic regression #520

meraldoantonio commented Jan 19, 2025

review-notebook-app bot commented Jan 19, 2025

meraldoantonio commented Jan 19, 2025

fkiraly Jan 25, 2025

fkiraly Jan 25, 2025

fkiraly Jan 25, 2025 •

edited

Loading

fkiraly Jan 25, 2025

fkiraly left a comment

[DOC] two notebooks on Bayesian probabilistic regression #520

Are you sure you want to change the base?

[DOC] two notebooks on Bayesian probabilistic regression #520

Conversation

meraldoantonio commented Jan 19, 2025

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Does your contribution introduce a new dependency? If yes, which one?

What should a reviewer concentrate their feedback on?

Did you add any tests for the change?

Any other comments?

PR checklist

For all contributions

For new estimators

review-notebook-app bot commented Jan 19, 2025

meraldoantonio commented Jan 19, 2025

fkiraly Jan 25, 2025

Choose a reason for hiding this comment

fkiraly Jan 25, 2025

Choose a reason for hiding this comment

fkiraly Jan 25, 2025 • edited Loading

Choose a reason for hiding this comment

fkiraly Jan 25, 2025

Choose a reason for hiding this comment

fkiraly left a comment

Choose a reason for hiding this comment

fkiraly Jan 25, 2025 •

edited

Loading