O Blog MulherTech.com.br é para mulheres que trabalham e amam tecnologia. Assuntos abordados neste Blog: Gestão de Projetos, Marketing Digital, Tecnologia da Informação, Ciência da Computação, Cloud Computing, Rede de Computadores Wireless, Rede TCP-IP (IPv4 e IPv6), Rede 5G, Linux, Windows Server, Windows 11, Data Science, Big Data, Inteligência Artificial, Linguagens de Programação Web, Frontend, Backend, etc.
sábado, 23 de março de 2024
Machine Learning With Python For Everyone
sexta-feira, 13 de outubro de 2023
Python Machine Learning
- Scikit-Learn
- TensorFlow
- PyTorch
- Keras
- LightGBM
- XGBoost
- CatBoost
- OpenAI Gym
- Hugging Face Transformers
quinta-feira, 30 de março de 2023
domingo, 10 de janeiro de 2016
9 Free Books for Learning Data Mining and Data Analysis
9 Free Books for Learning Data Mining and Data Analysis
sábado, 9 de janeiro de 2016
Common Errors in Machine Learning due to Poor Statistics Knowledge
Common Errors in Machine Learning due to Poor Statistics Knowledge
Probably the worst error is thinking there is a correlation when that correlation is purely artificial. Take a data set with 100,000 variables, say with 10 observations. Compute all the (99,999 * 100,000) / 2 cross-correlations. You are almost guaranteed to find one above 0.999. This is best illustrated in may article How to Lie with P-values (also discussing how to handle and fix it.)
This is being done on such a large scale, I think it is probably the main cause of fake news, and the impact is disastrous on people who take for granted what they read in the news or what they hear from the government. Some people are sent to jail based on evidence tainted with major statistical flaws. Government money is spent, propaganda is generated, wars are started, and laws are created based on false evidence. Sometimes the data scientist has no choice but to knowingly cook the numbers to keep her job. Usually, these “bad stats” end up being featured in beautiful but faulty visualizations: axes are truncated, charts are distorted, observations and variables are carefully chosen just to make a (wrong) point.